Lucene

Page 241

206

CHAPTER 6

Extending search

AM FL Y

that can be ordered alphabetically. Handling numbers is basically the same, except implementing a conversion to a text format is left up to you. In this section, our example scenario indexes an integer id field so that range queries can be performed. If we indexed toString representations of the integers 1 through 10, the order in the index would be 1, 10, 2, 3, 4, 5, 6, 7, 8, 9— not the intended order at all. However, if we pad the numbers with leading zeros so that all numbers have the same width, the order is correct: 01, 02, 03, and so on. You’ll have to decide on the maximum width your numbers need; we chose 10 digits and implemented the following pad(int) utility method:3 public class NumberUtils { private static final DecimalFormat formatter = new DecimalFormat("0000000000"); public static String pad(int n) { return formatter.format(n); }

TE

}

The numbers need to be padded during indexing. This is done in our test setUp() method on the id keyword field: public class AdvancedQueryParserTest extends TestCase { private Analyzer analyzer; private RAMDirectory directory; protected void setUp() throws Exception { super.setUp(); analyzer = new WhitespaceAnalyzer(); directory = new RAMDirectory(); IndexWriter writer = new IndexWriter(directory, analyzer, true); for (int i = 1; i <= 500; i++) { Document doc = new Document(); doc.add(Field.Keyword("id", NumberUtils.pad(i))); writer.addDocument(doc); } writer.close(); } }

With this index-time padding, we’re only halfway there. A query expression for IDs 37 through 346 phrased as id:[37 TO 346] won’t work as expected with the 3

Lucene stores term information with prefix compression so that no penalty is paid for large shared prefixes like this zero padding.

Team-Fly® Licensed to Simon Wong <simonwg@sinatown.com>


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.