Page 1


(10%) What is stemming? How does it affect recall and precision? Why?


(10%) Why are skip pointers not useful for queries of the form x OR y?


(5%) Assume a biword index. Give an example of a document which will be return or a query of “New York University” but is actually a false positive which should not be returned.


(12%) Please suggest two kinds of index method for supporting wildcard query. And discuss the advantages and disadvantages of the methods.


(10%) Compute the edit distance between border and aboard. Write down the 7*7 array of distances between all prefixes.


Variable byte codes a. (5%) How it works? b. (5%) If the base unit is 4-bit, please encode 54. c. (5%) If the base unit is 8-bit, please decode 00000110 10100110


γ codes a. (5%) How many bits are needed to encode number N? b. (10%) Decode the following sequences if possible i. 111001011110001110111000 ii. 111101101111010 c. (6%) We say γ code is prefix-free, what does prefix-free mean?


(12%) Why internal sort is not suitable for sorting huge amount of data (can not fit into main memory)? Suggest a sorting method for solving this problem.


(10%) For n=2 and 1≦T≦30, perform a step-by-step simulation of Logarithmic Merge. Create a table that shows, for each point in time at which T=2*k tokens have been processed(1≦k≦15), which of the three indexes I0, ..., I3 are in use. The first three lines of the table are given below. 2 4 6

I3 0 0 0

I2 0 0 0

I1 0 0 1

I0 0 1 0