Issuu on Google+

1.

(10%) Recommend a query processing order for (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes) given the following postings list size: Term eyes

2.

Postings size 213312

kaleidoscope 8700 marmalade 107913 skies 271658 tangerine 4665 trees 516812 (16%) Are the following statements true or false? a. b. c. d.

3.

4. 5.

In a Boolean retrieval system, stemming never lowers precision. In a Boolean retrieval system, stemming never lowers recall. Stemming increases the size of the vocabulary. Stemming should be invoked at indexing time, but not while processing a query. (8%) Assume a byword index. Give an example of a document which will be returned for a query of “New York University� but is actually a false positive which should not be returned. (5%) If you want to search for he*lo in a permuterm wildcard index, what key(s) would one do the lookup on? Variable byte codes a. (5%) How it works? b. c.

6.

7. 8.

(6%) If the base unit is 4-bit, please encode 49. (6%) If the base unit is 8-bit, please decode 10000110 00000110 10001001 codes a. (18%) Decode the following sequences if possible i. 111001011110000110011010 ii. 1001110101111110011 iii. 11110010111101010 b. (8%) We say code is prefix-free, what does prefix-free mean? (10%) What is the idf if a term that occurs in every document? Compare this with the use of stop word lists. Consider the following table a. (10%) Assume there are 4 documents in the database, and given that

Please calculate cosine similarity between Doc1 and Doc4.


b.

(6%) Calculate Jaccard coefficient between Doc1 and Doc4. Doc1

Doc2

Doc3

Doc4

internet

10

15

0

1

keep

5

0

15

60

law

6

2

0

1


13528012521363190523