(10%) Recommend a query processing order for (tangerine OR trees) AND (marmalade OR skies) AND (kaleidoscope OR eyes) given the following postings list size: Term eyes
Postings size 213312
kaleidoscope 8700 marmalade 107913 skies 271658 tangerine 4665 trees 516812 (16%) Are the following statements true or false? a. b. c. d.
In a Boolean retrieval system, stemming never lowers precision. In a Boolean retrieval system, stemming never lowers recall. Stemming increases the size of the vocabulary. Stemming should be invoked at indexing time, but not while processing a query. (8%) Assume a byword index. Give an example of a document which will be returned for a query of â€œNew York Universityâ€? but is actually a false positive which should not be returned. (5%) If you want to search for he*lo in a permuterm wildcard index, what key(s) would one do the lookup on? Variable byte codes a. (5%) How it works? b. c.
(6%) If the base unit is 4-bit, please encode 49. (6%) If the base unit is 8-bit, please decode 10000110 00000110 10001001 codes a. (18%) Decode the following sequences if possible i. 111001011110000110011010 ii. 1001110101111110011 iii. 11110010111101010 b. (8%) We say code is prefix-free, what does prefix-free mean? (10%) What is the idf if a term that occurs in every document? Compare this with the use of stop word lists. Consider the following table a. (10%) Assume there are 4 documents in the database, and given that
Please calculate cosine similarity between Doc1 and Doc4.
(6%) Calculate Jaccard coefficient between Doc1 and Doc4. Doc1