Text Mining

Page 15

15 currence of a particular event while text mining techniques may be used to look for an explanation of the event. Text mining can also be used to identify implicit connections, wherein lies its, for the most part untapped, potential value for businesses. Research in the application of text mining techniques to this area is encouraging. In [4] Bernstein et al. analyze co-occurrence based association rules that relate different companies. Their analysis is done on over 22,000 business news stories. Initially they use an information extraction software (ClearForest[11]), to extract the set of company names from the text. They then use disambiguation techniques on this set to identify all the unique company names. For example, H.P. and Hewlett Packard are names for the same company. A graph structure is used to visualize the model they generate. Each node in the graph represents a company and an edge represents a cooccurrence based association between two companies. To eliminate random associations they link two companies only if the strength of their association is above a minimum support threshold. From this graph they were able to identify hubs, which represent dominant companies in different industries. They also used the vector space model (from IR) to represent companies as weighted link vectors. They use the cosine similarity score between a company vector and the average industry vector as an estimate of the relatedness of a company to its industry. Additionally, the similarity between different average industry vectors gives a measure of how closely related the industries are to each other. For example, they found that the computer software industry and the computer hardware industry were fairly closely related. Although this research did not reveal any new knowledge, as acknowledged by the authors, we can use these techniques to explore relationships other than


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.