International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 07 Issue: 01 | Jan 2020
p-ISSN: 2395-0072
www.irjet.net
Text-Based Domain and Image Categorization of Google Search Engine using Conceptual Clustering Divya Soni1, Preetesh Purohit2 1Student,
Swami Vivekanand College of Engineering, Indore, India Professor, Swami Vivekanand College of Engineering, Indore, India ----------------------------------------------------------------------***--------------------------------------------------------------------2Associate
Abstract - The search engine like google provides a record of results that show a list of ranked output. The ranked output is not considered user-relevant. Query suggestion system is an option for this problem. But still, users are always forced to shift through the long ordered list of documents retrieved by the search engines. However, categorization is one option to solve this problem. The proposed system performs categorization based on several relevant conceptual information and predictive modelling. Proposed work acquaints an impressive approach that takes the conceptual preferences of users to get topics of the snippet with the help of analyzed data. To achieve this goal, an online technique is introduced that analyzes web-content of the generated results to find appropriate concepts and uses it to recognize related results. A personalized conceptual clustering algorithm is also used to generate a decision tree of clusters of the query data. With the help of this tree, the proposed system can identify classes for the web pages easily which provide relevant topical results to the user. Key Words: Web Mining, Text Mining, Clustering, Web Page Recommendation, Search Engine, Query Logs, Domain Categorization. 1. INTRODUCTION Search engines are web portals for finding knowledge on world wide web. Search engines index a large portion of the web and store the information in databases. Unlike a web directory, the search engine carries operations automatically. The underlying technique of search engines is information retrieval. The obvious question in information retrieval is how to find the relevant documents for each user query. This is the main issue that tries to address [1]. Knowledge mining is a procedure of mining usable information from the crude information. The information is a term that is principal to the information that is required by way of a data miner or utility. Extraction of the similar text from a raw set of text is the generation of text data mining. Clearly, textual information does not follow any similar pattern in text data, an unstructured technique and labelling of text data is tricky initiation. Therefore, several applications make use of the classification and clustering techniques for categorizing knowledge. Finding high quality labelled document is crucial for any search engine since it can be used to estimate the search
© 2020, IRJET
|
Impact Factor value: 7.34
|
engine quality and efficiency. The basic idea of proposed work is established on concepts and its similarities extracted from the search queries submitted by the user, the web content, and the click data. To identify user preference, the personalized clustering technique is applied which exploit these click-through data: A user clicks a link in search results if the result has relevant web content in which the user interested. Moreover, click data can be gathered easily without the effective excess load on users, thus providing a low-cost means to capture user’s interest [2][3][4]. 2. BACKGROUND World Wide Web contains a huge amount of knowledge and data, some of the knowledge is distributed using the contents of web pages and some of the information is not directly gained from the direct web. Therefore, in order to recover the meaningful and essential knowledge from the web, a domain known as web mining is introduced. Web mining utilizes the web data and the mining techniques. The web mining is divided in three main key domains according to the kind of data processing. The content of web page is analyzed using content mining, web access log is analyzed under web usages mining and the connectivity of the web pages are analyzed under the structure mining. Recommendation systems are the most essential application of web mining and most of the recommender systems typically generate a list of recommendations in two ways content-based filtering or through collaborative. Recommendation system algorithm based on content filtering [5] is the algorithm that works with content based profiles that are created at the beginning from the users’ preferences. This profile contains information about a user and his interesting topics. Topics are based on how many times a user retrieve the same results for the related query. In this process, the search engine analyzes the results looks for similarities that were previously patently ranked by the user with the results he did not ranked. Those results that are chiefly similar to the patently ranked ones, will be recommended to the user. Collaborative filtering Algorithm [6][7] recommendation system was one of the most popular researched technique of recommendation systems since that technique was described and stated by H. Varian and P. Resnick in 1997. [8][9] collaborative filtering is based on community data analysis which find the users from their shared appreciations in the community [10]. If multiple users have similar or nearly same ranked results in common, then those users have similar interest in that result [11].
ISO 9001:2008 Certified Journal
|
Page 193