GRD Journals | Global Research and Development Journal for Engineering | International Conference on Innovations in Engineering and Technology (ICIET) - 2016 | July 2016
e-ISSN: 2455-5703
Ranking of Document Recommendations from Conversations using Probabilistic Latent Semantic Analysis 1P.
Velvizhi 2S. Aishwarya 3R. Bhuvaneswari 1,2,3 Department of Computer Science & Engineering 1,2,3 K.L.N. College of Engineering, Pottapalayam, Sivagangai 630612, India Abstract Any Information retrieval from documents is done through text search. Now a day, efficient search is done through Mining techniques. Speech is recognized for searching a document. A group of Conversations are recorded using Automatic Speech Recognition (ASR) technique. The system changes speech to text using FISHER tool. Those conversations are stored in a database. Formulation of Implicit Queries is preceded in two stages as Extraction and Clustering. The domain of the conversations is structured through Topic Modeling. Extraction of Keywords from a topic is done with high probability. In this system, Ranking of documents is done using Probabilistic Latent Semantic Analysis (PLSA) technique. Clustering of keywords from a set covers all the topics recommended. The precise document recommendation for a topic is specified intensively. The Probabilistic Latent Semantic Analysis (PLSA) technique is to provide ranking over the searched documents with weighted keywords. This reduces noise while searching a topic. Enforcing both relevance and diversity ensures effective document retrieval. The text documents are converted to speech conversation using e-Speak tool. The final retrieved conversations are as required. Keyword- Keyword Extraction, Topic Modeling, Word Frequency, PLSA, Document retrieval __________________________________________________________________________________________________
I. INTRODUCTION Many unpredictable information are available as documents, databases or multimedia resources. User’s current activities do not initiate a search to access that information. So we adopt just-in-time retrieval system. This system spontaneously recommends documents by analyzing users’ current activities. Here, the activities are recorded as conversations. For instance, conversations recorded in a meeting. A real-time Automatic Speech Recognition (ASR) technique constructs implicit queries for document retrieval. This recommendation is done from the web or a local repository. Just-in-time retrieval system must construct implicit queries from conversations and this contains a much larger number of words than a query. For instance, four people must make a list of all the items required to survive in the mountains. A short fragment of 120 seconds conversations which contains 600 words is recorded. This is then split based on the variety of domains such as ‘Job’, ‘Plan’ or ‘Business’. The Multiplicity of topics or speech disfluencies produces ASR noise. The main objective is to maintain multiple records about users’ information needs. In this paper, extracting a relevant and diverse set of keywords is the first step. Clustering into specific-topic queries are ranked by importance. This topic-based clustering technique decreases ASR errors and increases diversity of document recommendations. For instance, Word Frequency retrieves the Wikipedia pages like ‘Job’, ‘Hiring’ and ‘Interview’. Whereas, users would prefer fragments such as ‘Job’, ‘Plan’ and ‘Business’. Relevance and Diversity can be explained in three stages as follows. Extraction of keywords, which retrieves the mostly used words. Build one or several implicit queries namely Single and Multiple. While using Single query, irrelevant documents are retrieved. Whereas in Multiple query, Diversity constraint is maintained. Ranking the results is considered as the final step. It reorders the document list recommended to users. Previous methods in the formulation of implicit queries rely on Word Frequency weights. But other methods perform keyword extraction using Topical Similarity. They do not set Topic diversity constraint. From the ASR output, user’s information needs are satisfied and the number of irrelevant words is reduced. When the keywords are extracted, it is followed by clustering which builds topically separated queries. They are run independently than topically mixed query. These results are then ranked and organized. The paper is organized as follows. In Section II-A the just-in-time retrieval system and the policies used for query formulation is reviewed. In Section II-B keyword extraction methods are discussed. In Section III the proposed technique for formulation of implicit queries is given. In Section IV introduces data sets and the comparison of keyword sets is made to retrieve documents using crowd sourcing. In Section V the experimental results on keyword extraction and clustering is presented.
All rights reserved by www.grdjournals.com
133