Lossless Representation of High Utility Itemset Using APRIORI Algorithm

GRD Journals | Global Research and Development Journal for Engineering | International Conference on Innovations in Engineering and Technology (ICIET) - 2016 | July 2016

e-ISSN: 2455-5703

Lossless Representation of High Utility Item set Using APRIORI Algorithm 1S.

Rashmi 2A. Selvaraj 3K. Pavithra 4S. Sundareswari 1,2,3,4 Student 1,2,3,4 Department of Information Technology 1,2,3,4 K.L.N College of Engineering Abstract

Mining High Utility Item set (HUIs) becomes an important data mining task. Too many HUIs may degrade the level of performance. To achieve high efficiency of mining task apriori algorithm is used. In this method a refined database is made that makes the search easier. This in-turn serves to be a compact and lossless representation of HUIs. Keyword- Mining High Utility Item set (HUIs), apriori algorithm _________________________________________________________________________________________________

I. INTRODUCTION Frequent item set mining (FIM) is a fundamental research topic in data mining. The original paper consists of frequent pattern tree structure used for data mining. In tree structure memory and spaced used is more. Since memory used is more this seems to be a great disadvantage. In this application, the traditional model of FIM may discover a large amount of frequent but low revenue item sets and lose the information on valuable item sets having low selling frequencies. These problems are caused by the facts that FIM treats all items as having the same importance. These representations successfully reduce the number of item sets found. In this paper, we address all of these challenges by proposing. A condensed and meaningful representation of HUIs named closed high utility item sets (CHUIs), which integrates the concept of closed item set into high utility item set mining. Due to a new structure named utility unit array the proposed representation is lossless that allows recovering all HUIs and their utilities efficiently. The proposed representation is also compact. Experiments show that it reduces the number of item sets by several orders of magnitude, especially for datasets containing long high utility item sets mining, each item has a weight (e.g. unit profit) and can appear more than once in each transaction (e.g. purchase quantity). The utility of an item set represents its importance. If its utility is no less than a user-specified minimum utility threshold then that item set is called a high utility item set (HUI); otherwise, it is called a low utility item set. It has a wide range of applications such as website click stream analysis, cross marketing in retail stores mobile commerce environment and biomedical applications. The original dataset consists of weather data related to storm events. Pre-processing of the dataset has to be done. In this stage all the noisy data is removed. The data that would be helpful for processing is alone categorized for usage. For this the original dataset is provided as input for removing the noisy data. The removal of this noisy data is done manually. By this the original dataset is pre-processed. Now the pre-processed dataset is provided as input in the next step. Next loading and processing of dataset is done. The availability of the file is checked for. If found then processed or else warning has to be send. Then implementation of apriori algorithm is done in this stage. The state codes are verified and then the number of frequent sets is categorized. Thus based on the number of occurrence the frequent set is prepared. The Frequent set is provided as output using attribute number.

208

Turn static files into dynamic content formats.

Create a flipbook