January 2016

Page 1

ISSN (ONLINE) : 2045 -8711 ISSN (PRINT) : 2045 -869X

INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY & CREATIVE ENGINEERING

JANUARY 2016 VOL- 6 NO - 1

NOVEMBER 2015 VOL- 5 NO - 11

@IJITCE Publication @IJITCE Publication


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016

UK: Managing Editor International Journal of Innovative Technology and Creative Engineering 1a park lane, Cranford London TW59WA UK E-Mail: editor@ijitce.co.uk Phone: +44-773-043-0249 USA: Editor International Journal of Innovative Technology and Creative Engineering Dr. Arumugam Department of Chemistry University of Georgia GA-30602, USA. Phone: 001-706-206-0812 Fax:001-706-542-2626 India: Editor International Journal of Innovative Technology & Creative Engineering Dr. Arthanariee. A. M Finance Tracking Center India 66/2 East mada st, Thiruvanmiyur, Chennai -600041 Mobile: 91-7598208700

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016

www.ijitce.co.uk

IJITCE PUBLICATION

INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY & CREATIVE ENGINEERING Vol.6 No.1 January 2016

www.ijitce.co.uk

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016

From Editor's Desk Dear Researcher, Greetings! Research article in this issue discusses about motivational factor analysis. Let us review research around the world this month. Engineers are Developing Robotic Spacecraft to Assist Satellite Repairs in Orbit NASA engineers are developing robotic spacecraft to help service and repair satellites in distant orbits. NASA is developing and demonstrating technologies to service and repair satellites in distant orbits. Robotic spacecraft likely operated with joysticks by technicians on the ground would carry out the hands-on maneuvers, not human beings using robotic and other specialized tools, as was the case for spacecraft like the low-Earth-orbiting Hubble Space Telescope. Web Usage Mining is the application of data mining techniques to discover interesting usage patterns from Web data in order to understand and better serve the needs of Web-based applications. Usage data captures the identity or origin of Web users along with their browsing behaviour at a Web site. The user logs are collected by the Web server. Typical data includes IP address, page reference and access time. Commercial application servers have significant features to enable e-commerce applications to be built on top of them with little effort. A key feature is the ability to track various kinds of business events and log them in application server logs. New kinds of events can be defined in an application, and logging can be turned on for them thus generating histories of these specially defined events. It must be noted many end applications require a combination of one or more of the techniques applied in the categories above. Web usage mining by itself does not create issues, but this technology when used on data of personal nature might cause concerns. It has been an absolute pleasure to present you articles that you wish to read. We look forward to many more new technologies related research articles from you and your friends. We are anxiously awaiting the rich and thorough research papers that have been prepared by our authors for the next issue. Thanks, Editorial Team IJITCE

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016

Editorial Members Dr. Chee Kyun Ng Ph.D Department of Computer and Communication Systems, Faculty of Engineering,Universiti Putra Malaysia,UPMSerdang, 43400 Selangor,Malaysia. Dr. Simon SEE Ph.D Chief Technologist and Technical Director at Oracle Corporation, Associate Professor (Adjunct) at Nanyang Technological University Professor (Adjunct) at ShangaiJiaotong University, 27 West Coast Rise #08-12,Singapore 127470 Dr. sc.agr. Horst Juergen SCHWARTZ Ph.D, Humboldt-University of Berlin,Faculty of Agriculture and Horticulture,Asternplatz 2a, D-12203 Berlin,Germany Dr. Marco L. BianchiniPh.D Italian National Research Council; IBAF-CNR,Via Salaria km 29.300, 00015 MonterotondoScalo (RM),Italy Dr. NijadKabbaraPh.D Marine Research Centre / Remote Sensing Centre/ National Council for Scientific Research, P. O. Box: 189 Jounieh,Lebanon Dr. Aaron Solomon Ph.D Department of Computer Science, National Chi Nan University,No. 303, University Road,Puli Town, Nantou County 54561,Taiwan Dr. Arthanariee. A. M M.Sc.,M.Phil.,M.S.,Ph.D Director - Bharathidasan School of Computer Applications, Ellispettai, Erode, Tamil Nadu,India Dr. Takaharu KAMEOKA, Ph.D Professor, Laboratory of Food, Environmental & Cultural Informatics Division of Sustainable Resource Sciences, Graduate School of Bioresources,Mie University, 1577 Kurimamachiya-cho, Tsu, Mie, 514-8507, Japan Dr. M. Sivakumar M.C.A.,ITIL.,PRINCE2.,ISTQB.,OCP.,ICP. Ph.D. Project Manager - Software,Applied Materials,1a park lane,cranford,UK Dr. Bulent AcmaPh.D Anadolu University, Department of Economics,Unit of Southeastern Anatolia Project(GAP),26470 Eskisehir,TURKEY Dr. SelvanathanArumugamPh.D Research Scientist, Department of Chemistry, University of Georgia, GA-30602,USA.

Review Board Members Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168, Australia Dr. Zhiming Yang MD., Ph. D. Department of Radiation Oncology and Molecular Radiation Science,1550 Orleans Street Rm 441, Baltimore MD, 21231,USA Dr. Jifeng Wang Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign Urbana, Illinois, 61801, USA Dr. Giuseppe Baldacchini ENEA - Frascati Research Center, Via Enrico Fermi 45 - P.O. Box 65,00044 Frascati, Roma, ITALY. Dr. MutamedTurkiNayefKhatib Assistant Professor of Telecommunication Engineering,Head of Telecommunication Engineering Department,Palestine Technical University (Kadoorie), TulKarm, PALESTINE.

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016 Dr.P.UmaMaheswari Prof &Head,Depaartment of CSE/IT, INFO Institute of Engineering,Coimbatore. Dr. T. Christopher, Ph.D., Assistant Professor &Head,Department of Computer Science,Government Arts College(Autonomous),Udumalpet, India. Dr. T. DEVI Ph.D. Engg. (Warwick, UK), Head,Department of Computer Applications,Bharathiar University,Coimbatore-641 046, India. Dr. Renato J. orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business School,RuaItapeva, 474 (8° andar),01332-000, São Paulo (SP), Brazil Visiting Scholar at INSEAD,INSEAD Social Innovation Centre,Boulevard de Constance,77305 Fontainebleau - France Y. BenalYurtlu Assist. Prof. OndokuzMayis University Dr.Sumeer Gul Assistant Professor,Department of Library and Information Science,University of Kashmir,India Dr. ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg., Rm 120,Hampton University,Hampton, VA 23688 Dr. Renato J. Orsato Professor at FGV-EAESP,Getulio Vargas Foundation,São Paulo Business SchoolRuaItapeva, 474 (8° andar),01332-000, São Paulo (SP), Brazil Dr. Lucy M. Brown, Ph.D. Texas State University,601 University Drive,School of Journalism and Mass Communication,OM330B,San Marcos, TX 78666 JavadRobati Crop Production Departement,University of Maragheh,Golshahr,Maragheh,Iran VineshSukumar (PhD, MBA) Product Engineering Segment Manager, Imaging Products, Aptina Imaging Inc. Dr. Binod Kumar PhD(CS), M.Phil.(CS), MIAENG,MIEEE HOD & Associate Professor, IT Dept, Medi-Caps Inst. of Science & Tech.(MIST),Indore, India Dr. S. B. Warkad Associate Professor, Department of Electrical Engineering, Priyadarshini College of Engineering, Nagpur, India Dr. doc. Ing. RostislavChoteborský, Ph.D. Katedramateriálu a strojírenskétechnologieTechnickáfakulta,Ceskázemedelskáuniverzita v Praze,Kamýcká 129, Praha 6, 165 21 Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials,CSIRO Process Science & Engineering Private Bag 33, Clayton South MDC 3169,Gate 5 Normanby Rd., Clayton Vic. 3168 DR.ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg.,HamptonUniversity,Hampton, VA 23688 Mr. Abhishek Taneja B.sc(Electronics),M.B.E,M.C.A.,M.Phil., Assistant Professor in the Department of Computer Science & Applications, at Dronacharya Institute of Management and Technology, Kurukshetra. (India). Dr. Ing. RostislavChotěborský,ph.d, Katedramateriálu a strojírenskétechnologie, Technickáfakulta,Českázemědělskáuniverzita v Praze,Kamýcká 129, Praha 6, 165 21

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016 Dr. AmalaVijayaSelvi Rajan, B.sc,Ph.d, Faculty – Information Technology Dubai Women’s College – Higher Colleges of Technology,P.O. Box – 16062, Dubai, UAE Naik Nitin AshokraoB.sc,M.Sc Lecturer in YeshwantMahavidyalayaNanded University Dr.A.Kathirvell, B.E, M.E, Ph.D,MISTE, MIACSIT, MENGG Professor - Department of Computer Science and Engineering,Tagore Engineering College, Chennai Dr. H. S. Fadewar B.sc,M.sc,M.Phil.,ph.d,PGDBM,B.Ed. Associate Professor - Sinhgad Institute of Management & Computer Application, Mumbai-BangloreWesternly Express Way Narhe, Pune - 41 Dr. David Batten Leader, Algal Pre-Feasibility Study,Transport Technologies and Sustainable Fuels,CSIRO Energy Transformed Flagship Private Bag 1,Aspendale, Vic. 3195,AUSTRALIA Dr R C Panda (MTech& PhD(IITM);Ex-Faculty (Curtin Univ Tech, Perth, Australia))Scientist CLRI (CSIR), Adyar, Chennai - 600 020,India Miss Jing He PH.D. Candidate of Georgia State University,1450 Willow Lake Dr. NE,Atlanta, GA, 30329 Jeremiah Neubert Assistant Professor,MechanicalEngineering,University of North Dakota Hui Shen Mechanical Engineering Dept,Ohio Northern Univ. Dr. Xiangfa Wu, Ph.D. Assistant Professor / Mechanical Engineering,NORTH DAKOTA STATE UNIVERSITY SeraphinChallyAbou Professor,Mechanical& Industrial Engineering Depart,MEHS Program, 235 Voss-Kovach Hall,1305 OrdeanCourt,Duluth, Minnesota 55812-3042 Dr. Qiang Cheng, Ph.D. Assistant Professor,Computer Science Department Southern Illinois University CarbondaleFaner Hall, Room 2140-Mail Code 45111000 Faner Drive, Carbondale, IL 62901 Dr. Carlos Barrios, PhD Assistant Professor of Architecture,School of Architecture and Planning,The Catholic University of America Y. BenalYurtlu Assist. Prof. OndokuzMayis University Dr. Lucy M. Brown, Ph.D. Texas State University,601 University Drive,School of Journalism and Mass Communication,OM330B,San Marcos, TX 78666 Dr. Paul Koltun Senior Research ScientistLCA and Industrial Ecology Group,Metallic& Ceramic Materials CSIRO Process Science & Engineering Dr.Sumeer Gul Assistant Professor,Department of Library and Information Science,University of Kashmir,India Dr. ChutimaBoonthum-Denecke, Ph.D Department of Computer Science,Science& Technology Bldg., Rm 120,Hampton University,Hampton, VA 23688

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016 Dr. Renato J. Orsato Professor at FGV-EAESP,Getulio Vargas Foundation,S찾o Paulo Business School,RuaItapeva, 474 (8째 andar)01332-000, S찾o Paulo (SP), Brazil Dr. Wael M. G. Ibrahim Department Head-Electronics Engineering Technology Dept.School of Engineering Technology ECPI College of Technology 5501 Greenwich Road Suite 100,Virginia Beach, VA 23462 Dr. Messaoud Jake Bahoura Associate Professor-Engineering Department and Center for Materials Research Norfolk State University,700 Park avenue,Norfolk, VA 23504 Dr. V. P. Eswaramurthy M.C.A., M.Phil., Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India. Dr. P. Kamakkannan,M.C.A., Ph.D ., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 007, India. Dr. V. Karthikeyani Ph.D., Assistant Professor of Computer Science, Government Arts College(Autonomous), Salem-636 008, India. Dr. K. Thangadurai Ph.D., Assistant Professor, Department of Computer Science, Government Arts College ( Autonomous ), Karur - 639 005,India. Dr. N. Maheswari Ph.D., Assistant Professor, Department of MCA, Faculty of Engineering and Technology, SRM University, Kattangulathur, Kanchipiram Dt - 603 203, India. Mr. Md. Musfique Anwar B.Sc(Engg.) Lecturer, Computer Science & Engineering Department, Jahangirnagar University, Savar, Dhaka, Bangladesh. Mrs. Smitha Ramachandran M.Sc(CS)., SAP Analyst, Akzonobel, Slough, United Kingdom. Dr. V. Vallimayil Ph.D., Director, Department of MCA, Vivekanandha Business School For Women, Elayampalayam, Tiruchengode - 637 205, India. Mr. M. Moorthi M.C.A., M.Phil., Assistant Professor, Department of computer Applications, Kongu Arts and Science College, India PremaSelvarajBsc,M.C.A,M.Phil Assistant Professor,Department of Computer Science,KSR College of Arts and Science, Tiruchengode Mr. G. Rajendran M.C.A., M.Phil., N.E.T., PGDBM., PGDBF., Assistant Professor, Department of Computer Science, Government Arts College, Salem, India. Dr. Pradeep H Pendse B.E.,M.M.S.,Ph.d Dean - IT,Welingkar Institute of Management Development and Research, Mumbai, India Muhammad Javed Centre for Next Generation Localisation, School of Computing, Dublin City University, Dublin 9, Ireland Dr. G. GOBI Assistant Professor-Department of Physics,Government Arts College,Salem - 636 007 Dr.S.Senthilkumar Post Doctoral Research Fellow, (Mathematics and Computer Science & Applications),UniversitiSainsMalaysia,School of Mathematical Sciences, Pulau Pinang-11800,[PENANG],MALAYSIA. Manoj Sharma Associate Professor Deptt. of ECE, PrannathParnami Institute of Management & Technology, Hissar, Haryana, India RAMKUMAR JAGANATHAN Asst-Professor,Dept of Computer Science, V.L.B Janakiammal college of Arts & Science, Coimbatore,Tamilnadu, India

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016 Dr. S. B. Warkad Assoc. Professor, Priyadarshini College of Engineering, Nagpur, Maharashtra State, India Dr. Saurabh Pal Associate Professor, UNS Institute of Engg. & Tech., VBS Purvanchal University, Jaunpur, India Manimala Assistant Professor, Department of Applied Electronics and Instrumentation, St Joseph’s College of Engineering & Technology, Choondacherry Post, Kottayam Dt. Kerala -686579 Dr. Qazi S. M. Zia-ul-Haque Control Engineer Synchrotron-light for Experimental Sciences and Applications in the Middle East (SESAME),P. O. Box 7, Allan 19252, Jordan Dr. A. Subramani, M.C.A.,M.Phil.,Ph.D. Professor,Department of Computer Applications, K.S.R. College of Engineering, Tiruchengode - 637215 Dr. SeraphinChallyAbou Professor, Mechanical & Industrial Engineering Depart. MEHS Program, 235 Voss-Kovach Hall, 1305 Ordean Court Duluth, Minnesota 55812-3042 Dr. K. Kousalya Professor, Department of CSE,Kongu Engineering College,Perundurai-638 052 Dr. (Mrs.) R. Uma Rani Asso.Prof., Department of Computer Science, Sri Sarada College For Women, Salem-16, Tamil Nadu, India. MOHAMMAD YAZDANI-ASRAMI Electrical and Computer Engineering Department, Babol"Noshirvani" University of Technology, Iran. Dr. Kulasekharan, N, Ph.D Technical Lead - CFD,GE Appliances and Lighting, GE India,John F Welch Technology Center,Plot # 122, EPIP, Phase 2,Whitefield Road,Bangalore – 560066, India. Dr. Manjeet Bansal Dean (Post Graduate),Department of Civil Engineering,Punjab Technical University,GianiZail Singh Campus,Bathinda -151001 (Punjab),INDIA Dr. Oliver Jukić Vice Dean for education,Virovitica College,MatijeGupca 78,33000 Virovitica, Croatia Dr. Lori A. Wolff, Ph.D., J.D. Professor of Leadership and Counselor Education,The University of Mississippi,Department of Leadership and Counselor Education, 139 Guyton University, MS 38677

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL. 6 NO.1 JANUARY 2016

Contents A Comparative analysis on Boundary-based Classification Techniques for Outlier Detection in WDBC Datasets D.Rajakumari, Dr.S.Pannirselvam……………….……………………………….…………………………………….[322]

www.ijitce.co.uk


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.1 JANUARY 2016

A Comparative analysis on Boundary-based Classification Techniques for Outlier Detection in WDBC Datasets D.Rajakumari Ph.D Research Scholar, Department of Computer Science, Erode Arts & Science College (Autonomous), Erode, Tamil Nadu, India. Email: rsrajakumarid@gmail.com Dr.S.Pannirselvam Associate Professor & Head, Department of Computer Science, Erode Arts & Science College (Autonomous), Erode, Tamil Nadu, India. Email: pannirselvam08@gmail.com Abstract— Record and sample of large size in data mining requires the outlaying observation detection (Outlier Detection (OD)), since they carry the necessary information. The large size and diversity introduces the limitations in outlier detection techniques. The classifiers involved in machine learning algorithms deteriorate the OD performance, since they are sensitive to noise, irrelevant features. This paper discusses the influence of Triangular Boundary-based Classification (TBC) and the Weighing Based Feature Selection and Monotonic Classification (WFSMC) on the Wisconsin Diagnosis Breast Cancer (WDBC) dataset for an effective outlier prediction. The imputation, weight computation, and ordinal feature selection prior to TBC predicts the relevant features. The normal distribution function-based triangular area support boundary region analysis to provide treatment or precautions to the patients. The points nearer to the boundary region lead to misclassification. Hence, the inclusion of monotonic constraints in the classification phase improves their accuracy. The comparative analysis between the TBC and WFSMC regarding accuracy, precision, and recall proves the effectiveness of WFSMC in real-time data mining applications.

Keywords- Data mining, Monotonic classification, Ordinal Feature selection, Outlier Detection, Weight Computation, Wisconsin Diagnosis Breast Cancer (WDBC)

1. INTRODUCTION Outlier prediction plays a significant role in data mining applications namely, fault identification and diagnosis. A multi-disciplinary effort for extracting the knowledge of data refers to data mining. The accuracy of knowledge extraction is affected due to the presence of outliers in the data set and inability to classify data records near the boundary. Generally outliers are categorized as erroneous or real. Real outliers are defined as the observations whose actual values are different from the observed value for the remaining data. Erroneous outliers are the observations that are distorted due to the misreporting errors occur during the classification process. Both the outliers exert more influence on the classification results. Hence reliable detection methods are required for the identification of outliers. Depending on the existence of labels, the outlier detection process is classified as supervised [1], Semi-supervised [2], and unsupervised [3] methods. Unsupervised methods are widely used for outlier detection,

322

since other categories require accurate and representative labels that are really expensive. Selection of relevant attributes improves the performance of data mining and avoids the over fitting. Feature selection is an essential part in successful data mining. The formation of subset of original features on the basis of specific criterion is feature selection process [4], [5]. Multi- view learning algorithms are applicable to analyse the presence of correlated and complemented information on data. But, the selection of discriminating features via multi-view learning on single view data is the challenging task. They utilizes the three information namely, data cluster centre, similarity between data and correlation between different views. Approximation of cluster labels introduced the noise due to the discrete nature leads to misguidance in feature selection. Robust spectral learning employs in unsupervised feature selection for the improvement of robustness. In general, the data in high dimensional and a large space are labelled. The simultaneous explore of labelling and un-labelling of data difficult by using the un-supervised feature selection approaches. The computation of feature score is complex for high dimensionality. Semi-supervised algorithm based on noise insensitive trace ratio criterion analyses the dimensionality reduction. The semi-supervised selection process utilizes the special label propagation method in order to explore the distribution of labelled and unlabelled data. But, the semi-supervised approaches, introduces the small-labelled sample problem in which labelled data are small and unlabelled data are large. Semi-supervised approaches are sensitive to noise from the constraints set is used. Since it urges to implement techniques that can effectively handle the outliers, by previously eliminating the outlier from the training data or by minimizing their consequences if training data is not reliable. This paper proposes a novel triangular boundary based classification approach for the effective prediction of outliers. The use of semi-supervised feature selection of high dimensional data introduces the challenge that the presence of irrelevant features affects the speed and accuracy. The occurrence of unreliable inferences during the uninformative feature selection introduces the label noise problem. The evaluation mechanism effectively performed data analysis. The feature selection process categorized into three processes based on evaluation mechanism namely, filter, wrapper and embedded. Filter based approaches consider the redundant


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.1 JANUARY 2016

variable without the knowledge about the interaction between the variables. Wrapper based models consider the interaction between the variables. But, they suffer two problems such as over fitting risks and high computational time. Embedded methods combined the advantages of both filter and wrapper based models. The main contributions of the proposed approach are • A Weighing based Feature Selection and Monotonic Classification including the imputation methods and ordinal classification methods, for the effective feature selection. • The feature weighing scheme based on imputation improves the performance of selection and classification. • A novel triangular boundary based classification approach including the training and testing algorithms, for the effective prediction of outliers. • The comparative analysis between TBC and WFSMC regarding accuracy, precision, and recall on Wisconsin Diagnosis Breast Cancer (WDBC) dataset demonstrates the efficiency. 2. Related Work This section describes the conventional works related to Outlier Detection (OD), influence of feature selection algorithms on OD. Due to the high-dimensionality, OD was a challenging task. Radovanovic et al. [6] demonstrated the distance-based outlier methods that produced more contrasting outlier scores in the high dimensional data by re-examining the reverse nearest neighbours. Angiulli et al. [7] introduced a distributed method for distance-based OD in very large data sets. They realized the efficiency and scalability improvement through experimental results with increasing node size. Pham and Pagh [8] suggested a novel random projection-based technique to estimate the angle-based outlier factor for all data points. Kriegel et al. [9] proposed a novel outlier detection model was introduced to detect the outliers that differ from the normal instances by considering the combinations of different subsets of attributes. Albanese et al. [10] proposed a Rough Outlier Set Extraction method for the theoretic representation of the outlier set by using the rough set approximations. The time consumption for an effective OD was more. Kim et al. [11] utilized the kd-tree indexing and an approximated k-nearest neighbour (ANN) search algorithm to reduce the computation time of the density-based outlier detection. Hence, the local outlier was detected effectively within a short time. The selection of high-contrast projection for maximum accuracy was difficult. Keller et al. [12] proposed a novel subspace search method was proposed for selecting the high contrast subspaces for the density-based outlier ranking for quality improvement. The determination of most likelihood outliers was difficult in the categorical data. Suri et al. [13] performed clustering and ranking to determine the set of most likely outliers. The computational complexity of the proposed algorithm was not affected by the number of outliers to be detected. An accurate detection of spatial outliers was the difficult process due to the high-dimensionality. Cai et al. [14] addressed the high-dimensional problems of spatial attributes and outliers with irregular features and detected accurately by iterative self-organizing map approach. The simultaneous evaluation of mean and variance was the necessary task in OD validation. Buzzi-Ferraris et al. [15] developed the new technique for correctly OD and

simultaneous evaluation. They combined density based and partitioning clustering method was presented for data streaming process. The high-dimensionality is the major problem observed in conventional OD techniques. The feature selection was an effective technique and an important step for dimension reduction in data mining applications. The irrelevant attributes were filtered before data mining process. Padungweang et al [16] proposed an unsupervised feature selection based on Fourier transform of the probability density function (PDF). The discrimination score extended the data orientation. The time complexity also employed in this category. Traditional feature selection methods were developed to detect the single view data. Feng et al. [17] proposed an Adaptive Unsupervised Multiuser Feature Selection (AUMFS) utilized cluster centre, similarity and correlation. Sparse regression models employed in AUMFS to predict the cluster labels. The noise and irrelevant features affected the estimation of graph laplacian and approximation of cluster labels introduced the noise. Shi et al [18] proposed the Robust Spectral learning based Feature Selection (RSFS) for the improvement of robustness in sparse spectral regression. The multimedia and web based applications handled high dimensional data. The limited number of labelled data leads to the development of semi-supervised feature selection algorithm. The semi-supervised algorithm effectively handled both labelled and unlabelled data. Liu et al [19] analysed the noise insensitive trace criterion for dimensionality reduction. They utilized the special label propagation method explored the distribution of labelled and unlabelled data. The feature selection in data mining extends the applications to video recognition. The supervised approaches were informal in the identification of relevant features due to limited number of labelled videos. Han et al [20] presented the Semi-Supervised Feature Selection via Spline Regression (S2FS2R) that combined the discriminative information and geometric structure. The speed and accuracy of semi-supervised approaches were affected by the irrelevant features. The diagnosis of disease and early prediction of stages of cancer required the quality improvement in feature selection process. Sharma et al [21] designed the data mining model by using Probabilistic Neural Network (PNN). The PNN based selection model improved the accuracy and effectiveness of the treatment. The discovery of knowledge patterns in multiclinical categorization was an important process. Jacob et al [22] analysed the various feature selection algorithms namely, fisher filtering, runs filtering, stepwise discriminant analysis and RELIEF on error rate minimization. The comparative study improved the decision making process in classification. The simultaneous tracking of multi-level expressions of genes in the clinical data performed by using microarray. The introduction of reduction technique in feature selection and cluster analysis improved the performance. Zhu et al [23] proposed hybrid Case Based Reasoning (CBR) in order to reduce the redundant features. They selected the minimal set of features from the problem domain and the redundant ones were reduced by using the neighbourhood set algorithm. The feature weight factor was calculated by using RELIEF algorithm. But, the existence of uncertainty in feature weight calculation lead to poor evaluation. Sun et al [24] solved the uncertainty problem by novel mean-variance model based feature selection

323


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.1 JANUARY 2016

algorithm. The mean and variance at discriminative instant considered in the feature weight estimation led the stable and accurate results. Several complex issues namely, class-specific nature, unbalanced data set handling, complexities in multilabelled data and noise were addressed. Yilmaz et al [25] extended the RELIEF algorithm into RELIEF-MM for multimodal fusion. The representation and reliable capabilities and the discrimination capability were utilized in weight function estimation. The deterioration of feature selection and classification performance occurs due to the relevant or irrelevant feature and noise in the traditional algorithms. Hence, this paper investigates the Weighing based Feature Selection and Monotonic Classification (WFSMC) to improve the performance. 3. Triangular Boundary Based Classification The novel triangular boundary based classification approach includes training algorithm and testing algorithms for boundary layer computation of class labels. The training dataset for the class label c is defined as

(5) (6) th

The distance for i Instance for class label is computed by using the equation (7) The upper and lower boundary layer for each class present in the training dataset are formulated by using normal distribution. These values helps to predict the accurate class for each instance. Overall Mean and Deviation are computed by using the equations given below (8)

(9)

(1)

(10) Where the single ith instance

represents the ‘m’

number of features is expressed as (2) The triangular area is computed to find the relationship between the each features present in the instance

. In the

triangular matrix main diagonal values are must be zero. Upper diagonal and lower diagonal triangular matrix values are same. In case of any updating in the upper triangular value, it will affect the lower diagonal value. Triangular average for the class is calculated as (3) The covariance between the features in the triangular matrix is used to find the linear changes of all features. Covariance Matrix for the class c is given as

In the Training algorithm, the dataset X containing ‘C’ number of class labels is considered. The input dataset containing ‘c’ number of class labels and ‘n’ number of instances is obtained. Each instance contains ‘m’ number of features. The boundary layer prediction is built through the normal distribution function, density estimation of the distance between individual instance in the training dataset and the expectation of the ‘z’ number of training records. The mean of elements ‘u’ and ‘v’ is computed by using the equation (6). To find the normal distribution function, there is a need to compute the mean and deviation of the distance of all instance containing ‘c’ class labels by using equation (5) respectively. The distribution of the distance is described by mean and deviation of the cth class. Training Algorithm Input: Output: Procedure: Step 1: Compute Triangular Area Step 2: For I = 1, 2 … z do Step 3: Step 4: End For

(4) The standard deviation between the triangular values of two instances is defined as

Step 5: Compute Step 6: Compute Covariance Matrix Step 7: For I = 1, 2 … z do Step 8: Compute

324


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.1 JANUARY 2016

Step 9: End For Step 10: Compute Step 11:

Imputation method Input: Original dataset (DS), Imputation method (I)

Compute

Step 12: Return

Output: Estimated dataset (

The boundary layer for each class label is computed by Values are taken from the using equation (10). normal distribution function of each class from the training algorithm. is a constant value varies from 0.5 to1.5. From this variance, prediction of class label varies from 91% to 98% in association with the selection of different values of . Thus if the distance is calculated for testing instance and the respective value is greater than or lesser than the boundary layer. It is considered as an outlier and the other class label for the appropriate instance is predicted.

ͳǤ Set =null; ʹǤ For each example eg in DS ͵Ǥ . ͶǤ For each feature =estimate the new value for ͷǤ ͸Ǥ ͹Ǥ ͺǤ end

Testing Algorithm Input: TX, Output: Procedure: Step 1: Compute Triangular Area Step 2: For I = 1, 2 … z do Step 3:

)

Initially, the estimated dataset value is assumed to null. is estimated for each example ( ) by Then, new value using imputation method. The process of estimation of new values performed until the whole dataset is processed. The estimated new dataset holds the conditioned distributions for specific feature. The non-parametric test is performed to evaluate the changes of features as a similarity measure of two distributions (

). In general, the non-parametric with the

test for two distribution functions

Step 4: For c= 1, 2 … C do Step 5: Compute Step 6:

described by

corresponding indicator functions

(11) between two functions by The statistic measure ( Kolmogorov – Smirnov statistic such that the degree of superiority (sup) of feature changes is follows: (12)

Step 7: Step 8: End if Step 9: End For Step 10: End For

(13)

In the testing phase, the triangular area of the testing instance to be calculated by using equation (3). Then, the distance between the triangular area for testing instance and the mean triangular value is computed by using equation (8). The computed distance is compared with each class boundary layer, and the class label is stored in the respective satisfied boundary layer class. 4. Weighing-Based Feature Selection and Monotonic Classification The proposed Weighing based Feature Selection and Monotonic Classification includes the imputation methods, ordinal feature selection, and monotonic constraints are explained in this section. A. Imputation of Dataset The initial process of proposed WFSMC is weighing algorithm consists of two phases namely, imputation of dataset ) and computation of weight. The new estimated dataset ( calculated from the original dataset (DS) by using imputation algorithm as follows:

- set of features in original dataset -set of features in the imputed dataset (

and . The values

of statistic measure in the ranges from 0 to 1, i.e. . The feature with low value of has small influence on similarity function and the feature with high value has high influence on similarity function. The statistic transformed to weight by using following equation (14) (14) The weight factor computation used to highlight the influence of changed features on distance estimation in the selection process. B. Ordinal Feature Selection The margin based feature selection involved in this paper. The distance between the instance and distance border refers to margin. Based on the decision, margin provides the geometric measure of weighting of feature subset as shown in Fig. 1.

325


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.1 JANUARY 2016

attributes. The ordinal decision system contains three subsets. ) for and for The k nearest misses ( ( (

Fig. 1 Large margin The ordinal based feature selection performed on the basis of monotonic constraints. The ordinal decision system contains , attributes (At), decisions finite set of instances and the domain of attributes (

are

computed.

Similarly

are computed for

nearest

hits

. Hence, there are

two margins namely upward and down ward margin is constructed. The decision boundary is defined by using the combination of upward and downward margins as shown in fig. 2. The features with large values are termed as poor estimated features by the imputation method. Higher weight is assigned to it since they are preferred for distance estimation.

. In

monotonic model, the domain and attributes follows the structural order. The nested decision structure is described as follows: (15) In general, the function refers the monotone for all instances in original (X) and estimated dataset (Y) satisfies the constraint as follows: (16) The weight updating rule in (14) combined with the monotonic constraint of k-nearest neighbours selects the ordinal based feature selection Ordinal feature selection (With constraint) For t=1: T Select the instance from the space in random manner Compute the nearest hit for each sample and Compute the nearest miss for each sample and For i=1: I Update the weight factor by using equation (14) End End Initially, the instance from the space is selected. Then, the nearest hit and miss for each instances are identified. Then the weight factor is updated by using the equation (14) for all the features (I). The computation of distances for each instance x requires the O (mI) operations. The operations are doubled O ) for entire space (S=m). Hence, ordinal based feature ( selection is more suitable in terms of computational complexity. C. Monotonic Classification The monotonic decision system contains N ordered classes. The order of the decision system arranged in the range as . For a given instance x, the instance with decision value instance with

provided the better features and the

Fig. 2 Margin for monotonic classification The features with low values are easily estimated by imputation methods. Lower weights are assigned to these features since they are not preferred for distance calculation. 5. Comparative Analysis This section explains about the performance analysis results of the proposed Weighing based Feature Selection and Monotonic Classification (WFSMC). Our proposed approach is tested by using the WDBC dataset and results are compared with the Triangular Boundary-based Classification (TBC) regarding the parameters of no. of attributes, accuracy, precision, and recall. A. WDBC dataset The Wisconsin Diagnostic Breast Cancer (WDBC) contains various attributes namely, diagnosis, ID number and real valued features. There are ten real valued features namely, radius, area, perimeter, smoothness, texture, compactness, concave points, concavity, symmetry and fractal dimension computed from digitized image of breast mass. The WDBC dataset are multi-variant dataset that comprises 569 instances and 32 attributes. WDBC describes the characteristics of cellnuclei in an image. B. No. of Attributes The variation of attributes with the boundary-based classification algorithms as in Table 1 shows that the WFSMC utilizes less number of attributes. The attributes required for TBC algorithm is 15 and for WFSMC 12. The weight computation and the ordinal feature selection in WFSMC reduce the attributes by 20 % compared to TBC. TABLE 1 No. of Attributes Vs. Classification Algorithms Algorithm No. of attributes TBC 15 WFSMC 12

provided the worse features with respect to 326


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.1 JANUARY 2016

C. Accuracy The accuracy variation with the boundary-based classification algorithms as in Table 2 shows that the WFSMC offers an accurate OD. TABLE 2 Accuracy Vs. Classification Algorithms Algorithm Accuracy (%) TBC 97.54 WFSMC 98.54 The accuracy of TBC algorithm are 97.54 % and for WFSMC 98.54 %. The weight computation and the ordinal feature selection in WFSMC improve the accuracy by 1 % compared to TBC. D. Precision The ratio of the number of true positives to the total number of true positives and false positives refers to precision. The status of precision determines the false positive values. The high value of precision indicates the low value of false positives. TABLE 3 Precision Vs. Classification Algorithms Algorithm Precision TBC 0.9541 WFSMC 0.9631 Table 3 shows the visual representation of precision analysis with the boundary-based classification algorithms. The precision of TBC algorithm are 0.9541 and for WFSMC 0.9631. The weight computation and the ordinal feature selection in WFSMC improve the precision by 0.94 % compared to TBC. E. Recall The sensitivity or true positive rate defines the recall. Improving the recall can often decrease the precision, because it gets harder to be precise with the increase in the sample space. TABLE 4 Precision Vs. Classification Algorithms Algorithm Recall TBC 0.9811 WFSMC 0.9925 Table 4 shows the graphical representation of recall analysis with the boundary-based classification algorithms. The recall of TBC algorithm are 0.9811 and for WFSMC 0.9925. The weight computation and the ordinal feature selection in WFSMC improve the recall by 1.15 % compared to TBC. 6. CONCLUSION This paper discussed the problems such as diverse large size dataset availability, classifier sensitivity of noise and irrelevant features in the Outlier Detection (OD) techniques. This paper compared the OD performance of Triangular Boundary-based Classification (TBC) and Weighing-based Feature Selection and Monotonic Classification algorithm on WDBC dataset. The ordinal feature selection employment prior to triangular boundary area isolated the relevant features from the irrelevant

features. The boundary region analysis by normal distribution function in TBC supported an effective treatment to the patients. Then, the inclusion of monotonic constraints in classification phase improved the accuracy. The comparative analysis between TBC and WFSMC regarding the performance parameters of accuracy, precision, and recall showed the effective OD in real-time data mining applications.

[1] [2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

327

8. REFERENCES C. C. Aggarwal, "Supervised outlier detection," in Outlier Analysis, ed: Springer, 2013, pp. 169-198. A. Daneshpazhouh and A. Sami, "Entropy-based outlier detection using semi-supervised approach with few positive examples," Pattern Recognition Letters, vol. 49, pp. 77-84, 2014. K. Noto, C. Brodley, and D. Slonim, "FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection," Data mining and knowledge discovery, vol. 25, pp. 109-133, 2012. S. Beniwal and J. Arora, "Classification and feature selection techniques in data mining," International Journal of Engineering Research & Technology (IJERT), vol. 1, 2012. H. Liu, H. Motoda, R. Setiono, and Z. Zhao, "Feature Selection: An Ever Evolving Frontier in Data Mining," FSDM, vol. 10, pp. 4-13, 2010. M. Radovanovic, A. Nanopoulos, and M. Ivanovic, "Reverse nearest neighbors in unsupervised distancebased outlier detection," IEEE Transactions on Knowledge and Data Engineering, vol. 27, pp. 13691382, 2015. F. Angiulli, S. Basta, S. Lodi, and C. Sartori, "Distributed strategies for mining outliers in large data sets," IEEE Transactions on Knowledge and Data Engineering, vol. 25, pp. 1520-1532, 2013. N. Pham and R. Pagh, "A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data," in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp. 877885. H. Kriegel, P. Kroger, E. Schubert, and A. Zimek, "Outlier detection in arbitrarily oriented subspaces," in IEEE 12th International Conference on Data Mining (ICDM), 2012 2012, pp. 379-388. A. Albanese, S. K. Pal, and A. Petrosino, "Rough sets, kernel set, and spatiotemporal outlier detection," IEEE Transactions on Knowledge and Data Engineering, vol. 26, pp. 194-207, 2014. S. Kim, N. W. Cho, B. Kang, and S.-H. Kang, "Fast outlier detection for very large log data," Expert Systems with Applications, vol. 38, pp. 9587-9596, 2011. F. Keller, E. MĂźller, and K. BĂśhm, "HiCS: high contrast subspaces for density-based outlier ranking," in IEEE 28th International Conference on Data Engineering (ICDE), 2012 2012, pp. 1037-1048. N. R. Suri, M. N. Murty, and G. Athithan, "An algorithm for mining outliers in categorical data through ranking," in 12th International Conference on


INTERNATIONAL JOURNAL OF INNOVATIVE TECHNOLOGY AND CREATIVE ENGINEERING (ISSN:2045-8711) VOL.6 NO.1 JANUARY 2016

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

Hybrid Intelligent Systems (HIS), 2012 2012, pp. 247-252. Q. Cai, H. He, and H. Man, "Spatial outlier detection based on iterative self-organizing learning model," Neurocomputing, vol. 117, pp. 161-172, 2013. G. Buzzi-Ferraris and F. Manenti, "Outlier detection in large data sets," Computers & chemical engineering, vol. 35, pp. 388-390, 2011. P. Padungweang, C. Lursinsap, and K. Sunat, "A discrimination analysis for unsupervised feature selection via optic diffraction principle," IEEE Transactions on Neural Networks and Learning Systems, vol. 23, pp. 1587-1600, 2012. Y. Feng, J. Xiao, Y. Zhuang, and X. Liu, "Adaptive unsupervised multi-view feature selection for visual concept recognition," in Computer Vision–ACCV 2012, ed: Springer, 2013, pp. 343-357. L. Shi, L. Du, and Y.-D. Shen, "Robust Spectral Learning for Unsupervised Feature Selection," in IEEE International Conference on Data Mining (ICDM), 2014 2014, pp. 977-982. Y. Liu, F. Nie, J. Wu, and L. Chen, "Efficient semisupervised feature selection with noise insensitive trace ratio criterion," Neurocomputing, vol. 105, pp. 12-18, 2013. Y. Han, Y. Yang, Y. Yan, Z. Ma, N. Sebe, and X. Zhou, "Semisupervised feature selection via spline regression for video semantic recognition," IEEE Transactions on Neural Networks and Learning Systems, vol. 26, pp. 252-264, 2015. N. Sharma and H. Om, "Usage of Probabilistic and General Regression Neural Network for Early Detection and Prevention of Oral Cancer," The Scientific World Journal, vol. 2015, 2015. S. G. Jacob and R. G. Ramani, "Discovery of Knowledge Patterns in Clinical Data through Data Mining Algorithms: Multi-class Categorization of Breast Tissue Data," International Journal of Computer Applications (IJCA), vol. 32, pp. 46-53, 2011. G.-N. Zhu, J. Hu, J. Qi, J. Ma, and Y.-H. Peng, "An integrated feature selection and cluster analysis techniques for case-based reasoning," Engineering Applications of Artificial Intelligence, vol. 39, pp. 1422, 2015. Y. Sun, X. Lou, and B. Bao, "A novel relief feature selection algorithm based on mean-variance model," Journal of Information & Computational Science, vol. 8, pp. 3921-3929, 2011. T. Yilmaz, A. Yazici, and M. Kitsuregawa, "RELIEFMM: effective modality weighting for multimedia information retrieval," Multimedia systems, vol. 20, pp. 389-413, 2014.

328


NOVEMBER 2015 VOL- 5 NO - 11

@IJITCE Publication @IJITCE Publication


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.