Abstract -
Fraud App Detection using Machine Learning
Dr. Pallam Ravi1 , Anirudh Bhandari2, Avusula Poojitha3 , Baki Harish41Assistant professor, Dept. of Computer Science and Engineering, Anurag Group of Institutions Ghatkesar, Telangana, India

*2,3,4 Student, Dept. of Computer Science and Engineering, Anurag Group of Institutions Ghatkesar, Telangana, India ***
It's more important than ever to be informed in order to determine which mobile applications are safe and which are not as a result of the increase in the number of mobile applications used in everyday life. It is impossible to pass judgment. Our system is basedonfourparametersthatinclude ratings, reviews, in-app purchases, and Contains ads to predict. Our system compares three models Decision Tree classifier, Logistic Regression, and Naïve Bayes. These models were further analyzed on four parameters of F1 score, Recall, Precision, and Accuracy. The algorithmwhichgivesthehigher accuracy will be considered for our model.
Key Words: Frauddetection,Decisiontree,NaïveBayes, Logisticregression,Playstore,userrating.
1. INTRODUCTION
Fraudappdetectionusingmachinelearningistheprocessof usingstatisticalalgorithmsandpredictivemodelstoidentify andpreventfraudulentactivitiesinmobileapplications.As mobile apps become increasingly popular, fraudsters are findingnewwaystoexploitvulnerabilitiesintheseappsto commitvarioustypesoffraud,includingaccounttakeover, identitytheft,andfinancialfraud.Inordertopreservetheir currentsuccess,businesses,andsoftwaredevelopersputa lot of time and attention into finding clients. This strong competitionstemsfromtheneedtodemonstratethequality oftheirproducts.Userreviewsandupdatesoneachprogram thatisavailablefordownloadarecrucialfactors.Thiscanbe astrategyforengineerstoidentifytheirflawsandintegrate themintothedesignofanewproductthatmeetstheneeds ofthepeople.UnderthetreesAppcreatorstypicallymajorly supporttheirappsand,asaresult,managetherankinthe App Store, as opposed to depending on conventional marketing techniques. This is frequently achieved by increasingthenumberofdownloadsandauditsutilizingsocalled"botbots"or"waterarmies."
Machinelearningcanbeusedtodetectfraudulentactivities inmobileappsbyanalyzing patternsindata, suchasuser behavior, device information, and network traffic. These patterns can be used to build models that can identify unusual or suspicious behavior and flag it for further investigation.
Thecapabilityofmachinelearningforfraudappdetectionto continuouslylearnandadapttonewtypesoffraudisoneof its main benefits. As fraudsters come up with new tactics, machine learning models can be trained to identify these newpatternsandpreventfraudulentactivitiesbeforethey cancausesignificantharm.
Overall, fraud app detection using machine learning is a critical tool for protecting users and businesses from the financialandreputationaldamagecausedbyfraudinmobile applications.
2. LITERATURE SURVEY
[1]DetectionofFraudRankingforMobileAppsUsingIP AddressRecognitionTechnique
Themobilebusinessisexpandingquickly,whichhasledto anincreaseinthenumberofmobileappsavailable.Withthe app stores on Google Play and Apple, users can choose to downloadprogramsforafeeorforfree.Therankingofan app is important since highly rated apps have a higher chance of being discovered by consumers than apps with lowerrankings.Someappdeveloperscheattoincreasetheir app's ranking to get a high rating. Thus, a ranking fraud detection system is required to spot rankings obtained fraudulently.Wedesignandbuildarankingfrauddetection system to find fraudulent rankings. The system compiles data about the app and user reviews and stores it in a database. With this information, the reviews are preprocessed,andsentimentalanalysisisdone.Theresultsare evaluatedincomparisontotheapp'srating,whichisusedto spotrakingfraud.
[2]Detectionoffraudappsusingsentimentanalysis
Thesedays,themajorityof ususemobiledevicesrunning AndroidoriOS,andwefrequentlymakeuseoftheplaystore orappstorefunctionality.Bothmarketplacesofferalarge selectionofsoftware,howeverunfortunatelysomeofthose applicationsarefraudulent.Theseapplicationscandamage phonesand steal data. Hence, for storeusers to recognize suchprograms,theymustbelabeled.We,therefore,propose awebapplicationthatwillmanagethedata,feedback,and applicationevaluation.Asaresult,itwillbeeasiertotellifan applicationisfraudulentornot.Severalapplicationscanbe processedatonceusingtheonlineapplication.Also,auser
can not always find reliable or honest product reviews online. As a result, the admin will assess the reviews and comments,makingiteasyfortheadmintodecidewhether theapplicationishonestordishonest.
[3]Detectionofmobileapplicationsleakingsensitivedata

The most common and frequently used gadget among individualsisasmartphone.Theyhostawidevarietyofdata kinds,bothpublicandprivate.Thedevelopmentofmalicious programstostealpersonalinformationfromsmartphonesa recent problem. Personal information stored on these devices, such as SMS messages, contacts, videos, GPS coordinates, etc., therefore needs a unique security mechanismtopreventhackersfromstealingit.Whenmobile applicationsattempttoaccesssensitivedatausingAndroid permissions,thekeyissueisthedisclosureofthatdata.This conditionindicatesthatsensitivedatawilllikelyleakfrom these mobile applications. Thus, there is a need for some remedies,especiallyinlightofsmartphonedataleaks.The twoobjectivesofthisworkaretoidentifyapplicationsthat leak sensitive data and to analyze program if they have malevolent intentions or not. A sensitive data leakage avoidancemethodforAndroidsystemswassuggestedinthis study.Tofindmobileapplicationsleakingcriticaldata,the J48 classification system was utilized. Also, the K-Means clusteringalgorithmwasusedtodeterminewhetherornot trustedmobileapplicationsobtainedfromtheGooglePlay Store share any similarities with malicious mobile applications.
3. PROPOSED SYSTEM
We propose a system that would identify such fake applications on the play or app store. We can acquire the probability of determining whether an app is fake or not, thereforewepresentasystemthatusesfourfeaturesthat arein-apppurchases,containsads,ratings,andreviewsto determinetheprobabilityofanappwhetherit'sscamming itsconsumersornot.Thesolepurposeofthegivenproposed system is majorly to review the fraud detection of google play store applications and to use the four-parameter methodstodifferentiatecertainfraudulentapplicationsor commonly referredtoasspamapplications.Experimental analysisisperformedondifferenttypesofmethodologyin the proposed manner for the detection of fraud or fake applications.Oursystemwillreceivefraudwithfourtypesof evidence, such as ad-based ratings, in-app purchases, and evidence-basedreviews.Inaddition,thedevelopment-based integrationapproachincorporatesallfouraspectstodetect fraud.Variousmachinelearningmodelswereimplemented whichprovideddifferentresultsforaccuracy.Byanalysis, we found that our given proposed method provides 85% accuracycomparedtootheralgorithms.Whileindependent thinkingstillexists,thedecisiontreesectionperformsbetter compared to other models such as the recession and the naïve Bayes. It is an intuitive algorithm for separation problems.Itisareliablereal-timeguess,asetbackproblem.
Decisiontreescanmanagenon-lineardatasetseffectively.It plays a role in decision-making in various fields of life, including engineering, social planning, business, and even law.

Fig -1:Thearchitectureoftheproposedsystem
3.1 ALGORITHMS
a. NaïveBayesAlgorithm
The Naive Bayes algorithm, a supervised learning algorithm,usestheBayestheoremtosolveclassification problems.Itisespeciallyhelpfulforclassifyingcontent thatincludestop-notchtrainingdatabases.ItsClassifier is one of the most straightforward and effective programmingalgorithms,aidinginthedevelopmentof machine-learningmodelsthatcanproducepredictions morequickly.Itisamethodofpotentialseparationthat foretells the result based on the object's potential. Emotional analysis, spam filtering, and article classificationareamongadditionaltypicalusesforthe NaiveBayesalgorithm.Asaresult,itisthemostwidely utilizedfixfortext-sharingissues.
b. LogisticRegression
Logistic regression apart from its name, is a classificationmodelevenafterhavingregressioninthe name. Logistic regression is defined as a process of modeling the probability of a distinct outcome when given an input variable. It is a type of classification modelandisveryeasytoemphasizeanditalsoachieves reallygoodexecutionwithlinearlydistinctclasses.This kindofanalysiscanhelpinpredictingthelikelihoodof an event happening or a choice being made. It is a considerably used algorithm for classification in the industry. The logistic regression model is a statistical methodology for binary classification which can be generalized to multiclass classification. It is a considerably useful analysis method for various classification problems, determining whether a new samplefitsbetterintoacategory.
c. DecisionTreeAlgorithm
Both classification and regression are accomplished using a supervised method known as Decision Trees. The decision tree aims to construct a model that predicts the value of the target variable using basic decision ruleslearnedfrom numerous data pieces. By dividingthesourcesetintodifferentsubsetsbasedon thefeaturevaluetest,thedecisiontreecanbelearned Recursive partitioning is the process of iteratively repeating the following operation on every derived subset.Whenthevalueofthesubsetataspecificnode matchesthatofthetargetvariable,therecursionends, or when splitting no longer improves the predictions. Thebuildingofadecisiontreeclassifiertechniquedoes notneedanydomainknowledgeorparametersetting, and because of this, it is appropriate for probing knowledge discovery. High-dimensional data can be handled well by decision trees. In broad terms, the decision tree classifier is considered to have good accuracy.


4. RESULTS
The model achieved an accuracy of 88.7% on the test set, withaprecisionof86.8%,recallof85.4%,andF1scoreof 88.1%.Theseresultsindicatethatthemodelcaneffectively identifyfraudulentapps,withrelativelyfewfalsepositives. Severaldifferentmachinelearningalgorithmsweretested, includinglogisticregression,decisiontrees,andNaiveBayes. TheDecisiontreealgorithmperformedthebest.Theresults ofthisstudydemonstratethatmachinelearningcanbean effective tool for detecting fraudulent apps. The high accuracy,precision,andrecallvaluesindicatethatthemodel cancorrectlyidentifyalargeproportionoffraudulentapps while minimizing the number of false positives. One potentiallimitationofthemodelisthatitwastrainedona relativelysmalldataset,whichmaynotberepresentativeof allpossibletypesoffraudulentapps.Inaddition,thedataset was collected from a single source, and may not be generalizabletootherappmarketplaces.Toaddressthese limitations,futurestudiescoulduselargerandmorediverse datasets,orincorporatedatafrommultiplesources.
5. FUTURE ENHANCEMENTS
1. Continuous learning: Instead of training the machinelearningmodelonastaticdataset,itcanbe continuouslyupdatedwithnewdatatoimproveits performanceovertime.Thiscanbeachievedusing techniquessuchasonlinelearningorreinforcement learning.
2. Ensemble methods: Ensemble methods involve combining multiple machine learning models to improve their performance. This could involve combining different algorithms or using multiple modelswithdifferentsetsoffeatures.
3. ExplainableAI:Onelimitationofmachinelearning models is that they can be difficult to interpret. Using explainable AI techniques, such as LIME or SHAP, can help to provide insights into how the model is making its predictions, and identify potentialbiasesorareasforimprovement.
4. Incorporating more data sources: The machine learningmodelcanbetrainedonawiderrangeof datasources,includingsocialmedia,webpages,and app reviews, to identify patterns and behaviours associatedwithfraudulentapps.
Overall, these enhancements can help to improve the accuracyandeffectivenessofthemachinelearningmodelfor detecting fraudulent apps and ensure that it remains effectiveindetectingnewtypesoffraudastheyemerge
6. CONCLUSION
In the Era of growing technology, the threat to security is alsobecomingamajorissue,andapartofit,todaywehave several apps listed on the Google app stores which also includevariousfraudappsthatareathreattousers’privacy anddata.Inourmodel,wehaveworkedpreciselytodetect fraudulent software using 4 parameters including scales, reviewscores,in-apppurchases,andcontentadditions.We compared the accuracy of the three algorithms when the resolutiontreeappearedtobe85%higher.Theframework

is measurable and can be expanded to further proven domain-based fraudulent evidence. Demonstrated the effectiveness of the proposed system, algorithm detection measurement,andstandardizationoflevelfraudoperations. This can be used effectively to rate fraudulent play store applicationsinplaystoreforsecuritypurposes.
7. REFERENCES
1. Esther Nowroji, Vanitha, “Detection Of Fraud Ranking For Mobile App Using IP Address RecognitionTechnique”,vol.4.
2. Javvaji Venkataramaiah, Bommavarapu Sushen, Mano. R, Dr. GladispushpaRathi, “An enhanced mining leading session algorithm for fraud app detectioninmobileapplications”
3. S.R.Srividhya, S.Sangeetha – “A Methodology to DetectFraudAppsUsingSentimentAnalysis”
4. Keerthana. B, Sivashankari.K and Shaistha Tabasum.S,“DetectingMalwaresandSearchRank Fraud in Google Search Using Rabin Karp Algorithm”,IJARSE,7(02),2018,pp.504-527.
5. ShashankBajaj,NikhilNigam,PriyaVandana,Srishti Singh, “Detection of fraud apps using sentiment analysis”, International Journal of Innovative ScienceandResearchTechnology.
6. Harpreet Kaur, Veenu Mangat and Nidhi, ― “A SurveyofSentimentAnalysistechniques”
7. InternationalconferenceonI-SMAC(IoTinSocial, Mobile, Analytics and Cloud) (I-SMAC), 2017, pp. 921
8. JingWan,MufanLiu,JunkaiYiandXuechaoZhang, “Detecting Spam Webpages through Topic and Semantics Analysis”, IEEE Global Summit onComputerandInformationTechnology(GSCIT), 2015,pp.83-92.
9. Navdeep Singh, Prashant Kr. Pandey and Mr.Srinivasan, ― “Improved Discovery of Rating Fake for Cellular Apps”, IEEE International ConferenceonScienceTechnologyEngineeringand Management(ICONSTEM),2016,pp.135-140.
10. Weiman Wang, Restricted Boltzmann Machine. GitHub. Aug 2017. [Online] Availablehttps://github.com/aaxwaz/Frauddetection-usingdeep learning/blob/master/rbm/rbm.py.
11. Dubey Veena, G. D. (2016). Sentiment Analysis Based on Opinion Classification Techniques: A Survey.InternationalJournalofAdvancedResearch inComputerScienceandSoftwareEngineering,5358.
12. Ranking fraud Mining personal context-aware preferencesformobileusers.H.Zhu,E.Chen,K.Yu, H.Cao,H.Xiong,andJ.Tian.InDataMining(ICDM), 2012 IEEE 12th International Conference on, pages1212–1217,2012.
13. NandimathJyoti,K.B.(2017).EfficientlyDetecting andAnalyzingSpamReviewsUsingLiveDataFeed. InternationalResearchJournalofEngineeringand Technology(IRJET),1421-1424.

14. Detecting product review spammers using rating behaviors.E.-P.Lim,V.-A.Nguyen,N.Jindal,B.Liu, and H. W. Lauw In Proceedings of the 19th ACM international conference on Information and knowledgemanagement,CIKM‟10,pages939–948, 2013.
15. Detection for mobile apps H. Zhu, H. Xiong, Y. Ge, andE.Chen.Aholisticview.InProceedingsofthe 22ndACMinternationalconferenceonInformation andknowledgemanagement,CIKM‟13,2013