CREDIT CARD FRAUD DETECTION IN R

Page 1

IN R

Nowadays credit card frauds are drastically increasing in number as compared to earlier times. Criminals are using fake identity and various technologies to trap the users and get the money out of them. Therefore, it is very essential to find a solution to these types of frauds. In this project we will be designing a method or model to detect fraudulent activity in credit cardtransactions.As thetechnology ischanging andbecomingmoreand moreadvancedday byday,it isbecoming more and more difficult to track the behavior and pattern of criminal activities. Through this project we will be able to provide a solution that can make use of technologies such as Machine Learning and Data Visualization using R. Hence, easingtheprocessofdetectionoffraudulentcardtransactions.

This is a very relevant problem that demands the attention of communities such as machine learn ing and data science wherethesolutiontothisproblemcanbeautomated.Frauddetectionin volvesmonitoringtheactivitiesofpopulationsof usersinordertoestimate,perceiveoravoidob jectionablebehavior,whichconsistoffraud,intrusion,anddefaulting. This problem is particularly challenging from the perspective of learning, as it is characterized by various factors such as class imbalance. The number of valid transactions far outnumber fraudu lent ones. Also, the transaction patterns often changetheirstatisticalpropertiesoverthecourseoftime. These are not the only challenges in the implementation of a real world fraud detection system, however. In real world examples,themassivestreamofpaymentrequestsisquicklyscannedbyautomatictoolsthatdeterminewhichtransactions toauthorize.

Tocollectdatafromatrustedsourceandanalyzethedata.

Tomonitortheactivitiesofthepopulationofusersinordertoperceiveoravoidobjectionablebehavior.

1.3 Motivation

FRAUD

Aryan1 , Chandana

Yelamancheli2 , Karthik Kumar Reddy Kota3 , Pujayant Kumar4 , Nithesh Derin Joan O5 , Kunta Prasanth Kumar6 ***

As we aim at detecting fraudulent transactions, we would be getting a Dataset from a reliable source and then trainingit,andtestingitusingvariousmethodsandalgorithmsinR.WewilldevelopMachineLearningalgorithms,whichwill enableustobeabletoanalyzealargerdatasetandtheoneprovidedtous.Thenwiththeapplicationofprocessingofsomeof theattributeprovidedwhichcanfindaffectedfrauddetectioninviewingthegraphicalmodelofDataVisualization.

Tofindoutthepreventivemeasurestopreventsuchfraudulentpracticesinfuture.

CREDIT CARD DETECTION

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page582

1.1.1IntroductionBackground

1.2 Objective

Thebasicobjectivesoftheprojectsarelistedbelow:

TovisualizetheongoingtrendinsuchfraudsbyusingadvancedvisualizingtoolssuchasR.

Abstract:

Tostudytheunauthorizedandunwanted‘fraud’increditcardtransactions.

Rishav Sowmya.

Tuned analytical server with most optimal model for fraud detection Limited to only machine learning approaches

3. Literature Survey

Hardware Requirements Computer

To detect credit card transactionsfraudulent

Project Resource Requirements 2.1

2. So†ware andvariousR andlibrariesrelatingtoML 2.2 with8GB+RA

Deep learning detectingfraudin credit card transactions utilized a high scalabilityimbalanceproblemsdetectionfraudpastenvironmentcomputingdistributedperformance,cloudtonavigatecommonsuchasclassand

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page583

Requirements Rsoftware,

Comparable resultsto machine learning approaches Xuan,S.,Liu,G., Li,Z.,Zheng,L.,Wang,S., &Jiang,C.[4] Randomforest for credit card fraud To detect credit card fraudulent Used two kinds randomof forests are used Only tested on pertainingdatasets to

3.1 Background Inthissection,wehaveconductedaliteraturesurveyonvariousresearchpapersdealingwithpredictionoffraudlent credit card transactions. Accordingly, research papers [1 15] are reviewed and analyzed based on various approaches and methodologiesused.

packages

Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare,S.A [2] Used f naïve kbayes,nearestneighbor and regressionlogistic on highly skewed credit card frauddata Credit card fraud detection using machine techniqueslearning naïve bayes, k nearestneighbor get accuracy as high as97.92% and97.69% Logistic regression hasanaccuracy of 54.86% Roy, A., Sun, J., Mahoney, R., Alonzi, L., Adams, S., & Beling, P [3] Used ANN resultsparametersvariouscomputingbypoweredcloudandfinetunesforbetter

3.2 LiteratureAuthorsreview Method Purpose Advantages Disadvantages

Patil, S., Nemade, V., & Soni, P. K. [1] Proposedinterfacin g of SAS with curvesROCDescisionk.frameworHadoopUsedtrees,

Addressed the variousBig Data Methods,tools and technologies that can be applied, Complex prediction model Gamon,M.[7] Used SVMlargewith feature reductionwithcombinationvectorsinfeature To study classificationSentiment noisyandfeedbackcustomerondatadealwithdata

detection ForestusingtransactionsRandom to train the behavior featuresof normal and transactionsabnormal china

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072

Achieved high accuracy n data that Learning.Supervisedannotatorchallengesclassificationpresentevenforahumani.e. Complex prediction model Leppäaho, E., Ammad ud din, M., & Kaski,S.[8] Using packageGFAin R for CCFD GFA:explorato ry analysis of multipledata In depth analysis of GFA Package which provides a full pipelinefor factor analysis Does not explain GFAsapplicationsto reallifescenarios sources with group factor analysis of multiple data sources that are represented as matrices with samplescooccurring Andrienko, G., Andrienko,N.,Drucker,S.,Fekete,J. D.,Fisher, D., Idreos, S., ... & Stonebraker, M [9] Studies various related concept s nformat ion Visualiz ation, Human Comput er Interacti on, MachineLearnin g, Data manage ment & Mining, and Comput er Graphic Tostudy AnalyticsVisualizationApplicationsEmergingChallengesResearchFutureandinBigDataand

Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., He Guelton, L., & Caelen, O. [5]

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page584

Authors have made a MemoryShortandForestbetweencomparisonRandom(RF)LongTerm(LSTM)

Thereporthas been drafted by the contributions of communitiesdiverseindustry,academiascientistsdistinguishedfourteenfromandandrelated Complex to understand

To study and compare LSTMand RandomForeston problemCCFD Concludes to use a combinationoftwo Concluded their study with a discussion on both andpractical scientific challenges that remainunsolved Elgendy, N., & Elragal, A. [6] Used Big Data Analytics for CCFD To detect credit card fraud using Big Data Analytics

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | 585

To study Credit card fraud detectionbigusingdata analytic parallelizationIntroduced of the auto associative neural networkin architecturehybridthe to achieve speed up Complexprediction model classificati on in computatioSpark nal framework

Kamaruddin, S., &Ravi,V[10] Usedhybridarchitecture of classforNeuralAutoOptimizatiSwarmParticleonandAssociativeNetworkone

Page

Maniraj,S&Saini,Aditya & Ahmed, Shadab & Sarkar,Swarna[11] Used deployment of multiple algorithmsdetectionanomalyassuch Local Outlier Factor and Isolation Forest algorithm To study Credit card fraud detection using LearningMachine ScienceDataand incorrectMinimized fraud detection or false positives comaprableAccuracy to exsitingmodels

Applies FraudCreditmethodsLearningMachinevariousforCardDetection

Varmedja, Dejan & Karanovic,Mirjana & Sladojevic, Srdjan & Arsenovic, Marko & Anderla,Andras[12] Used PerceptronMultilayerNaiveRandomRegression,LogisticoversamplitechniqueSMOTEwasusedforng.UsedForest,Bayesand

Maniraj,S&Saini,Aditya & Ahmed, Shadab & Sarkar,Swarna.[13] Used deployment of multiple algorithmsdetectionanomalyassuch OutlierLocal Factor and To study Credit card fraud detection using techniquesLearningMachine incorrectMinimized fraud detection or false positives comaprableAccuracy to exsitingmodels

Resultsshow that each canalgorithmbeused for credit card accuracydetectionfraudwithhigh modelProposedcan beforused detection of otherirregularities.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page586

4. Proposed Methodology

Throughourextensivestudywefoundthatcreditcard frauddetectiondata arehighly imbalanced.Beforeconductingany kindofpredictiononittheimbalanceneedtobedealtwith.Alotofmachinelearningaswellasdeeplearningmodelsgives similarresultsifnotbetter.

Used complicated for prediction and dealing with class imbalance

Designed and assessed a novel learning strategy that addresseseffectively class conceptimbalance, latencyverificationanddrift,

Isolation algorithmForest Andrea Dal Pozzolo; Giacomo Boracchi; Olivier Caelen; Cesare Alippi; Gianluca Bontempi.[14]

Used combinatio n of supervisedvarious and unsupervis ed methods for credit detectionfraudcard

To propose a hybrid technique that improvetechniquesunsupervisedsupervisedcombinesandtothe detection.accuracyindeedefficientcombinationthatresultsExperimentalshowtheisanddoesimprovetheofthe Complex prediction model accuracydetectionfraud

4.1 Proposed Architecture

WeproposetomakeaCreditCardFraudDetectionSysteminRlanguagebymakinguseofMachineLearningandadvanced R concepts. We would be incorporating various algorithms like Decision Tress, Artificial Neural Networks, Logistic RegressionandGradientBoostingClassifier.Inordertocarryoutthetaskofcreditcardfrauddetection,wewillbemaking useofaCreditCardTransactionsdatasetconsistingofamixoffraudaswellasnon fraudulenttransactions.

3.3 Summary

4.2 Method Used Wehavereferredvariousresearchpaperstoidentifythevariouscomponentsthatmightberequiredinourproject.Wealso referredfewwebsitestounderstandthevariouspackagesandlibrariesinRthataretobeused.

F. Carcillo, Y.A. Le Borgne,O.Caelen,Y. Kessaci,F. Oblé, G.Bontempi[15]

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Detections driftinareal world streamdata

Proposed, a formalization of the fraud problemdetection that describesrealistically the conditionsoperating of Fraud tested their research on more than 75 andunbalanceimpactdemonstratedtransactionsmillionandtheofclassconcept

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page587 5. Implementation Details and User Manuals 5.1 Implementation Details and User Manual Step 1: GettingTheDataSet: > library(ranger) > library(caret) > library(data.table) > creditcard_data < read.csv("/Users/kreet/Desktop/academic\ files/sem\ 4/DATA\ VISUALIZATION/project/Credit card dataset/creditcard.csv")>creditcard_data Time V1 V2 V3 V4 V5 V6 1 0 1.359807134 7.278117e 022.5363467381.3781552243 3.383208e 014.623878e 01 2 01.1918571112.661507e 010.1664801130.44815407856.001765e 02 8.236081e 02 3 1 1.358354062 1.340163e+00 1.7732093430.3797795930 5.031981e 011.800499e+00 4 1 0.966271712 1.852260e 01 1.792993340 0.8632912750 1.030888e 021.247203e+00 5 2 1.1582330938.777368e 011.5487178470.4030339340 4.071934e 019.592146e 02 6 2 0.4259658849.605230e 011.141109342 0.16825207984.209869e 01 2.972755e 02 7 41.2296576351.410035e 010.0453707741.20261273671.918810e 012.727081e 01 8 7 0.6442694421.417964e+001.074380376 0.49219901859.489341e 014.281185e 01 9 7 0.8942860822.861572e 01 0.113192213 0.27152613012.669599e+003.721818e+00109 0.3382617521.119593e+001.044366552 0.22218727674.993608e 01 2.467611e 01 11101.449043781 1.176339e+000.913859833 1.3756666550 1.971383e+00 6.291521e 01 12100.3849782156.161095e 01 0.874299703 0.09401862602.924584e+003.317027e+00 13101.249998742 1.221637e+000.383930151 1.2348986877 1.485419e+00 7.532302e 01 14111.0693735882.877221e 010.8286127272.7125204296 1.783980e 013.375437e 01 1512 2.791854766 3.277708e 011.6417501611.7674727439 1.365884e 018.075965e 01 1612 0.7524170433.454854e 012.057322913 1.4686432984 1.158394e+00 7.784983e 02 17121.103215435 4.029621e 021.2673320891.2890914696 7.359972e 012.880692e 01 1813 0.4369050719.189662e 010.924590774 0.72721905369.156787e 01 1.278674e 01 19 14 5.401257663 5.450148e+001.1863046311.73623880013.049106e+00 1.763406e+00 20 15 1.492935977 1.029346e+00 0.454794734 1.4380258799 1.555434e+00 7.209611e 01 21160.694884776 1.361819e+001.0292210400.8341592992 1.191209e+001.309109e+00 22170.9624960703.284610e 01 0.1714790542.10920406771.129566e+001.696038e+00

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page588 23181.1666163825.021201e 01 0.0673003142.26156923954.288042e 018.947352e 02 24180.2474911282.776656e 011.185470842 0.0926025499 1.314394e+00 1.501160e 01 2522 1.946525131 4.490051e 02 0.405570068 1.01305733702.941968e+002.955053e+00 2622 2.074294672 1.214818e 011.3220206300.41000751422.951975e 01 9.595372e 01 27231.1732846103.534979e 010.2839050651.1335633179 1.725772e 01 9.160537e01 28231.322707269 1.740408e 010.4345550310.5760376524 8.367580e 01 8.310834e 01 2923 0.4142888109.054373e 011.7274529441.47347126667.442741e 03 2.003307e 01 30231.059387115 1.753192e 011.2661296431.1861099547 7.860018e 015.784353e 01 31 24 1.2374290306.104258e 02 0.3805258800.7615641114 3.597707e 01 4.940841e 01 32 25 1.1140085958.554609e 02 0.4937024871.3357599851 3.001886e 01 1.075378e 02 33 26 0.5299122848.738916e 01 1.3472473290.14545667664.142089e 011.002231e 01 34 26 0.5299122848.738916e 01 1.3472473290.14545667664.142089e 011.002231e 01 35 26 0.5353877638.652678e 01 1.3510762880.14757547454.336802e 018.698294e 02 36 26 0.5353877638.652678e 01 1.3510762880.14757547454.336802e 018.698294e 02 37 27 0.2460459494.732669e 01 1.6957375540.2624114880 1.086641e 02 6.108359e 01 38 27 1.4521872791.765124e+00 0.6116685411.1768249842 4.459799e 012.468265e 01

It gives us the standard deviation in the amount column Step 3: We will be scaling our data by using the scale() function in R, in order to removeany extreme values which might hinder in the functioningofourmodel.Thescalingfunctionhelpsstandardizethedata,bystructuringthemaccordingtoaspecificrange.

> creditcard_data$Amount=scale(creditcard_data$Amount)Thiswillscaleourdataset.>NewData=creditcard_data[, c(1)]

> sd(creditcard_data$Amount)

Givesusthedimensionofthedataset[1]28480731

> var(creditcard_data$Amount)Itgivesthevarianceoftheamountcolumn

Givesthefirst6dataentriesinthedataset>tail(creditcard_data,6)Givesthelast6entriesinthe dataset.

> head(NewData)

ThisgivesusthesummaryofthedatasetStatistically.>names(creditcard_data)This commandwilltellusaboutthenamesofthecolumnsinthedataset

……………..andsoon Step 2: understandingthestructureoftheDataset

Thisisusedinordertorecheckourmodelafterscaling

> dim(creditcard_data)

> summary(creditcard_data$Amount)

> head(creditcard_data,6)

> Logistic_Model=glm(Class~.,test_data,family=binomial())

> plot(Logistic_Model)

> dim(test_data)

Step 4: Now,afterwehavescaledourdata,itisreadyfortraining.So,nowwewillbeextractingtwosetsofdatafromthe existingdata,onewillbetrain_data,andtheotherwillbetest_data.>library(caTools)

> dim(train_data)

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072

Thisisusedtotransferalltheelementsindata_samplewhichhavea valueofdata_sample=true. > test_data=subset(NewData,data_sample==FALSE)

Thisisusedtotransferalltheelementsindata_samplewhichhavea valueofdata_sample=false.

ToplottheLogistic_modelvalues

> data_sample = sample.split(NewData$Class,SplitRatio=0.80)

Step 5: Inthisstep,wewillbeperformingLogicalregression.TheLogisticRegressiondeterminestheextenttowhichthere is a linear relationship between a dependent variable and one or more independent variables. In terms of output, linear regression will giveus a trendline plottedamongst a set ofdata points. So,in our project, wehave used it determinethe relationshipbetweenfraudornotfraud.

> train_data=subset(NewData,data_sample==TRUE)

> set.seed(123) Itgeneratesrandomnumbers

Itisusedtocheckthedimensionsofthetrainingdata.[1]227846 30

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Itisusedtoheckthedimensionsofthetestdataset[1]5696130

ItisusedtogenerateaBinomialLinearRegressionModel.

> summary(Logistic_Model)

(a)PredictedValuesv/sResiduals (b)TheoreticalQuantitiesv/sStd.DeviationResiduals

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page589

This function is used in orderto split the dataset into two datasetsintheratio0.8:0.2

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified

Research

Journal | Page590

International Journal of Engineering Technology (IRJET) e ISSN: 2395 0056

(c) PredictedValuesv/ssqrt.Std.DeviationResiduals (d)Leverageglmv/sStd.PearsonResid. Then, we have to assess the performance of our model, so we use it to delineate the ROC curve(Receiver Optimistic >Characterisitics).library(pROC)>lr.predict<predict(Logistic_Model,test_data,

and

probability = TRUE) This is used in order to predict more values on the basisofthecurrentvalues.

> auc.gbm=roc(test_data$Class,lr.predict,plot=TRUE,col="blue")ItisusedtobuildtheROCcurve.

(a)SpecificityvsSensitivity

Step 6: In this step, we have considered using Decision tress in order to plot the outcomes of the decision. Through the outcomeswewillbeabletofigureoutwhichclasstheobjectbelongsto.TherpartlibraryisusedforRecursivepartitioning forclassification,regressionandsurvivaltrees.

Certified Journal | Page591

> library(rpart.plot) > decisionTree_model< rpart(Class~.,creditcard_data,method='class')

> probability< predict(decisionTree_model,creditcard_data,type='prob')

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008

> library(rpart)

> rpart.plot(decisionTree_model)

> plot(ANN_model)

Step7:Nowusingthe ‘neuralnet’library wewill bemakinga ANN(Artificial Neural Network)model which will beableto learn the various patterns, study the history of our dataset and be able to perform the classification of the input data. In ANN,weneedtosetathresholdofvalues,sowehavesetthethresholdas0.5,soallthevaluesabove0.5willbemarkedas 1,andbelowwillbe0.

> library(neuralnet) > ANN_model=neuralnet(Class~.,train_data,linear.output=FALSE)

> predicted_val< predict(decisionTree_model,creditcard_data,type='class')

Research

> library(gbm,quietly=TRUE)

of Engineering and Technology

©

Certified Journal | Page592

> resultANN=predANN$net.result

Boosting.Thisisusedforperformingclassificationandregressiontasks.It comprises of many weak decision trees in its models. All the trees when combined/put together form a strong model of GradientBoosting.

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008

> predANN=compute(ANN_model,test_data)//theparametersarex=Atable,andname=Nameofthetableonthedatabase.

> StepresultANN=ifelse(resultANN>0.5,1,0)8:Finally,wewillbeperformingGradient

> system.time(+model_gbm< gbm(Class~ +,distribution="bernoulli" +,data=rbind(train_data,test_data) ++,n.trees=500 +,interaction.depth=3 +,n.minobsinnode=100 +,shrinkage=0.01 +,bag.fraction=0.5 +,train.fraction=nrow(train_data)/ +(nrow(train_data)

International Journal (IRJET) e ISSN: 2395 0056

> model.influence=relative.influence(model_gbm,n.trees=gbm.iter,sort.=TRUE) plot(model_gbm) =predict(model_gbm,newdata=test_data,n.trees=gbm.iter)

>

> gbm_test

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page593 +nrow(test_data)) +) + user)systemelapsed708.94810.340836.577 > gbm.iter=gbm.perf(model_gbm,method="test")

5.2 Result and Analysis

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page594

> gbm_aucCall:roc.default(response=test_data$Class,predictor=gbm_test,plot=TRUE,col = Data:"red")gbm_testin56863controls(test_data$Class0)<98cases(test_data$Class1).Areaunderthecurve:0.9552

ThecurrentFraudDetectionSystemcanbeexpandedbyaddingmorewaystosecurethedatabyaddingextensiveMachine Learning Applications and Techniques. So, in the near future, we will be going over more research papers in order to understandmoretechniqueswhichcanbeappliedinordertomakethecurrentmodelmoreefficient.

6. Conclusion and Future Work

It can be concluded that the development of a Credit Card Fraud Detection is a very essential thing for any Bank or organization,inordertokeeptrackifanyfraudulentactivitiesaretakingplaceus ingitscustomer’screditcards.Andthis canbeperformedusingMachineLearningTechniques.

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072

TheresultasobservedfromtheexecutionoftheabovecodesisthatwecandevelopaCreditCardFraudDetectionsystem fortheBanksoftheworld,inordertoavoidFraudulentTransactions.Therequirementstoperformthetask,aretheusage of various R libraries, which can be used to under stand the dataset obtained, and what are the various Attributes that it contains,thenknowinghowtoManipulatethedatausingDataManipulationtechniques,followedbytheknowledgeofhow tomodeladatainRi.e.creatingtheTestandTraindatafromtheoriginaldataset.ThenmakingaLogisticRegressionmodel in order to be able to tell if fraud/not fraud. And also learning how to use Decision Trees in R. Then the creation of a ANN(ArtificialNeuralNetwork)modelhelpsustolearnthevariousdatapatternsinvolvedinthedataset.Thenhavingused the Gradient Boosting Method we can perform the classification and regression tasks. So, basically, an ad vanced Fraud DetectionSystemcanbedevelopedusingMachineLearningtechniques.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Future Work

> gbm_auc=roc(test_data$Class,gbm_test,plot=TRUE,col="red")Settinglevels:control=0,case=1

6.2

6.1 Conclusion

[8] Leppäaho,E.,Ammad ud din,M.,&Kaski,S.(2017).GFA:exploratoryanalysisofmultipledatasourceswithgroup factoranalysis.TheJournalofMachineLearningResearch,18(1),12941298.

[11] Maniraj,S& Saini,Aditya & Ahmed, Shadab&Sarkar,Swarna. (2019).CreditCard FraudDetectionusingMachine LearningandDataScience.InternationalJournalofEngineeringResearchand.08.10.17577/IJERTV8IS090031.

[10] Kamaruddin, S., & Ravi, V. (2016, August). Credit card fraud detection using big data analytics: use of PSOAANN basedone classclassification.InProceedingsoftheInternationalConferenceonInformaticsandAnalytics(pp.1 8).

[2] Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare, S. A. (2017, October). Credit card fraud detection using machine learningtechniques:Acomparativeanalysis.In2017International ConferenceonComputingNetworkingandInformatics (ICCNI)(pp.1 9).IEEE.

[3] Roy,A.,Sun,J.,Mahoney,R.,Alonzi,L.,Adams,S.,&Beling,P.(2018,April).Deeplearningdetectingfraudincredit cardtransactions.In2018SystemsandInformationEngineeringDesignSymposium(SIEDS)(pp.129 134).IEEE.

[9] Andrienko, G., Andrienko, N., Drucker, S., Fekete, J. D., Fisher, D., Idreos, S., ... & Stonebraker, M. Big Data VisualizationandAnalytics:FutureResearchChallengesandEmergingApplications.

[13] Maniraj,S& Saini,Aditya & Ahmed, Shadab&Sarkar,Swarna.(2019).CreditCard FraudDetectionusingMachine LearningandDataScience.InternationalJournalofEngineeringResearchand.08.10.17577/IJERTV8IS090031

6.3 References

[4] Xuan,S.,Liu,G.,Li,Z.,Zheng,L.,Wang,S.,&Jiang,C.(2018,March).Randomforestforcreditcardfrauddetection.In 2018IEEE15thInternationalConferenceonNetworking,SensingandControl(ICNSC)(pp.1 6).IEEE.

[15] F. Carcillo, Y.A. Le Borgne, O. Caelen, Y. Kessaci, F. Oblé, G. BontempiCombining unsupervised and supervised learningincreditcardfrauddetectionInf.Sci.(Ny).(2019),10.1016/j.ins.2019.05.042

[5] Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P. E., He Guelton, L.,& Caelen, O. (2018). Sequence classificationforcredit cardfrauddetection.ExpertSystemswithApplications,100,234 245.

[7] Gamon, M. (2004, August). Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the 20th international conference on Computational Linguistics (p. 841).AssociationforComputationalLinguistics.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 02 | Feb 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page595

[6] Elgendy,N.,&Elragal,A.(2014,July).Bigdataanalytics:aliteraturereviewpaper.InIndustrialConferenceonData Mining(pp.214 227).Springer,Cham.

[14] Varmedja, Dejan & Karanovic, Mirjana & Sladojevic, Srdjan & Arsenovic, Marko& Anderla, Andras. (2019). Credit CardFraudDetection MachineLearningmethods.1 5.10.1109/INFOTEH.2019.8717766.

[1] Patil,S.,Nemade,V., &Soni, P.K.(2018).Predictivemodellingforcreditcardfraud detectionusingdata analytics. Procediacomputerscience,132,385 395.

[12] Varmedja, Dejan & Karanovic, Mirjana & Sladojevic, Srdjan & Arsenovic, Marko& Anderla, Andras. (2019). Credit CardFraudDetection MachineLearningmethods.1 5.10.1109/INFOTEH.2019.8717766.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.