Multiple Disease Prediction System

Page 1

3Saloni Shah, Dept. of Information Technology Engineering, Atharva College of Engineering

Multiple Disease Prediction System

***

1.INTRODUCTION

Alotofanalysisoverexistingsystemsinthehealthcare industry considered only one disease at a time. For example,onesystemisusedtoanalysediabetes,another is used to analyse diabetes retinopathy, and another system is used to predict heart disease. Maximum systems focus on a particular disease. When an organization wants to analyse their patient’s health reports then they have to deploy many models. The approachintheexistingsystemisusefultoanalyseonly particular diseases. In multiple diseases prediction systemausercananalysemorethanonediseaseona single website. The user doesn’t need to traverse differentplacesinordertopredictwhetherhe/shehasa particulardiseaseornot.Inmultiplediseasesprediction system, the user needs to select the name of the particulardisease,enteritsparametersandjustclickon submit.Thecorrespondingmachinelearningmodelwill beinvokedanditwouldpredicttheoutputanddisplayit onthescreen.

Ankush Singh1 , Ashish Yadav2 , Saloni Shah3 , Prof. Renuka Nagpure4

2Ashish Yadav, Dept. of Information Technology Engineering, Atharva College of Engineering

Inthisdigitalworld,dataisanasset,andenormousdatawas generated in all the fields. Data in the healthcare industry consists of all the information related to patients. Here a general architecturehasbeen proposedfor predictingthe disease in the healthcare industry. Many of the existing modelsareconcentratingononediseaseperanalysis.Like oneanalysisfordiabetesanalysis,oneforcanceranalysis, oneforskindiseaseslikethat.Thereisnocommonsystem present that can analyze more than one disease at a time. Thus, we are concentrating on providing immediate and accurate disease predictions to the users about the symptomstheyenteralongwiththediseasepredicted.So, weareproposingasystemwhichusedtopredictmultiple diseases by using Django. In this system, we are going to analyzeDiabetes,Heart,andmalariadiseaseanalysis.Later manymorediseasescanbeincluded.Toimplementmultiple disease prediction systems we are going to use machine learningalgorithms,andDjango.Pythonpicklingisusedto save the behavior of the model. The importance of this systemanalysisisthatwhileanalyzingthediseasesallthe parameters which cause the disease is included so it is possible to detect the disease efficiently and more accurately. The final model's behavior will be saved as a pythonpicklefile.

Abstract - Machinelearning andArtificial Intelligenceare playingahugeroleintoday’sworld.Fromself drivingcars tomedicalfields,wecanfindthemeverywhere.Themedical industrygeneratesahugeamountofpatientdatawhichcan beprocessedinalotofways.So,withthehelpofmachine learning, we have created a Prediction System that can detectmorethanonediseaseatatime.Manyoftheexisting systemscanpredictonlyonediseaseatatimeandthattoo with lower accuracy. Lower accuracy can seriously put a patient’s health in danger. We have considered three diseasesfornowthatareHeart,Liver,andDiabetesandin thefuture,manymorediseasescanbeadded.Theuserhas toentervariousparametersofthediseaseandthesystem woulddisplaytheoutputwhetherhe/shehasthediseaseor not.Thisprojectcanhelpalotofpeopleasonecanmonitor thepersons’conditionandtakethenecessaryprecautions thusincreasingthelifeexpectancy

Key Words: Diabetes, Heart, Liver, Knn, Random forest, XGBoost.

1Ankush Singh, Dept. of Information Technology Engineering, Atharva College of Engineering

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1697

Inmultiplediseaseprediction,itispossibletopredict more than one disease at a time. So the user doesn’t needtotraversedifferentsitesinordertopredictthe diseases. We are taking three diseases that are Liver, Diabetes, and Heart. . As all the three diseases are correlatedtoeachother.Toimplementmultipledisease

Manyoftheexistingmachinelearningmodelsforhealth care analysis are concentrating on one disease per analysis.Forexamplefirstisforliveranalysis,onefor canceranalysis,oneforlungdiseaseslikethat.Ifauser wantstopredictmorethanonedisease,he/shehasto gothroughdifferentsites.Thereisnocommonsystem whereoneanalysiscanperformmorethanonedisease prediction. Some of the models have lower accuracy which can seriously affect patients’ health. When an organization wants to analyse their patient’s health reports,theyhavetodeploymanymodelswhichinturn increasesthecostaswellastimeSomeoftheexisting systemsconsiderveryfewparameterswhichcanyield falseresults.

1.3Proposed system

1.2 Problem system

4Prof.Renuka Nagpure, Dept. of Information Technology Engineering, Atharva College of Engineering Maharashtra, India

1.1 Description

3.1FunctionalANALYSISRequirement

1.Accordingtothe paperfocusesaboutas diabetesisoneof the dangerous diseasesin the world , it can cause many varietiesofdisorderswhich includesblindnessetc.Inthis papertheyhaveusedmachinelearningtechniquestofind out diabetes disease as it is easy and flexible to forecast whether the patient has illness or not Their aim of this analysiswastoinventasystemthatcanhelpthepatientto detect the diabetes disease of the patient with accurate results Heretheyusedmainly4mainalgorithmsDecision Tree,NaïveBayes,andSVMalgorithmsandcomparedtheir accuracywhichis85%,77%,77.3%respectively Theyalso used ANN algorithm after the training process to see the reactionsofthenetworkwhichstateswhetherthediseaseis classifiedproperlyornot.Heretheycomparedtheprecision recallandF1scoresupportadaccuracyofallthemodels[1]

4 DESIGN

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1698

analyses we are going to use machine learning algorithmsandDjango.Whentheuserisaccessingthis API,theuserhastosendtheparametersofthedisease along with the disease name. Django will invoke the corresponding model and returns the status of the patient.

FigureDesignNo4.1:

Thewebsitewillproviderangeofthevaluesduring thepredictionofthedisease. Thewebsiteshouldbereliableandconsistent.

Block Diagram

3. SYSTEM

4.1Architecture

Thesystemallowsthepatienttopredictthedisease Theuseraddstheinputfortheparticulardisease and based on the trained model of the user input theoutputwillbedisplayed.

2.Themainaimofthepaperis,asheartplaysanimportant roleinlivingorganisms.Sothediagnosisandpredictionof heartrelateddiseaseshouldbeperfectandcorrectbecause itisverycrucialwhichcancausedeathcasesrelatedtoheart .SoMachinelearningandArtificialIntelligencesupportsin predictinganykindofnaturalevents.Sointhispaperthey calculateaccuracyofmachinelearningforpredictingheart disease using k nearest neighbor ,decision tree, linear regression and SVM by using UCI repositor dataset for trainingandtesting.Theyalsocomparedthealgorithmand their accuracy SVM 83 %,Decision tree 79%,Linear regression78%,k nearestneighbour87%[2].

3.2 Non Functional Requirement

Inthefigureno4.1wehaveexperimentedonthreediseases that is heart,daibetes and liver as these are correlated to each other.The first step is to the dataset for heart disease,daibetesdiseaseandliverdiseasewehaveimported the UCI dataset,PIMA dataset and Indian liver dataset respectively.Once we have imported the dataset then visualization of each inputed data takes place.After visualization pre processing of data takes place wher we checkforoutliers,missingvaluesandalsoscalethedataset thenontheupdateddatasetwesplitthedataintotraining andtesting.Nextisonthetrainingdatasetwehadapplied knn,xgboost and random forest algorithm and applied knowledge on the classified algorithm using testing dataset.After applying knowledge we will choose the algorithm with the best accuracy for each of the disease .Then we build a pickle file for all the disease and then integratedthepicklefilewiththedjangoframeworkforthe outputofthemodelonthewebpage

3. The system defines that liver diseases is causing high number of deaths in India and is also considered as a life threatingdiseaseintheworld. Asitisdifficulttodetectthe liver disease at early stage .So using automated program usingmachinelearningalgorithmswecandetecttheliver diseaseaccurately.Theyused andcomparedSVM,Decision TreeandRandomforestalgorithm andmeasuresprecision, accuracyandrecallmetricsforquantitativemeasurement Theaccuracyare95%,87%,92%respectively[3].

2. LITERATURE REVIEW

5.1.2. Random Forest Algorithm

Step-3: Thenchoosing thenumberNfordecisiontreesthat youwanttobuild.

Step 5: Finding thepredictionsofeachdecisiontree,and assigning thenewdatapointstothecategorythatwinsthe majorityvotes.

Step 5:Thenassignthenewpointtothecategory having maximum number of neighbours. For example Category A has highest number of neighboursowewillassignthenewdatapoint tocategoryA.

Step7: Thenwecanpredictthe residualvaluesusingthe DecisionTreeyouconstructed.

whereρisthelearningrate.

TheworkingofXGBoostalgorithmareasfollows:

Step-4:Repeatingstep1and2.

Step3:Calculating thesimilarityscoreusingformula:

Step8:Thenewsetofresidualsiscalculatedas:

Step 2: Then we will find the Euclidean distance betweenthepoints.Itiscalculatedbytheas:

Step5:ApplyingsimilarityscorewecalculateInformation gain.Informationgainhelp tofindthedifference between old similarity and new similarity and tells how much homogeneity is achieved by splitting the node at a given point.Itiscalculatedbytheformula:

Step 6:SofinallyourKnnmodelisready.

Step 4:Thencountthenumber ofthedatapointsin eachcategory.Forexamplefound threevalues forCategoryAandtwovaluesforcategoryB.

where,Hessianisequaltonumberofresiduals;Gradient2= squared sum of residuals; λ is a regularization

5.1.3. XGBoost Algorithm

Step 2: After selecting k data points then building the decision trees associated with the selected data points (Subsets).

55.1Algorithm.1.1.KnnAlgorithm

RandomForestworkingispossibleintwophases,firstisto createtherandomforestbymerging Ndecisiontree,and secondismakingprediction foreachtreecreatedinthefirst Thephase.workingoftherandomforestisasfollows:

Step9:Thengobacktostep1andrepeattheprocessforall thetrees.

Step 1:StarttoselecttheKvalue forexamplek=5

Step6:Creating thetreeofdesiredlengthusingtheabove methodpruningandregularizationcan bedonebyplaying withtheregularizationhyperparameter.

Step 2: Then for the first tree, we have to compute the averageoftargetvariableaspredictionand thencalculating the residuals using the desired loss function and then for subsequent trees the residuals come from prediction that wasthereinprevioustree.

Step1:Firstlycreatingasingleleaftree.

Step 1: FirstlyitwillselectrandomKdatapointsfromthe trainingset.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1699 4.2User Interface Design Figure No4.2: Graphical User Interface

5. IMPLEMENTATION

TheworkingoftheK NNalgorithmisasfollowed:

Step 3: Then we will calculate the Euclidean distanceofthe nearestneighbour.

Stephyperparameter.4:Applyingsimilarityscoreweselecttheappropriate node.Thehigherthesimilarityscoremorethehomogeneity.

68% 1. Error Message on inputting the incorrect value: Figure No:6.1:Error Message 2.Diabetes Disease : Figure No 6 2:Diabetes Disease Input Data Figure No 6.3:Diabetes Disease Output Result 3.Heart FigureDisease:No 6.4:Heart Disease Input Data

73%

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1700

88%

89%

Table No 6 3:Liver

Table No 6.2:Hear

RandomForest

Inthesystemdiabetesdiseasepredictionmodel usedknn algorithm,heartdiseaseusesthexgboostalgorithmandliver uses the random forest algorithm as these gave the best accuracy accordingly. There when the patient adds the parameteraccordingtothediseaseitwillshowwhetherthe patienthasadiseaseornotaccordingtothediseaseselected. The parameters will show the range of the values needed andifthevalueisnotbetweentherangeorisnotvalidoris emptyitwillshowthewarningsignthataddacorrectvalue.

85%

Table No 6 1:Diabetes

RandomForest

ALGORITHM

RandomForest

6. RESULT

ACCURACY FOR EACH DISEASE:

ALGORITHM

KNN

XGBoost

77%

XGBoost

ALGORITHM Diabetes

Disease

Heart

Disease

Disease

Liver

[3] A.Sivasangari, Baddigam Jaya Krishna Reddy,AnnamareddyKiran,P.Ajitha,”DiagnosisofLiver DiseaseusingMachineLearningModels” 2020Fourth International Conference on I SMAC (IoT in Social, Mobile,AnalyticsandCloud)(I SMAC)

We sincere thank to our college Atharva College of Engineeringforgivingusaplatformtoprepareaprojecton thetopic"Multiple Disease Prediction System"andwould liketothankourprincipal Dr.ShrikantKallurkarforgiving ustheopportunitiesandtimetoconductandresearchon the subject. We are sincerely grateful for Prof. Renuka Nagpure as our guide and Prof. Deepali Maste, Headof Information Technology Department, for providing help during our research, which would have seemed difficult without their motivation, constant support, and valuable suggestion. Moreover, the complication of this research paperwouldhavebeenimpossiblewithouttheco operation, suggestion,andhelpofourfriendsandfamily

[2] ArchanaSingh,RakeshKumar,“HeartDiseasePrediction Using Machine Learning Algorithms”, 2020 IEEE, InternationalConferenceonElectricalandElectronics Engineering(ICE3)

[1] Priyanka Sonar, Prof. K. JayaMalini,” DIABETES PREDICTIONUSINGDIFFERENTMACHINELEARNING APPROACHES”,2019IEEE,3rdInternationalConference on Computing Methodologies and Communication (ICCMC)

In the future we can add more diseases in the existingAPI. Wecantrytoimprovetheaccuracyofpredictionin ordertodecreasethemortalityrate.

8. FUTURE SCOPE

The main objective of this project was to create a system that would predict more than one disease and do so with highaccuracy.Becauseofthisprojecttheuserdoesn’tneed to traverse different websites which saves time as well. Diseasesifpredictedearlycanincreaseyourlifeexpectancy aswellassaveyoufromfinancialtroubles.Forthispurpose, we have used various machine learning algorithms like RandomForest,XGBoost,andKnearestneighbor(KNN)to achievemaximumaccuracy.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1701 Figure No 6.5:Heart Disease Output Result 4.Liver Disease:FigureNo 6.6:Liver Disease Input Data Figure No 6.7:Liver Disease Output Result

Trytomakethesystemuser friendlyandprovidea chatbotfornormalqueries.

ACKNOWLEDGEMENT

REFERENCES

7. CONCLUSION

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.