Machine Learning-Based System for Predicting Multiple Diseases

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

Machine Learning-Based System for Predicting Multiple Diseases

Department of Information Science & Engineering ,Bangalore Institute of Technology ,Bengaluru-560004.

Abstract - The Multiple Disease Detection Software aims to address crucial healthcare accessibility challenges and enhance the early diagnosis of chronic diseases that remain leading causes of death worldwide. Overburdened healthcare systems, population growth, and the COVID-19 pandemic have worsened delays in diagnosis and treatment, intensifying the need for a solution such as this innovative software. The system utilizes Bio-Inspired Algorithms, Machine Learning (ML), and Deep Learning (DL) to deliver accurate and dependable predictions regarding the probability of specific chronic illnesses, leveraging user-provided health data. The Django framework ensures scalability and user-friendliness while optimizing techniques to enhance precision and efficacy positively.The program relieves the pressure from the healthcare infrastructure by allowing self-assessment of health by people and promotes early intervention, hence ensuring that such basic medical care is accessible to everyone, including those in remote areas.The ultimate goal of the platform is to lower mortality rates, foster universal healthcare accessibility, and contribute to better public health outcomes.

Key Words: Machine Learning, Deep Learning, Chronic Disease Prediction, Alzheimer's disease, cardiovascular conditions,breastcancer,lungcancer,andearlydetection.

1. INTRODUCTION

Alzheimer'sdiseaseisadegenerativebraindisorderand theleadingcauseofdementia,characterizedbymemoryloss, cognitivedecline,andbehavioralchanges.Itprogressesover time, leading to brain cell death, with no cure currently available.Similarly,breastcancer,originatingintheglandular tissue, can progress from localized to invasive forms, but early diagnosis and combined treatments significantly improveoutcomes

TheDiabetesmellitus,ametabolicdisorder,leadstoelevated blood glucose levels due to various factors, ranging from chronic diseases such as type 1 and type 2 diabetes to reversibleonessuchasprediabetesandgestationaldiabetes. Coronaryarterydiseaseoccursduetothebuildupofplaque inthearteries,leadingtoreducedbloodflowandpotentially resultinginheartattacksorstrokes,iscauseddifferentlyin males and females and manifests differently as well. Lung cancer,primarilyassociatedwithsmoking,remainsthemajor cause of deaths from cancer but quitting smoking helps reducetheserisks.

Parkinson'sdiseaseaffectsbrainfunction,causingtremors, stiffness,andcoordinationissues,withsymptomsworsening over time. Risk factors include age and genetic predispositions.

The Multiple Disease Detection Software enables users to inputtheirhealthdata andreceivepredictionsforchronic diseases, addressing challenges related to healthcare accessibilityanddiagnosticdelays.DevelopedwithMachine Learning,DeepLearning,andBio-InspiredAlgorithms,itis integrated with Django to offer an AI-driven platform for early detection and improved healthcare outcomes. Deep Learning is an AI subset which has been playing a very important role in disease diagnosis by processing and analyzingmedicaldatatofindpatterns,contributingtothe advancementofmedicalscience

2. LITERATURE SURVEY

[1]DiagnosisofParkinson’sDiseaseUsingArtificialNeural Network Authors:AnilaMandDr.GPradeepini,IEEE

Thispaperprimarilyfocusesonanalyzingvoicepatternsto aid in the diagnosis of Parkinson's disease. Five machine learning models, including as ANN, Random Forest, KNN, SVM,andXGBoost,arecomparedfortheselectionofthebest model with the most error rate and performance metrics. TheprimarylimitationofthestudyliesintheuseofanANN withonlytwohiddenlayers,whichisappropriateforsmall datasetsbutinadequateforhandlinghighdatacomplexity. Additionally, the reliance on a single feature selection method restricted the effectiveness of dimensionality reduction.

[2] Machine Learning-Based Approaches for Prediction of Alzheimer’s Disease Author: Arvind Kumar Tiwari,Publication:IEEE

Thispaperusesminimumredundancymaximumrelevance featureselectionalgorithmstoidentifykeyfeaturesthatcan predict the onset of Alzheimer's disease. The study had foundthatfor20featuresselected,itobtainedanaccuracyof 90.3%,precisionof90.2%,Matthewscorrelationcoefficient of 0.73, and the ROC value obtained was 0.96, which is significantlyhigherthanbagging,boosting,rotationforest, andSVM-basedmachinelearningmodels.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

[3ADeepLearningMethodforPredictingtheProgressionof Lung Cancer. Authors: Afzal Hussain Shahid and MaheshwariPrasadSingh,Publication:IEEE

Thisresearchproposedadeepneuralnetwork(DNN)model topredictlungcancerdiseaseprogressionusingareduced inputfeaturespace.PCAwasintegratedtoenhancetheDNN model's efficiency. The model was tested on real-world datasetsfromUCI,withafocusonimprovingperformance byincorporatingadditionaldatapoints.

[4] Prediction of Heart Disease Using Machine Learning

Authors: Aditi Gavhane, Gouthami Kokkula, Isha Panday, and Prof. Kailash Devadkar. Publication: Applied Artificial Intelligence

The authors developed a multi-layer perceptron model to predictheartdiseasesusingCADtechnology.Thesystem's widespread use could increase disease awareness and potentiallyreducetheheartdiseasemortalityrate.

[5] Application of Machine Learning in Disease Prediction

Authors:PahulpreetSinghKohliandShriyaArora,IEEE

VariousMachineLearningAlgorithms:Thispaperexplores multiplemachinelearningalgorithmsfordiseaseprediction. Logistic regression has an accuracy of 87.1% for heart diseases, support vector machines reached to 85.71% in diabetes prediction, and AdaBoost classifier achieved an excellent accuracy of 98.57% in predicting breast cancer. Hence the above algorithms come out to be effective for healthcarepredictiveanalytics.

3. EXISTING SYSTEM

Thecurrentmethodsusedforchronicdiseasedetectionare mostlybasedonthetraditionalmedicalapproaches,suchas consultations,tests,andscans.Whiletheseapproachesare accurate,theyaretime-consumingandexpensive,thereby inaccessibleinrural areas. Moreover,eachof thediseases demands different diagnostic instruments, which can be fragmented approaches and delay in detection. Further, over-burdenedhealthsystemsespeciallyduringpandemics worsenthissituation,makingmanypatientsreportlateand endingupwithundesirableoutcomes.

Advantages:

1.ProvenReliability:Traditionalmethodsprovideaccurate andtrusteddiagnosticresultsforindividualdiseases.

2.AdvancedTechnology:Techniqueslikeimagingandblood testsenableprecisediseasedetection.

3. Specialized Expertise: Direct involvement of healthcare professionalsensurespersonalizeddiagnosisandcare.

Disadvantages:

 Time and Cost Intensive: Diagnosis processes are lengthyandoftenprohibitivelyexpensive.

 Limited Accessibility: Healthcare facilities are challenging to access in remote or resource-limited regions.

 Fragmented Approach: Existing methods focus on individual diseases rather than offering multi-disease detection.

 DelayedDiagnosis:Dependenceonperiodiccheck-ups leadstolatedetectionofchronicdiseases.

 StrainonHealthcareSystems:Highdemand,especially duringpandemics,overburdensinfrastructure,causing inefficiencies.

4. PROBLEM STATEMENT

Aproposalhasbeenpresentedforthehealthcareindustry regarding Multiple Disease Detection Software, where it predictssixchronicdiseases-Alzheimer'sDementia, Heart Disease,BreastCancer,Diabetes,Parkinson'sDisease,and LungCancer.Thesoftwarereliesonacoupleofalgorithmsin predicting these diseases, making them easier to be compared on which is more precise and effective for the diagnosisofmultiplediseases.Userscanfeedintheirhealth data in the platform and receive predictions about risks toward such diseases. The system is based on Machine Learning, Deep Learning, and Bio-Inspired Algorithms to increase the chances of better predictions and early detectionofdiseases.Thisintegratedapproachnotonlyaids inidentifyingpotentialhealthrisksbutalsoattemptstofind solutionsforissuessuchasdelayeddiagnosisandrestricted access to healthcare. This software gives users an opportunity of assessing their health and prevention measures before these diseases occur through early detectionaccessibleplatforms.

5. PROPOSED SYSTEM

The Multiple Disease Detection Software is a health prediction system that predicts several chronic diseases, includingAlzheimer'sdisease,dementia,breastcancer,heart disease,diabetes,Parkinson'sdisease,andlungcancer.The systemprovidesaneasy-to-useplatformforuserstoinput their health data and receive predictions regarding their likelihoodofdevelopinganyofthesediseases.Itcombines variousmachinelearninganddeeplearningalgorithmsto improve prediction accuracy, ensuring dependable early detection.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

Figure5:SequenceDiagram

ForAlzheimer’sDementia,thesystemusesa 2D CNN model topredicttheseverityoftheconditionbasedonMRIscans. The system is built with an intuitive user interface that allowsuserstoupload2DMRIimagesforanalysis.

In the case of Breast Cancer prediction, the system uses ensemblelearningalgorithmslikeRandomForestClassifier andExtraTreesRegressorwithGeneticAlgorithmsandPCA forenhancingtheaccuracyandperformanceofpredictions. Itisamulti-diseasepredictiontooldesignedtoaidinearly diagnosisandpreventionthroughsoftware.

AnensemblelearningapproachimprovedwithAntColony Optimization (ACO) for heart disease. It utilizes five ensemble algorithms: Adaboost, Bagging, Random Forest, Gradient Boosting, and Extra Trees. Additionally, ACO, an optimization technique inspired by foraging ants, was applied to optimize each classifier for optimal prediction accuracy.TheAntColonyOptimizationassignspheromone values to the routes that ants travel, thus directing the searchtowardbettersolutionsthroughmanyiterations.At prediction of heart disease using a dataset, along with a highlyaccurateandoptimizedmodel.

The system for predicting Parkinson's Disease and Lung Cancer employs advanced machinelearning algorithms to predicttheconditionswithgoodaccuracybasedondistinct datainputs.InthecaseofParkinson'sDiseaseprediction,the systemisbasedonvoiceanalysis,acommonmethodusedin detecting speech disorders, which affect about 90% of patientswithParkinson's.Featuresareextractedfromvoice recordings including pitch, jitter, shimmer, and Melfrequencycepstralcoefficients(MFCCs).Theoptimizationof thesefeaturesisdoneusingtheACOtechnique,whichwill identifythemostpertinentfeaturesinclassification.Using the optimized feature set, Support Vector Machine will classify the subjects as either healthy or affected by Parkinson's Disease. The SVM algorithm works by maximizing the margin of separation between two classifications of the data points along the best-suited hyperplane.

Thesystempredictslungcancerbasedonvariousmedical data, including the age of the patient, gender, smoking history, and results from a CT scan. Data preprocessing is applied to handle missing values and scale features to preparethedatafortrainingthemodel.Classificationisdone usingtheXGBoostalgorithm,takingthebenefitsofgradient boostingtoboosttheaccuracyofpredictions.XGBoostbuilds aseriesofdecisiontrees,whereeachnewtreeisdesignedto correcttheerrorsofthepreviousone.Keyhyperparameters, including the learning rate (eta), tree depth (max_depth), andsampleproportions(subsampleandcolsample_bytree), are fine-tuned to improve overall performance. Upon completion of the training process, the XGBoost model estimatestheprobabilityoflungcanceroccurring,basedon thedataprovidedbythepatient.

5.1Alzheimer’sDiseasePrediction

ForAlzheimer'sDiseaseprediction,datacleaninginvolves imputing missing cognitive test scores and genetic data, while outliers in MRI data are handled using IQR. Feature selection is done through correlation analysis and mutual information to retain relevant features. Feature encoding applies label encoding for binary categories and one-hot encoding for other categorical data. Feature scaling standardizescontinuousfeatureslikecognitivescoresand age.Toaddressimbalanceddata,techniqueslikeSMOTEare used. Finally, dimensionality reduction with PCA or t-SNE helpsreducefeaturesizewhileretainingkeyinformation.

5.2

Ensemble Learning algorithms that were initially applied includeExtraTreesRegressorandRandomForestClassifier, followed by hyperparameter tuning that has applied GridSearchCV.Furthertuningofthemodelhasbeendoneon SequentialModelfromKerasTensorFlowalongwithPCAfor reducing the dimensionality. Eventually, the accuracy has beenobtainedtobe96.5%usingMLPClassifieralongwitha GAfor50generations.TheGAistheoptimizationtechnique inspiredbyprinciplesinGeneticsandNaturalSelection.A populationofcandidatesolutionsundergoesrecombination

Figure5.1:Alzheimer’sDiseasePrediction
BreastCancerPrediction

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

andmutationovergenerations.Eachcandidateisassigneda fitnessvalue,sofittersolutionsarefavoredinreproduction, mirroring Darwin's "Survival of the Fittest" theory. This generates an iterative approach to ensure that the generation.

5.3DiabetesPrediction

For diabetes prediction, data cleaning involves imputing missingvaluesforglucoseandinsulinusingmean/median or KNN, and detecting and handling outliers. Feature selectionis performedusing a correlationmatrixandChisquare test. Feature encoding includes label encoding for binary categories and one-hot encoding for multi-class categories. Feature scaling standardizes or normalizes continuous features like glucose and BMI. To address imbalanceddata,techniqueslikeSMOTEorrandomunder samplingareused Dimensionalityreductionisappliedwith PCAifneededtoreducefeaturesize.

5.4HeartDiseasePrediction

Missingvaluesarehandledindatacleaningforheartdisease prediction through mean/median imputation or KNN. OutliersaredetectedthroughIQRorZ-score.Featuresare selected with a correlation matrix and Chi-square test. Encodingisusedforfeaturesthroughlabelencodingifthe feature is binary and one-hot encoding for categorical variables. Feature scaling standardizes or normalizes continuousfeatureslikeageandcholesterol.Techniqueslike oversamplingandundersampling,includingSMOTE,willbe employedtoaddressimbalanceddata.PCAisappliedfinally toreducefeatures.

5.5Parkinson'sDiseasePrediction

ForthepredictionofParkinson'sDisease,voicedatacleaning includesnoisereductionbyspectralsubtractionorWiener filteringandsilenceremovalbyenergy-basedthresholding. Feature extraction is done using Mel-frequency Cepstral Coefficients (MFCCs), jitter, shimmer, and Harmonics-toNoise Ratio (HNR). Normalization is done with Min-Max scalingorZ-scorestandardizationtostandardizefeatures. PCA-based dimensionality reduction helps minimize the featurespace.andpreventoverfitting.Missingvalueswere imputedusingeitherthemeanorKNNmethod.imputation andlabelencodingoftargetswithahealthylabelof0andPD as

Figure5.2:BreastCancerPrediction
Figure5.3:Diabetesprediction
Figure5.4:Heartdiseaseprediction

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

5.6LungCancerPrediction

Data preprocessing includes handling missing values by mean/medianimputationorKNNandoutlierdetectionbyZscoreorIQR.Featureselectionisdonebyusingcorrelation matricesandChi-squareteststoselectimportantfeatures. Feature encoding uses one-hot or label encoding for categoricalvariables.FeaturescalingusesMin-Maxscaling or standardization. In case of imbalanced data, SMOTE or undersampling is used. Lastly, PCA or LDA reduces irrelevantfeaturesbydimensionalityreduction.

5.7 Feature Extraction

Key features of multiple diseases are extracted and optimizedtoenhancepredictionaccuracy Alzheimer'suses metrics like hippocampal volume, Breast Cancer relies on meanradius,HeartDiseasefocusesoncholesterolandblood pressure, all refined with ACO. Features of Diabetes are glucose and BMI, Parkinson's uses acoustic features like MFCCs, and Lung Cancer uses imaging metrics combined with clinical data, using PCA and Gradient Boosting for efficientclassification.

5.8 Classification Techniques

Classification methods adopted for the Multiple Disease Detection Software use the software utilizes machine learning, deep learning, and bio-inspired algorithms. The selectedtechniquesvaryasperdiseasetypes.Forexample, Alzheimeruses2DCNNforclassifyingimages;BreastCancer followsensemblemethodsRandomForestandExtraTrees. Heart Diseases predictions rely upon optimized ensemble technique, namely, Gradient Boosting and Adaboost improvedthroughACO.ForParkinson's,SVMclassifiesthe patientswiththehelpofacousticfeatures.InLungCancer prediction,thehighaccuracyvalueisobtainedwithXGBoost. Diabetesisdetectedwiththetraditionalclassifiers:Logistic RegressionandGradientBoosting.

6. ALGORITHMS

1. 2D Convolutional Neural Network (CNN): Used for Alzheimer’s Dementia prediction by analyzing MRIscanstoclassifytheseverityofthecondition.

2. ACO, combined with Ensemble Methods like Adaboost, Bagging, Random Forest, Gradient Boosting, and Extra Trees, is applied to Heart Disease Prediction to optimize classifier performance using behavior-inspired search techniques.

3. Logistic Regression and Decision Tree Classifiers: UsedforDiabetespredictiontoclassify risklevelsbasedonpatientdata.

4. Support Vector Machine (SVM) optimized with Ant Colony Optimization (ACO): Used for Parkinson’sDiseasepredictionbyanalyzing voice featuresandselectingthemostrelevantones.

5. XGBoost (Extreme Gradient Boosting): Usedfor Lung Cancer prediction to improve classification accuracythroughiterativetree-basedlearning.

DataAugmentationTechniques

Data augmentation techniques improve performance by enrichingthevariabilityofthedataand,inturn,helpwith overfitting on class-imbalance. For instance, Alzheimer's Dementia MRI images are rotated, flipped, and zoomed. DatasetbalancingisappliedtoconditionssuchasDiabetes and Heart Disease through SMOTE. Noise injection makes Parkinson'sDiseasepredictionrobustastheysimulatethe actual voice collected in the real world, Geometric transformations,suchascroppingandscaling,areusedto enhance histopathological images for improved Breast Cancerdetection.

Figure5.5:Parkinson'sDiseasePrediction
Figure5.6:LungCancerPrediction

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

7. RESULTS

Figure7.1Resultondesktopapplication

Multi-Disease Prediction System has a user-friendly interface, which is divided into two major sections: Data Input&PredictionandResultDisplay.IntheDataInput& Predictionsection,patientscanuploadorinputtheirmedical recordsorimagingfiles.Thesystemthenpreprocessesthe data, extracts relevant features, and applies machine learning models to evaluate the risk of Alzheimer's, heart disease, and cancer. This section contains the predictions andtheircorrespondingconfidencescores.Therearecharts or medical images in addition to these. Results can be downloadedforfurtheranalysisorreporting.

8. CONCLUSION

“RespiratorySoundClassificationUsingMachineLearning” demonstrates the power of machine learning algorithms, particularly Convolutional Neural Networks (CNNs), in classifyingrespiratorysoundssuchaswheezes,crackles,and normal sounds. The system leverages advanced feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCCs), to transform raw audio data into usablefeaturesforclassification.ByutilizingaCNNmodel, theprojectsuccessfullyclassifiesaudiofilesintorespective categorieswithhighaccuracy,providingareliabletoolfor theearlydetectionofrespiratoryconditions.

These techniques hold significant potential in the medical field,particularlyfortheremotemonitoringofpatientswith chronic respiratory diseases, such as asthma and COPD. Automaticclassificationcapabilitiesallowforamuchshorter diagnosistime,thusprovidingthemedicalteamwithmore timely responses to critical cases. It also supplies a noninvasiveandcost-effectivemethodforassessingrespiratory health. The modularity of the project facilitates easy adaptation to diverse healthcare environments, thereby renderingitbothscalableandflexibleforfutureapplications.

Overall, the project not only contributes to the healthcare sectorbyimprovingdiagnosticefficiencybutalsoservesasa foundationforfurtherexplorationinthedomainofmedical soundclassificationusingmachinelearning.

9. FUTURE ENHANCEMENT

Future enhancements for the Multiple Disease Detection Softwarecouldincludeintegratingadvanceddeeplearning modelslike3DCNNsandTransformersforbettermedical imageanalysis.Combiningmulti-modaldata,suchasgenetic and lifestyle information, would improve prediction accuracy. Expanding the system to real-time, cloud-based platforms would enable faster, scalable predictions. Incorporating explainable AI (XAI) would enhance model transparency,fosteringtrustamongcliniciansandpatients. Additionally, expanding datasets to include diverse populations and rare diseases, along with continuous learning capabilities, would keep the system up-to-date. Lastly, improving user interfaces and telemedicine integration could enhance accessibility and adoption in clinicalsettings.

10. REFERENCES

[1]Gadekallu,T.R.,Alazab,M.,Kaluri,R.,Maddikunta,P.K.R., Bhattacharya, S. and Lakshmanna, K., 2021. Hand gesture classification using a novel CNN-crow search algorithm. Complex&IntelligentSystems,7,pp.1855-1868.

[2]Surendar,P.,2021.Diagnosisoflungcancerusinghybrid deepneuralnetworkwithadaptivesinecosinecrowsearch algorithm.JournalofComputationalScience,53,p.101374.

[3] Kavitha, R., Jothi, D.K., Saravanan, K., Swain, M.P., Gonzáles,J.L.A.,Bhardwaj,R.J.andAdomako,E.,2023.Ant ColonyOptimization-EnabledCNNDeepLearningTechnique forAccurateDetectionofCervicalCancer.BioMedResearch International,2023.

[4]Masud,M.,Singh,P.,Gaba,G.S.,Kaur,A.,Alroobaea, R., Alrashoud, M. and Alqahtani, S.A., 2021. CROWD: crow search and deep learning based feature extractor for classificationofParkinson’s disease.ACM Transactions on InternetTechnology(TOIT),21(3),pp.1-18.

[5]Sayed,G.I.,Hassanien,A.E.andAzar,A.T.,2019.Feature selectionviaanovelchaoticcrowsearchalgorithm.Neural computingandapplications,31,pp.171-188.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.