
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072
Padmanabha J, Anagha Kashyap, C Akhilesh Reddy, Karthika G, Keerthika Shetty
Department of Information Science & Engineering ,Bangalore Institute of Technology ,Bengaluru-560004.
Abstract - The Multiple Disease Detection Software aims to address crucial healthcare accessibility challenges and enhance the early diagnosis of chronic diseases that remain leading causes of death worldwide. Overburdened healthcare systems, population growth, and the COVID-19 pandemic have worsened delays in diagnosis and treatment, intensifying the need for a solution such as this innovative software. The system utilizes Bio-Inspired Algorithms, Machine Learning (ML), and Deep Learning (DL) to deliver accurate and dependable predictions regarding the probability of specific chronic illnesses, leveraging user-provided health data. The Django framework ensures scalability and user-friendliness while optimizing techniques to enhance precision and efficacy positively.The program relieves the pressure from the healthcare infrastructure by allowing self-assessment of health by people and promotes early intervention, hence ensuring that such basic medical care is accessible to everyone, including those in remote areas.The ultimate goal of the platform is to lower mortality rates, foster universal healthcare accessibility, and contribute to better public health outcomes.
Key Words: Machine Learning, Deep Learning, Chronic Disease Prediction, Alzheimer's disease, cardiovascular conditions,breastcancer,lungcancer,andearlydetection.
Alzheimer'sdiseaseisadegenerativebraindisorderand theleadingcauseofdementia,characterizedbymemoryloss, cognitivedecline,andbehavioralchanges.Itprogressesover time, leading to brain cell death, with no cure currently available.Similarly,breastcancer,originatingintheglandular tissue, can progress from localized to invasive forms, but early diagnosis and combined treatments significantly improveoutcomes
TheDiabetesmellitus,ametabolicdisorder,leadstoelevated blood glucose levels due to various factors, ranging from chronic diseases such as type 1 and type 2 diabetes to reversibleonessuchasprediabetesandgestationaldiabetes. Coronaryarterydiseaseoccursduetothebuildupofplaque inthearteries,leadingtoreducedbloodflowandpotentially resultinginheartattacksorstrokes,iscauseddifferentlyin males and females and manifests differently as well. Lung cancer,primarilyassociatedwithsmoking,remainsthemajor cause of deaths from cancer but quitting smoking helps reducetheserisks.
Parkinson'sdiseaseaffectsbrainfunction,causingtremors, stiffness,andcoordinationissues,withsymptomsworsening over time. Risk factors include age and genetic predispositions.
The Multiple Disease Detection Software enables users to inputtheirhealthdata andreceivepredictionsforchronic diseases, addressing challenges related to healthcare accessibilityanddiagnosticdelays.DevelopedwithMachine Learning,DeepLearning,andBio-InspiredAlgorithms,itis integrated with Django to offer an AI-driven platform for early detection and improved healthcare outcomes. Deep Learning is an AI subset which has been playing a very important role in disease diagnosis by processing and analyzingmedicaldatatofindpatterns,contributingtothe advancementofmedicalscience
[1]DiagnosisofParkinson’sDiseaseUsingArtificialNeural Network Authors:AnilaMandDr.GPradeepini,IEEE
Thispaperprimarilyfocusesonanalyzingvoicepatternsto aid in the diagnosis of Parkinson's disease. Five machine learning models, including as ANN, Random Forest, KNN, SVM,andXGBoost,arecomparedfortheselectionofthebest model with the most error rate and performance metrics. TheprimarylimitationofthestudyliesintheuseofanANN withonlytwohiddenlayers,whichisappropriateforsmall datasetsbutinadequateforhandlinghighdatacomplexity. Additionally, the reliance on a single feature selection method restricted the effectiveness of dimensionality reduction.
[2] Machine Learning-Based Approaches for Prediction of Alzheimer’s Disease Author: Arvind Kumar Tiwari,Publication:IEEE
Thispaperusesminimumredundancymaximumrelevance featureselectionalgorithmstoidentifykeyfeaturesthatcan predict the onset of Alzheimer's disease. The study had foundthatfor20featuresselected,itobtainedanaccuracyof 90.3%,precisionof90.2%,Matthewscorrelationcoefficient of 0.73, and the ROC value obtained was 0.96, which is significantlyhigherthanbagging,boosting,rotationforest, andSVM-basedmachinelearningmodels.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072
[3ADeepLearningMethodforPredictingtheProgressionof Lung Cancer. Authors: Afzal Hussain Shahid and MaheshwariPrasadSingh,Publication:IEEE
Thisresearchproposedadeepneuralnetwork(DNN)model topredictlungcancerdiseaseprogressionusingareduced inputfeaturespace.PCAwasintegratedtoenhancetheDNN model's efficiency. The model was tested on real-world datasetsfromUCI,withafocusonimprovingperformance byincorporatingadditionaldatapoints.
[4] Prediction of Heart Disease Using Machine Learning
Authors: Aditi Gavhane, Gouthami Kokkula, Isha Panday, and Prof. Kailash Devadkar. Publication: Applied Artificial Intelligence
The authors developed a multi-layer perceptron model to predictheartdiseasesusingCADtechnology.Thesystem's widespread use could increase disease awareness and potentiallyreducetheheartdiseasemortalityrate.
[5] Application of Machine Learning in Disease Prediction
Authors:PahulpreetSinghKohliandShriyaArora,IEEE
VariousMachineLearningAlgorithms:Thispaperexplores multiplemachinelearningalgorithmsfordiseaseprediction. Logistic regression has an accuracy of 87.1% for heart diseases, support vector machines reached to 85.71% in diabetes prediction, and AdaBoost classifier achieved an excellent accuracy of 98.57% in predicting breast cancer. Hence the above algorithms come out to be effective for healthcarepredictiveanalytics.
Thecurrentmethodsusedforchronicdiseasedetectionare mostlybasedonthetraditionalmedicalapproaches,suchas consultations,tests,andscans.Whiletheseapproachesare accurate,theyaretime-consumingandexpensive,thereby inaccessibleinrural areas. Moreover,eachof thediseases demands different diagnostic instruments, which can be fragmented approaches and delay in detection. Further, over-burdenedhealthsystemsespeciallyduringpandemics worsenthissituation,makingmanypatientsreportlateand endingupwithundesirableoutcomes.
Advantages:
1.ProvenReliability:Traditionalmethodsprovideaccurate andtrusteddiagnosticresultsforindividualdiseases.
2.AdvancedTechnology:Techniqueslikeimagingandblood testsenableprecisediseasedetection.
3. Specialized Expertise: Direct involvement of healthcare professionalsensurespersonalizeddiagnosisandcare.
Disadvantages:
Time and Cost Intensive: Diagnosis processes are lengthyandoftenprohibitivelyexpensive.
Limited Accessibility: Healthcare facilities are challenging to access in remote or resource-limited regions.
Fragmented Approach: Existing methods focus on individual diseases rather than offering multi-disease detection.
DelayedDiagnosis:Dependenceonperiodiccheck-ups leadstolatedetectionofchronicdiseases.
StrainonHealthcareSystems:Highdemand,especially duringpandemics,overburdensinfrastructure,causing inefficiencies.
Aproposalhasbeenpresentedforthehealthcareindustry regarding Multiple Disease Detection Software, where it predictssixchronicdiseases-Alzheimer'sDementia, Heart Disease,BreastCancer,Diabetes,Parkinson'sDisease,and LungCancer.Thesoftwarereliesonacoupleofalgorithmsin predicting these diseases, making them easier to be compared on which is more precise and effective for the diagnosisofmultiplediseases.Userscanfeedintheirhealth data in the platform and receive predictions about risks toward such diseases. The system is based on Machine Learning, Deep Learning, and Bio-Inspired Algorithms to increase the chances of better predictions and early detectionofdiseases.Thisintegratedapproachnotonlyaids inidentifyingpotentialhealthrisksbutalsoattemptstofind solutionsforissuessuchasdelayeddiagnosisandrestricted access to healthcare. This software gives users an opportunity of assessing their health and prevention measures before these diseases occur through early detectionaccessibleplatforms.
The Multiple Disease Detection Software is a health prediction system that predicts several chronic diseases, includingAlzheimer'sdisease,dementia,breastcancer,heart disease,diabetes,Parkinson'sdisease,andlungcancer.The systemprovidesaneasy-to-useplatformforuserstoinput their health data and receive predictions regarding their likelihoodofdevelopinganyofthesediseases.Itcombines variousmachinelearninganddeeplearningalgorithmsto improve prediction accuracy, ensuring dependable early detection.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072
Figure5:SequenceDiagram
ForAlzheimer’sDementia,thesystemusesa 2D CNN model topredicttheseverityoftheconditionbasedonMRIscans. The system is built with an intuitive user interface that allowsuserstoupload2DMRIimagesforanalysis.
In the case of Breast Cancer prediction, the system uses ensemblelearningalgorithmslikeRandomForestClassifier andExtraTreesRegressorwithGeneticAlgorithmsandPCA forenhancingtheaccuracyandperformanceofpredictions. Itisamulti-diseasepredictiontooldesignedtoaidinearly diagnosisandpreventionthroughsoftware.
AnensemblelearningapproachimprovedwithAntColony Optimization (ACO) for heart disease. It utilizes five ensemble algorithms: Adaboost, Bagging, Random Forest, Gradient Boosting, and Extra Trees. Additionally, ACO, an optimization technique inspired by foraging ants, was applied to optimize each classifier for optimal prediction accuracy.TheAntColonyOptimizationassignspheromone values to the routes that ants travel, thus directing the searchtowardbettersolutionsthroughmanyiterations.At prediction of heart disease using a dataset, along with a highlyaccurateandoptimizedmodel.
The system for predicting Parkinson's Disease and Lung Cancer employs advanced machinelearning algorithms to predicttheconditionswithgoodaccuracybasedondistinct datainputs.InthecaseofParkinson'sDiseaseprediction,the systemisbasedonvoiceanalysis,acommonmethodusedin detecting speech disorders, which affect about 90% of patientswithParkinson's.Featuresareextractedfromvoice recordings including pitch, jitter, shimmer, and Melfrequencycepstralcoefficients(MFCCs).Theoptimizationof thesefeaturesisdoneusingtheACOtechnique,whichwill identifythemostpertinentfeaturesinclassification.Using the optimized feature set, Support Vector Machine will classify the subjects as either healthy or affected by Parkinson's Disease. The SVM algorithm works by maximizing the margin of separation between two classifications of the data points along the best-suited hyperplane.
Thesystempredictslungcancerbasedonvariousmedical data, including the age of the patient, gender, smoking history, and results from a CT scan. Data preprocessing is applied to handle missing values and scale features to preparethedatafortrainingthemodel.Classificationisdone usingtheXGBoostalgorithm,takingthebenefitsofgradient boostingtoboosttheaccuracyofpredictions.XGBoostbuilds aseriesofdecisiontrees,whereeachnewtreeisdesignedto correcttheerrorsofthepreviousone.Keyhyperparameters, including the learning rate (eta), tree depth (max_depth), andsampleproportions(subsampleandcolsample_bytree), are fine-tuned to improve overall performance. Upon completion of the training process, the XGBoost model estimatestheprobabilityoflungcanceroccurring,basedon thedataprovidedbythepatient.
ForAlzheimer'sDiseaseprediction,datacleaninginvolves imputing missing cognitive test scores and genetic data, while outliers in MRI data are handled using IQR. Feature selection is done through correlation analysis and mutual information to retain relevant features. Feature encoding applies label encoding for binary categories and one-hot encoding for other categorical data. Feature scaling standardizescontinuousfeatureslikecognitivescoresand age.Toaddressimbalanceddata,techniqueslikeSMOTEare used. Finally, dimensionality reduction with PCA or t-SNE helpsreducefeaturesizewhileretainingkeyinformation.
5.2
Ensemble Learning algorithms that were initially applied includeExtraTreesRegressorandRandomForestClassifier, followed by hyperparameter tuning that has applied GridSearchCV.Furthertuningofthemodelhasbeendoneon SequentialModelfromKerasTensorFlowalongwithPCAfor reducing the dimensionality. Eventually, the accuracy has beenobtainedtobe96.5%usingMLPClassifieralongwitha GAfor50generations.TheGAistheoptimizationtechnique inspiredbyprinciplesinGeneticsandNaturalSelection.A populationofcandidatesolutionsundergoesrecombination
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072
andmutationovergenerations.Eachcandidateisassigneda fitnessvalue,sofittersolutionsarefavoredinreproduction, mirroring Darwin's "Survival of the Fittest" theory. This generates an iterative approach to ensure that the generation.
5.3DiabetesPrediction
For diabetes prediction, data cleaning involves imputing missingvaluesforglucoseandinsulinusingmean/median or KNN, and detecting and handling outliers. Feature selectionis performedusing a correlationmatrixandChisquare test. Feature encoding includes label encoding for binary categories and one-hot encoding for multi-class categories. Feature scaling standardizes or normalizes continuous features like glucose and BMI. To address imbalanceddata,techniqueslikeSMOTEorrandomunder samplingareused Dimensionalityreductionisappliedwith PCAifneededtoreducefeaturesize.
Missingvaluesarehandledindatacleaningforheartdisease prediction through mean/median imputation or KNN. OutliersaredetectedthroughIQRorZ-score.Featuresare selected with a correlation matrix and Chi-square test. Encodingisusedforfeaturesthroughlabelencodingifthe feature is binary and one-hot encoding for categorical variables. Feature scaling standardizes or normalizes continuousfeatureslikeageandcholesterol.Techniqueslike oversamplingandundersampling,includingSMOTE,willbe employedtoaddressimbalanceddata.PCAisappliedfinally toreducefeatures.
ForthepredictionofParkinson'sDisease,voicedatacleaning includesnoisereductionbyspectralsubtractionorWiener filteringandsilenceremovalbyenergy-basedthresholding. Feature extraction is done using Mel-frequency Cepstral Coefficients (MFCCs), jitter, shimmer, and Harmonics-toNoise Ratio (HNR). Normalization is done with Min-Max scalingorZ-scorestandardizationtostandardizefeatures. PCA-based dimensionality reduction helps minimize the featurespace.andpreventoverfitting.Missingvalueswere imputedusingeitherthemeanorKNNmethod.imputation andlabelencodingoftargetswithahealthylabelof0andPD as
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072
5.6LungCancerPrediction
Data preprocessing includes handling missing values by mean/medianimputationorKNNandoutlierdetectionbyZscoreorIQR.Featureselectionisdonebyusingcorrelation matricesandChi-squareteststoselectimportantfeatures. Feature encoding uses one-hot or label encoding for categoricalvariables.FeaturescalingusesMin-Maxscaling or standardization. In case of imbalanced data, SMOTE or undersampling is used. Lastly, PCA or LDA reduces irrelevantfeaturesbydimensionalityreduction.
5.7 Feature Extraction
Key features of multiple diseases are extracted and optimizedtoenhancepredictionaccuracy Alzheimer'suses metrics like hippocampal volume, Breast Cancer relies on meanradius,HeartDiseasefocusesoncholesterolandblood pressure, all refined with ACO. Features of Diabetes are glucose and BMI, Parkinson's uses acoustic features like MFCCs, and Lung Cancer uses imaging metrics combined with clinical data, using PCA and Gradient Boosting for efficientclassification.
Classification methods adopted for the Multiple Disease Detection Software use the software utilizes machine learning, deep learning, and bio-inspired algorithms. The selectedtechniquesvaryasperdiseasetypes.Forexample, Alzheimeruses2DCNNforclassifyingimages;BreastCancer followsensemblemethodsRandomForestandExtraTrees. Heart Diseases predictions rely upon optimized ensemble technique, namely, Gradient Boosting and Adaboost improvedthroughACO.ForParkinson's,SVMclassifiesthe patientswiththehelpofacousticfeatures.InLungCancer prediction,thehighaccuracyvalueisobtainedwithXGBoost. Diabetesisdetectedwiththetraditionalclassifiers:Logistic RegressionandGradientBoosting.
1. 2D Convolutional Neural Network (CNN): Used for Alzheimer’s Dementia prediction by analyzing MRIscanstoclassifytheseverityofthecondition.
2. ACO, combined with Ensemble Methods like Adaboost, Bagging, Random Forest, Gradient Boosting, and Extra Trees, is applied to Heart Disease Prediction to optimize classifier performance using behavior-inspired search techniques.
3. Logistic Regression and Decision Tree Classifiers: UsedforDiabetespredictiontoclassify risklevelsbasedonpatientdata.
4. Support Vector Machine (SVM) optimized with Ant Colony Optimization (ACO): Used for Parkinson’sDiseasepredictionbyanalyzing voice featuresandselectingthemostrelevantones.
5. XGBoost (Extreme Gradient Boosting): Usedfor Lung Cancer prediction to improve classification accuracythroughiterativetree-basedlearning.
DataAugmentationTechniques
Data augmentation techniques improve performance by enrichingthevariabilityofthedataand,inturn,helpwith overfitting on class-imbalance. For instance, Alzheimer's Dementia MRI images are rotated, flipped, and zoomed. DatasetbalancingisappliedtoconditionssuchasDiabetes and Heart Disease through SMOTE. Noise injection makes Parkinson'sDiseasepredictionrobustastheysimulatethe actual voice collected in the real world, Geometric transformations,suchascroppingandscaling,areusedto enhance histopathological images for improved Breast Cancerdetection.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072
Figure7.1Resultondesktopapplication
Multi-Disease Prediction System has a user-friendly interface, which is divided into two major sections: Data Input&PredictionandResultDisplay.IntheDataInput& Predictionsection,patientscanuploadorinputtheirmedical recordsorimagingfiles.Thesystemthenpreprocessesthe data, extracts relevant features, and applies machine learning models to evaluate the risk of Alzheimer's, heart disease, and cancer. This section contains the predictions andtheircorrespondingconfidencescores.Therearecharts or medical images in addition to these. Results can be downloadedforfurtheranalysisorreporting.
“RespiratorySoundClassificationUsingMachineLearning” demonstrates the power of machine learning algorithms, particularly Convolutional Neural Networks (CNNs), in classifyingrespiratorysoundssuchaswheezes,crackles,and normal sounds. The system leverages advanced feature extraction techniques, including Mel Frequency Cepstral Coefficients (MFCCs), to transform raw audio data into usablefeaturesforclassification.ByutilizingaCNNmodel, theprojectsuccessfullyclassifiesaudiofilesintorespective categorieswithhighaccuracy,providingareliabletoolfor theearlydetectionofrespiratoryconditions.
These techniques hold significant potential in the medical field,particularlyfortheremotemonitoringofpatientswith chronic respiratory diseases, such as asthma and COPD. Automaticclassificationcapabilitiesallowforamuchshorter diagnosistime,thusprovidingthemedicalteamwithmore timely responses to critical cases. It also supplies a noninvasiveandcost-effectivemethodforassessingrespiratory health. The modularity of the project facilitates easy adaptation to diverse healthcare environments, thereby renderingitbothscalableandflexibleforfutureapplications.
Overall, the project not only contributes to the healthcare sectorbyimprovingdiagnosticefficiencybutalsoservesasa foundationforfurtherexplorationinthedomainofmedical soundclassificationusingmachinelearning.
Future enhancements for the Multiple Disease Detection Softwarecouldincludeintegratingadvanceddeeplearning modelslike3DCNNsandTransformersforbettermedical imageanalysis.Combiningmulti-modaldata,suchasgenetic and lifestyle information, would improve prediction accuracy. Expanding the system to real-time, cloud-based platforms would enable faster, scalable predictions. Incorporating explainable AI (XAI) would enhance model transparency,fosteringtrustamongcliniciansandpatients. Additionally, expanding datasets to include diverse populations and rare diseases, along with continuous learning capabilities, would keep the system up-to-date. Lastly, improving user interfaces and telemedicine integration could enhance accessibility and adoption in clinicalsettings.
[1]Gadekallu,T.R.,Alazab,M.,Kaluri,R.,Maddikunta,P.K.R., Bhattacharya, S. and Lakshmanna, K., 2021. Hand gesture classification using a novel CNN-crow search algorithm. Complex&IntelligentSystems,7,pp.1855-1868.
[2]Surendar,P.,2021.Diagnosisoflungcancerusinghybrid deepneuralnetworkwithadaptivesinecosinecrowsearch algorithm.JournalofComputationalScience,53,p.101374.
[3] Kavitha, R., Jothi, D.K., Saravanan, K., Swain, M.P., Gonzáles,J.L.A.,Bhardwaj,R.J.andAdomako,E.,2023.Ant ColonyOptimization-EnabledCNNDeepLearningTechnique forAccurateDetectionofCervicalCancer.BioMedResearch International,2023.
[4]Masud,M.,Singh,P.,Gaba,G.S.,Kaur,A.,Alroobaea, R., Alrashoud, M. and Alqahtani, S.A., 2021. CROWD: crow search and deep learning based feature extractor for classificationofParkinson’s disease.ACM Transactions on InternetTechnology(TOIT),21(3),pp.1-18.
[5]Sayed,G.I.,Hassanien,A.E.andAzar,A.T.,2019.Feature selectionviaanovelchaoticcrowsearchalgorithm.Neural computingandapplications,31,pp.171-188.