Cardiovascular Risk Prediction

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN: 2395-0072

Cardiovascular Risk Prediction

Er. Jagmeet Singh1 , Er. Charandeep Singh2

1Jagmeet Singh Dept. of Computer science and Engineering, Baba Farid College of Engineering and technology, Bathinda, Punjab

2Assistant Professor Charandeep Singh Bedi, Dept. of Computer Science and Engineering, Baba Farid College of Engineering and technology, Bathinda, Punjab

Abstract - Cardiovascular disease (CVD) impairs the function of the heart and blood vessels, which often resulting in death or physical paralysis. Thus, early and automated detection of CVD is important for human lives. Although various studies have goal to achieve this, but there still room for improving performance and reliability. This study makes contribution in this direction. It implemented two reliable machinelearning techniques multi-layer perceptron (MLP) and Knearest neighbour (K-NN) to detect CVD using data from the publicly availableUniversityofCaliforniaIrvine repository. Model performance is enhancedbyremoving outliers and attributes with null values. Experimental results show that the MLP model achieves a detection accuracy of 82.47% and an area-under-the-curve (AUC) value of 86.41%, outperforming the K- Nearest Neighbors model. Therefore, the MLP model is recommendedforautomaticCVDdetection.Additionally, the proposed methodology also applied on detection of other diseases and canbevalidatedusingotherstandard datasets.

Key Words: MLP,AUC,CVD,K-NN.

1. INTRODUCTION

Healthismostprominentaspectofeveryone'slife.However, numerousfactors,suchas workstress,unhealthylifestyles, andpsychologicalstrain,externalinfluenceslikehazardous work environments, inadequate health services, and pollution, contribute to millions of people worldwide developing chronic ailments like cardiovascular diseases (CVD). These conditions badly affect the heart and blood vessels, which are leading to death or disability. Recent reportsshowsthatasignificantproportionofhumandeaths areduetoCVD[1,2].ConditionsassociatedwithCVDinclude hypertension, hyperlipidaemia, thromboembolism, and coronaryheartdisease,allareresponsibleforheartfailure, andwithhypertension itisknowntobeasprimarycause [3]. In 2012, 7.4 million people died from coronary heart disease,while6.7milliondiedfromstroke[4].Accordingto theWorldHealthOrganization,nearly17millionpeopledie annuallyfromCVDs,accountingforapproximately31%of global deaths. Early diagnosis of CVD can potentially save

countlesslives.However,diagnosingandtreatingpatientsat earlystagesposesanoticeablechallengeforcardiologists.

Traditional CVD risk-assessment models often assume a linear relationship between each risk factor and CVD outcomes. These models tend to simplify complex relationships, including numerous risk factors with nonlinearinteractions.Itisessentialtoincorporatemultiplerisk factors accurately and examine the nuanced correlations betweenthesefactorsandoutcomes.Tilldate,nolarge-scale studyhasutilizedroutineclinicaldataandmachinelearning (ML) for prognostic CVD assessment. This study aims to determineifMLcanenhancecardiovascularriskprediction accuracyinprimarycarepopulationsandalsoidentifywhich MLalgorithmsyieldthemostconciseandeffectiveresults.

1.1 Objectives

1.Developanovelcardiovascularriskpredictionmodel using advanced machine learning techniques, integrating comprehensivepatientdataforimprovedaccuracy.

2.Validateandcomparethemodel'sperformancewith existingstandards,ensuringrobustnessandgeneralizability throughrigoroustestingandevaluation.

3. Enhance model interpretability to facilitate clinical integration, providing clear insights into risk factors for personalizedpatientcare.

1.2 Data Overview

The dataset is from an ongoing cardiovascular study on residents of the town of Framingham, Massachusetts. The classificationgoalistopredictwhetherthepatienthasa10yearriskoffuturecoronaryheartdisease(CHD).thedataset includesdataonover4,000individualswithcoronaryheart disease,with15characteristicsdescribingeachpatient.

1.3 Data Cleaning

PriortoEDA,cleaningthedataisimportantsinceitwillget ridofanyduplicateinformationthatcanhaveanimpacton the results. Education, heartRate, BPMed , totChol BMI , cigsPerDay,andglucosecolumnshavemissingornullvalues. IhavefilledthesecolumnsformissingvaluesbyusingKNN

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN: 2395-0072

imputer.Forcolumnssexandis_smoking,Ihavechangedthe categoricalvaluewithanumericalvalue(likeyesto1andno to0,andFto1andMto0).

1.4Exploratory Data Analysis

Mostpeoplesmokebetween0and10cigarettesaday.The riskofcoronaryheartdiseaseriseswithgrowingagetoage 63 and then declines beyond that Men have a higher propensity for developing coronary heart disease (CHD) than women. Women, in contrast, are at a higher risk of stroke, which often occurs at an older age. Compared to nonsmokers,smokershavemostlikelychanceofdeveloping coronaryheartdisease.

1.5Feature Engineering

Impactofheartrateonthetargetvariable:Peoplewithhigh heart rates havea highrisk ofhaving CHDin the next 10 years. Impact of cholesterol level on the target variable: Peoplewithhighcholesterollevelshaveahighriskofhaving CHD in the next 10 years. So impact of age on the target variable:Olderpeoplehaveahighriskof havingCHDinthe coming10years.ImpactofBodyMassIndexonthetarget variable:Peoplewithhighbodymassindexhaveahighrisk ofhavingCHDinthenext10years.Impactofsystolicblood pressure on the target variable: People with high systolic bloodpressurehaveahighriskofhavingCHDinthenext10 years.ImpactofGlucoseonthetargetvariable:Peoplewith highGlucosehaveahighriskofhavingCHDinthenext10 years.InsightsUsingFeatureEngineering.

1.5 Data preparation

Thehighcorrelationbetween:

1.CigsPerDayandissmoking

2.SysBPandPrevalentHype

3.DiaBPandSysBP

Combined SysBpandDiaBPto denotea new feature pulse rate.

DroppingCigsperDay,issmoking,SysBP,PrevalentHype, andDiaBPcolumnsfromthedataset.

Addedthecolumnslike,BMIbucket,pulsepressureandage bucket Thedependentcolumnwillbepredictedasthatisthe targetvariablenamed“TenYearCHD.

Stepsinvolved:

1. ExploratoryDataAnalysis:

After loading the dataset, performed this method by comparingourtargetvariablethatis10-yearriskoffuture coronary heart disease (CHD) with other independent

Figureno.1:Impactofheartrateonthetargetvariable
Figureno.3:Impactofcholesterollevelonthetargetvariable
Figureno.2:Impactofsystolicbloodpressureonthetarget variable
FigureNo.3:Imapctofsystolicbloodpressureonthe targetvariable

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN: 2395-0072

variables. This process helped us figuring out various segments and relationships among the target and the independent variables. It gave us a better idea of which feature behaves in which manner compared to the target variable.

2. NullvaluesTreatment:

3. Thisdatasetcontainsalargenumberofnullvalues which might tend to disturb our accuracy hence used a KNN-imputer to perform missing value imputation,andprocesseddatatoremoveoutliers.

FigureNo.4:Dataofpeoplesmokecigarettesperday between0-10.

FigureNo.5:Histogramplotfortheageofpeopleinthis dataset

Encoding of categorical columns

WeusedOneHotEncodingtoproducebinaryintegersof0 and1toencodeourcategoricalfeaturesbecausecategorical featuresthatareinstringformatcannotbeunderstoodby themachineandneedstobeconvertedtonumericalformat.

FigureNo.5:RepresentationofdataofBP,cholesteroland BMI

Feature Selection

InthesestepsweusedalgorithmslikeExtraTreeclassifierto check the results of each feature that is, which feature is moreimportantcomparedtoourmodelandwhichisofless important.

Next,thereisChi2usedforcategoricalfeaturesandANOVA for numericalfeaturestoselectthebestfeaturewhichwe willbeusingfurtherinourmodel.Fittingdifferentmodels

Formodellingwetriedvariousclassificationalgorithmslike:

1.LogisticRegression

2.SVMClassifier

3.RandomForestClassifier

4.GridSearchCV

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN: 2395-0072

SMOTEboostingforfeatures

Tosolvetheissueofclassimbalance,SMOTEboostingwas usedtoover-sampletheminorityclassobservations.

2. CONCLUSIONS

The patient's health may suffer greatly as a result of this Datawereresampledbecausetheyweren'tbalanced.High accuracycanbeachievedwithimbalanceddata,however,in these situations, recall, precision, and F1 score must be considered. Wih the KNN-imputer, it is performed for missing value imputation, and processed data to remove outliers. To solve the issue of class imbalance, SMOTE boosting was used to over-sample the minority class observations.Newerelementslikeagebucket,BMIbucket andpulsepressurethathelpedtoexplaintheseparationin theRiskwerecreatedusingtheinformationfromEDA.Due to the parametric relationship in the data, a logistic regressionmodelwasimplemented,anditwassuccessfulin achievingaRecallof74.5%.Eventhoughtherecallscorefor SVMwas81.9%,SVMisnotaninterpretablemodel,thusI choseaninterpretablemodelforthissituation.Allmeasures, including Precision, Recall, Accuracy, and F1 score were evaluated for each model. Based on this analysis, Logistic regressioncanidentifypositivecaseswitha74.5%Recall. Usingadecisiontree,positivecasesmaybepredictedwitha recallof49%.WiththehelpofRandomForest,positivecases maybepredictedwitha49%Recall.UsingaSupportVector Machine, positive cases can be predicted with an 81.9% Recall.UsingGridSearchCV,positivecasescanbepredicted with81.9%Recall

REFERENCES

[1]Cardiovascular diseases (CVDs). http://www.who.int/newsroom/factsheets/detail/cardiova scular-diseases-(cvdsaccessedon30/9/2018.

[2]Patel B, Sengupta P. Machine learning for predicting cardiac events: what does the future hold? Expert Rev CardiovascTher.2020;18(2):77–84.

[3]Baharvand-Ahmadi B, Bahmani M, Zargaran A. A brief reportofRhazesmanuscriptsinthefieldofcardiologyand cardiovascular diseases. Int J Cardiol. 2016;207:190–1.

[4]Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLOS ONE. 2017;12(4):e0174944.

[5]YanH,YeQ,ZhangT,YuD-J,YuanX, XuY, etal.Least squarestwinboundedsupportvectormachinesbasedonL1norm distance metric for classification. Pattern Recogn. 2018;74:434–47.

[6]JaworskiM,DudaP,RutkowskiL.Newsplittingcriteria for decision trees in stationary data streams. IEEE Trans NeuralNetwLearnSyst.2018;29:2516–29.

[7]ZhangS,ChengD,DengZ,ZongM,DengX.AnovelK-NN algorithm with data driven k parameter computation. PatternRecognLett.2018;109:44–54.

[8]Abdar M, Zomorodi-Moghadam M, Das R, Ting IH. Performance analysis of classification algorithms on early detection of liver disease. Expert Syst Appl. 2017;67:239–51.

[9]AbdarM,YenNY,HungJC-S.Improvingthediagnosisof liver disease using multilayer perceptron neural network andboosteddecisiontrees.JMedBiolEng.2017;10:1–13.

[10]Pławiak P. Novel genetic ensembles of classifiers applied to myocardium dysfunction recognition based on ECGsignals,Swarm.EvolComput.2018;39:192–208.

[11]Pławiak P. Novel methodology of cardiac health recognition basedonECGsignalsandevolutionary-neural system.ExpertSystAppl.2018;92:334–49.

[12]KhozeimehF,AlizadehsaniR,RoshanzamirM,Khosravi A, Layegh P, Nahavandi S. An expert system for selecting warttreatmentmethod.ComputBiolMed.2017;81:167–75.

[13]KhozeimehF,AzadFJ,OskoueiYM,JafariM,Tehranian S, Alizadehsani R, et al. Intralesional immunotherapy compared to cryotherapy in the treatment of warts. Int J Dermatology.2017;56:474–8.

[14]Alizadehsani R, Abdar M, Jalali SMJ, Roshanzamir M, Khosravi A, Nahavandi S. Comparing the performance of feature selection algorithms for wart treatment selection. Proc.Int.WorkshopFutureTechnol;2018.p.6–18.

[15]https://archive.ics.uci.edu/ml/datasets/Heart+Disease.

BIOGRAPHY

JagmeetSingh,Dept.ofComputer Science and Engineering, Baba Farid College of engineering and technology,Bathinda,Punjab. 1’st

Author Photo

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Cardiovascular Risk Prediction by IRJET Journal - Issuu