
International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
![]()

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
Ruchi Chauda * , Nagendra Kumar, Shobha Rajak
*Research Scholar Department of CSE, Shri ram Institute of Science and Technology, Jabalpur, M.P.
**Prof., Department of CSE, Shri ram Institute of Science and Technology, Jabalpur, M.P.
Abstract- Cardiovascular diseases (CVDs) are the leading cause of mortality worldwide, emphasizing the need for early and accurate diagnosis. This study investigates the impact of various data sampling techniques on the performance of machine learning and deep learning classifiers for cardiovascular disease detection. A publicly available Kaggle dataset consisting of 253,680 records and 22 features was utilized. To address data imbalance, techniques such as Random Undersampling, Random Oversampling, SMOTE, DBSMOTE, Cluster-Based Sampling, and Principal Component Analysis (PCA) were applied. The performance of multiple classifiers including Logistic Regression, Decision Tree, Random Forest, XGBoost, and a custom CNN was evaluated. Results showed that sampling methods significantly influence model accuracy, with DB-SMOTE and PCA consistently improving classifier performance. The CNN achieved the highest accuracy of 90% without sampling and 89.69% with PCA, while XGBoost peaked at 90% using SMOTE. These findings highlight the importance of appropriate sampling in building robust and reliable diagnostic models for CVD detection.
Keywords: Cardiovascular Disease (CVD), Data Imbalance, Data Sampling Techniques, SMOTE, DB-SMOTE, Principal Component Analysis (PCA), Machine Learning, Convolutional NeuralNetwork(CNN),ClassificationAccuracy.
Cardiovascular disease (CVD) is the topmost cause of death worldwide. The World Health Organization (WHO) demonstrated that an estimated 17.9 million people died fromCVDsin2016,representing31%ofallglobaldeaths[1]. Acutecoronarysyndrome(ACS),a commonandserioustype of CVD, causes a lack of oxygen to the heart and can lead to unstable angina or myocardial infarction (MI)[2]. The predictionofmajorCVDtoestimatetheriskofACS,hasbeen intervention of ACS at early stage[3],[4]. It includes several types of cardiac events, such as MI, coronary artery bypass grafting (CABG), re-percutaneous coronary intervention (re-
PCI), cardiac death, and non-cardiac death, etc. Electrocardiogram (ECG), Age, Risk factors and Troponin (HEART),ThrombolysisinMyocardialInfarction(TIMI),Global Registry of Acute Coronary Events (GRACE)[5], Framingham riskscores[6],withdifferentkindsofriskusedtoestimatethe seriousness for patients. Nevertheless, there are several limitations of those risk scores, which consider only very few patients’prognosticfactorsandhavebeengeneratedlongtime ago. On the other hand, the previous traditional risk scoring tools cannot make a good prediction result for patients with ACS[5],[7]. Hence it is necessary to consider the new prognostic factors for prediction in patients with ACS over time.
For this reason, artificial intelligence (AI) techniques such as machine learning (ML) are recently focused and can not only enhance the performance while discovering multiple important prognostic factors, but also overcome some typical data problems such as missing values and outliers. Many different ML algorithms, such as logistic regression (LR), support vector machine (SVM), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost) are better model for risk prediction[8]–[11]. Even though most of the individual ML-based prediction models of the MACE occurrences have outstanding performance, there are also several challenging problems as follows. First, not any single ML algorithm is anytime better than other one in the same domain. Second, the hybridization of different mlalgorithms get better result than an individual algorithm[12]. When gettingintomedicaldomainwherepreciseresultsareneeded, the accurate result means the high efficiency of diagnosis. Third, there is an imbalanced problem with the classes of experimentaldataset.So,thedatasamplingtechniquesneedto be investigated to solve the bias issue of the class distribution[10],[13],[14].
Therefore,thisworkisgoingtoproposeaensemblemodelfor health occurrences in patients with ACS using seven popular ML algorithms such as LR, SVM, KNN, DT, RF, XGBoost, and AdaBoost as base learners while dealing with the class

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
imbalance problem through data sampling techniques. The KNN-based imputation approach, which has shown the efficiency to handle the missing values[15], is applied to solve the missing values when pre-processing the raw dataset. Finally, we propose a stacking ensemble prediction model using seven ML algorithms as base learners to overcomethelimitationsandimprovetheperformanceofthe individualML-basedpredictionmodels.
The developed method attained an accuracy of 95.9% [16]. Waqaretal.proposedSMOTE-baseddeeplearningtopredict heart attacks. The author used SMOTE technique to balance the dataset without feature selection. The balanced dataset trained and tested by a deep neural network to predict the absence and presence of a heart attack and obtained 96% accuracy [17]. Recently, Ishaq et al. used SMOTE to balance data distribution to predict patient survival using RF importance ranking [18]. Moreover, some researchers used the hybrid method to detect the patients’ severity levels. Salari et al. introduced a hybrid method including a genetic algorithm (GA) for feature selection, a modified kNN, and a Back propagation neural network for severity classification. The results revealed that the method obtained 62.1% accuracy[19].Inthesamecontext, Wihartoetal.proposeda hybrid method based on the binary tree (BT) and SVM for predicting the sickness level of HD. The study obtained 61.86%accuracy[20].Unlikely,Khateebetal.getre-sampling filter for the data and then utilized kNN (IBK) to predict the disease. The model obtained accuracy up to 79.20% [21]. Magesh and Swarnalatha developed an optimal FS method based on cluster-based DT learning (CDTL) to find the significant features and then employed the RF to predict the severitylevel.Thedevelopedmodelachieved89.3%accuracy for severity level predictions [22]. Recently, the authors in [23] developed a decision system for HD severity level prediction based on an ML-based fusion approach. Most studies focused on the importance of HD attributes in selectingthebestfactorsandtheoutlier’sremovaltoenhance the ML model performance. However, in the field of ML, imbalance data and hyper parameters tuning may arise and impact the prediction model’s performance since many ML parameters have different optimums to reach the best performance on various tasks and datasets. Mainly for complex ML systems with many hyper parameters and large datasets. Still, none of the previous studies integrated optimizationmethodswithdatabalancingandMLtechniques formodelHyperparametersOptimization(HPO)toinfluence the model’s prediction performance [24] [25]. This thesis look after all the problems in existing work for heart failure and develops an ML model to forecast heart failure and predict heart disease severity. We apply SMOTE, Undersampling and Oversampling to assist with the imbalanceprobleminbothforecastproblems.Weemploysix MLalgorithmvariations,RandomForest,LogisticRegression,
kNN, Decision Trees (DT), XGBoost, and CNN, to predict heart failure and severity levels. The selected classifiers, together with SMOTE, were optimized using HB to find the best performing hyper parameters since there are a variety of optimum values for ML hyper parameters for different datasets.
Cardiovascular diseases (CVDs) are the leading cause of mortality worldwide, accounting for nearly 18 million deaths annually, according to the World Health Organization (WHO).Despitemedicaladvancements,asignificantnumberof casesgoundetectedorarediagnosedatalatestage,oftendue to inadequate screening tools or imbalanced healthcare datasets.
In real-world medical datasets, class imbalance is a critical issue where the number of healthy cases vastly outnumbers the number of CVD-positive cases. This imbalance severely impacts the performance of machine learning models, causing them to be biased toward the majority class and potentially misclassifyinghigh-riskpatients.
To address this, various data sampling techniques like SMOTE, DBSMOTE, and PCA have been developed. However, their effectiveness across different classification models, including both traditional machine learning algorithms and deep learning models like CNN, remains a topic of active research.
Thisresearchismotivatedbytheneedto:
Improve early detection of cardiovascular diseases using predictiveanalytics.
Mitigate class imbalance using advanced sampling strategies.
Compare classifier performance (DecisionTree,XGBoost, Random Forest, Logistic Regression, and CNN) in a healthcarecontext.
Help build more robust, fair, and accurate models for real-worlddiseasepredictionusinglarge-scaledatasetslike theonesourcedfromKaggle.
Ultimately, this study seeks to bridge the gap between datadriven approaches and clinical decision-making, offering valuable insights for both data scientists and healthcare professionals.
Dwivedi et al. tested the performance of different machine learning techniques for the prediction of heart failure [25]. Amin et al. evaluated different data mining techniques and identificationofsignificantfeaturesforpredictingheartfailure [26]. From experimental results, it was observed that the best

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
performance of the data mining technique for classification accuracywas87.4%fortheheartfailureprediction.Recently, Resul et al. achieved heart failure detection accuracy of 89.01%, sensitivity of 80.95% specificity of 95.91% by developing ensemble of neural network model. Samuel et al. further improved the heart failure prediction accuracy to 91.10% by proposing a novel hybrid decision support by integrating ANN and fuzzy analytic hierarchy (Fuzzy-AHP). Furthermore, Ali et al. improved the heart failure prediction accuracy to 92.22% by developing a novel diagnostic system that hybridized two SVMs. The first SVM was used for selectionofimportantfeaturesandthesecondSVMwasused for prediction purposes. Both the models were optimized by using a new searched algorithm. Most recently, Paul et al. further improved the heart failure prediction accuracy to 92.31% by proposing an adaptive weighted fuzzy system ensemble method. However, the HF prediction accuracy still needs considerable amount of improvement. It is important to note that each time, the obtained subset of features is applied to RF algorithm for classification and the optimal hyper parameters of RF are searched out using grid search algorithm.
This work uses Kaggle's Stroke Prediction dataset to predict heart stroke where the classes are not balanced [27]. Resampling strategies are used to balance the dataset so that the trained model will produce accurate results for all the target variable's classes. Random Forest classifier yields a betteraccuracyof97.9%.
Inthisstudy[28],weusefivemachinelearningtechniquesas comparison which machine learning technique has a most accuracy to recognize heart disease in someone's condition. In this case, we are using UCI Cleveland Dataset as a sample andtheresultshowsthattheSupportVectorMachineandKNearestNeighborgivethemostaccuracywhichis85%along withmanyaspectsrespectively.
In [29] authors proposed ensemble based model. They used ML models, including RF, Logistic Regression, KNN, XGBoost, DT and SVM. These models were optimized and stackensembletoformthe‘CARDIACX’modeltopredicttheriskof heartdisease.Themodelsdemonstratingrobustperformance on various metrics such as AUC, PR Curve, Log Loss, Jaccard Score, and MCC. This study provides a reliable and transparent framework for early detection of heart disease, providingusefulinformationandenablingpatientstoreceive timelyandpersonalizedcare.
In this study [30], two DL models, Ensemble based Cardiovascular Disease Detection Network (EnsCVDD-Net) and Blending based Cardiovascular Disease Detection Network(BlCVDD-Net),areproposedforaccurateprediction and classification of CVDs. The results indicate that the
proposed model outperforms all base models with 88% accuracy, 88% F1-score, 91% precision, 85% recall, and 777s execution time. Similarly, with 91% accuracy, 91% F1-score, 96%precision,86%recall,and247sexecutiontime,proposed workoutperformsthestate-of-the-artDLmodels.
This study [31] examined how to make and compare different CVD prediction models using a large dataset that included biochemical,clinical,anddemographicinformationabouteach person.Duringthepre-processingstage,wetookgreatcareto ensure the data’s accuracy and quality. We have utilized a variety of machine learning algorithms such as random forest, logisticregression,supportvectormachines,anddeeplearning neural networks. We assessed the performance of these models using the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUC-ROC). Ourfindingsshowthatwhilemoresophisticatedalgorithms especially deep learning models perform better at spotting possibleinstancesofCVD,moreconventionalmodels suchas regression offersignificantpredictivepower.
Thisstudy[32]usesadatasetofclinicalanddemographicdata to evaluate the capacity of heart disease prediction using logistic regression and decision tree models. Logistic regression produced a recall of0.85 that is, 85% of the true events and a precision of 0.88 that is, 88% of the predictionsforpatientscategorizedasnothavingheartdisease (Class0).WithanF1-scoreof0.87,themodelexhibitsbalanced performance in this range. Class 0 Decision Tree revealed considerably lower measurements for precision, recall, and F1score, with values of 0.83, 0.80, and 0.81 accordingly. For patients with heart disease (Class 1), logistic regression yielded excellent results: precision of 0.87, recall of 0.89, andF1-score of 0.88. In Class 1, the decision tree performed well, achieving F1-score measures of 0.85, recall metrics of 0.86, and accuracy measures of 0.84 by comparison. With an accuracyof88%insteadof85%,logisticregressionturnedout tobegenerallysuperiortodecisiontrees.Theseresultsreveal that since logistic regression better balances accuracy and recall for both groups, it is preferable for clinical usage in the diagnosis of heart disease. These findings will be extremely usefultomedicalpractitionersseekingtoimprovepatientcare throughexactpredictionmodels.
In such a stroke-patients group, patients with heart attack are much less than patients without heart attack. The available data is very imbalanced. On the other side, the analyzed medical indicators in the existing work are more than such indicators as heart rate, blood sugar, respiratory rate and bloodpressure.Inclinic,theabovefourmedicalindicatorsare common and easy to obtain. The architecture of proposed modelisshownbelow:

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072

Figure3.1:ProposedmodelWorkflow.
In this paper, we proposed work about heart attack predictioninstrokepatientsbasedonthedatabalancingand dataprocessingarepresented.
In the workflow, analyzing stroke-patients data is firstly done. In the workflow a new data handling algorithm is designed. Related analyses are done in order to make clear role of each module and to measure complexity. And performancesofclassicalclassifiersareinvestigatedbasedon data-processing results of the proposed algorithm for the heartattackprediction.
3.1 Data Sampling Techniques
1. No Sampling (Original Dataset)
Definition:Usesthedatasetinitsoriginalform.
Use Case: When class distribution is already balanced or you'retestingabaseline.
Limitation:Canleadto bias ifdataisimbalanced.
2. Random Undersampling
Definition: Randomly removes instances from the majority class
Advantage:Balancesclassesbyreducingthemajority.
Limitation: Can discard valuable information, possibly underfitting.
3. Random Oversampling
Definition: Randomly duplicates instances from the minority class
Advantage:Easytoimplement,balancesdata.
Limitation: Can lead to overfitting, since the same samplesarerepeated.
4. Cluster-based Sampling
Definition:Appliesclustering(e.g.,K-Means)togroupdata, thensamplesproportionallyfromclusters.
Advantage:Capturesthediversityofdata.
Limitation:Clusterqualityimpactssamplingeffectiveness.
5. SMOTE (Synthetic Minority Oversampling Technique)
Definition: Synthesizes new examples by interpolating betweenminorityclasssamples.
Advantage: Reduces overfitting compared to random oversampling.
Limitation:Riskofcreatingambiguoussamplesornoise.
6. DBSMOTE (Density-Based SMOTE)
Definition: Enhances SMOTE by focusing on dense areas oftheminorityclassandavoidsnoise.
Advantage: More robust than SMOTE; avoids overlapping withmajorityclass.
Limitation:Morecomplexandcomputationallyheavier.
7. PCA (Principal Component Analysis)
Definition: Dimensionality reduction technique that projectsdataintoprincipalcomponents.
Usage in Sampling:Helpsbalanceclassvarianceorreduce noisebeforesampling.
Advantage: Enhances model speed, possibly improves accuracy.
Limitation: May lose some information (explained variance).
Step 1: Read the dataset.
Step2:Normalizethenumericalvaluesofdata.
Step3:Deletethenullvaluesandduplicaterecords.
Step4:Trainthesixclassifiersonebyone.
Step5:Testtheclassifiersandpredicttheresults.
Step6:ApplyrandomUnderSamplingtechniquesondatasets.
Step7:Repeatstep4and5.
Step 8: Apply Cluster Centroid Sampling techniques on datasets.
Step9:Repeatstep4and5.
Step10:ApplyRandomoversamplingtechniquesondatasets.
Step11:Repeatstep4and5.
Step12:ApplySMOTEtechniquesondatasets.
Step13:Repeatstep4and5.
Step14:ApplyDBSMOTEtechniquesondatasets.
Step15:Repeatstep4and5.
Step16:ApplyPCAtechniquesondatasets.
Step17:Repeatstep4and5.
Step18:Endofalgorithm.
Basically, we felt the need to improve the current studies in this field and analysed previous models to determine what mightbelacking,afterwhichwetooktheinitiativetodevisea solution after applying data balancing techniques that might

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
reshape the current datasets and provide a better of results thatmakesthesystemsuitableforpracticalimplementation.
The classification performance can be evaluated in terms of accuracy, precision, recall and f1-score. The table below shows the result comparisons of all classifiers before data balancingandafterdatabalancingtechniques.
Table 4.1: Accuracy Comparison before data balancing techniques.
ClassifiersUsed
Table6.1: AccuracyComparisonChart.
From above chart it is viewed that the CNN Classifier has betterperformanceascomparedtootherclassifiersinterms ofaccuracy.
Now,wewillapplyRandomoverSamplingfordatabalancing and compare the accuracy. Table 4.2: Accuracy comparison afterrandomundersampling.
ClassifiersUsed
Table6.2:AccuracyComparisonafterrandomUndersampling.
Fromtheabovechartwecanconcludethattheaccuracyofall theclassifiersdecreasesafterrandomundersamplingexcept the decision tree. The Accuracy of Decision Tree Classifier getsincreasedafterrandomundersampling.
AfterRandomundersampling,wewillapplyClusterCentroid Samplingandcheckfortheresults.Thetable4.3belowshows the accuracy comparisons of all the classifiers after Centroid Cluster sampling. From the results, we can conclude that after Cluster Centroid Sampling, CNN has the better performance.
Table4.3:AccuracyComparisonafterClusterbasedsampling. ClassifiersUsed
Table4.3:AccuracyComparisonafterClusterCentroidunder sampling.
After Undersampling, we are going to test our result with oversampling. Table 6.4 below shows the accuracy comparisons of the classifiers after Random Oversampling for databalancing.
Table4.4:AccuracyafterrandomOversamplingtechnique.
ClassifiersUsed
After apply RandomOversampling
Table4.4:AccuracyafterRandomOversampling.
From the above chart it is clear that Random Forest Classifier has better results as compared to other classifiers after RandomOversamplingofdatabalancingtechnique. Now, we will apply SMOTE and compare the results of all the classifiers. Table 4.5 below shows the accuracy comparison afterSMOTETechniques.
Table4.5:AccuracyComparisonafterSMOTE.
Classifiers Used Accuracy After apply
Table4.5:AccuracyafterSMOTE.
AfterSMOTE,wewillapplyDBSMOTEandtestourresultswith all the six classifiers. Table 6.6 below shows the accuracy comparisonafterDBSMOTE.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
Table5.6:AccuracyafterDBSMOTE.
ClassifiersUsed
Table5.6:AccuracyafterDBSMOTE.

Figure4.3:FinalAccuracyComparisonforXGBoostmodel.

Figure4.2:FinalAccuracyComparisonforCNNmodel.
Results of various classifiers are compared against the different data sampling techniques used. We use classifiers for classifying the data based on two classes, “Heart attack” and“Notattack”type.TheclassifiersusedareRandomforest, Logistic Regression, Decision Tree, XGBoost and CNN. Rom theresultswecanconclude thatDecisionTree withRandom underSamplinghasthebestaccuracy.TheCNNclassifierhas
better performance after PCA. Random Forest classifier is not showing much deviation with data sampling techniques. Logistic Regression with PCA is showing better results. XGBoost with PCA is also having better results. So, the Principle Component Analysis sampling technique is showing better results as compared to other sampling techniques for heartattackpredictionsystem.
The main motivation of this thesis is to provide an insight about detecting and curing heart disease using machine learning technique. For this thesis, data were collected from Kaggle Data Sets. All attributes are numeric-valued. These attributes are fed into Random Forest, Logistic Regression, KNN,Decision Tree, XGBoostandCNN,in whichCNN gavethe betterresultwiththehighestaccuracymostoftimesafterdata balancing techniques applied. Valid performance is achieved using CNN algorithm in diagnosing heart diseases and can be furtherimprovedbyincreasingthenumberofattributes.
Thus, in an environment similar to that of the used dataset, if allthefeaturesarepreprocessedsuchthattheyacquirenormal distribution,allthedatabalancingtechniquesareapplied,CNN is a good selection to obtain a robust prediction model. And, such models provide a valuable assistant to the society for healthcaremanagementdomain.
1. Cardiovascular Diseases (CVDs). Accessed: Apr. 15, 2021. [Online]. Available: https://www.who.int/news-room/factsheets/detail/cardiovascular-diseases-(cvds).
2. E.A.Amsterdam,N.K.Wenger,R.G.Brindis,D.E.Casey,T. G. Ganiats, D. R. Holmes, A. S. Jaffe, H. Jneid, R. F. Kelly, M. C. Kontos, and G. N. Levine, ‘‘2014 AHA/ACC guideline for the managementofpatientswithnon–ST-elevationacutecoronary syndromes: A report of the American College of Cardiology/AmericanHeartAssociationTaskForceonPractice Guidelines,’’ J. Amer. College Cardiol., vol. 64, no. 24, pp. 139–228,Dec.2014.
3. K.E.Kip,K.Hollabaugh,O.C.Marroquin,andD.O.Williams, ‘‘The problem with composite end points in cardiovascular studies: The story of major adverse cardiac events and percutaneouscoronaryintervention,’’J.Amer.College Cardiol., vol.51,no.7,pp.701–707,Feb.2008.
4. D. Hu, W. Dong, X. Lu, H. Duan, K. He, and Z. Huang, ‘‘EvidentialMACEpredictionofacutecoronarysyndromeusing electronic health records,’’ BMC Med. Informat. Decis. Making, vol.19,no.S2,pp.9–17,Apr.2019,doi:10.1186/s12911-0190754-7.
5. J.M.Poldervaart,M.Langedijk,B.E.Backus,I.M.C.Dekker, A. J. Six, P. A. Doevendans, A. W. Hoes, and J. B. Reitsma, ‘‘Comparison of the GRACE, HEART and TIMI score to predict

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
major adverse cardiac events in chest pain patients at the emergencydepartment,’’Int.J.Cardiol.,vol.227,pp.656–661, Jan.2017,doi:10.1016/j.ijcard.2016.10.080.
6. R.B.D’Agostino,M.J.Pencina,J.M.Massaro,andS.Coady, ‘‘Cardiovascular disease risk assessment: Insights from Framingham,’’GlobalHeart,vol.8,no.1,pp.11–23,2013,doi: 10.1016/j.gheart.2013.01.001.
7. J.-M. Kwon, K.-H. Jeon, H. M. Kim, M. J. Kim, S. Lim, K.-H. Kim, P. S. Song, J. Park, R. K. Choi, and B.-H. Oh, ‘‘Deeplearning-based risk stratification for mortality of patients with acute myocardial infarction,’’ PLoS ONE, vol. 14, no. 10, Oct.2019,Art.no.e0224502.
8. W. Chang, Y. Liu, Y. Xiao, X. Yuan, X. Xu, S. Zhang, and S. Zhou, ‘‘A machine-learning-based prediction method for hypertension outcomes based on medical data,’’ Diagnostics, vol. 9, no. 4, p. 178, Nov. 2019, doi: 10.3390/diagnostics9040178.
9. M. Saqlain, B. Jargalsaikhan, and J. Y. Lee, ‘‘A voting ensemble classifier for wafer map defect patterns identification in semiconductor manufacturing,’’ IEEE Trans. Semicond.Manuf.,vol.32,no.2,pp.171–182,Mar.2019,doi: 10.1109/TSM.2019.2904306.
10.K. Davagdorj, J. S. Lee, V. H. Pham, and K. H. Ryu, ‘‘A comparative analysis of machine learning methods for class imbalance in a smoking cessation intervention,’’ Appl. Sci., vol.10,no.9,p.3307,May2020,doi:10.3390/app10093307.
11.S. W. A. Sherazi, Y. J. Jeong, M. H. Jae, J.-W. Bae, and J. Y. Lee, ‘‘A machine learning–based 1-year mortality prediction modelafterhospitaldischargeforclinicalpatients withacute coronary syndrome,’’ Health Informat. J., vol. 26, no. 2, pp. 1289–1304,Jun.2020,doi:10.1177/1460458219871780.
12.S. W. A. Sherazi, J.-W. Bae, and J. Y. Lee, ‘‘A soft voting ensemble classifier for early prediction and diagnosis of occurrences of major adverse cardiovascular events for STEMI and NSTEMI during 2-year follow-up in patients with acute coronary syndrome,’’ PLoS ONE, vol. 16, no. 6, Jun. 2021, Art. no. e0249338, doi: 10.1371/journal. Pone. 0249338.
13.Q. Zou, S. Xie, Z. Lin, M. Wu, and Y. Ju, ‘‘finding the best classificationthresholdinimbalancedclassification,’’BigData Res., vol. 5, pp. 2–8, Sep. 2016, doi: 10.1016/j.bdr.2015.12.001.
14.G.Haixiang,L.Yijing,J.Shang,G.Mingyun,H.Yuanyue,and G. Bing, ‘‘Learning from class-imbalanced data: Review of methods and applications,’’ Expert Syst. Appl., vol. 73, pp. 220–239,May2017,doi:10.1016/j.eswa.2016.12.035.
15.O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman, ‘‘Missing value estimation methods for DNA microarrays,’’ Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001, doi: 10.1093/bioinformatics/17.6.520.
16.N.L.Fitriyani,M.Syafrudin,G.Alfian,andJ.Rhee,‘‘HDPM: An effective heart disease prediction model for a clinical
decision support system,’’ IEEE Access, vol. 8, pp. 133034–133050,2020.
17.M.Waqar,H.Dawood,H.Dawood,N.Majeed,A.Banjar,and R. Alharbey, ‘‘An efficient SMOTE-based deep learning model forheartattack prediction,’’ Sci.Program.,vol.2021,pp. 1–12, Mar.2021.
18.A.Ishaq,S.Sadiq,M.Umer,S.Ullah,S.Mirjalili,V.Rupapara, and M. Nappi, ‘‘Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques,’’IEEEAccess,vol.9,pp.39707–39716,2021.
19.N. Salari, S. Shohaimi, F. Najafi, M. Nallappan, and I. Karishnarajah, “A novel hybrid classification model of genetic algorithms, modified Knearest neighbor and developed Backpropagation neural network,’’ PLoS ONE, vol. 9, no. 11, Nov.2014,Art.no.e112987.
20.W. Wiharto, H. Kusnanto, and H. Herianto, “Performance analysisofmulticlasssupportvectormachineclassificationfor diagnosis of coronary heart diseases,’’ 2015, arXiv: 1511.02352.
21.N. Khateeb and M. Usman, ‘‘Efficient heart disease prediction system using K-nearest neighbor classification technique,’’inProc.Int.Conf.BigDataInternetThing(BDIOT), 2017,pp.21–26.
22.G. Magesh and P. Swarnalatha, “Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction,’’Evol.Intell.vol.14,no.2,pp.583–593,Jun.2021.
23.H. B. Kibria and A. Matin, “The severity prediction of the binary and multi-class cardiovascular disease A machine learning-based fusion approach,’’ Comput. Biol. Chem., vol. 98, Jun.2022,Art.No.107672.
24.S. Shin, B. Ko, and H. So, ‘‘Noncontact thermal mapping method based on local temperature data using deep neural network regression,’’ Int. J. Heat Mass Transf., vol. 183, Feb. 2022,Art.No.122236.
25.A. K. Dwivedi, ‘‘Performance evaluation of different machine learning techniques for prediction of heart disease,’’ NeuralComput.Appl.,vol.29,no.10,pp.685–693,2018.
26.M. S. Amin, Y. K. Chiam, K. D. Varathan, ‘‘Identification of significant features and data mining techniques in predicting heart disease,’’ Telematics Inform., vol. 36, pp. 82–93, Mar. 2019.
27.N. S. R. Ambati, S. H. Singara, S. S. Konjeti and S. C, "Performance Enhancement of Machine Learning Algorithms on Heart Stroke Prediction Application using Sampling and Feature Selection Techniques," 2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India, 2022, pp. 488-495, doi: 10.1109/ICAISS55157.2022.10011040.
28.K. Setiawan, Jonathan, P. S. Beratha, M. S. Anggereainy and A.Kurniawan,"ClassificationPredictionofHeartDiseaseUsing Machine Learning Techniques," 2023 4th International Conference on Artificial Intelligence and Data Sciences (AiDAS), IPOH, Malaysia, 2023, pp. 86-90, doi: 10.1109/AiDAS60501.2023.10284676.
© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page762

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 07 | Jul 2025 www.irjet.net p-ISSN:2395-0072
29.S. Santhosh, K. Chadaga, R. V. Arjunan and S. D'Souza, "Cardiac Clarity: Harnessing Machine Learning for Accurate Heart-DiseasePrediction,"inIEEEAccess,vol.13,pp.9752997544,2025,doi:10.1109/ACCESS.2025.3573760.
30.H. Khan, N. Javaid, T. Bashir, M. Akbar, N. Alrajeh and S. Aslam, "Heart Disease Prediction Using Novel Ensemble and Blending Based Cardiovascular Disease Detection Networks: EnsCVDD-Net and BlCVDD-Net," in IEEE Access, vol. 12, pp. 109230-109254,2024,doi:10.1109/ACCESS.2024.3421241.
31.V. Vision Paul and J. A. I. S. Masood, "Exploring Predictive MethodsforCardiovascularDisease:ASurveyofMethodsand Applications," in IEEE Access, vol. 12, pp. 101497-101505, 2024,doi:10.1109/ACCESS.2024.3430898.
32.R. Bhuria and S. Gupta, "Comparative Evaluation of Logistic Regression and Decision Tree Models for Heart Disease Prediction Using Clinical and Demographic Data," 2024 3rd International Conference for Advancement in Technology (ICONAT), GOA, India, 2024, pp. 1-5, doi: 10.1109/ICONAT61936.2024.10774642.