Integrative Approach to PCOS Detection Using Machine Learning and Convolutional Neural Networks

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

Integrative Approach to PCOS Detection Using Machine Learning and Convolutional Neural Networks

¹Student, Delhi Pharmaceutical Sciences and Research University, New Delhi, India

¹Student, Delhi Pharmaceutical Sciences and Research University, New Delhi, India

¹Student, Delhi Pharmaceutical Sciences and Research University, New Delhi, India

Abstract - This study developed two models for Polycystic Ovary Syndrome (PCOS) detection using clinical and imaging data. A stacking ensemble model achieved 93% accuracy (Area Under the Curve [AUC] 0.99), identifying key predictors such as follicle counts, Body Mass Index (BMI), and hormonal markers. A Convolutional Neural Network (CNN) achieved 92.72% accuracy (AUC 0.89) with zero misclassifications, ensuring reliable ultrasound image classification. Advanced feature selection methods and a Streamlit-based interface enable real-time diagnostics, supporting early detection and improved clinical outcomes.

Key Words: PolycysticOvarySyndrome(PCOS),Diagnostic Models,MachineLearning,Prevalence,ReproductiveHealth

1.INTRODUCTION

Polycystic Ovary Syndrome (PCOS) is a multifaceted endocrine disorder that affects 4% to 20% of women of reproductiveageglobally,withprevalencevaryingbasedon diagnostic criteria and geographic location [1, 2]. First describedin1935bySteinandLeventhal,itischaracterized by a wide spectrum of reproductive, metabolic, and psychological symptoms, including infertility, hyperandrogenism, menstrual irregularities, insulin resistance,obesity,andmooddisorderssuchasanxietyand depression[3,4].

Long-term complications of PCOS include type 2 diabetes, cardiovasculardisease,endometrialcancer,andobstructive sleepapnea,underscoringitspublichealthsignificance[4,5] Despite these health concerns, up to 70% of cases remain undiagnosedgloballydueto thedisorder’sheterogeneous presentation and overlapping symptoms with other endocrineconditions[6].

Globally, PCOS affects approximately 8–13% of women of reproductiveage,withsignificantvariationsinprevalence acrosspopulationsduetodifferencesindiagnosticcriteria andstudymethodologies.[7]

InIndia,theprevalenceofPCOSrangesfrom3.7%to22%, highlighting regional anddemographicdisparities [8].For example, a community-based study in Mumbai reported a prevalenceof22.5%usingtheRotterdamcriteria[9],whilea pilotstudyinTamilNadufoundan18%prevalenceamong

adolescent females, with higher rates in urban areas comparedtoruralregions[10]

Conversely,researchinLucknowobservedaprevalenceof only3.7%usingtheNIHcriteriaamongwomenaged18–25 withmenstrualirregularitiesandhirsutism[11].Similarly,a studyinAndhraPradeshreporteda9.13%prevalencebased ontheRotterdamcriteria[12]

These variations are often influenced by the choice of diagnosticframework,lifestylefactors,andhealthcareaccess [13,14]

ThepathophysiologyofPCOSinvolvesacomplexinterplayof hormonal, genetic, and environmental factors. Hyperandrogenism, characterized by elevated androgen levels, disrupts normal ovarian function and follicular development,leadingtoanovulationandpolycysticovarian morphology [15,16]. Insulin resistance is another critical factor, as hyperinsulinemia exacerbates androgen production while reducing sex hormone-binding globulin (SHBG)levels,amplifyingsymptomslikehirsutismandacne [17,18]

Geneticpredispositions,suchasmutationsinFSHR,LHCGR, INSR,andTHADAgenes,furthercontributetoPCOS,along withepigeneticchangesresultingfromprenatal androgen exposure and elevated maternal anti-Müllerian hormone (AMH)levels[19][20][21].

Environmentalandlifestylefactorsplayasignificantrolein the onset and progression of PCOS. High-calorie diets, sedentarybehaviour,andexposuretoendocrine-disrupting chemicals(EDCs)havebeenlinkedtoincreasedprevalence, particularlyinurbanpopulations[22,12]

Geographical and socioeconomic disparities further influence symptom severity and healthcare access. Indian women,forinstance,frequentlypresentwithuniqueclinical features, such as higher incidences of insulin resistance, acanthosisnigricans,andthyroiddysfunction,necessitating tailoreddiagnosticandtherapeuticapproaches[14,9][23].

ThediagnosticframeworkforPCOShasevolvedsignificantly over the decades. The Rotterdam criteria, established in 2003, are the most widely used today and require the presenceofatleasttwoofthefollowing:oligo-anovulation,

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

clinical or biochemical hyperandrogenism, and polycystic ovarianmorphology(PCOM)detectedviaultrasonography, after excluding other disorders [24][25]. Advances in imaging techniques, such as high-frequency transvaginal ultrasonography,haveimproveddiagnosticaccuracy,though debatesregardingfolliclecountthresholdsandtheirclinical significancepersist[26,27]

Management of PCOS emphasizes a multidisciplinary approach, combining pharmacological treatments such as metformin, oral contraceptives, and anti-androgens with lifestyle modifications targeting weight loss and insulin sensitivity [28][4]. Dietary changes and regular exercise havedemonstratedsignificantimprovementsinmetabolic and reproductive outcomes. Additionally, psychological supportisessential,giventhepsychosocialburdenofPCOS and its impact on quality of life and treatment adherence [29]

Despite advancements in understanding and managing PCOS,significantgapsremaininelucidatingitsunderlying mechanisms and addressing diagnostic delays. Future researchshouldprioritizeexploringnovelbiomarkers,the role of epigenetics, and the long-term effects of early interventions.Thisstudyseekstoprovideacomprehensive overview of PCOS, emphasizing its prevalence, pathophysiology, diagnostic challenges, and management strategies while advocating for holistic and personalized approachestoimproveoutcomesforwomenaffectedbythis complexdisorder[30].

Inourstudy,weaccomplishedthefollowingkeyobjectives:

 Utilized a publicly available Kaggle dataset combiningclinicalandultrasounddatatoaddress PCOSdetectioncomprehensively.

 Applieddiverseanalyticaltechniquestoidentifykey predictors and optimize feature selection for improvedmodelaccuracy.

 Developedahybridapproachwithtailoredmodels forclinicalandimagingdata,achievingrobustand reliabledetectionresults.

 Enhancedimageprocessingwithadvancedmethods to extract critical diagnostic patterns for effective classification.

 Createdauser-friendlyinterfaceenablingseamless data input, analysis, and real-time predictions to streamlinediagnostics.

2. RELATED WORKS

Artificialintelligence(AI)hassignificantlyadvancedPCOS diagnosis using machine learning (ML) and deep learning (DL) models. Gopalakrishnan et al. achieved 93.82%

accuracy with SVM [31], while Nilofer et al.’s IFFOA-ANN modelreached97.5%accuracythroughadaptiveclustering [32].Hosainetal.’sPCONetCNNattained98.58%accuracy [33], and Maheshwari and Tiwari achieved 99.7% using Wavelet-EnhancedCNNs[34].Khannaetal.combinedHarris Hawks Optimization with XGBoost for high performance [35],andDanaeiMehrandPolatachieved98.89%accuracy withRandomForestandfeatureselection[36].

These AI-driven approaches address PCOS diagnosis challengesbyintegratingclinicalandimagingdata,enabling earlydetectionandbetterhealthcareoutcomes,particularly inhigh-prevalenceregionslikeIndia.

3. MATERIALS AND METHODS

3.1 Inclusion Criteria

 Womenofreproductiveage(18–44years).

3.2 Exclusion Criteria

 Men,children,non-pubescentgirls,missingclinical data,andlow-qualityultrasoundimages.

3.3 Dataset characteristics

Twodatasetswereutilized,sourcedfromKaggle:

3.3.1 Clinical Laboratory Data: Comprising 2,000 records with 44 attributes, including demographic, hormonal, and lifestyle factors relevanttoPCOSdiagnosis.

3.3.2 Pelvic Ultrasound Images: A total of 4,400 augmented grayscale images, categorized as "infected"(PCOS)and"notinfected"(healthy), usedforimage-basedclassification.

Table 1-ParametersofDataset

Parameter

Age(years)

Weight(kg)

Height(cm)

Body Mass Index (BMI)

Reproductive age:15–45

Variesbasedon height and body composition

Variesbasedon genetics and nutrition

18.5–24.9

Typical Values in PCOS

Commonly diagnosed in late adolescencetoearly 30s

Oftenoverweightor obese,butcanoccur atnormalweight

Generally unaffectedbyPCOS

Often ≥25, indicating overweight or

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

Normal

Parameter

BloodGroup A,B,AB,O

PulseRate(bpm) 60–100

Respiratory Rate (breaths/min) 12–20

Hemoglobin (Hb) (g/dL)

Menstrual Cycle Regularity

Women: 12.0–15.5

Regular cycles every 21–35 days

CycleLength(days) 21–35

Marital Status (years)

Pregnancy Status (Y/N)

Number of Abortions 0

β-HCG(mIU/mL)

Non-pregnant: <5

FSH(mIU/mL) 3–10

LH(mIU/mL) 2–10

No direct association with PCOS

Typically within normalrange

Generallynormalin PCOSpatients

Usually within normalrange

Often irregular or absent due to anovulation

May be prolonged (>35 days) or absent

Notdirectlyrelated toPCOS

Leading cause of infertility due to anovulation

Increased risk of miscarriage

Elevated if pregnant; not directly related to PCOS

Normal or low levels

Elevated; LH:FSH ratio>2:1common

FSH/LHRatio Approximately 1:1 Often>2:1

Hip Circumference (inch)

Variesbasedon body composition

Waist Circumference (inch) Women:<35

Waist-to-HipRatio

Women:<0.85

TSH(mIU/L) 0.4–4.0

AMH(ng/mL) 1.0–4.0

Increased in overweight/obese individuals

Often increased, indicating central obesity

Often ≥0.85, indicating central obesity

Usually within normalrange

Elevated (>4.0), reflectingincreased folliclecount PRL(ng/mL) 4.0–23.0

Usually normal; mild elevation possible

Parameter

Normal Reference Range

Vitamin D3 (ng/mL) 30–100

Progesterone (ng/mL)

Typical Values in PCOS

Deficiencycommon

Follicular: 0.2–1.5;Luteal:1.7–27 Low due to anovulation

RBS(mg/dL) <140

Maybeelevateddue toinsulinresistance

WeightGain(Y/N) Common,especially centralobesity

Hirsutism (Hair Growth)(Y/N) Common due to hyperandrogenism

Skin Darkening (Y/N) Mayindicateinsulin resistance

HairLoss(Y/N) Androgenicalopecia canoccur

Pimples(Y/N) Common due to elevatedandrogens

Fast Food Consumption(Y/N)

Regular Exercise (Y/N)

High intake may exacerbate insulin resistance

Lackofexercisecan worsen metabolic parameters

BP (Systolic/Diastolic) (mmHg) <120/80 Maybeelevated

3.4 Data Preprocessing

Fortheclinicaldataset,missingvalueswereimputedwith the median after cleaning and converting data to numeric formats. Outliers were removed using Z-scores, and irrelevant features were dropped to retain significant predictors. Data normalization was performed using MinMaxScaler,ensuringallfeatureswerescaledbetween0 and1.TheMin-Maxnormalizationformulascalesafeature toarangeof(0,1)using:

Figure 1-DimensionsofUltrasoundImageDataset

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

whereXistheoriginalvalueofthefeature.

Ultrasound images underwent preprocessing to enhance clarity and extract meaningful features. CLAHE (Contrast Limited Adaptive Histogram Equalization) was applied to improve image contrast, making subtle structures like ovarianfolliclesmorevisiblewhilepreventingnoiseoveramplification. Bilateral filtering reduced noise while preservingimportantedgesintheimages,followedbyedge detectionusingadaptivethresholdingandSobeloperators, which highlighted changes in pixel intensity to reveal key structuralfeatureslikecystboundaries.Keyvisualpatterns werethenidentifiedusingORB(OrientedFASTandRotated BRIEF)descriptors,whichdetectuniquepointsintheimage andencodethemintonumericalfeatures.Thesedescriptors weregroupedintoclustersusingKMeans(n_clusters=50),a methodthatorganizessimilarpatternsinto"visualwords." This process, known as the Bag-of-Visual-Words (BoVW) representation, converted each image into a histogram representingthefrequencyofthesevisualwords,enabling machine learning models to analyze them effectively. To furtherenhancemodelrobustness,augmentationtechniques suchasrotation,zoom,andflippingwereapplied,increasing dataset diversity and improving the model's ability to generalizeacrossvariedimagescenarios.

3.5 Feature extraction methods

Feature extraction employed Apriori association, correlationanalysis,andmutualinformation,reducing44 parameters to essential features and enhancing PCOS detectionmodeltraining.

3.5.1 Association Rule Mining: This data mining technique identifies hidden patterns among features, where the presence of one feature predicts another. The Apriori algorithm was used to uncover significant associations between PCOS diagnosis and some features, ensuringtheretentionofimpactfulpredictors.

3.5.2 Correlation Analysis: Pearson’s correlation coefficient ( ) was used to evaluate linear relationshipsbetweenfeaturesandthetarget variable ("PCOS (Y/N)"). A threshold of ensured the inclusion of features withmoderateorstrongercorrelations,yetnot dismissing negative correlations to detect relationshippatterns.

3.5.3 Mutual Information (MI): MI captured both linear and non-linear dependencies between features and the target variable, identifying intricate relationships missed by correlation analysis.

3.6 Data Splitting

An80:20splitwasappliedtodividethedatasetintotraining andvalidationsets.Thisensuredthat80%ofthedatawas utilized for model training, allowing the model to learn patterns, while the remaining 20% was reserved for validationtoassessgeneralizationonunseendata.Thesplit was performed randomly with a fixed random state for reproducibility,ensuringbothsubsetswererepresentative oftheoveralldataset.

3.7 Model training

For the clinical dataset, a stacking ensemble model was createdusingExtraTrees,AdaBoost,CatBoost,XGBoost,and LightGBM as base learners, with linear regression as the meta-classifier. Each algorithm contributed its strengths: Extra Trees identified key features, AdaBoost improved weaklearners,CatBoostworkedwellwithcategoricaldata, andXGBoostandLightGBMofferedhighspeedandaccuracy forlargedatasets.Themodelwastrainedusing5-foldcrossvalidation, ensuring reliable and consistent performance acrossthedata.

Fortheultrasounddataset,aConvolutionalNeuralNetwork (CNN)wasdesigned with threeconvolutional layers, each

Figure 2-UltrasoundImages afterpreprocessing techniques

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

pairedwithbatchnormalization,maxpooling,anddropout toextractfeaturesandminimizeoverfitting.Featuremaps wereflattenedandpassedthroughadenselayerwith128 units,followedbyafinalsigmoidlayerforclassification.The model, optimized with the Adam optimizer and binary crossentropyloss,effectivelyclassifiedPCOSandnon-PCOS images.

3.8 Model Performance Analysis

BoththeensemblestackingandCNNmodelswereevaluated usingcomprehensivemetricstoensurerobustperformance:

3.8.1 Classification Report:Providedmetricssuch as precision, recall, F1-score, and support, highlightingthemodel'sperformancebalance acrossdifferentclasses.

3.8.2 Confusion Matrix: Illustrated true positive (TP),truenegative(TN),falsepositive(FP),and falsenegative(FN)counts,offeringa detailed viewofclassificationoutcomes.

3.8.3 Accuracy:

Thismetricmeasurestheproportionofcorrectlyclassified instancesoutofthetotalinstances.

ThemodelswereevaluatedusingROC-AUC,whichmeasures a model's discriminative ability. The ROC curve (Receiver Operating Characteristic) visualizes the trade-off between True Positive Rate (Sensitivity) and False Positive Rate across thresholds, while AUC (Area Under the Curve) quantifiestheoverallperformance,withvaluescloserto1 indicatingbetterclassification.

4. EXPERIMENTAL RESULTS

4.1 Model1:ClinicalData

Themodelachievedanaccuracyof93%,with:

 Class0:Precision=90%,Recall=100%,F1-score= 95%

 Class1:Precision=99%,Recall=81%,F1-score= 89%

The ensemble stacking model outperformed individual classifiers like ExtraTrees (90.22%), AdaBoost (89.90%), andCatBoost(99.02%).

4.2 Model2:UltrasoundImages

Themodelrecordedanaccuracyof92.72%,with:

 NegativeClassPrecision=97.83%

PositiveClassPrecision=96.25%

The model was trained for 50 epochs, with training and validation loss curves showing convergence and accuracy curvesconfirmingminimaloverfitting

5. DISCUSSION

ThestudydevelopedtworobustmodelsforPCOSdetection using clinical data and ultrasound images, with advanced featureextractiontechniques.Theintegrationofassociation rule mining, correlation analysis, and mutual information identifiedcriticalpredictors,enhancingfeatureselectionand modelperformance.

5.1 Feature Extraction Methods

Featureextractionplayedapivotalroleinisolatingthemost relevantpredictorsforPCOSdetection.

5.1.1 Association Rule Mining: Association rules revealed significant relationships between clinicalfeaturesandPCOSdiagnosis.

The association rules highlight significant predictors for PCOSdetection.Folliclecountintherightovary (support: 0.1635, confidence: 59.18%, lift: 1.61) and the left ovary (support: 0.153, confidence: 58.17%, lift: 1.58) are key diagnosticmarkers,emphasizingtheirimportanceinPCOS diagnosis.Additionally,thestrongassociationbetweenBMI andweight(support:0.136,confidence:60.73%,lift:3.43) underscorestheirroleinbodycompositionanomalieslinked to PCOS. These findings confirm the interconnection of hormonalimbalance,folliclecount,andbodycompositionin PCOS pathology. The high conviction values (e.g., 6.69 for Rule 1) and strong associations between follicle numbers, BMI,andweightunderlinetheinterconnectionofhormonal imbalanceandbodycompositioninPCOSpathology.

5.1.2 Correlation Matrix Analysis: Thecorrelation analysisidentifiedhighlyrelevantfeatures:

 Follicle No.(R)(r = 0.63) and Follicle No. (L) (r= 0.59) demonstrated the strongest positive correlations with PCOS, aligning with the condition’shallmarksymptomofincreasedovarian follicles.

Figure 3-SnapshotofAssociationRules

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

 HairGrowth(Y/N)(r=0.48),SkinDarkening(Y/N) (r=0.46),andWeightGain(Y/N)(r=0.43)showed

moderatecorrelations,indicatingtheirrelevanceas clinicalindicatorsofhyperandrogenismandinsulin resistance.

 FeaturessuchasPimples(Y/N)(r=0.28)andCycle (R/I) (r = 0.39) emphasized their importance in capturingirregularmenstrualcyclesandandrogendrivensymptoms.

 Weakly correlated features like height (r = 0.07), pulse rate (r = 0.08), and Vitamin D3 levels (r = 0.06) were excluded to streamline the dataset, ensuringimprovedmodelclarityandefficiency.

5.1.3 Mutual Information (MI): Mutual information analysis highlighted hormonal markers as key predictors, FSH:MI=0.38,LH:MI=0.31,TSH:MI=0.30.

These results confirm the importance of elevated LH and altered FSH levels in PCOS diagnosis, alongside abnormal TSH levels, which may reflect thyroid dysfunction a frequent comorbidity in PCOS. By prioritizing high MI featuresandexcludingthosewithminimalimpact,feature selectionwasfurtheroptimized

5.2 Model 1: Clinical Data

The stacking ensemble model demonstrated exceptional performanceontheclinicaldataset,achievinganaccuracyof 93%.Keyresultsinclude:

Class0:Precision=90%,Recall=100%,F1-score= 95%

Class1:Precision=99%,Recall=81%,F1-score= 89%

Theconfusionmatrixconfirmedthemodel'seffectiveness, withonly2falsepositivesand3falsenegatives,ensuringa strong balance between precisionand recall.Additionally, theROCcurveachievedanAUCof0.99,reflectingexcellent discriminative power across thresholds. The model's trainingutilized5-foldcross-validationonVSCodeIDEwith a fixed seed value of 42, ensuring consistent and reproducibleoutcomes.Theensembleapproach,combining ExtraTrees, AdaBoost, CatBoost, and gradient boosting models, effectively generalized across the clinical data, providingreliablediagnosticaccuracy.

6 -ConsolidatedClassificationreportand accuraciesofEnsembleModel

7 -ConfusionMatrixofEnsembleModel

Figure 4 -CorrelationMatrixwithThresholdValue
Figure 5 -MutualInformationscoresofParameters
Figure
Figure

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

5.3 Model 2: Ultrasound Images

The CNN model achieved an accuracy of 92.72% in classifying ultrasound images into PCOS and non-PCOS categoriesaftertrainingfor50epochs.Resultsinclude:

 Confusion Matrix: The model achieved zero misclassifications,with781truepositivesand1141 true negatives, underscoring its reliability in distinguishing between infected and non-infected cases.

 Training vs Validation Accuracy: The training accuracystabilizednear99%,whilethevalidation accuracygraduallyalignedbyepoch30,confirming minimal overfitting and strong model generalization.

 ROCCurve:TheROCcurveachievedanAUCof0.89, computedduringaninitialtrainingphaseof10-11 epochs,reflectingstrongdiscriminativecapability early in training, further refined over additional epochs.

The CNN was implemented using TensorFlow with DNN optimizationenabled,onasystemequippedwithanInteli5 (12th Gen) processor and 16 GB RAM, ensuring efficient computation. The automated feature extraction by CNNs eliminated manual intervention, allowing the model to capture complex patterns within ultrasound images for accurateclassification.

11 -CNNtrainingvsvalidationAccuracyin50 Epochs

Association rule mining, correlation analysis, and mutual information identified key predictors for PCOS detection. ThestackingmodelachievedanAUCof0.99forclinicaldata, while the CNN achieved an AUC of 0.89 for ultrasound images,providingaccurateandreliablediagnosticsolutions.

Figure 8 -ROCofEnsembleModel
Figure 9 -ConfusionMatrixofCNNModel
Figure 10 -LossandAccuracyGraphofCNNModel
Figure

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

12-AUCofCNNin11Epochs

5.4 Comparison with Previous

Studies

ThisstudyalignswithandextendspriorresearchonPCOS detectionusingMLandDLmodels.Thestackingensemble modelachievedanaccuracyof93%andAUCof0.99,slightly lower than Danaei Mehr's Random Forest (98.89%) [40], and Khanna's STACK-2 model (98% with AUC = 1) [39]. However,thisstudy’semphasisonassociationrulemining andfeatureselectionenhancedinterpretability,identifying follicle numbers, BMI, weight, and hormonal markers as criticalpredictors.

For ultrasound images, the CNN achieved an accuracy of 92.72%andAUCof0.89,comparabletopriorworkssuchas Gopalakrishnan’sSVM(93.82%)[35],andHosain'sPCONet (98.58%) [37]. Despite slightly lower accuracy, zero misclassifications and minimal overfitting highlight the CNN'sreliabilityandgeneralization.

By integrating ML and DL models and focusing on explainability,thisstudyoffersarobustframeworkforPCOS detection,balancingaccuracywithpracticalreliability.

Table 2 -ComparisonwithRelativeworks

Study Methodology Accurac y (%) AU C Key Features or Insights

Danaei Mehr and Polat

Random Forest with embedded feature selection 98.89Highlighted the importance of reducing feature redundancy.

Khanna et al. STACK-2with Salp Swarm Optimization (SSA) 98.00 1.00

This Study Stacking ensemble (ExtraTrees, AdaBoost, CatBoost, 93.00 0.99

Near-perfect classification performance.

Identified follicle counts, BMI, weight, LH, FSH, and TSH as critical

Study Methodology Accurac y (%) AU C Key Features or Insights Gradient Boosting) predictors.

Hosain et al. Custom PCONetCNN 98.58Focused on ultrasoundimage classification.

Gopalakrishna n et al.

SVM with feature extraction and preprocessin g 93.82Highaccuracyon ultrasoundimage classification.

This Study CNN with automated feature extraction 92.72 0.89 Zero misclassifications ; reliable and generalized performance.

Rahman et al. Random Forest and AdaBoost 94.00Demonstratedthe value of mutual information in featureselection.

Nilofer et al. ANN with adaptive clustering 97.50Highlighted the power of integrated modelsforfollicle classification.

CONCLUSION

Thisstudysuccessfullydevelopedandvalidatedtworobust modelsforPCOSdetection,integratingmachinelearningand deep learning approaches. The stacking ensemble model achievedhighaccuracy(93%)andanAUCof0.99forclinical data, identifying critical predictors such as follicle counts, BMI, weight, and hormonal markers (LH, FSH, and TSH). Similarly,the CNN model achievedanaccuracyof92.72% and an AUC of 0.89 for ultrasound images, with zero misclassificationsandminimaloverfitting,demonstratingits reliabilityinimage-basedclassification.

The integration of association rule mining, correlation analysis,andmutualinformationenhancedfeatureselection, ensuring a balance between performance and interpretability.Thesefindingsalignwithandextendprior research, offering a comprehensive framework for PCOS diagnosisbycombiningclinicalandimagingdata.

Additionally,thecreationofaStreamlit-baseduserinterface makes the application accessible, enabling real-time predictions and seamless interaction for clinicians and researchers.ThisstudycontributestoadvancingAI-driven solutionsforwomen'shealth,pavingthewayforaccurate, earlydetectionofPCOSandimprovedpatientoutcomes.

Figure

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

13 -SnapshotofStreamLitappforPCOSDetection

Figure 14 -SnapshotofStreamLitappforPCOSDetection

ACKNOWLEDGMENTS

Funding:nil

Ethical Statement

Theauthorsareresponsibleforthework'scontentandwill addressanyconcernsaboutitsaccuracyorintegrity.

Conflicts of Interest

The authors declare no conflicts of interest related to this work.

Authors' Contributions

All authors contributed significantly to the research, including conceptualization, data analysis, model

development,andmanuscriptpreparation.Eachauthorhas reviewedandapprovedthefinalversionofthemanuscript.

Data Availability Statement: The clinical laboratory datasetandpelvicultrasoundimagesusedinthisstudyare publicly available on Kaggle: Polycystic Ovary Syndrome (PCOS)Dataset

REFERENCES

[1]AzzizR,CarminaE,ChenZ,DunaifA,LavenJS,LegroRS, Lizneva D, Natterson-Horowtiz B, Teede HJ, Yildiz BO, "Polycysticovarysyndrome," Nat. Rev. Dis. Primers,vol. 2, Aug.2016,p.16057,doi:10.1038/nrdp.2016.57.

[2] Dong J, Rees DA, "Polycystic ovary syndrome: pathophysiologyandtherapeuticopportunities," BMJ Med., vol. 2, no. 1, Oct. 2023, p. e000548, doi:10.1136/bmjmed2023-000548.

[3]LegroRS,ArslanianSA,EhrmannDA,HoegerKM,Murad MH, Pasquali R, Welt CK, "Diagnosis and treatment of polycystic ovary syndrome: an Endocrine Society clinical practiceguideline," J. Clin. Endocrinol. Metab.,vol.98,no.12, Dec.2013,pp.4565–4592,doi:10.1210/jc.2013-2350.

[4]TayCT,MousaA,VyasA,PattuwageL,TehraniFR,Teede H, "2023 international evidence-based polycystic ovary syndrome guideline update: insights from a systematic reviewandmeta-analysisonelevatedclinicalcardiovascular diseaseinpolycysticovarysyndrome," J. Am. Heart Assoc., vol. 13, no. 16, Aug. 2024, p. e033572, doi:10.1161/JAHA.123.033572.

[5] Goodarzi MO, Dumesic DA, Chazenbalk G, Azziz R, "Polycystic ovary syndrome: etiology, pathogenesis, and diagnosis," Nat. Rev. Endocrinol.,vol.7,no.4,Apr.2011,pp. 219–231,doi:10.1038/nrendo.2010.217.

[6]SinghS,PalN,ShubhamS,SarmaDK,VermaV,MarottaF, Kumar M, "Polycystic Ovary Syndrome: Etiology, Current Management,andFutureTherapeutics," J. Clin. Med.,vol.12, no.4,2023,p.1454,doi:10.3390/jcm12041454.

[7]WorldHealthOrganization,"PolycysticOvarySyndrome," WHO Fact Sheet,Jan.2025,accessedJan.6,2025.Available: https://www.who.int/news-room/factsheets/detail/polycystic-ovary-syndrome

[8]BharaliMD,RajendranR,GoswamiJ,SingalK,Rajendran V,"Prevalence of Polycystic Ovarian Syndrome in India: A SystematicReviewandMeta-Analysis," Cureus,vol.14,no. 12,Dec.2022,p.e32351,doi:10.7759/cureus.32351.

[9] Joshi B, Mukherjee S, Patil A, Purandare A, Chauhan S, Vaidya R, "A cross-sectional study of polycystic ovarian syndrome among adolescent and young girls in Mumbai, India," Indian J. Endocrinol. Metab.,vol.18,no.3,May2014, pp.317–324,doi:10.4103/2230-8210.131162.

Figure

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

[10] Balaji S, Amadi C, Prasad S, Bala Kasav J, Upadhyay V, SinghAK,SurapaneniKM,JoshiA,"Urbanruralcomparisons ofpolycysticovarysyndromeburdenamongadolescentgirls in a hospital setting in India," Biomed. Res. Int., vol. 2015, 2015,p.158951,doi:10.1155/2015/158951.

[11]GillH,TiwariP,DabadghaoP,"Prevalenceofpolycystic ovary syndrome in young women from North India: A community-basedstudy," IndianJ. Endocrinol. Metab.,vol.16, Suppl. 2, Dec. 2012, pp. S389–S392, doi:10.4103/22308210.104104.

[12] Nidhi R, Padmalatha V, Nagarathna R, Amritanshu R, "Prevalence of polycystic ovarian syndrome in Indian adolescents," J. Pediatr. Adolesc. Gynecol.,vol.24,no.4,Aug. 2011,pp.223–227,doi:10.1016/j.jpag.2011.03.002.

[13] Patel S, "Polycystic ovary syndrome (PCOS), an inflammatory,systemic,lifestyleendocrinopathy," J. Steroid Biochem. Mol. Biol., vol. 182, Sep. 2018, pp. 27–36, doi:10.1016/j.jsbmb.2018.04.008.

[14]GanieMA,VasudevanV,WaniIA,BabaMS,ArifT,Rashid A,"Epidemiology,pathogenesis,genetics&managementof polycysticovarysyndromeinIndia," Indian J. Med. Res.,vol. 150, no. 4, Oct. 2019, pp. 333–344, doi:10.4103/ijmr.IJMR_1937_17.

[15] Carmina E, Lobo RA, "Polycystic ovary syndrome: Arguably the most common endocrinopathy is associated with significant morbidity in women," J. Clin. Endocrinol. Metab., vol. 84, no. 6, Jun. 1999, pp. 1897–1899, doi:10.1210/jcem.84.6.5803.

[16] Rosenfield RL, Ehrmann DA, "The pathogenesis of polycystic ovary syndrome: The hypothesis of PCOS as functional ovarian hyperandrogenism revisited," Endocr. Rev., vol. 37, no. 5, Oct. 2016, pp. 467–520, doi:10.1210/er.2015-1104.

[17] Baptiste CG, Battista MC, Trottier A, Baillargeon JP, "Insulinandhyperandrogenisminwomen withpolycystic ovarysyndrome," J. Steroid Biochem. Mol. Biol.,vol.122,no. 1–3,Oct.2010,pp.42–52,doi:10.1016/j.jsbmb.2009.12.010.

[18]GonzalezF,"Inflammationinpolycysticovarysyndrome: Underpinningofinsulinresistanceandovariandysfunction," Steroids, vol. 77, no. 4, Mar. 2012, pp. 300–305, doi:10.1016/j.steroids.2011.12.003.

[19] ZhaoH,ZhaoY,LiT,LiM,LiJ,LiR,LiuP,YuY,QiaoJ, "Metabolismalterationinfollicularniche:Thenexusamong intermediary metabolism, mitochondrial function, and classic polycystic ovary syndrome," Free Radic. Biol. Med., vol. 86, 2015, pp. 295–307, doi:10.1016/j.freeradbiomed.2015.05.013.

[20] Escobar-Morreale HF, "Polycystic ovary syndrome: Definition, aetiology, diagnosis and treatment," Nat. Rev.

Endocrinol., vol. 14, no. 5, May 2018, pp. 270–284, doi:10.1038/nrendo.2018.24.

[21] Day F, Karaderi T, Jones MR, Meun C, He C, Drong A, KraftP,LinN,HuangH,BroerL,etal.,"Large-scalegenomewidemeta-analysisofpolycysticovarysyndromesuggests sharedgeneticarchitecturefordifferentdiagnosiscriteria," PLoS Genet., vol. 14, no. 12, Dec. 2018, p. e1007813, doi:10.1371/journal.pgen.1007813.

[22] Diamanti-Kandarakis E, "PCOS in adolescents," Best Pract. Res. Clin. Obstet. Gynaecol.,vol.24,no.2,Apr.2010,pp. 173–183,doi:10.1016/j.bpobgyn.2009.09.005.

[23]SinhaU,SinharayK,SahaS,LongkumerTA,BaulSN,Pal SK, "Thyroid disorders in polycystic ovarian syndrome subjects: A tertiary hospital-based cross-sectional study fromEasternIndia," Indian J. Endocrinol. Metab.,vol.17,no. 2, Mar.–Apr. 2013, pp. 304–309, doi:10.4103/22308210.109714

[24]FauserBC,TarlatzisBC,RebarRW,LegroRS,BalenAH, Lobo R, et al., "Consensus on women's health aspects of polycysticovarysyndrome(PCOS)," Hum. Reprod.,vol.27, no.1,Jan.2012,pp.14–24,doi:10.1093/humrep/der396.

[25]JonardS,RobertY,Cortet-RudelliC,PignyP,DecanterC, DewaillyD,"Ultrasoundexaminationofpolycysticovaries:Is itworthcountingthefollicles?" Hum. Reprod.,vol.18,no.3, Mar.2003,pp.598–603,doi:10.1093/humrep/deg115.

[26]DewaillyD,AndersenCY,BalenA,BroekmansF,Dilaver N, Fanchin R, et al., "The physiology and clinical utility of anti-Mullerianhormoneinwomen," Hum. Reprod. Update., vol. 20, no. 3, May–Jun. 2014, pp. 370–385, doi:10.1093/humupd/dmt062.

[27]ChristJP,GunningMN,FauserBCJM,"Implicationsofthe 2014 Androgen Excess and Polycystic Ovary Syndrome Society guidelines on polycystic ovarian morphology for polycystic ovary syndrome diagnosis," Reprod. Biomed. Online., vol. 35, no. 4, Oct. 2017, pp. 480–483, doi:10.1016/j.rbmo.2017.06.022.

[28] Moran LJ, Teede HJ, "Metabolic features of the reproductive phenotypes of polycystic ovary syndrome," Hum. Reprod. Update.,vol.15,no.4,Jul.–Aug.2009,pp.477–488,doi:10.1093/humupd/dmp008.

[29] Dokras A, Witchel SF, "Are young adult women with polycysticovarysyndromeslippingthroughthehealthcare cracks?" J. Clin. Endocrinol. Metab.,vol.99,no.5,May2014, pp.1583–1585,doi:10.1210/jc.2013-4190.

[30]NormanRJ,DewaillyD,LegroRS,HickeyTE,"Polycystic ovarysyndrome," Lancet ,vol.370,no.9588,Aug.2007,pp. 685–697,doi:10.1016/S0140-6736(07)61345-2.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

[31] Gopalakrishnan C, Iyapparaja M, "Multilevel thresholding-based follicle detection and classification of polycysticovarysyndromefromtheultrasoundimagesusing machine learning," Int. J. Syst. Assur. Eng. Manag., 2021, doi:10.1007/s13198-021-01203-x.

[32] NiloferM,AhmedS,PradeepC,"Improvedfuzzyfirefly optimizationalgorithmforANNinPCOSdiagnosis," Neural Comput. Appl.,2021,doi:10.1007/s00521-021-06095-x

[33] Hosain AKMS, Mehedi MHK, Kabir IE, "PCONet: A convolutional neural network architecture to detect polycysticovarysyndrome(PCOS)fromovarianultrasound images," arXiv.,2022,doi:10.48550/arXiv.2210.00407.

[34]MaheshwariS,TiwariP,"PCOS-WaveConvNet:Awavelet convolutionalneuralnetworkforpolycysticovarysyndrome detection using ultrasound images," 9th Int. Conf. Inf. Technol. Trends (ITT).,2023.

[35] Khanna VV, Chadaga K, Sampathila N, Prabhu S, BhandageV,HegdeGK,"Adistinctiveexplainablemachine learning framework for detection of polycystic ovary syndrome," Appl. Syst. Innov., vol. 6, no. 2, 2023, p. 32, doi:10.3390/asi6020032.

[36] DanaeiMehrH,PolatH,"Diagnosisofpolycysticovary syndrome throughdifferent machinelearningandfeature selection techniques," Health Technol., vol. 12, 2021, doi:10.1007/s12553-021-00613-y.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.