

Thisprojectaimedtopredictdrugresistanceincancerpatients usingamulti-omicsintegrationapproach,combininggene expression,mutationprofiles,proteinabundance,methylation data,andmiRNAexpression.Thegoalwastobuildamachine learningmodelthataccuratelyclassifiesresistantvs.responsive tumorsamplesanduncoverskeybiologicaldriversofresistance.
• Retrieved matched multi-omics data from TCGA for chemotherapy-treatedbreastandlungcancerpatients.
• Normalizeddatausingz-scorestandardization.
Applied batch effect correction using Combat for cross-platform consistency.
• Combinedfivedatamodalities:
⚬ Geneexpression
⚬ Mutationscores
⚬ Proteinabundance
⚬ DNAmethylation
⚬ miRNAprofiles
• Dimensionality reduction using PCA followed by featureselectionwithmutualinformationranking.
• Trained multiple models: Random Forest, SVM, Gradient Boosting,andXGBoost.
• Used 5-fold cross-validation and grid search for hyperparametertuning.
• FinalensemblemodelchosenbasedonF1-scoreandAUC.
• AUCscore:0.94forensemblemodelontestdata.
• Key predictors: Mutation scores and gene expression of top drug metabolismgenes(e.g.,ABCB1,TP53,EGFR).
• Achieved high sensitivity and specificity in separating resistant vs. sensitivesamples.
Identifiedpotentialbiomarkersofresistanceinvolvedin:
Drugefflux(ABCtransporters)
Apoptosisdysregulation
DNAdamageresponse
Eachrowisapatientsample; columnsrepresentstandardized valuesacrossfiveomicslayers.
Geneexpressionand mutationscoreswere themostimpactful featuresusedbythe model.
3:
AUC=0.94reflectsstrong modeldiscriminationbetween resistantandnon-resistant samples.
Thisprojectsuccessfullyimplementeda cutting-edgeintegrativebioinformaticsworkflow combiningbiologicalheterogeneitywithML predictionpower.Ourmodelisnotonlyhighly accuratebutalsobiologicallyinterpretable,offering candidatebiomarkersandpathwaysforfurther experimentalvalidation.