Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model

Page 1

Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model

Words:

1.INTRODUCTION

ChEMBL provided a large nonredundant data set of 5,103 compoundswithpublishedIC50valuesagainstAChE,which were used in a quantitative structure activity relationship (QSAR) analysis to learn more about the origins of their bioactivity.AChEinhibitorsweredescribedusingasetof12 fingerprint descriptors, and prediction models were built usingrandomforest Following that, the substructure fingerprint count was thoroughlyexaminedinordertogainusefulinformationon theinhibitoryactivityofAChEinhibitors.Thisinformation can be applied in a variety of ways. Our paper`s goal is to predict pIC50 values using the SMILES notation and the ChEMBLIDasinput.

Abstract acetylcholine,catalyzesacetylcholinesteraseagainsttoInaccuratelyintroduced.thisdisadvantagedrugtheinteractionscomActivitycomputationalThesignificantlyconsumptionfieldapproacheslearningdrugoftheresultsgenerated,targeteddrugdevelopmentandnoveldevelopmentapproachesnowtypicallyincludemachineanddeeplearningalgorithms.Machinelearninghavebeenusedtodevelopdrugcandidatesintheofdrugdiscoveryanddevelopment.Asaresult,theandoveralltimespentondrugdevelopmentwasreduced.developmentofpowerfuldirectandindirectapproachessuchasQuantitativeStructureRatio(QSAR)hasincreasedthepotentialofpoundstobecomedrugs.Thepredictionofdrugtargetinmedicinehasasignificantimpact.Ithelpsindevelopmentofnewdrugsforvariousdiseases.Traditionalinteractionswithtargetshasmanydrawbacks.Themainisthatitiscostlyandtimeconsuming.Tosolveproblem,anewapproachtomachinelearninghasbeenMachinelearningapproachescanbeusedtoandefficientlypredictdrugtargetinteractions.thispaper,wewillbeusingrandomforestregressionmodeltrainourmachinetopredictthepIC50valuesofnoveldrugstheAlzheimer'sdiseasetargetprotein,(AChE),whichisanenzymethatthebreakdownoftheneurotransmitterwhichisnecessaryforcognitionandmemory.

Shobhana A Khedekar1 , Nidhi S Mhatre2 , Raju Mendhe3

1.1 Dataset

ChEMBLisacarefullyselecteddatabaseofbiologicallyactive compounds for medicinal use. It combines chemical data, biologicalactivitydata,andgeneticdatatohelptransform genomic data into effective new drugs. The human AChE inhibitor data set (subject ID CHEMBL220) was collected usingtheChEMBL20databasecontaining7549compounds. Atotalof5,103uniquecanonicalsmileswereobtainedafter removingduplicatecanonical smilesand entries with null canonicalsmiles. Model cleaning and building were done using Jupyter Notebook.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page160

Toimprovetheefficiency,effectiveness,andquality

Key Random Forest Regression Model, QSAR model,pIC50

1Student, Computer Engineering Department, Mumbai University, Datta Meghe College of Engineering, Navi Mumbai, Maharashtra, India

2Student, Computer Engineering Department, Mumbai University, Datta Meghe College of Engineering, Navi Mumbai, Maharashtra, India

3Assistant Professor, Computer Engineering Department, Mumbai University, Datta Meghe College of Engineering, Navi Mumbai, Maharashtra, India

***

Fig -1:WorkflowDatasetPreparation

1.2 Description of inhibitors

Fig

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page161

Severalstudiesoftheuseoffingerprintstomodelbiological activityhaveinvestigatedperformancedifferencesbetween typesoffingerprints.RinikerandLandrum(2013)compared and evaluated the performance of predictive models built usingRDKit2Dfingerprintdescriptors. Saltswereremovedpriortodescriptorcomputationusing thebuilt inPaDELDescriptorsoftwarefunction.The (NumHAAcceptors).and(LogP),(MW),descriptorsPaDELDescriptorsoftwarealsocalculatedthefourmolecularusedtodefineLipinski'slaw:molecularweightlogarithmoftheoctanol/waterpartitioncoefficientnumberofhydrogenbonddonors(NumHDonors),numberofhydrogenbonds.BondReceptors 2 LipinskiDescriptors

3.1 Multivariate Analysis

This study builds a regression model that predicts a continuousresponsevariable(e.g.,pIC50)asafunctionofa predictorvariable(egofingerprintdescriptor).

3. QSAR MODELING

Aboveisaflowchartoftheresearchworkflow.Briefly,this includes a large scale QSAR model for the prediction and assessmentofAChEinhibitionperformedinaccordancewith OECDguidelines. (i)Aclearmethodoflearning; (ii)specificapplicationdomainsoftheQSARmodel; (iii)Useacceptablemeasuresofquality,sustainabilityand (iv)predictability.MechanicalanalysisoftheQSARmodel.

The development of machine learning algorithms has becomeanimportanttoolintheprocessofdrugdiscovery. Quantitative Structure Activity Relationships (QSAR) currentlyusesavarietyofmachinelearningtools todevelop QSARmodels.Usingmachinelearningalgorithms, 2DQSAR analysis explores quantitative relationships between moleculardescriptorsandbiologicalactivities.Established QSAR models based on machine learning techniques can helptounderstandthestructuralrequirementsneededto developnovelcompoundswithimprovedbiologicalactivity

Supervised training is the process of training a model on labeledtrainingdata,whichcan beusedtopredictunknown or future data. This study builds a regression model that predicts a continuous response variable (i.e., pIC50) as a functionofapredictorvariable(i.e.,fingerprintdescriptors).

Fig 3 FlowchartofQSARModel

A random forest (RF) classifier is an ensemble classifier madeupofmultipledecisiontrees.Simplyput,themainidea ofRFisthatinsteadofbuildingadeepdecisiontreewithan ever increasingnumberofnodesthatmaybesusceptibleto dataoverfittingandovertraining,itcreatesmultipletrees to minimize variance rather than maximize accuracy. As a

Certified Journal | Page162

Fig -6 PlotofexperimentalpIC50valuesversuspredicted pIC50values Fig 7 ModelScore

Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008

result,theresultisnoisierthantheresultofawell trained decisiontree,butisgenerallyreliableandrobust. Fig 4 CanonicalSmileswithpIC50valuesandChemblID

Oncethepredictionismade,theusercandownloaditin.csv fileformatanduseitforfurtherinvestigation

4. DEPLOYMENT OF MODEL

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

QSAR Model Validation

Fig

TheperformanceoftheQSARmodelswasevaluatedusing fourstatisticalparameters:Pearson'scorrelationcoefficient (r),rootmeansquarederror(RMSE),meansquarederror (MSE)andcoefficientofdetermination(r2) 5 ModelEvaluation

3.2

The model was then converted to a pickle file for further distribution. An application using various features of Streamlitiscreated.Usersarepromptedtouploadatextfile containingstandardSmilesand ChEMBLids.

Model validation is a critical step that should be taken to ensurethatafittedmodelcanaccuratelypredictresponses forfutureorunknownsubjects.

[2] AykulS,Martinez HackertE.Determinationofhalf maximalinhibitoryconcentrationusingbiosensor based protein interaction analysis. Anal Biochem. 2016 Sep 1;508:97 103. doi: 27365221;10.1016/j.ab.2016.06.025.Epub2016Jun27.PMID:PMCID:PMC4955526.

[3] Bosc, N., Atkinson, F., Felix, E. et al. Large scale comparison of QSAR and conformal prediction methodsandtheirapplicationsindrugdiscovery. J Cheminform 11, 4 (2019). https://doi.org/10.1186/s13321 018 0325 4

5. CONCLUSIONS

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page163 Fig 8 FlowchartofFront end Fig 9 Front endSidebartouploadfile Fig 10 PredictionsofpIC50values

The goal of this project is to create applications that help medical researchers in the early stages of drug discovery, where different methods have been studied and carefully Finally,integrated.webuilt

[1] “How do drugs for Alzheimer's disease work?” Alzheimer's Society, 22 December 2021, alzdementia/treatments/drugs/howhttps://www.alzheimers.org.uk/aboutdodrugsheimersdiseasework.Accessed1March2022

aQSARmodelthatcanpredict pIC50values fornewdrugs usingfingerprintdescriptors.Wehopethat theresultsofthisstudywillbeageneralguidelineforthe developmentofnovelAChEinhibitors.. REFERENCES

[6] M. Stitou, H. Toufik, M. Bouachrine, H. Bih and F. Lamchouri, "Machine learning algorithms used in Quantitativestructure activityrelationshipsstudies as new approaches in drug discovery," 2019 InternationalConferenceonIntelligentSystemsand AdvancedComputingSciences(ISACS),2019,pp.1 8,doi:10.1109/ISACS48493.2019.9068917.

[4] Yap,ChunWei.(2011).PaDEL Descriptor:AnOpen SourceSoftwaretoCalculateMolecularDescriptors and Fingerprints. Journal of computational chemistry.32.1466 74.10.1002/jcc.21707.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 03 | Mar 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified

[5] Kuz'min,Victor&Polishchuk,Pavel&Artemenko, Anatoly&Andronati,Sergey.(2011).Interpretation ofQSARModelsBasedonRandomForestMethods. Molecular Informatics. 30. 10.1002/minf.201000173.

[7] Bhure, Soham, et al. “Drug Generation Using GenerativeModels.”vol.08,no.10|Oct2021,p.8, V8I1097.pdf.https://www.irjet.net/archives/V8/i10/IRJET

Journal | Page164

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Prediction of pIC50 Values for the Acetylcholinesterase (AChE) using QSAR Model by IRJET Journal - Issuu