TELKOMNIKATelecommunication,Computing,ElectronicsandControl
Vol.21,No.3,June2023,pp.584∼591
ISSN:1693-6930,DOI:10.12928/TELKOMNIKA.v21i3.23416
❒ 584
TELKOMNIKATelecommunication,Computing,ElectronicsandControl
Vol.21,No.3,June2023,pp.584∼591
ISSN:1693-6930,DOI:10.12928/TELKOMNIKA.v21i3.23416
❒ 584
MdGulzarHussain1,2,BabeSultana1,MahmudaRahman1,MdRashidulHasan1
1DepartmentofComputerScienceandEngineering,FacultyofScienceandEngineering,GreenUniversityofBangladesh, Dhaka,Bangladesh
2DepartmentofComputerApplicationTechnology,SchoolofComputerScienceandArtificialIntelligence,Changzhou University,Changzhou,Jiangsu,China
ArticleInfo
Articlehistory:
ReceivedFeb21,2022
RevisedOct29,2022
AcceptedNov12,2022
Keywords:
Banglanews
Banglatextclassification
Logisticregression
Naturallanguageprocessing
Newsclassification
Supportvectormachine
Intheinformationage,Banglanewsarticlesontheinternetarefast-growing.For organizing,everynewssitehasaparticularstructureandcategorization.Newsarticleclassificationisamethodtodetermineadocument’sclassificationbasedonvariouspredefinedcategories.ThisresearchdiscussestheclassificationofBanglanews articlesontheonlineplatformandtriestomakeconstructivecomparisonusingseveralclassificationalgorithms.ForBanglanewsarticlesclassification,termfrequencyinversedocumentfrequency(TF-IDF)weightingandcountvectorizerhavebeenused asafeatureextractionprocess,andtwocommonclassifiersnamedsupportvectormachine(SVM)andlogisticregression(LR)employedforclassifyingthedocuments.It isclearthattheaccuracyoftheexperimentalresultsbyapplyingSVMis84.0%and LRis81.0%fortwelvecategoriesofnewsarticles.Inthisresearchwork,whenwe havemadecomparisontworenownedclassificationalgorithmsappliedontheBangla newsarticles,LRwasoutperformedbySVM.
Thisisanopenaccessarticleunderthe CCBY-SA license.
CorrespondingAuthor:
MdGulzarHussain
DepartmentofComputerScienceandEngineering,FacultyofScienceandEngineering GreenUniversityofBangladesh,Dhaka,Bangladesh
Email:gulzar.ace@gmail.com
Textdataisthemostcomprehensivesourceofinformationbutduetoitsunstructurednature,itis difficultandtimeconsumingtodrawinsightsfromit.Advancementinmachinelearning(ML)andnatural languageprocessing(NLP)aremakingiteasiertoanalyzetextdatathroughtextclassification.Textclassificationtechniquesareusedtoorganize,structureandcategorizetextdataforanalysisofsentiment,topic labelling,spamdetection,intentdetectionandsoon[1].Thisstudyappliestextclassificationtechniquesto Banglanewsarticlesasthenewsarticlesarethemostcommonformoftextonline.Manystudieshavebeen conductedontextclassification,anddifferentclassificationtechniquessuchasrules-basedmethod,decision trees,k-nearestneighbors(KNN),naiveBayes,logisticregression,supportvectormachines(SVM),neural networks(NN)andsoonhavebeendeveloped[2].Studiesintheliteratureofvariousresearchpapershave shownthatresearchershavefocusedonclassifyingtextsindifferentlanguages,suchasEnglishandArabic andsoon[3],[4]however,theamountofanalysisontheBengalilanguageismuchless.In[5]haveusedmachinelearningalgorithmstocategorizeBanglanewspaperarticlesintofivedistinctgroups.Theyusedlogistic regression,SVM,andmulti-layerneuralnetworktodoso.Multi-layerdenseneuralnetworkapproachhasa
Journalhomepage: http://telkomnika.uad.ac.id
accuracyof95.50%whichishighercomparedtoothermodels.Thereisarepresentation[6]ofthebehavior ofleast-squaresSVMs,twinSVMs,andleast-squarestwinSVM(LS-TWSVM)classifiersonthenewsdata isshowntohandlemulti-categorydata.AndtheirperformanceevaluationshowedthatLS-TWSVMisthe bestofallthreewith92.96%accuracy.Maisha etal. [7]performsentimentanalysisonBanglanewsusing ”pipeline”classalongwithsixstate-of-the-artsupervisedMLalgorithmswhichincludesdecisiontree(DT), multinomialnaiveBayes(MNB),k-nearestneighbor(KNN),logisticregression(LR),randomforest(RF)and lagrangiansupportvectormachine(LSVM).Randomforestalgorithmoutstandsallotheralgorithmssecuring 98%accuracyinpercentagesplitmethod.Thismethodologycategorizesthecontenteitheraspositiveornegative.RabbinnovandKobilov[8]usedSVM,decisiontreeclassifier,randomforest,LR,andMNBamongsix othermachine-learningalgorithmstoconductmulti-classtextcategorizationofinternetUzbeknewsarticles. ForSVM,theyusedradialbasisfunction(RBFSVM)whichgavethebestaccuracy(86.88%)performance ofotherclassifiers.Severalsupervisedmachinelearningaswellasdeeplearningalgorithmsforcategorizing Bengalinewsdocumentsarediscussedin[9].AnevermethodforclassifyingBanglatextualcontenthasbeen developedby[10].Thedeeplearningrecurrentneuralnetworks(RNN)-basedattentionlayerandtheRNN withBiLSTMachievedaccuracyratesof97.72%and86.56%,respectively.In[11]proposedamodelstructure namedtheDCLSTM-MLPmodelforthecategorizationofnewstextdocuments,anideaofacustomizedalgorithmthatcombinesdeeplearningalgorithmssuchaslongtermshortmemory(LSTM),convolutionalneural network(CNN),andmulti-layerperceptron(MLP).Byapplyingthismodelstructuretheyhavetriedtosolve theproblemsoftextuallength,thecomplexityofextractingfeaturesfromnewscontentandcategorizingnews texteffectivelyandachieved94.82%accuracy.However,themainissueoftheirpaperisthatthenumberof samplesissmall,andthedistributionofdifferenttypesofnewsisunequal,resultinginthemodel’snarrow effectiveness.
Kowsher etal.of[12]usedseveralwordembeddingmethodstoincorporatethetextfromBangla newspaperdata,aswellasmachinelearningalgorithmstocategorizetheincorporatedtext.Adeeprecurrent neuralNetworkwasusedtoprovideanewmethodofevaluatingBanglanewsarticlesin[13].Thedeep recurrentneuralnetworkfeaturingBi-LSTMobtained98.33%inBengalitextcategorization,whichisgreater thanpreviouswell-knownclassificationtechniques.Forclassifyingtextordocuments,manyofsupervised techniquesareusedsuchasKNN,naiveBayes(NB),DT,n-grams,neuralnetworks(NNet).Butaccordingto previousresearchliteraturereviewsSVM[14]ismostfrequentlyusedclassifieralgorithm.In[15]proposed anewsarticleclassificationmodelframeworkbasedondeephybridlearningandcomparedittotraditional textclassificationtodemonstratethesuperiorityofnetworknewstextclassification,anditoutperformsthe standardizedmethodforclassifyingnewstextsintermsofoverallperformance.In[16]showedacomparative analysisamongDT,KNN,NB,andRocchio’salgorithmwheretheirstudiessaidthatSVMoutperformsbetter thanallotherclassifiers.AsidefromtheEnglishdocument,therehasalsobeenextensiveresearchonother languages.In[17]focusedontheclassificationofself-createdIndonesiannewscorpus,whichincludesfour separatecategoriesand472Indonesiannewspaperarticlesfromavarietyofsources.Theyproducedfivemodels withtenepocheachusing377datafortrainingand95testingdata,withCNNhavingthehighestaccuracyrate atabout90.74percentcategorizingIndonesiannewsdata.In[18]conductedonanIndonesiannewscorpus foundthatthecombinationofTF-IDFandMNBbeatsotherclassificationmodelssuchasmultivariateBernoulli naiveBayes(BNB)andSVMwithanaccuracyof85%.IntermsofArabictextclassification,in[19]Arabic medicaltextdocumentsareclassifiedandauthorsusedanrule-basedclassifierforclassificationofArabictext andhavinganaccuracyof90.6%.Theyusedthreeclassificationalgorithm:majorityvoting,ordereddecision list,andK-NNwhichareusedtovalidatethemodel.
AmongallthecoupleworkscoveredintheBangla,Chy etal. [20]workedonBanglalanguagenews classificationandtheyusednaiveBayesclassifier.ButtheproblemisthatinthenaiveBayesmodelifthe testingdatasetcontainsacategoricalvariableofacategorythatwasnotpartofthetrainingdataset,itwill giveitzeroprobabilityandbeunabletomakeanypredictions.Nahar etal. [21]demonstratedacomparative analysisutilizingnaiveBayesclassifier,SVM,andneuralnetworkstofilterBanglasportsandpoliticalnews onlyononlinenetworksfromtextdata.MandalandSen[22]showedthecomparisonperformanceevaluationof the4supervisedlearningtechniques:decisiontree,naiveBayes,k-nearestneighbor,andSVMfortheBengali textcategorization.TheyusedTF-IDFtocompriseafeaturevectorusing1000webdocumentswhichcontains 22,218words.Closetoourresearchwork,byusingTF-IDFasafeatureselectionapproachandSVMasa classifier,theyattainedaclassificationaccuracyof92.57percentfor12categoriesofBanglatextfiles.They used3191textsamplespercategorywherewehavetriedtouse11770articlesoftop12categories(showed
ComparisonanalysisofBanglanewsarticlesclassificationusing...(MdGulzarHussain)
inTable1)from20differentcategories.AndbestofourknowledgetheyusedonlySVMwherewehaveused SVMandlogisticregression(LR)andLRisprimarilyusedtoclassifyobservationsintoadiscretenumber ofcategorieswithrapidclassificationofunknownrecords.Inthisresearchpaper,wehavetriedtoshowa comparativeperformanceanalysisusingSVMandLRwhichcanbringrighteouspathforfutureresearchers whoarewillingtoworkwithBanglalanguageinthisera.
Inthisstudy,LRandSVMareusedtoclassifytheBanglanewsarticlesasthestudy[23]showsthat thelogisticregressionoutperformsothertechniquessuchasrandomforestandknearestneighboursalgorithm. ItclassifiestheBanglanewsarticlesandlabelthemtoacertaintopicbasedontheircontents.Itextractsfeatures fromthecorpususingtheTF-IDFfeaturevectorsinceitisenabledtotheextractionofrelevantfeaturesaswell asremovingcommonwords.
Theaimofnewsclassificationseemstoallocatecategoriesasperthecontentofanewsarticle.PreprocessingofBanglanewsarticlesandfeaturesetextractionarealsoneededpriortotrainingandmodelconstructionfordocumentclassification,likeEnglishtextclassification.TheoverallBanglanewsarticleclassificationprocesshavebeenusedinthisexperimentasillustratedinFigure1.
Despitethelimitedamountofthedata,Hossain etal. [24]discoveredthattext-graphconvolutional neural(GCN)accomplishedbetterthanGRU-LSTM,BiLSTM,Char-CNN,LSTM,andbidirectionalencoder representationsfromtransformers(BERT)inclassifyingonlineBanglanews.Assofar,thereisascarcityof standarddatasetintheBanglalanguage,soadatasethavebeenpreparedbyscrapingthenewsarticlesfrom variouselectronicnewssitessuchas’https://www.prothomalo.com/’.Atthetimeofscrapping,thearticleswere labeledwiththeircategories.Toassesstherecommendationresults,wehavecollectedaround12.5Klabeled newsarticlesconsistingof20categories.Amongthemtop12categoriesareconsideredforthisresearchwork whichhasmentionedinTable1.
Preprocessingisperformedtominimizethenoiseinthetext,whichhelpstoincreasetheclassifier’s efficacy.Thepreprocessingstepsclearthetextdataandpreparingitfortheartificiallearningmodel.Thenthe tokenizationisperformedontextdocumentsandbreakdownthesequenceofcharactersorwordstogetthe features.Sincetextdocumentscontainsentences,likeasequenceofcharacterorwords,togetthefeatures,it shouldbebrokenintotokens.Tokenizationistheprocesstobreakdownthetextintosentencesandthenwords. Tokenizationistheprocessbywhichthetextisdividedintophrasesandthenwords.Aftertokenization, differentsymbolssuchas!,x,¿,¡,$,%,andnumbers,whicharenotveryimportantforclassification,are excluded.Stop-wordsinBanglaarealsoeliminatedatthesameperiod.In[25]isalistofstop-wordsusedin thisresearch.
Thegeneralmethodofconversionofacollectionoftextdocumentstonumericalfeaturevectors isvectorization.Countvectorizertransformsatextdocumentarrayintoatokencountingmatrix:contains tokencountsoccurrencesineachdocument.Asparserepresentationofthenumbersisgeneratedbythis implementation.Ontheotherhand,thepurposeofusingTF-IDFinsteadoftherawfrequenciesofatoken’s occurrenceinagivendocumentistoscaledowntheinfluenceoftokensthatoccurinagivencorpusvery regularly.Andthusempiricallylessinformativecharacteristicsthatoccurinasmallfractionofthetraining corpusareremoved.Itisamethodofdataretrievalthatweightsthefrequencyofwords(TF)andtheinversed documentfrequency(IDF).Everywordhasit’sownTFandIDFratings,respectively.TheTF-IDFweightis theproductofaterm’sTFandIDFscores.Therarerthewordandviceversa,thehighertheTF-IDFscore.
Inthisstudy,twocommonclassifiers:theSVMclassifierwithalinearkernelandthelogisticregressionhavebeenused.Intextclassification,SVMhasbeenusedeffectively.Andlogisticregressionisa statisticalmethodofdataanalysisinwhichoneormorevariablesareusedtodeterminetheresult.
Inessence,SVMisasupervisedmachinelearningapproachknownasabinaryclassifier.In[25] useshyper-planetoclassifydataintotwotypes.Supportvectorsareclosertothe-hyper-planedatapointsthat influencethepositionandorientationofthehyper-plane.Thesupportvectorisusedtooptimizetheperimeter oftheclassifier.Byeliminatingthesupportvectors,thehyper-plane’spositionwillchange.SVMrecognizes thefollowingsignfunctionofequation[26]mathematically,
��(x)=sin(wx + b) (1)
where w isan n weightedvectorin Rn .Bydividingthespace Rn intotwohalf-spaceswiththemaximum margin,SVMfindsthehyper-planein(2).
y = wx + b (2)
Generally,SVMisabinaryclassifier,butthestrategieslikeone-to-oneandone-vs-restcanbeusedto expandintoamulti-classclassifier.InSVM,linearandradialbasisfunctionkernelsareusedtomakedecisions. Sincemosttextsarelinearlyseparable,thelinearkernelispreferredfortextclassification.
Logisticregressionusesalogisticfunctiontoestimateprobabilitiesfortherelationshipbetweenone ormoreindependentvariablesandthedependentvariableofthecategories.Thelogisticfunctionequationis alsoknownasthelogisticcurvewhichisacommon“S”shapedcurvedescribedbythe(3).Thesigmoidcurve isanothernameforthelogisticcurve.
(3)
Where, L isthecurve’shighestvalue, e thebaseofanaturallogarithm(oreuler’snumber), k isthelogistic growthrateorcurvesteepness, v0 isthesigmoidmidpoint’s x-value.
ComparisonanalysisofBanglanewsarticlesclassificationusing...(MdGulzarHussain)
TheperformanceofSVMclassifiersandlogisticregrassiononourdata-setisdiscussedinthissection. Alittleanalysisonthedata-setisprovidedhere.Later,theperformanceanalysisofSVMandLRontheBangla textdataandcomparisonwithsomesimilarworksaredemonstratedinTable4.
Among12.5thousandofarticleswith20categoriesweused11770articlesoftop12categoriesin thisresearchwork.Bar-chartfornumberofnewsarticlesofeachcategoriesaregiveninFigure2.InFigure 3generatedwordcloudbasedonTF-IDFandcountvectorizerfromtotalarticlesaregiven.Thisisgenerated beforerunningthepreprocessingonthedata.
Wecanseemanyunnecessarywordsareavailableinthecloudsandthecloud-basedonTF-IDF isdifferentthanthecloud-basedoncountvectorizervalue.Butsomewordsarehavingaroundthesame importanceforbothTF-IDFandcountvectorizervalue.Wehavesplitthedataintotwoparts,fortrainingthe model,80%ofthedataisusedandtheremaining20%isusedtotesttheperformance.
TELKOMNIKATelecommunComputElControl,Vol.21,No.3,June2023:584–591
In-textcategorization,avarietyofevaluationmetricsareemployed.Theprecision,recall,andFmeasureareamongthemostwidelyusedperformancemeasuresconsideredinourexperiments.Eachcategory’sprecision,recall,andF-measureexperimentaremeasuredforSVMandLR.TheF-measureisaveraged toassessperformancethroughoutcategories.Microaverageandmacroaveragearethetwotypesofaverage valueandthemacroaverageisused.Table2showhowtwoclassifiersperformedonourdata-setcorpusin termsofprecision,recall,andF-measure.OntheotherhandTable3showscomparisonofaccuracy,average precision,averagerecallandaverageF1-scoreresultforSVMandLRclassifier.
Table2.SVMandLRclassifierresultsof12categories
ItisobservedthatSVMwithlinearkernelachievesaccuracyof0.84andLRachieves0.81from Table3andSVMoutperformsLRincaseofaverageprecision,recallandF1-score.ThoughfromTable2it isfoundthatforsomeclassesorcategoriesLRoutperformsSVMincaseofindividualprecision,recalland F1-score.Figure4showstheconfusionmatrixforSVMclassifierwiththelinearkernelandlogisticregression respectively.Table4showsthecomparisonbetweentherecentworksusingSVMclassifierfindingswiththe resultsofthisresearchfinding.Thecomparisonshowsthatourresearchworkutilizingthecombinationof TF-IDFandcountvectorizerfeaturesintheSVMclassifierachievesbetteraccuracythanotherrecentworks whichalsousedSVMintheirresearchworks.
Table3.Accuracy,averageprecision,averagerecallandaverageF1-scoreresultofSVMandLRclassifier AccuracyPrecision(Avg)Recall(Avg)F1-score(Avg)
ISSN:1693-6930
Table4.ComparisonbetweenthisexperimentandrecentsimilarresearchesusingSVM ExperimentsFeatureusedAccuracy
ThisexperimentTF-IDF+Countvectorizer0.84 Islam etal. [27]TrigramTF-IDF0.80 Rahman etal. [28]TF-IDF0.82 Yeasmin etal. [5]TF-IDF0.83
Inthefieldofinformationsystems,textcategorizationisacontentiousissue.Inthispaper,wehave usedSVMandLRclassifiersonowndevelopedcorpusandtheirperformanceismeasured.Theproposed methodologyinthisstudyadvocatestheassumptionthattheBanglalanguagecanindeedberightlyclassified usingSVMandLRwithlimitedresources.Theoutcomeofthesuggestedapproachispromising.Still,more precisioncouldhaveattained.Goodoutcomescouldalsohaveobtainedifwehadbeenabletodealwithall ofthenewscategoriesandutilizemultiplecategorizationmethodstocomewithaconstructiveopinion.Inthe next,we’dliketoaddmorecategoriesandcomparetheresultsusingdifferentclassificationalgorithms.
ThisworkwassupportedinpartbytheCenterforResearch,Innovation,andTransformationofGreen UniversityofBangladesh.
[1] X.Zhou etal., “Asurveyontextclassificationanditsapplications,” WebIntelligence,vol.18,no.3,pp.205-216,2020,doi: 10.3233/WEB-200442.
[2] K.Kowsari,K.J.Meimandi,M.Heidarysafa,S.Mendu,L.Barnes,andD.Brown,“Textclassificationalgorithms:Asurvey,” Information,vol.10,no.4,2019,doi:10.3390/info10040150.
[3] A.Fesseha,S.Xiong,E.D.Emiru,M.Diallo,andA.Dahou,“Textclassificationbasedonconvolutionalneuralnetworksandword embeddingforlow-resourcelanguages:Tigrinya,” Information,vol.12,no.2,2021,doi:10.3390/info12020052.
[4] K.T.Nwet,“Machinelearningalgorithmsformyanmarnewsclassification,” InternationalJournalonNaturalLanguageComputing, vol.8,no.4,pp.17–24,2019,doi:10.5121/ijnlc.2019.8402.
[5] S.Yeasmin,R.Kuri,A.R.M.M.H.Rana,A.Uddin,A.Q.M.S.U.Pathan,andH.Riaz,“Multi-categorybanglanewsclassification usingmachinelearningclassifiersandmulti-layerdenseneuralnetwork,” InternationalJournalofAdvancedComputerScienceand Applications,vol.12,no.5,2021,doi:10.14569/IJACSA.2021.0120588.
[6] P.SaigalandV.Khanna,“Multi-categorynewsclassificationusingsupportvectormachinebasedclassifiers,” SNAppliedSciences, vol.2,2020,doi:10.1007/s42452-020-2266-6.
[7] S.J.Maisha,N.Nafisa,andA.K.M.Masum,“SupervisedmachinelearningalgorithmsforsentimentanalysisofBanglanewspaper,” InternationalJournalofInnovativeComputing,vol.11,no.2,pp.15–23,2021,doi:10.11113/ijic.v11n2.321.
[8] I.M.RabbimovandS.S.Kobilov,“Multi-classtextclassificationofuzbeknewsarticlesusingmachinelearning,”in Journalof Physics:ConferenceSeries,2020,doi:10.1088/1742-6596/1546/1/012097.
[9] M.R.Hossain,S.Sarkar,andM.Rahman,“Differentmachinelearningbasedapproachesofbaselineanddeeplearning modelsforbengalinewscategorization,” InternationalJournalofComputerApplications,vol.176,no.18,pp.10-16,doi: 10.5120/ijca2020920107.
[10] M.Ahmed,P.Chakraborty,andT.Choudhury,“BanglaDocumentCategorizationUsingDeepRNNModelwithAttentionMechanism,” CyberIntelligenceandInformationRetrieval,vol.291,pp.137–147,2021.
[11] M.Zhang,“Applicationsofdeeplearninginnewstextclassification,” ScientificProgramming,vol.2021,2021,doi: 10.1155/2021/6095354.
[12] M.Kowsher,A.Tahabilder,N.J.Prottasha,M.A-Rakib,M.M.Uddin,andP.Saha,“Banglatopicclassificationusingsupervised learning,” ComputationalIntelligenceinPatternRecognition,vol.1349,pp.505–518,2022.
[13] S.RahmanandP.Chakraborty,“Bangladocumentclassificationusingdeeprecurrentneuralnetworkwithbilstm,”in PComputer Science,2021.
[14] C.CortesandV.Vapnik,“Support-vectornetworks,” MachineLearning,vol.20,pp.273–297,1995,doi:10.1007/BF00994018.
[15] N.SunandC.Du,“Newstextclassificationmethodandsimulationbasedonthehybriddeeplearningmodel,” Complexity,vol. 2021,2021,doi:10.1155/2021/8064579.
[16] P.Y.PawarandS.H.Gawande,“Acomparativestudyondifferenttypesofapproachestotextcategorization,” InternationalJournal ofMachineLearningandComputing,vol.2,no.4,pp.423-426,2012.[Online].Available:http://www.ijmlc.org/papers/158C01020-R001.pdf
[17] M.A.Ramdhani,D.S.Maylawati,andT.Mantoro,“Indonesiannewsclassificationusingconvolutionalneuralnetwork,” Indonesian JournalofElectricalEngineeringandComputerScience,vol.19,no.2,pp.1000–1009,2020,doi:10.11591/ijeecs.v19.i2.pp10001009.
[18] R.Wongso,F.A.Luwinda,B.C.Trisnajaya,O.Rusli,Rudy,“NewsarticletextclassificationinIndonesianlanguage,” Procedia ComputerScience,vol.116,pp.137–143,2017,doi:10.1016/j.procs.2017.10.039.
TELKOMNIKATelecommunComputElControl,Vol.21,No.3,June2023:584–591
[19] Q.A.Al-RadaidehandS.S.Al-Khateeb,“Anassociativerule-basedclassifierforarabicmedicaltext,” InternationalJournalof KnowledgeEngineeringandDataMining,vol.3,no.3-4,pp.255–273,2015,doi:10.1504/IJKEDM.2015.074071.
[20] A.N.Chy,M.H.Seddiqui,andS.Das,“Banglanewsclassificationusingnaivebayesclassifier,” 16thInt’lConf.Computerand InformationTechnology,2014,pp.366-371,doi:10.1109/ICCITechn.2014.6997369.
[21] L.Nahar,Z.Sultana,N.Jahan,andU.Jannat,“FilteringBengaliPoliticalandSportsNewsofSocialMediafromTextualInformation,” 20191stInternationalConferenceonAdvancesinScience,EngineeringandRoboticsTechnology(ICASERT),2019,pp.1-6, doi:10.1109/ICASERT.2019.8934605.
[22] A.K.MandalandR.Sen,“Supervisedlearningmethodsforbanglawebdocumentcategorization,” InternationalJournalofArtificial IntelligenceApplications,vol.5,no.5,2014,doi:10.5121/ijaia.2014.5508.
[23] K.Shah,H.Patel,D.Sanghvi,andM.Shah,“Acomparativeanalysisoflogisticregression,randomforestandKNNmodelsforthe textclassification,” AugmentedHumanResearch,vol.5,pp.1–16,2020,doi:10.1007/s41133-020-00032-0.
[24] M.R.Hossain,M.M.Hoque,N.Siddique,andI.H.Sarker,“Bengalitextdocumentcategorizationbasedonverydeepconvolution neuralnetwork,” ExpertSystemswithApplications,vol.184,2021,doi:10.1016/j.eswa.2021.115394.
[25] Genediazjr,“Stopwords-ISO/stopwords-BN:Bengalistopwordscollection,”in Github,Oct2016.Online.[Available]: https://github.com/stopwords-iso/stopwords-bn(Accessed:December20,2022).
[26]
[27]
[28]
J.Kim,J.Yoo,H.Lim,H.Qiu,Z.Kozareva,andA.Galstyan,“Sentimentpredictionusingcollaborativefiltering,”in Proceedings oftheInternationalAAAIConferenceonWebandSocialMedia,vol.7,no.1,pp.685–688,2013,doi:10.1609/icwsm.v7i1.14461.
T.Islam,A.I.Prince,M.M.Z.Khan,M.I.Jabiullah,andM.T.Habib,“Anin-depthexplorationofBanglablogpostclassification,” BulletinofElectricalEngineeringandInformatics,vol.10,no.2,pp.742–749,2021,doi:10.11591/eei.v10i2.2873.
S.Rahman,S.K.Mithila,A.AktherandK.M.Alam,“AnEmpiricalStudyofMachineLearning-basedBanglaNewsClassification Methods,” 202112thInternationalConferenceonComputingCommunicationandNetworkingTechnologies(ICCCNT),pp.1-6, 2021,doi:10.1109/ICCCNT51525.2021.9579655.
MdGulzarHussain wasborninNimnagar,DinajpurCity.HereceivedtheBachelorofSciencedegree(2018)inComputerScienceandEngineeringfromtheGreenUniversityof Bangladesh,Dhaka,Bangladesh.Since2019,hehasbeenaLecturerwiththeComputerScience andEngineeringDepartment,GreenUniversityofBangladesh,Dhaka,Bangladesh.Heiscurrently studyingMScinComputerScienceandTechnologyinChangzhouUniversity,Changzhou,Jiangsu, China.HeisalsoaffiliatedwithIEEEasstudentmemberandIAERasprofessionalmember.HisresearchinterestsincludeArtificialIntelligence,MachineLearning,DeepLearning,NaturalLanguage Processing,TextMining,andTopicModeling.Hecanbecontactedatemail:gulzar.ace@gmail.com.
BabeSultana wasbornin,Cox’sBazar,Bangladesh,in1994.ShereceivedherB.Sc.DegreeinComputerScienceandEngineeringfromGreenUniversityofBangladesh(GUB)in2018.At presentsheisworkingasaLecturer,Dept.ofCSE,GreenUniversityofBangladesh.Also,herpublicationwasabout”Multi-modeProjectSchedulingwithLimitedResourceandBudgetConstraints” publishedinInternationalConferenceonInnovationinEngineeringandTechnology(ICIET)27-28 December,2018andshegotBestPaperAwardandIEEEBestPaperAwardonthisconference. HerresearchinterestsincludeTheoryofOptimization,NaturallanguageProcessing,Renewableand SustainableEnergy.Shecanbecontactedatemail:babecse@gmail.com.
MahmudaRahman recivedherMaster’sinCSfromJahangirnagarUniversityandB.Sc. inCSEfromGreenUniversityofBangladesh.ShehasrecievdViceChancellor’sGoldMedalaward forheroutstandingacademicperformancein4thconvocationofGreenUniversityofBangladesh. ShehasbeenworkingasaLecturer,Dept.ofCSE,GreenUniversityofBangladeshsince2019. HerresearchinterestincludesNaturalLanuageProcessing,MedicalimageprocessingandMachine learning.Shecanbecontactedatemail:aurthi018@gmail.com.
MdRashidulHasan receivedtheBachelorofSciencedegreeinComputerScienceand EngineeringfromGreenUniversityofBangladesh.HewasborninPabna,Bangladeshin1994.His researchinterestincludeMachineLearning,NaturalLanguageProcessing,andDataMining.Hecan becontactedatemail:rashidulhasan777@gmail.com.
ComparisonanalysisofBanglanewsarticlesclassificationusing...(MdGulzarHussain)