Support vector machine based discrete wavelet transform for magnetic resonance imaging brain tumor c

Page 1

TELKOMNIKATelecommunication,Computing,ElectronicsandControl

Vol.21,No.3,June2023,pp.584∼591

ISSN:1693-6930,DOI:10.12928/TELKOMNIKA.v21i3.23416

❒ 584

ComparisonanalysisofBanglanewsarticlesclassification usingsupportvectormachineandlogisticregression

MdGulzarHussain1,2,BabeSultana1,MahmudaRahman1,MdRashidulHasan1

1DepartmentofComputerScienceandEngineering,FacultyofScienceandEngineering,GreenUniversityofBangladesh, Dhaka,Bangladesh

2DepartmentofComputerApplicationTechnology,SchoolofComputerScienceandArtificialIntelligence,Changzhou University,Changzhou,Jiangsu,China

ArticleInfo

Articlehistory:

ReceivedFeb21,2022

RevisedOct29,2022

AcceptedNov12,2022

Keywords:

Banglanews

Banglatextclassification

Logisticregression

Naturallanguageprocessing

Newsclassification

Supportvectormachine

ABSTRACT

Intheinformationage,Banglanewsarticlesontheinternetarefast-growing.For organizing,everynewssitehasaparticularstructureandcategorization.Newsarticleclassificationisamethodtodetermineadocument’sclassificationbasedonvariouspredefinedcategories.ThisresearchdiscussestheclassificationofBanglanews articlesontheonlineplatformandtriestomakeconstructivecomparisonusingseveralclassificationalgorithms.ForBanglanewsarticlesclassification,termfrequencyinversedocumentfrequency(TF-IDF)weightingandcountvectorizerhavebeenused asafeatureextractionprocess,andtwocommonclassifiersnamedsupportvectormachine(SVM)andlogisticregression(LR)employedforclassifyingthedocuments.It isclearthattheaccuracyoftheexperimentalresultsbyapplyingSVMis84.0%and LRis81.0%fortwelvecategoriesofnewsarticles.Inthisresearchwork,whenwe havemadecomparisontworenownedclassificationalgorithmsappliedontheBangla newsarticles,LRwasoutperformedbySVM.

Thisisanopenaccessarticleunderthe CCBY-SA license.

CorrespondingAuthor:

MdGulzarHussain

DepartmentofComputerScienceandEngineering,FacultyofScienceandEngineering GreenUniversityofBangladesh,Dhaka,Bangladesh

Email:gulzar.ace@gmail.com

1. INTRODUCTION

Textdataisthemostcomprehensivesourceofinformationbutduetoitsunstructurednature,itis difficultandtimeconsumingtodrawinsightsfromit.Advancementinmachinelearning(ML)andnatural languageprocessing(NLP)aremakingiteasiertoanalyzetextdatathroughtextclassification.Textclassificationtechniquesareusedtoorganize,structureandcategorizetextdataforanalysisofsentiment,topic labelling,spamdetection,intentdetectionandsoon[1].Thisstudyappliestextclassificationtechniquesto Banglanewsarticlesasthenewsarticlesarethemostcommonformoftextonline.Manystudieshavebeen conductedontextclassification,anddifferentclassificationtechniquessuchasrules-basedmethod,decision trees,k-nearestneighbors(KNN),naiveBayes,logisticregression,supportvectormachines(SVM),neural networks(NN)andsoonhavebeendeveloped[2].Studiesintheliteratureofvariousresearchpapershave shownthatresearchershavefocusedonclassifyingtextsindifferentlanguages,suchasEnglishandArabic andsoon[3],[4]however,theamountofanalysisontheBengalilanguageismuchless.In[5]haveusedmachinelearningalgorithmstocategorizeBanglanewspaperarticlesintofivedistinctgroups.Theyusedlogistic regression,SVM,andmulti-layerneuralnetworktodoso.Multi-layerdenseneuralnetworkapproachhasa

Journalhomepage: http://telkomnika.uad.ac.id

accuracyof95.50%whichishighercomparedtoothermodels.Thereisarepresentation[6]ofthebehavior ofleast-squaresSVMs,twinSVMs,andleast-squarestwinSVM(LS-TWSVM)classifiersonthenewsdata isshowntohandlemulti-categorydata.AndtheirperformanceevaluationshowedthatLS-TWSVMisthe bestofallthreewith92.96%accuracy.Maisha etal. [7]performsentimentanalysisonBanglanewsusing ”pipeline”classalongwithsixstate-of-the-artsupervisedMLalgorithmswhichincludesdecisiontree(DT), multinomialnaiveBayes(MNB),k-nearestneighbor(KNN),logisticregression(LR),randomforest(RF)and lagrangiansupportvectormachine(LSVM).Randomforestalgorithmoutstandsallotheralgorithmssecuring 98%accuracyinpercentagesplitmethod.Thismethodologycategorizesthecontenteitheraspositiveornegative.RabbinnovandKobilov[8]usedSVM,decisiontreeclassifier,randomforest,LR,andMNBamongsix othermachine-learningalgorithmstoconductmulti-classtextcategorizationofinternetUzbeknewsarticles. ForSVM,theyusedradialbasisfunction(RBFSVM)whichgavethebestaccuracy(86.88%)performance ofotherclassifiers.Severalsupervisedmachinelearningaswellasdeeplearningalgorithmsforcategorizing Bengalinewsdocumentsarediscussedin[9].AnevermethodforclassifyingBanglatextualcontenthasbeen developedby[10].Thedeeplearningrecurrentneuralnetworks(RNN)-basedattentionlayerandtheRNN withBiLSTMachievedaccuracyratesof97.72%and86.56%,respectively.In[11]proposedamodelstructure namedtheDCLSTM-MLPmodelforthecategorizationofnewstextdocuments,anideaofacustomizedalgorithmthatcombinesdeeplearningalgorithmssuchaslongtermshortmemory(LSTM),convolutionalneural network(CNN),andmulti-layerperceptron(MLP).Byapplyingthismodelstructuretheyhavetriedtosolve theproblemsoftextuallength,thecomplexityofextractingfeaturesfromnewscontentandcategorizingnews texteffectivelyandachieved94.82%accuracy.However,themainissueoftheirpaperisthatthenumberof samplesissmall,andthedistributionofdifferenttypesofnewsisunequal,resultinginthemodel’snarrow effectiveness.

Kowsher etal.of[12]usedseveralwordembeddingmethodstoincorporatethetextfromBangla newspaperdata,aswellasmachinelearningalgorithmstocategorizetheincorporatedtext.Adeeprecurrent neuralNetworkwasusedtoprovideanewmethodofevaluatingBanglanewsarticlesin[13].Thedeep recurrentneuralnetworkfeaturingBi-LSTMobtained98.33%inBengalitextcategorization,whichisgreater thanpreviouswell-knownclassificationtechniques.Forclassifyingtextordocuments,manyofsupervised techniquesareusedsuchasKNN,naiveBayes(NB),DT,n-grams,neuralnetworks(NNet).Butaccordingto previousresearchliteraturereviewsSVM[14]ismostfrequentlyusedclassifieralgorithm.In[15]proposed anewsarticleclassificationmodelframeworkbasedondeephybridlearningandcomparedittotraditional textclassificationtodemonstratethesuperiorityofnetworknewstextclassification,anditoutperformsthe standardizedmethodforclassifyingnewstextsintermsofoverallperformance.In[16]showedacomparative analysisamongDT,KNN,NB,andRocchio’salgorithmwheretheirstudiessaidthatSVMoutperformsbetter thanallotherclassifiers.AsidefromtheEnglishdocument,therehasalsobeenextensiveresearchonother languages.In[17]focusedontheclassificationofself-createdIndonesiannewscorpus,whichincludesfour separatecategoriesand472Indonesiannewspaperarticlesfromavarietyofsources.Theyproducedfivemodels withtenepocheachusing377datafortrainingand95testingdata,withCNNhavingthehighestaccuracyrate atabout90.74percentcategorizingIndonesiannewsdata.In[18]conductedonanIndonesiannewscorpus foundthatthecombinationofTF-IDFandMNBbeatsotherclassificationmodelssuchasmultivariateBernoulli naiveBayes(BNB)andSVMwithanaccuracyof85%.IntermsofArabictextclassification,in[19]Arabic medicaltextdocumentsareclassifiedandauthorsusedanrule-basedclassifierforclassificationofArabictext andhavinganaccuracyof90.6%.Theyusedthreeclassificationalgorithm:majorityvoting,ordereddecision list,andK-NNwhichareusedtovalidatethemodel.

AmongallthecoupleworkscoveredintheBangla,Chy etal. [20]workedonBanglalanguagenews classificationandtheyusednaiveBayesclassifier.ButtheproblemisthatinthenaiveBayesmodelifthe testingdatasetcontainsacategoricalvariableofacategorythatwasnotpartofthetrainingdataset,itwill giveitzeroprobabilityandbeunabletomakeanypredictions.Nahar etal. [21]demonstratedacomparative analysisutilizingnaiveBayesclassifier,SVM,andneuralnetworkstofilterBanglasportsandpoliticalnews onlyononlinenetworksfromtextdata.MandalandSen[22]showedthecomparisonperformanceevaluationof the4supervisedlearningtechniques:decisiontree,naiveBayes,k-nearestneighbor,andSVMfortheBengali textcategorization.TheyusedTF-IDFtocompriseafeaturevectorusing1000webdocumentswhichcontains 22,218words.Closetoourresearchwork,byusingTF-IDFasafeatureselectionapproachandSVMasa classifier,theyattainedaclassificationaccuracyof92.57percentfor12categoriesofBanglatextfiles.They used3191textsamplespercategorywherewehavetriedtouse11770articlesoftop12categories(showed

ComparisonanalysisofBanglanewsarticlesclassificationusing...(MdGulzarHussain)

TELKOMNIKATelecommunComputElControl ❒ 585

inTable1)from20differentcategories.AndbestofourknowledgetheyusedonlySVMwherewehaveused SVMandlogisticregression(LR)andLRisprimarilyusedtoclassifyobservationsintoadiscretenumber ofcategorieswithrapidclassificationofunknownrecords.Inthisresearchpaper,wehavetriedtoshowa comparativeperformanceanalysisusingSVMandLRwhichcanbringrighteouspathforfutureresearchers whoarewillingtoworkwithBanglalanguageinthisera.

Inthisstudy,LRandSVMareusedtoclassifytheBanglanewsarticlesasthestudy[23]showsthat thelogisticregressionoutperformsothertechniquessuchasrandomforestandknearestneighboursalgorithm. ItclassifiestheBanglanewsarticlesandlabelthemtoacertaintopicbasedontheircontents.Itextractsfeatures fromthecorpususingtheTF-IDFfeaturevectorsinceitisenabledtotheextractionofrelevantfeaturesaswell asremovingcommonwords.

2. METHOD

Theaimofnewsclassificationseemstoallocatecategoriesasperthecontentofanewsarticle.PreprocessingofBanglanewsarticlesandfeaturesetextractionarealsoneededpriortotrainingandmodelconstructionfordocumentclassification,likeEnglishtextclassification.TheoverallBanglanewsarticleclassificationprocesshavebeenusedinthisexperimentasillustratedinFigure1.

Despitethelimitedamountofthedata,Hossain etal. [24]discoveredthattext-graphconvolutional neural(GCN)accomplishedbetterthanGRU-LSTM,BiLSTM,Char-CNN,LSTM,andbidirectionalencoder representationsfromtransformers(BERT)inclassifyingonlineBanglanews.Assofar,thereisascarcityof standarddatasetintheBanglalanguage,soadatasethavebeenpreparedbyscrapingthenewsarticlesfrom variouselectronicnewssitessuchas’https://www.prothomalo.com/’.Atthetimeofscrapping,thearticleswere labeledwiththeircategories.Toassesstherecommendationresults,wehavecollectedaround12.5Klabeled newsarticlesconsistingof20categories.Amongthemtop12categoriesareconsideredforthisresearchwork whichhasmentionedinTable1.

586 ❒
ISSN:1693-6930
Testing Training DataCleaning & Preprocessing Feature Selectionand Extraction Apply Classifier Algorithms BanglaNews Article TrainingData BanglaNews Article TestData Performance Evaluation CountVectorizer Tf-idf Vectorizer Training Dataset DataCleaning & Preprocessing Feature Selectionand Extraction Classifier Model
Figure1.Systemarchitectureoftheproposedwork 2.1. Datacollection
SerialCategoryArticlesSerialCategoryArticles 1Bangladeshnews50297Feedbacknews578 2Sportsnews13878Othernews447 3NorthAmericanews8099Citizennews414 4Entertainmentnews75010Newsofdistancemigrants393 5Internationalnews74611Lifestylenews264 6Economicnews71112Scienceandtechnologynews242 TELKOMNIKATelecommunComputElControl,Vol.21,No.3,June2023:584–591
Table1.Category-wisecountoftheBanglanewsarticles

2.2. Datapreprocessing

Preprocessingisperformedtominimizethenoiseinthetext,whichhelpstoincreasetheclassifier’s efficacy.Thepreprocessingstepsclearthetextdataandpreparingitfortheartificiallearningmodel.Thenthe tokenizationisperformedontextdocumentsandbreakdownthesequenceofcharactersorwordstogetthe features.Sincetextdocumentscontainsentences,likeasequenceofcharacterorwords,togetthefeatures,it shouldbebrokenintotokens.Tokenizationistheprocesstobreakdownthetextintosentencesandthenwords. Tokenizationistheprocessbywhichthetextisdividedintophrasesandthenwords.Aftertokenization, differentsymbolssuchas!,x,¿,¡,$,%,andnumbers,whicharenotveryimportantforclassification,are excluded.Stop-wordsinBanglaarealsoeliminatedatthesameperiod.In[25]isalistofstop-wordsusedin thisresearch.

2.3. Featureextraction

Thegeneralmethodofconversionofacollectionoftextdocumentstonumericalfeaturevectors isvectorization.Countvectorizertransformsatextdocumentarrayintoatokencountingmatrix:contains tokencountsoccurrencesineachdocument.Asparserepresentationofthenumbersisgeneratedbythis implementation.Ontheotherhand,thepurposeofusingTF-IDFinsteadoftherawfrequenciesofatoken’s occurrenceinagivendocumentistoscaledowntheinfluenceoftokensthatoccurinagivencorpusvery regularly.Andthusempiricallylessinformativecharacteristicsthatoccurinasmallfractionofthetraining corpusareremoved.Itisamethodofdataretrievalthatweightsthefrequencyofwords(TF)andtheinversed documentfrequency(IDF).Everywordhasit’sownTFandIDFratings,respectively.TheTF-IDFweightis theproductofaterm’sTFandIDFscores.Therarerthewordandviceversa,thehighertheTF-IDFscore.

2.4. Classifiers

Inthisstudy,twocommonclassifiers:theSVMclassifierwithalinearkernelandthelogisticregressionhavebeenused.Intextclassification,SVMhasbeenusedeffectively.Andlogisticregressionisa statisticalmethodofdataanalysisinwhichoneormorevariablesareusedtodeterminetheresult.

2.4.1. SVM

Inessence,SVMisasupervisedmachinelearningapproachknownasabinaryclassifier.In[25] useshyper-planetoclassifydataintotwotypes.Supportvectorsareclosertothe-hyper-planedatapointsthat influencethepositionandorientationofthehyper-plane.Thesupportvectorisusedtooptimizetheperimeter oftheclassifier.Byeliminatingthesupportvectors,thehyper-plane’spositionwillchange.SVMrecognizes thefollowingsignfunctionofequation[26]mathematically,

��(x)=sin(wx + b) (1)

where w isan n weightedvectorin Rn .Bydividingthespace Rn intotwohalf-spaceswiththemaximum margin,SVMfindsthehyper-planein(2).

y = wx + b (2)

Generally,SVMisabinaryclassifier,butthestrategieslikeone-to-oneandone-vs-restcanbeusedto expandintoamulti-classclassifier.InSVM,linearandradialbasisfunctionkernelsareusedtomakedecisions. Sincemosttextsarelinearlyseparable,thelinearkernelispreferredfortextclassification.

2.4.2. Logisticregression

Logisticregressionusesalogisticfunctiontoestimateprobabilitiesfortherelationshipbetweenone ormoreindependentvariablesandthedependentvariableofthecategories.Thelogisticfunctionequationis alsoknownasthelogisticcurvewhichisacommon“S”shapedcurvedescribedbythe(3).Thesigmoidcurve isanothernameforthelogisticcurve.

(3)

Where, L isthecurve’shighestvalue, e thebaseofanaturallogarithm(oreuler’snumber), k isthelogistic growthrateorcurvesteepness, v0 isthesigmoidmidpoint’s x-value.

ComparisonanalysisofBanglanewsarticlesclassificationusing...(MdGulzarHussain)

TELKOMNIKATelecommunComputElControl ❒ 587
L 1+ e k
��(x)=
(v v0 )

3. RESULTSANDDISCUSSION

TheperformanceofSVMclassifiersandlogisticregrassiononourdata-setisdiscussedinthissection. Alittleanalysisonthedata-setisprovidedhere.Later,theperformanceanalysisofSVMandLRontheBangla textdataandcomparisonwithsomesimilarworksaredemonstratedinTable4.

3.1. Datasetanalysis

Among12.5thousandofarticleswith20categoriesweused11770articlesoftop12categoriesin thisresearchwork.Bar-chartfornumberofnewsarticlesofeachcategoriesaregiveninFigure2.InFigure 3generatedwordcloudbasedonTF-IDFandcountvectorizerfromtotalarticlesaregiven.Thisisgenerated beforerunningthepreprocessingonthedata.

Wecanseemanyunnecessarywordsareavailableinthecloudsandthecloud-basedonTF-IDF isdifferentthanthecloud-basedoncountvectorizervalue.Butsomewordsarehavingaroundthesame importanceforbothTF-IDFandcountvectorizervalue.Wehavesplitthedataintotwoparts,fortrainingthe model,80%ofthedataisusedandtheremaining20%isusedtotesttheperformance.

TELKOMNIKATelecommunComputElControl,Vol.21,No.3,June2023:584–591

588 ❒
ISSN:1693-6930
Figure2.NumbersoftheBanglanewsarticlesforeachcategory Figure3.WordcloudforthearticlesusedbasedonTF-IDF(left)andcountvectorizer(right)

3.2. Performancemeasures

In-textcategorization,avarietyofevaluationmetricsareemployed.Theprecision,recall,andFmeasureareamongthemostwidelyusedperformancemeasuresconsideredinourexperiments.Eachcategory’sprecision,recall,andF-measureexperimentaremeasuredforSVMandLR.TheF-measureisaveraged toassessperformancethroughoutcategories.Microaverageandmacroaveragearethetwotypesofaverage valueandthemacroaverageisused.Table2showhowtwoclassifiersperformedonourdata-setcorpusin termsofprecision,recall,andF-measure.OntheotherhandTable3showscomparisonofaccuracy,average precision,averagerecallandaverageF1-scoreresultforSVMandLRclassifier.

Table2.SVMandLRclassifierresultsof12categories

ItisobservedthatSVMwithlinearkernelachievesaccuracyof0.84andLRachieves0.81from Table3andSVMoutperformsLRincaseofaverageprecision,recallandF1-score.ThoughfromTable2it isfoundthatforsomeclassesorcategoriesLRoutperformsSVMincaseofindividualprecision,recalland F1-score.Figure4showstheconfusionmatrixforSVMclassifierwiththelinearkernelandlogisticregression respectively.Table4showsthecomparisonbetweentherecentworksusingSVMclassifierfindingswiththe resultsofthisresearchfinding.Thecomparisonshowsthatourresearchworkutilizingthecombinationof TF-IDFandcountvectorizerfeaturesintheSVMclassifierachievesbetteraccuracythanotherrecentworks whichalsousedSVMintheirresearchworks.

Table3.Accuracy,averageprecision,averagerecallandaverageF1-scoreresultofSVMandLRclassifier AccuracyPrecision(Avg)Recall(Avg)F1-score(Avg)

TELKOMNIKATelecommunComputElControl ❒ 589
Categ.PrecisionRecallF1-ScoreCateg.PrecisionRecallF1-Score SVMLRSVMLRSVMLRSVMLRSVMLRSVMLR 00.890.840.960.960.920.9060.690.640.740.690.710.66 10.940.910.980.980.960.9570.770.780.780.760.770.77 20.740.780.720.680.730.7380.510.500.210.130.290.21 30.880.840.870.830.880.8490.590.600.490.410.530.49 40.700.650.680.610.690.63100.880.930.610.490.720.64 50.810.800.780.730.790.76110.690.680.570.430.620.53
SVMLRSVMLRSVMLRSVMLR 0.840.810.760.750.700.640.720.68
Figure4.Confusion-matrixforSVM(left)andLR(right) ComparisonanalysisofBanglanewsarticlesclassificationusing...(MdGulzarHussain)

ISSN:1693-6930

Table4.ComparisonbetweenthisexperimentandrecentsimilarresearchesusingSVM ExperimentsFeatureusedAccuracy

ThisexperimentTF-IDF+Countvectorizer0.84 Islam etal. [27]TrigramTF-IDF0.80 Rahman etal. [28]TF-IDF0.82 Yeasmin etal. [5]TF-IDF0.83

4. CONCLUSION

Inthefieldofinformationsystems,textcategorizationisacontentiousissue.Inthispaper,wehave usedSVMandLRclassifiersonowndevelopedcorpusandtheirperformanceismeasured.Theproposed methodologyinthisstudyadvocatestheassumptionthattheBanglalanguagecanindeedberightlyclassified usingSVMandLRwithlimitedresources.Theoutcomeofthesuggestedapproachispromising.Still,more precisioncouldhaveattained.Goodoutcomescouldalsohaveobtainedifwehadbeenabletodealwithall ofthenewscategoriesandutilizemultiplecategorizationmethodstocomewithaconstructiveopinion.Inthe next,we’dliketoaddmorecategoriesandcomparetheresultsusingdifferentclassificationalgorithms.

ACKNOWLEDGEMENT

ThisworkwassupportedinpartbytheCenterforResearch,Innovation,andTransformationofGreen UniversityofBangladesh.

REFERENCES

[1] X.Zhou etal., “Asurveyontextclassificationanditsapplications,” WebIntelligence,vol.18,no.3,pp.205-216,2020,doi: 10.3233/WEB-200442.

[2] K.Kowsari,K.J.Meimandi,M.Heidarysafa,S.Mendu,L.Barnes,andD.Brown,“Textclassificationalgorithms:Asurvey,” Information,vol.10,no.4,2019,doi:10.3390/info10040150.

[3] A.Fesseha,S.Xiong,E.D.Emiru,M.Diallo,andA.Dahou,“Textclassificationbasedonconvolutionalneuralnetworksandword embeddingforlow-resourcelanguages:Tigrinya,” Information,vol.12,no.2,2021,doi:10.3390/info12020052.

[4] K.T.Nwet,“Machinelearningalgorithmsformyanmarnewsclassification,” InternationalJournalonNaturalLanguageComputing, vol.8,no.4,pp.17–24,2019,doi:10.5121/ijnlc.2019.8402.

[5] S.Yeasmin,R.Kuri,A.R.M.M.H.Rana,A.Uddin,A.Q.M.S.U.Pathan,andH.Riaz,“Multi-categorybanglanewsclassification usingmachinelearningclassifiersandmulti-layerdenseneuralnetwork,” InternationalJournalofAdvancedComputerScienceand Applications,vol.12,no.5,2021,doi:10.14569/IJACSA.2021.0120588.

[6] P.SaigalandV.Khanna,“Multi-categorynewsclassificationusingsupportvectormachinebasedclassifiers,” SNAppliedSciences, vol.2,2020,doi:10.1007/s42452-020-2266-6.

[7] S.J.Maisha,N.Nafisa,andA.K.M.Masum,“SupervisedmachinelearningalgorithmsforsentimentanalysisofBanglanewspaper,” InternationalJournalofInnovativeComputing,vol.11,no.2,pp.15–23,2021,doi:10.11113/ijic.v11n2.321.

[8] I.M.RabbimovandS.S.Kobilov,“Multi-classtextclassificationofuzbeknewsarticlesusingmachinelearning,”in Journalof Physics:ConferenceSeries,2020,doi:10.1088/1742-6596/1546/1/012097.

[9] M.R.Hossain,S.Sarkar,andM.Rahman,“Differentmachinelearningbasedapproachesofbaselineanddeeplearning modelsforbengalinewscategorization,” InternationalJournalofComputerApplications,vol.176,no.18,pp.10-16,doi: 10.5120/ijca2020920107.

[10] M.Ahmed,P.Chakraborty,andT.Choudhury,“BanglaDocumentCategorizationUsingDeepRNNModelwithAttentionMechanism,” CyberIntelligenceandInformationRetrieval,vol.291,pp.137–147,2021.

[11] M.Zhang,“Applicationsofdeeplearninginnewstextclassification,” ScientificProgramming,vol.2021,2021,doi: 10.1155/2021/6095354.

[12] M.Kowsher,A.Tahabilder,N.J.Prottasha,M.A-Rakib,M.M.Uddin,andP.Saha,“Banglatopicclassificationusingsupervised learning,” ComputationalIntelligenceinPatternRecognition,vol.1349,pp.505–518,2022.

[13] S.RahmanandP.Chakraborty,“Bangladocumentclassificationusingdeeprecurrentneuralnetworkwithbilstm,”in PComputer Science,2021.

[14] C.CortesandV.Vapnik,“Support-vectornetworks,” MachineLearning,vol.20,pp.273–297,1995,doi:10.1007/BF00994018.

[15] N.SunandC.Du,“Newstextclassificationmethodandsimulationbasedonthehybriddeeplearningmodel,” Complexity,vol. 2021,2021,doi:10.1155/2021/8064579.

[16] P.Y.PawarandS.H.Gawande,“Acomparativestudyondifferenttypesofapproachestotextcategorization,” InternationalJournal ofMachineLearningandComputing,vol.2,no.4,pp.423-426,2012.[Online].Available:http://www.ijmlc.org/papers/158C01020-R001.pdf

[17] M.A.Ramdhani,D.S.Maylawati,andT.Mantoro,“Indonesiannewsclassificationusingconvolutionalneuralnetwork,” Indonesian JournalofElectricalEngineeringandComputerScience,vol.19,no.2,pp.1000–1009,2020,doi:10.11591/ijeecs.v19.i2.pp10001009.

[18] R.Wongso,F.A.Luwinda,B.C.Trisnajaya,O.Rusli,Rudy,“NewsarticletextclassificationinIndonesianlanguage,” Procedia ComputerScience,vol.116,pp.137–143,2017,doi:10.1016/j.procs.2017.10.039.

TELKOMNIKATelecommunComputElControl,Vol.21,No.3,June2023:584–591

590 ❒

[19] Q.A.Al-RadaidehandS.S.Al-Khateeb,“Anassociativerule-basedclassifierforarabicmedicaltext,” InternationalJournalof KnowledgeEngineeringandDataMining,vol.3,no.3-4,pp.255–273,2015,doi:10.1504/IJKEDM.2015.074071.

[20] A.N.Chy,M.H.Seddiqui,andS.Das,“Banglanewsclassificationusingnaivebayesclassifier,” 16thInt’lConf.Computerand InformationTechnology,2014,pp.366-371,doi:10.1109/ICCITechn.2014.6997369.

[21] L.Nahar,Z.Sultana,N.Jahan,andU.Jannat,“FilteringBengaliPoliticalandSportsNewsofSocialMediafromTextualInformation,” 20191stInternationalConferenceonAdvancesinScience,EngineeringandRoboticsTechnology(ICASERT),2019,pp.1-6, doi:10.1109/ICASERT.2019.8934605.

[22] A.K.MandalandR.Sen,“Supervisedlearningmethodsforbanglawebdocumentcategorization,” InternationalJournalofArtificial IntelligenceApplications,vol.5,no.5,2014,doi:10.5121/ijaia.2014.5508.

[23] K.Shah,H.Patel,D.Sanghvi,andM.Shah,“Acomparativeanalysisoflogisticregression,randomforestandKNNmodelsforthe textclassification,” AugmentedHumanResearch,vol.5,pp.1–16,2020,doi:10.1007/s41133-020-00032-0.

[24] M.R.Hossain,M.M.Hoque,N.Siddique,andI.H.Sarker,“Bengalitextdocumentcategorizationbasedonverydeepconvolution neuralnetwork,” ExpertSystemswithApplications,vol.184,2021,doi:10.1016/j.eswa.2021.115394.

[25] Genediazjr,“Stopwords-ISO/stopwords-BN:Bengalistopwordscollection,”in Github,Oct2016.Online.[Available]: https://github.com/stopwords-iso/stopwords-bn(Accessed:December20,2022).

[26]

[27]

[28]

J.Kim,J.Yoo,H.Lim,H.Qiu,Z.Kozareva,andA.Galstyan,“Sentimentpredictionusingcollaborativefiltering,”in Proceedings oftheInternationalAAAIConferenceonWebandSocialMedia,vol.7,no.1,pp.685–688,2013,doi:10.1609/icwsm.v7i1.14461.

T.Islam,A.I.Prince,M.M.Z.Khan,M.I.Jabiullah,andM.T.Habib,“Anin-depthexplorationofBanglablogpostclassification,” BulletinofElectricalEngineeringandInformatics,vol.10,no.2,pp.742–749,2021,doi:10.11591/eei.v10i2.2873.

S.Rahman,S.K.Mithila,A.AktherandK.M.Alam,“AnEmpiricalStudyofMachineLearning-basedBanglaNewsClassification Methods,” 202112thInternationalConferenceonComputingCommunicationandNetworkingTechnologies(ICCCNT),pp.1-6, 2021,doi:10.1109/ICCCNT51525.2021.9579655.

BIOGRAPHIESOFAUTHORS

MdGulzarHussain wasborninNimnagar,DinajpurCity.HereceivedtheBachelorofSciencedegree(2018)inComputerScienceandEngineeringfromtheGreenUniversityof Bangladesh,Dhaka,Bangladesh.Since2019,hehasbeenaLecturerwiththeComputerScience andEngineeringDepartment,GreenUniversityofBangladesh,Dhaka,Bangladesh.Heiscurrently studyingMScinComputerScienceandTechnologyinChangzhouUniversity,Changzhou,Jiangsu, China.HeisalsoaffiliatedwithIEEEasstudentmemberandIAERasprofessionalmember.HisresearchinterestsincludeArtificialIntelligence,MachineLearning,DeepLearning,NaturalLanguage Processing,TextMining,andTopicModeling.Hecanbecontactedatemail:gulzar.ace@gmail.com.

BabeSultana wasbornin,Cox’sBazar,Bangladesh,in1994.ShereceivedherB.Sc.DegreeinComputerScienceandEngineeringfromGreenUniversityofBangladesh(GUB)in2018.At presentsheisworkingasaLecturer,Dept.ofCSE,GreenUniversityofBangladesh.Also,herpublicationwasabout”Multi-modeProjectSchedulingwithLimitedResourceandBudgetConstraints” publishedinInternationalConferenceonInnovationinEngineeringandTechnology(ICIET)27-28 December,2018andshegotBestPaperAwardandIEEEBestPaperAwardonthisconference. HerresearchinterestsincludeTheoryofOptimization,NaturallanguageProcessing,Renewableand SustainableEnergy.Shecanbecontactedatemail:babecse@gmail.com.

MahmudaRahman recivedherMaster’sinCSfromJahangirnagarUniversityandB.Sc. inCSEfromGreenUniversityofBangladesh.ShehasrecievdViceChancellor’sGoldMedalaward forheroutstandingacademicperformancein4thconvocationofGreenUniversityofBangladesh. ShehasbeenworkingasaLecturer,Dept.ofCSE,GreenUniversityofBangladeshsince2019. HerresearchinterestincludesNaturalLanuageProcessing,MedicalimageprocessingandMachine learning.Shecanbecontactedatemail:aurthi018@gmail.com.

MdRashidulHasan receivedtheBachelorofSciencedegreeinComputerScienceand EngineeringfromGreenUniversityofBangladesh.HewasborninPabna,Bangladeshin1994.His researchinterestincludeMachineLearning,NaturalLanguageProcessing,andDataMining.Hecan becontactedatemail:rashidulhasan777@gmail.com.

ComparisonanalysisofBanglanewsarticlesclassificationusing...(MdGulzarHussain)

TELKOMNIKATelecommunComputElControl ❒ 591

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.