Survey on Sentiment Analysis of Marathi Speech and Script by IRJET Journal

alargenumberofpeoplehaveaccesstomobilephones.Peopleusemobilephonestocalleach other, text each other and use social Media platforms. This in turn generates a huge amount of data. The data generatedismainlyin2formsi.e.,Speechandtext.Ifa personwantstobuyanyproductonlineor wantstouseany toprocessdataproductAnalysisTherefore,unstructuredreviewswithservice,thenhe/shewillfirstlookuponitsreviewsonline.Beforemakinganydecision,he/shemightaswelldiscussitotherconsumerswhohavealreadyusedtheproductorservice.Hencethedatageneratedbytheusersinformofissentimentrichandvaluableforthecompaniestogaininsightsabouttheirproduct.Butthedataproducedisandcanbeintheformoftext,emoticons,URLs,punctuation,etc.toovasttoanalyzemanually.NaturalLanguageProcess(NLP)isdone.OneofthemainapplicationsofNLPisTextmining.Sentiment(SA)usestextminingandNLPtoprocessthisusercreatedcontent.Similarlywheneveranyuserbuysanyonline,theecommercesellerscalltheuseraskingforfeedback.Thisfeedbackisintheformofvoicedata.ThiscanbethenanalyzedusingNLP.PrimarygoalinthisprocessistocollectdataintheformofSpeechortext,prethisrawdatausingvarioustechniquesandthenusethisdatatogetusefulinsights.Inthispaper,wearegoingperformadetailedanalysisofvariousSentimentanalysismethodologiesacrossvariouslanguages. Intoday’sScenario,we

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page876 Survey on Sentiment Analysis of Marathi Speech and Script Prof. Vina M. Lomte1, Pratik Jadhav2, Onkar Kalshetti3, Sonali Deshmukh4, Arjun Jadhav5 1 5Department of Computer Engineering, RMD Sinhgad School of Engineering, SPPU, India *** Abstract Sentiment Analysis on Voice and textual data is being carried out widely. Sentiment analysis has many applications ranging from business, Government and consumer perspective in this data era. According to Google, India today has a population of 1380 Million. Out of which 825 million users are active on the internet. These huge user bases are distributed across various languages. Out of which Marathi stands to be 3rd most popular language in India. Even then it is the least focused language when talking about sentiment analysis. This is due to absence of proper dataset and lack of research focusing on Marathi text and speech. So, in this study we have studied different methodologies of sentiment analysis across various languages. In this survey paper we have taken comparative study of different techniques and approaches of sentiment analysis on voice data, tweets and textual data.

1. INTRODUCTION Inthismodernworld, canseethatSentimentanalysisoftheEnglishlanguageisbeingdonethemostandhasalotof outcomeseachwithauniqueefficiencyandresult.Butasweknow,withtechnologicaladvancementspeopleareable tolearnmorethan2languages.Inshort,peopleareabletospeakmultiplelanguages,soitbecomesnaturalthateven themachinesthatarelearningshouldlearnallthelanguages.Therehasbeensomegreatdevelopmenteveninother languages like Chinese, Bengali, etc. and all with successful results. There are still certain areas which are not yet entirely focused or successfully researched in the field of sentiment analysis; and those fields are of the local languages, India is a very big country which speaks a lot of languages. There are 22 official languages in India and creating a successful sentiment analysis model for all of them is nearly impossible mostly due to the fact that not enough data is available on the unscheduled languages and that not a lot of people really use the language. A large number of people in India use English words or sentences in their day to day lives. Marathi is a language spoken by the state of Maharashtra and hence is the local language of the state. Hence, that makes Marathi one of the most importantandspokenlanguagesofthecountry.Hence,wehavedecidedtousethisasalocallanguageforourstudy andresearch.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page877 2. Literature Survey Table 1: DeepLiteratureSurveyofCurrentTechnologies NoSR. PublicationDetails Authors TechUsed Accuracy Dataset ResearchGap 1. Applying natural satisfactionanalyseprocessinglanguagetocustomer IEEE AlibasicArmin and Popovic,Tomo Senior Member BeautifulScraping, Soap library Pre processing: flyingrootwordscuttingStemmingWords,RemovingTokenization,Stopthetotheofthatwordfly,seatsseat,etc. Post processing: Python NLTK library 95% 50,000 TripAdvisorreviewsairlinefrom sentiment score is calculated using the EnglishDatasetappearingfrequencyofpositiveandnegativewords.containedreviewsbyNativeSpeakers. 2. A Hybrid Machine ICOSECConferenceIEEESpeechRecognitionEmotionLearningModelforFromSignalsInternational2020 Author)KukanaPoonam(Author)BhartiDeepak(Co Feature Extraction: (Ant O,Optimization)ALLionGFCC Post processing: (MSVM)Classification 97% RAVDESS data 7356containingSpeech(EmotionalsetFiles)tthe Dataset used contains Male: Female VoiceRatioas4:1 3. Deep Learning Based SpeechSystemIdentificationLanguageFrom Athira N P PoornaSS Pre processing: DC and Noise Removal, Pre Spectrogramsemphasis, Post processing: precision, recall, F1 score and confusionmatrix 74% fromlanguagessevendatabaseSpeechofIndianIIITIndic accuracyMarathiDatasetcanbeSpecificallyDesignedforSpeechdatatoachievebetter 4. LanguageforusingSpeechRecognitionHTKToolkitMarathi oreDr.S.M.HandS.ChavanSupriya Pre processing: Segmentation,SpeechCoefficients,LinearEnhancement,SpeechPredictive Feature Extraction: Hidden LPCKitModellingMarkovTool3.4,MFCC, Post processing: Speech 80% 910 speakers.femalemalerecordedsentences,Marathiby3and3 Speech Recognition accuracy can be Improved as Current accuracy is 80%. DatasetSize&varietycanbeIncreased.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page878 Recognition: ANN 5. LanguageforAnalysisSentimentClassClassificationMulticlassandbasedHindi PundlikmPurushottaGawade,AkshayDasare,PrasadGaikwad,GajananKasbekar,PrachiPundlik,aProf.Sumitr ClassifierMode(LM)Languaget(HSWN)rdNeHindiSentiWo textconvertedTopicsvariousacrossfromSpeechesleaderstofiles. Self Learning ontology not Implemented. Works only with text documentswithWordlimitations 6. HMMRecognitionMarathiIsolatedSpokenWordsusing DeshpandeMangeshandSaiSawant processing:PremodelTestingTrainingandthe Processing:Post (HMMs)ModelsMarkovHidden 86.5% 2,000 speakersbywordsMarathispokentennative TToincreaseherecognition performance and fromincreasedsystem,speakerindependenceofthistrainingdatasetneedstobebyusingdataobtainedincreasednumberofspeakers. 7. ProcessingNaturalTaggingPartTechniquesDeepLearningforofSpeechbyLanguage KiwelekarArvindand(Deshmukh)DhumalMrs.RushaliDr. processing:Pre used.areTechniquesDropDroppingEarlyandout Post processing:Pre ScikitPythonNLTK,learn model.Biforandmodellearningthe85%fordeep97%theLSTM Pos tagged Newspapers.MarathiSpeechPartsusingbeenwordswithsentences1500containingcorpus10115hascreatedUnifiedoffrom 8. Urdu Sentiment MethodsLearninglysisAnaWithDeep TSUNHSIENAshraf,NomanAmjad,AmmarLalKhan, processing:Pre nNormalizatioStopWords, Post 82.05% The data of Movie and newsfromUrdureviews.electronictextUrdu negativethatOneofthelimitationsofthisstudyisitincludesonlypositiveandclasses;

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page879 GCHANG Processing: NB,SVM, RF, LSTMandLR,MLP,AdaBoost,1DCNN, websites. 9. A Pattern Based Approach for Multi AnalysisSentimentClass in Twitter BOUAZIZI,MONDHERTOMOAKIOHTSUKI processingPre By removing the URLs, Tokenizationtags, ProcessingPost: Classification,BinaryForest,RandomTernaryClassification 60.2% Set 1 21000:tweets Set 2: 19740tweets The proposed paper hasn’t used the results obtained through Ternary Classification 10. RepublicKAIST,Institute,ResearchElectronicsInformationSpeech.withlearningmultitaskconversionEmotionalvoiceusingTextToandDaejeon,ofKorea Tae Ho Kim LeeSooandSejikChoi,ShinkookCho,Sungjae,ParkYoung processing:Pre algorithmdetectionVoiceactivity :ProcessingPostoderAttention,dec male dataset.(mKETTS)speechTextEmotionalKoreanto In future need to focus on AimprovementofTTSbyVCspronunciationdiffersalso focus on explicit loss to keep down the differencebetweenTTSandVC. 11. DatasetAnalysisSentimentTweetSent:CubeMahaAMarathibased RavirajandKshirsagar,GayatriLikhitkarManaliMandhaneMeetKulkarni,Atharva,,Joshi BERTULMFiTMaxPoolalBiLSTM+GlobCNN,Used:Algorithms,, accuracy2accuracy3class84.13,class92.93 haSentL3CubeMaDatasetAnalysisSentimentMarathi The dataset does not retain any region,thecontextofthetweetssuchastweetingprofile,timeofposting,etc.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page880 12. RecognitionEmotionSpontaneous for Processing,andCommunicationConferenceInternationalWordsMarathiSpokenonSignal M.RanaDeepakP.GaikwadBharatratnaKambleVaibhavV.,, extractionsfeatureDynamiccepstrumscalefreAnalysis,DiscriminateLinearanalysis,ComponentPrincipalMelquencyanalysis,, 83.33% databaseEmotionBerlin 1.The feature vectors sample speech signal are never exactly the same as those provided during the enrollment or training phase for the database2.Ifsameuser.theinferiorisused as an input to the system then incorrect conclusion maybedrawn 13. Lokbhasha:Vachantar A Speech to Text Conversion for Marathi ChechareArchana.V. (ANN)NetworksNeuralArtificial(MFCC),CoefficientCepstralFrequencyMel 87.76% recognitionAsinthispaperisin Marathi it has been performed in Offline and Runtime modes.Theaccuracyofthesystemin runtimeofflinemodeismoreascomparedtomode. 14. Multi AnalysisSentimentClass in Twitter: What if Classification is nottheAnswer OhtsukTomoakiBouazizi,Mondher processing:Pre ExtractionFeatures processing:Post Naïve Bayes 3DichotomiserIterativeclassifierforestRandomClassifier,,classifier 77.4% Dataset is made of TwitterusingcollectedtweetsAPI Need to address the case of tweets with sentiments belonging to different polarities and try to find possible ways to identify these sentiments. 15. MarathiAnalysisDiscourseRuleProcessingNaturalLanguagebasedBasedofText MahendarCKhandale,KalpanaNamrata processing:Pre Analysis,Discourse n:Segmentatio POStagger processing:PostResolutionAnaphora 85.43% tostandardbookBalbharatiMarathifrom1st4thclass This paper highlighted to the issues and rule based structure to resolve GrammeranaphorathediscoursebasedonMarathi

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page881 16. Novel Technique for EvaluationPerformanceNLP:TranslationScriptusing ShindeSharmilaChaudhari,Patil,DarshanaS.B. processing:Pre E approachtranslationqueryissystem,whichconversionMCrossbasedon processing:Post translationHybrid 91% WordNetMarathi Can implement best hybrid system for multi language translation using NLPinfuture 17. ASpell ConversionforBasedMachineIntegratedcheckerLearningSolutionSpeechtoText HasanMd.ArafIslam,Md.AdnaulHasan,Md.ToufiqeHasan,MahmudulH.M processing:Pre TensorFlow n:Segmentatio DeepSpeech processing:Post Peter Nerving’s algorithmsuggestionspellingcorrect r:correctospellapplyingAfterr:correctospellapplyingBefore3054%4567% Dataset:&"Bengali.ai"Dataset:BengaliEnglish"MozillaCommonvoice" Can be focused on developing a fastermachinethelargerandmoreeffectivedatasetforBengalilanguagesothatthecanprocessdatabetterand 18. Kannada Speech to Text CMUConversionUsingSphinx GuptaDeepaAnoopK.G,AravindK.M,ShivakumarT.V, processing:Pre CMUSphinx Extraction:Feature 52),(withparameterNfiltvalue processing:Post UserModel(SingleDependentContextSpeech) Model:dentIndepenContextModel:ntDependeContext95%80% corporaSpeech for createdKannada by HyderabadIIIT These system holds accuracy for Kannada sentences with only four to tenwordlength.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page882 19. SystemSpeech(Marathi)DevanagariCognitiveTextto bhaChandrapraManjareShakil,ShadabShaikhAnil processing:Pre CombinerMapper& for textinput Extraction:Feature OCR)Recognition(CharacterOpticalsystem processing:Post Text Synthesizer(TTS)Speechto creationDatabase can bedoneinMS excel. In future can be focused on the enginethesignalswhichwegethereasoutputcanbethusreverseeredandcanbeuseful as the baseforaTTSengineitself. 20. SystemsStorytellingExpressiveClassificationHindiSentencefor AgrawalSwatiGirdharYukiKirtani,YuktiAgrawal,Shivliand processing:PreRNN LSTM Extraction:Feature Word2Vec processing:Post CNN SVM :accuracyOverall88.1% textsstorieslanguageHindifromlike Birbal”a”“Panchatantr&“Akbar In future can be focused on improving by incorporating intomodesvariousotherdiscourseandtakingtheirsubclassesaccount. 21. LearningLexicononinProductforAnalysisSentimentECommerceReviewsChineseBasedSentimentandDeep SHERRATTR.SIMONJINYINGLIYANG,LI,WANG, processing:PrelexiconSentiment Extraction:Feature CNN model & GRUmodel processing:Post modelSLCABG vector:edtunweighvector:worddweighte93.5%word92.8% fromcollectedbookThedataofreviews “Dangdang” technologycrawlerusingweb SCanbefocusedtostudytheentimentfinenessclassification of text.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page883 22. MethodsLearningUsingAnalysisSentimentUTSA:UrduTextDeep ABBASSYEDMAJID,ABDULNAQVI,UZMAANDALI processing:Pre Data Cleaning Tokenization& Extraction:Feature ModelsEmbeddingWord processing:Post LSTMCNN,BiLSTMModels(LSTDL,ATT,andC) 77.9% Urdu blogs and humsubDunyaExpress,aswebsitesnewssuchBBC,DW,and beInfuturecanfocusedon DL models with more embeddings that provide better beincreasedcontextualinformationalongwithandbalanceddatasetwillexplored 23. SystemTranslationHybridMarathiEnglishWorkAResearchonto PatilDr.SuhasBewoor,MrunalSalunkhe,Pramod ng:Pre-processi TranslationStatistical Processing:Post TranslationHybrid :nPrecisioAverage87.5% BhattacharyaPushpakproducedfromdownloadedonsystemstatisticalbuiltcorpusIITBby Further research can be made in ofofbettertranslationwithapplicationMarathiwordnetordevelopmenttranslingualwordnetforall 24. LANGUAGEOFANALYSISSENTIMENTMARATHI NunesJasonRotiwar,SurabhiPatil,NileemaDeshmukh,Sujata ng:Pre-processi nlemmatizatioandSegregation Processing:Post PolarityAttribute 60r:translatoYandexWith70% SentiwordNetEnglish Further real time update of analysisdirectiondictionarycanbefutureresearchinthefieldofsentimentofMarathilanguage.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page884 3. Algorithmic Survey Table 2: AlgorithmicSurveyofResearchStudies NoSR. PublicationDetails Authors TechUsed Accuracy Dataset ResearchGap Advantages 1. Applying natural satisfactionanalyseprocessinglanguagetocustomer Popovic,andAlibasicArminTomo Senior Member BeautifulScraping, Soap library Pre processing: flyingrootwordscuttingStemmingWords,RemovingTokenization,Stopthetotheofthatwordfly,seatsseat,etc. Postprocessing: Python NLTK library 95% 50,000 TripAdvisorreviewsairlinefrom sentiment score is calculated using the appearingfrequencyofpositive and negative words. Dataset contained reviews by NativeEnglishSpeakers. TrigramsBigramsusingSentimentsAnalyzedand 2. A Hybrid Machine ConferenceIEEESpeechRecognitionEmotionLearningModelforFromSignalsInternational Author)KukanaPoonam(Author)BhartiDeepak(Co Feature Extraction: (Ant O,Optimization)ALLionGFCC Post processing: (MSVM)Classification 97% RAVDESS data 7356containingSpeech(EmotionalsetFiles)tthe Dataset used contains Male: Female Voice Ratioas4:1 samples.thenoisereducesignalspeechqualityincreasedSystemProposedtheoftheandthelevelinuploaded 3. Deep Learning Based SpeechSystemIdentificationLanguageFrom Athira N P PoornaSS confusionF1precision,PostSpectrogramsemphasis,Removal,DCPreprocessing:andNoisePreprocessing:recall,scoreandmatrix 74% fromlanguagessevendatabaseSpeechofIndianIIITIndic Dataset can be Specifically Designed for Marathi Speech data to achievebetteraccuracy layalam.Kannadaaccuracy99%Kannadatheachievedaccuracy99.5%isforBengaliandforMa

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page885 4. Speech Recognition using HTK Toolkit for Marathi Language eDr.S.M.HandorS.ChavanSupriya Pre ANNRecognition:SpeechPostKitModellingHiddenExtraction:FeatureSegmentation,SpeechCoefficients,LinearEnhancement,Speechprocessing:PredictiveMarkovTool3.4,MFCC,LPCprocessing: 80% 910 speakersfemalemalerecordedsentences,Marathiby3and3 Increased.canSize80%.accuracyasbeaccuracyRecognitionSpeechcanImprovedCurrentisDataset&varietybe TechniquesusingachievedaccuracyHighersimple 5. forSentimentClassClassificationMulticlassandbasedAnalysisHindiLanguage PundlikPurushottamGawade,AkshayPrasadGaikwad,GajananKasbekar,PrachiPundlik,Prof.SumitraDasare, ClassifierMode(LM)Languaget(HSWN)eHindiSentiWordN textconvertedTopicsvariousacrossfromSpeechesleaderstofiles. limitationswithdocumentswithWorksImplemented.ontologySelfLearningnotonlytextWord &LMClassifierofCombinedaccuracyGivesOntologies.newbyCanbeusedcreatingbetterwithuseHSWN 6. HMMRecognitionMarathiIsolatedSpokenWordsusing DeshpandeMangeshandSaiSawant Pre-processing: TestingTrainingandthemodel Post Processing: ModelsHiddenMarkov(HMMs) 86.5% 2,000 speakersbywordsMarathispokentennative speakers.numberincreasedfromobtainedusingincreasedtodatasettrainingsystem,ofindependenceandperformancerecognitiontheToincreasespeakerthisneedsbebydataof known.testspeakerswhenobtainedresultsrecognitionthatisphonemes.numberwithperformedrecognitionMarathiisolatedInthiswork,wordislimitedofItobservedgoodarebothanddataare

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page886 7. ProcessingLanguagebyofTechniquesDeepLearningforPartSpeechTaggingNaturalICMIA KiwelekarArvindand(Deshmukh)DhumalMrs.RushaliDr. Pre processing: Early Dropping used.TechniquesDropandoutare Post processing:Pre ScikitPythonNLTK,learn model.Biforandmodellearningthe85%fordeep97%theLSTM Pos tagged Newspapers.MarathiSpeechPartsusingbeenwordswithsentences1500containingcorpus10115hascreatedUnifiedoffrom Pos Newspapers.MarathifromofUnifiedusingbeenwordswithsentences1500containingcorpustagged10115hascreatedPartsSpeech 8. Urdu Sentiment LearninglysisAnaWithDeepMethods GHSIENNomanAmjad,AmmarLalKhan,Ashraf,TSUNCHANG Pre processing: NormalizationStopWords, Post Processing: LSTMLR,MLP,NB,SVM,RF,AdaBoost,1DCNN,and 82.05% The data of Movie and websites.newsfromUrdureviews.electronictextUrdu classes;negativepositiveincludesthatthislimitationsOneoftheofstudyisitonlyand classifiers.allisperformanceitsthisperformerhighesttheclassifierTheofcombinationLR82.05%scorethePaperachievehighestF1ofusingwithfeatures.SVMissecondfortaskandaveragebetterthanother 9. A Pattern Based AnalysisClassApproachforMultiSentimentinTwitter BOUAZIZI,MONDHERTOMOAKIOHTSUKI Pre processing By removing the TokenizationURLs,tags, ProcessingPost: Classification,BinaryRandomForest,TernaryClassification 60.2% Set 1 21000:tweets Set 2: 19740tweets ClassificationTernarythroughobtainedthehaven'tpaperproposedTheusedresults wasdataanalysissentimentmultiobtainedaccuracypotential:someresultsTheobtainedshowtheforclassinthesetused60.2%. 10. KAIST,ResearchElectronicsInformationSpeech.withmultitaskconversionEmotionalvoiceusinglearningTextToandInstitute,Daejeon, Tae SooSejikShinkookSungjaeHoKim,Cho,Choi,ParkandYoungLee Pre processing: algorithmdetectionVoiceactivity Post Processing : Attention,decoder male dataset.(mKETTS)speechTextEmotionalKoreanto explicitfocusdifferspronunciationasofimprovementtoInfutureneedfocusonTTSbyVCalsoonlossto hasusinglearningmultitaskusingachievedaccuracyhigherInthiswork,isalsothisit

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page887 RepublicofKorea keep down andbetweendifferencetheTTSVC. WER.minimized 11. DatasetAnalysisSentimentTweetSent:CubeMahaAMarathibased RavirajandKshirsagar,GayatriLikhitkarManaliMandhaneKulkarni,AtharvaMeet,,Joshi ULMFiTMaxPoolBiLSTM+GlobalCNN,AlgorithmsUsed:,,BERT accuracy2accuracy3class84.13,class92.93 haSentL3CubeMaDatasetAnalysisSentimentMarathi The dataset retaindoesnot any region,posting,timeprofile,thesuchthecontextoftweetsastweetingofetc. 1. It is the performedTheyAnalysis.SentimentfordatasetavailablepubliclymajorfirstMarathihave 2 andclass3 overfittingreducessignificantlyembeddingspre2.futureforbenchmarkprovidetoclassificationsentimentclassastudiesTheuseoftrained 12. RecognitionEmotionSpontaneous for ProcessingandCommunicationConferenceInternationalWordsMarathiSpokenonSignal M.RanaDeepakP.GaikwadBharatratnaKambleVaibhavV.,, extractionsDynamiccepstrumanalysis,freAnalysis,Discriminateanalysis,ComponentPrincipalLinearMelquencyscale,feature 83.33% databaseEmotionBerlin conclusionincorrectsysteminputuseddatabaseinferior2.Ifsamephasetrainingenrollmentduringprovidedsameexactlyarespeechsamplevectors1.Thefeaturesignalnevertheasthosetheorfortheuser.theisasantothethen situationsthefromisthisthatdatabase2.Theaccuracy.gotextractionfeatureonsystem(ERFMSW)workproposedThisofbasedmoreisusedincollectedreallife

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page888 maybedrawn 13. Lokbhasha:Vachantar A Speech to Text Conversion for Marathi ChechareArchana.V. NetworksArtificial(MFCC),CoefficientCepstralFrequencyMelNeural(ANN) 87.76% As in this recognitionpaper is in Marathi it has mode.runtimecomparedasmodeinofThemodes.RuntimeinperformedbeenOfflineandaccuracythesystemofflineismoreto 1.In this the text will is aboutinformationtheandextractusedrecognitionspeaker2.Automaticused.canfilefilesavedtoaandthatbefurtheristorecognize speaker’s speech 14. Multi theClassificationinSentimentClassAnalysisTwitter:WhatifisnotAnswer OhtsukTomoakiBouazizi,Mondher Pre processing: ExtractionFeatures Post processing: Naïve Bayes Classifier , Random forest classifier , DichotomiserIterative 3 classifier 77.4% Dataset is made of APIusingcollectedtweetsTwitter Need to address the case of tweets belongingsentimentswith to polaritiesdifferent and try to find possible ways to identify sentiments.these Forest).lyRandomanalysis(sentimentofthebesthaveofversioncurrentUsingtheSENTAwefoundfittingincontextmulticlassmain 15. ofDiscourseRuleProcessingNaturalLanguagebasedBasedAnalysisMarathiText MahendarCKhandale,KalpanaNamrata Pre processing: Analysis,Discourse Segmentation: POStagger Post processing: ResolutionAnaphora 85.43% tostandardbookBalbharatiMarathifrom1st4thclass This GrammerMarathibasedanaphoradiscourseresolvestructurerulethehighlightedpapertoissuesandbasedtotheon properly.resolvedsentences440from515Totalsentenceswhichdiscourseare

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page889 16. Novel Technique for Script Translation using NLP: EvaluationPerformance ShindeSharmilaChaudhari,Patil,DarshanaS.B. Pre processing: E system,whichconversionMCross is based on query approachtranslation Post processing: translationHybrid 91% WordNetMarathi futureusingtranslationlanguagemultisystembestimplementCanhybridforNLPin personsseveralchallengesolvecollectionncrossassistsystemtranslationasedtechnologyautomatedusesupportsItfullytheofanbtoindomaidatatotheof 17. ASpell ConversiontoSolutionLearningIntegratedcheckerMachineBasedforSpeechText Md.ArafIslam,Md.AdnaulHasan,Md.ToufiqeHasan,MahmudulH.MHasan Pre-processing: TensorFlow Segmentation: DeepSpeech Post processing: PeterNerving’s algorithmsuggestioncorrectspelling corrector:spellapplyingAftercorrector:spellapplyingBefore3054%4567% Dataset:&"Bengali.ai"Dataset:BengaliEnglish"MozillaCommonvoice" Can be developingfocusedon a larger and fasterbetterprocessmachinethatlanguagethedatasetmoreeffectiveforBengalisothecandataand 18. Kannada Speech to Text Conversion CMUUsingSphinx DeepaAnoopK.G,K.M,ShivakumarAravindT.V,Gupta Pre-processing: CMUSphinx Extraction:Feature Nfilt parameter (withvalue52), Post processing: Speech)Model(SingleDependentContextUser ntIndependeContextModel:DependentContext95%Model:80% corporaSpeech for createdKannada by HyderabadIIIT wordfourwithsentencesKannadaaccuracyholdsThesesystemforonlytotenlength. words.KannadaspokenalmorphologicandallrecognizingofextensibilityinvestigatesThissystemlettersvariantsof

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page890 19. SystemSpeech(Marathi)DevanagariCognitiveTextto aChandraprabhManjareShakil,ShaikhShadabAnil Pre processing: CombinerMapper& for textinput Extraction:Feature systemRecognition(OCR)OpticalCharacter Post processing: Text to Speech (TTS)Synthesizer creationDatabase can bedoneinMS excel. itself.TTSbaseusefulandengineeredreversecantheherewhichthebeInfuturecanfocusedonsignalswegetasoutputbethuscanbeastheforaengine phrases.sentencesoutputsshowderiveWecanorgraphforor 20. SystemsStorytellingExpressiveClassificationHindiSentencefor SwatiandYukiYuktiAgrawal,ShivliKirtani,GirdharAgrawal Pre processing: RNN LSTM Extraction:Feature Word2Vec Post processing: CNN SVM accuracy:Overall88.1% textsstorieslanguageHindifromlike Birbal””“Panchatantra&“Akbar intosubclassestakingmodesdiscoursevariousincorporatingimprovingbeInfuturecanfocusedonbyotherandtheiraccount. systeCNNexistingthanexperiencebetterprovidesCNNusingClassificationHinditoComparingpresentStoryRNNusertheSVMm 21. andSentimentChineseProductforSentimentAnalysisECommerceReviewsinBasedonLexiconDeepLearning SHERRATTR.SIMONJINYINGLIYANG,LI,WANG, Pre processing: Sentimentlexicon Extraction:Feature modelCNNmodel&GRU Post processing: SLCABGmodel vector:edunweightvector:wordweighted93.5%word92.8% collectedbookThedataofreviewsfrom “Dangdang” technologycrawlerusingweb ofclassificationfinenesssentimentstudyfocusedCanbetothetext. models.analysissentimentthanperformanceclassificationbettermodelSLCABGfoundberesults,experimentaltheByanalyzingitcanthatthehasother

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page891 22. MethodsLearningUsingSentimentUTSA:UrduTextAnalysisDeep ABBASSYEDMAJID,ABDULNAQVI,UZMAANDALI Pre processing: Data Cleaning & Tokenization Extraction:Feature Word ModelsEmbedding Post processing: DL LSTM)CNN,BiLSTMModels(LSTM,ATT,andC 77.9% Urdu blogs and humsubDunyaExpress,aswebsitesnewssuchBBC,DW,and bedatasetandincreasedalonginformationcontextualbetterthatembeddingswithDLbeInfuturecanfocusedonmodelsmoreprovidewithbalancedwillexplored taskclassificationonembeddingofimportancethehighlightsalsoimproved.performancemodelsmodel,embeddingthethatobservedIthasbeenbyusingSamarDLIt 23. SystemTranslationHybridMarathiEnglishWorkAResearchonto PatilDr.SuhasBewoor,MrunalSalunkhe,Pramod ng:Pre-processi TranslationStatistical Processing:Post TranslationHybrid :PrecisionAverage87.5% BhattacharyaPushpakproducedfromdownloadedonsystemstatisticalbuiltcorpusIITBby researchFurther can be made in netlingualofdevelopmentwordnetofapplicationwithtranslationbetterMarathiortranswordforall system.Ruletocomparisoncorrectedsystemstatisticaloutputbettercanapproach,weInHybridachieveresultofisinbased 24. LANGUAGEOFANALYSISSENTIMENTMARATHI NunesJasonRotiwar,SurabhieemaDeshmukh,NilSujataPatil, Pre processi ng: lemmatizationandSegregation Post Processing: PolarityAttribute 60translator:YandexWith70% SentiwordNetEnglish Further real time update language.Marathianalysissentimentthedirectionresearchcanofdictionarybefutureinfieldofof standard.setneutralnegativeasthetextpolaritycumulativethetoCanbeusedcalculateoftheandranksentencepositive,oronascale

[3] A. N..P and P. S.S., "Deep Learning Based Language Identification System From Speech," 2019 International Conference on Intelligent Computing and Control Systems (ICCS), 2019, pp. 1094 1097, doi: 10.1109/ICCS45141.2019.9065370.

[10] T. H.Kim,S.Cho,S.Choi,S.ParkandS. Y.Lee,"EmotionalVoiceConversionUsingMultitaskLearningwithText To Speech," ICASSP 2020 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2020,pp.7774 7778,doi:10.1109/ICASSP40776.2020.9053255.

[14] M.BouaziziandT.Ohtsuki,"Multi ClassSentimentAnalysisinTwitter:WhatifClassificationisNottheAnswer,"in IEEEAccess,vol.6,pp.64486 64502,2018,doi:10.1109/ACCESS.2018.2876674.

REFERENCES

[2] D.BhartiandP.Kukana,"AHybridMachineLearningModelforEmotionRecognitionFromSpeechSignals,"2020 International Conference on Smart Electronics and Communication (ICOSEC), 2020, pp. 491 496, doi: 10.1109/ICOSEC49089.2020.9215376.

[5] S.Pundlik,P.Dasare,P.Kasbekar,A.Gawade,G.GaikwadandP.Pundlik,"Multiclassclassificationandclassbased sentiment analysis for Hindi language," 2016 International Conference on Advances in Computing, CommunicationsandInformatics(ICACCI),2016,pp.512 518,doi:10.1109/ICACCI.2016.7732097.

[11] Atharva Kulkarni,Meet Mandhane,Manali Likhitkar,Gayatri Kshirsagar,Raviraj Joshi, “L3CubeMahaSent: A MarathiTweet basedSentimentAnalysisDataset,”arXiv:2103.11408v2[cs.CL]22Apr2021

[15] K.B.KhandaleandC.N.Mahender,"NaturalLanguageProcessingbasedRuleBasedDiscourseAnalysisofMarathi Text," 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 356 362,doi:10.1109/ICESC48915.2020.9155653.

[1] A. Alibasic and T. Popovic, "Applying natural language processing to analyze customer satisfaction," 2021 25th InternationalConferenceonInformationTechnology(IT),2021,pp.1 4,doi:10.1109/IT51528.2021.9390111.

[7] R. Dhumal Deshmukh and A. Kiwelekar, "Deep Learning Techniques for Part of Speech Tagging by Natural Language Processing," 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA),2020,pp.76 81,doi:10.1109/ICIMIA48430.2020.9074941.

[8] L.Khan,A.Amjad,N.Ashraf,H. T.ChangandA.Gelbukh,"UrduSentimentAnalysisWithDeepLearningMethods," inIEEEAccess,vol.9,pp.97803 97812,2021,doi:10.1109/ACCESS.2021.3093078.

Sentiment Analysis has been popular and has led to building of better products & services, understanding user’s Sentimentandopinion,andfortakingdatadrivenbusinessdecisions.Withrapidlyincreasingtechnology,peopletend to rely more on reviews, and opinion of other people towards certain product or service The rise in user generated data for Marathi language across various domains like news, culture, arts, sports etc has opened the data to be explored and mined effectively.Thelack ofdatasetsand propermodel isoneof the biggestchallenges whiledealing with sentiment analysis for Marathi language. In this paper we surveyed different techniques used for analysing speechandtextdataofvariouslanguageswhilecomparingitwithMarathilanguage.Wealsostudied itsusefulnessin analysingalargenumberofhumangeneratedtexttodrivevaluableinsights.

[16] D. Patil, S. B. Chaudhari and S. Shinde, "Novel Technique for Script Translation using NLP: Performance Evaluation,"2021InternationalConferenceonEmergingSmartComputingandInformatics(ESCI),2021,pp.728 732,doi:10.1109/ESCI50559.2021.9396969.

[4] S. Supriya and S. M. Handore, "Speech recognition using HTK toolkit for Marathi language," 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), 2017, pp. 1591 1597,doi:10.1109/ICPCSI.2017.8391979.

[17] H.M.M. Hasan,M.A.Islam,M.T.Hasan,M.A.Hasan,S.I.RummanandM.N.Shakib,"ASpell checkerIntegrated

[12] V.V.Kamble,B.P.GaikwadandD.M.Rana,"SpontaneousemotionrecognitionforMarathiSpokenWords,"2014 International Conference on Communication and Signal Processing, 2014, pp. 1984 1990, doi: 10.1109/ICCSP.2014.6950191.

[6] S. Sawant and M. Deshpande, "Isolated Spoken Marathi Words Recognition Using HMM," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018, pp. 1 4, doi: 10.1109/ICCUBEA.2018.8697457.

[13] Archana.V.Chechare,“Vachantar Lokbhasha:ASpeechtoTextConversionforMarathi,”inInternationalJournal ofAdvancedResearchinComputerandCommunicationEngineering,Vol.5,Issue7,July2016

[9] M. Bouazizi and T. Ohtsuki, "A Pattern Based Approach for Multi Class Sentiment Analysis in Twitter," in IEEE Access,vol.5,pp.20617 20639,2017,doi:10.1109/ACCESS.2017.2740982.

BIOGRAPHIES

[20] S. Agrawal, Y. Kirtani, Y. Girdhar and S. Aggarwal, "Hindi Sentence Classification for Expressive Storytelling Systems," 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 2019 2025, doi: 10.1109/SSCI.2018.8628858.

[22] U. Naqvi, A. Majid and S. A. Abbas, "UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods," in IEEE Access,vol.9,pp.114085 114094,2021,doi:10.1109/ACCESS.2021.3104308.

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 08 Issue: 12 | Dec 2021 www.irjet.net p ISSN: 2395 0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page893 MachineLearningBasedSolutionforSpeechtoTextConversion,"2020ThirdInternational ConferenceonSmart SystemsandInventiveTechnology(ICSSIT),2020,pp.1124 1130,doi:10.1109/ICSSIT48917.2020.9214205. [18] 10.1109/INVENTIVE.2016.7830119.2016K.M.Shivakumar,K.G.Aravind,T.V.AnoopandD.Gupta,"KannadaspeechtotextconversionusingCMUSphinx,"InternationalConferenceonInventiveComputationTechnologies(ICICT),2016,pp.16,doi: [19] S.S.ShakilandM.C.Anil,"CognitiveDevanagari(Marathi)Text to SpeechSystem,"2015InternationalConference onComputingCommunicationControlandAutomation,2015,pp.758 762,doi:10.1109/ICCUBEA.2015.151.

[21] L.Yang, Y.Li, J. WangandR.S.Sherratt,"SentimentAnalysisforE CommerceProductReviewsinChinese Based on Sentiment Lexicon and Deep Learning," in IEEE Access, vol. 8, pp. 23522 23530, 2020, doi: 10.1109/ACCESS.2020.2969854.

[24] Sujata Deshmukh,Nileema Patil, Surabhi Rotiwar, Jason Nunes, "SENTIMENT ANALYSIS OF MARATHI LANGUAGE,"inInternational Journal ofResearch PublicationsinEngineeringandTechnology,VOLUME3,ISSUE 6,Jun. 2017

[23] Pramod Salunkhe, Mrunal Bewoor, Dr.Suhas Patil. "A Research Work on English to Marathi Hybrid Translation System,"inInternationalJournalofComputerScienceandInformationTechnologies,Vol.6(3),2015,2557 2560

Prof. Vina M. Lomte Project Guide and Head of Computer Engineering Department at RMD Sinhgad School of Engineering, SPPU, Pune. Mr. Pratik Jadhav Project Team Lead and B.E. Student at Department of Computer Engineering, RMD Sinhgad School of Engineering, SPPU,Pune. Mr. Onkar Kalshetti Project Research Fellow and B.E. Student at Department of Computer Engineering, RMD Sinhgad School of Engineering, SPPU,Pune. Ms. Sonali Deshmukh Project Research Fellow and B.E. Student at Department of Computer Engineering, RMD Sinhgad School of Engineering, SPPU,Pune. Mr. Arjun Jadhav Project Research Fellow and B.E. Student at Department of Computer Engineering, RMD SPPU,SinhgadSchoolofEngineering,Pune.