
International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN:2395-0072
![]()

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN:2395-0072
Prakhar Singhal1 , Pawan Kumar2 , Ms. Aarushi Thusu3
1Noida Institute of Engineering and Technology, Greater Noida, UP, India
2Noida Institute of Engineering and Technology, Greater Noida, UP, India
3 Assistant Professor, Department of computer science and engineering,Noida Institute of Engineering and Technology, Greater Noida, UP, India
Abstract -
With the exponential rise in social media engagement, understanding public sentiment from online content has become critical for industries, governments, and researchers. This study explores the application of artificial intelligence techniques for sentiment analysis of Twitter data. Using the Sentiment140 dataset, we compare traditional machine learning models such as Logistic Regression and Support Vector Machines with deep learning approaches like LSTM. Additionally, we discuss the integration potential of transformer-based models like RoBERTa. The paper outlines preprocessing techniques, feature extraction strategies (TF-IDF and GloVe), and evaluates models based on accuracy, precision, recall, and F1-score. Our results show that deep learning models outperform classical methods, with significant improvements noted in LSTM's handling of sequence data.
Key Words: Sentiment Analysis, NLP, Twitter, LSTM, Machine Learning, Deep Learning, GloVe, TF-IDF, RoBERTa.
Socialmediaplatforms,particularlyTwitter,havebecomerichsourcesofopinionateddata.Extractinginsightsfromsuchdata allows stakeholders to make informed decisions in marketing, politics, crisis management, and more. Sentiment analysis, a subfieldofNaturalLanguageProcessing(NLP),focusesonclassifyingtextaspositive,negative,orneutral.ThisstudyappliesAI andMLmodelstoevaluatesentimentfromalarge-scaletweetdataset.
Twitter's brevity and real-time nature make it both a challenging and an attractive medium for sentiment analysis. Unlike longer texts, tweets often contain abbreviations, slang, emoticons, and hashtags, making them complex for traditional rulebased systems. With the development of AI and neural language models, it has become feasible to understand and classify theseinformaltextsaccurately.
This paper aims to design and evaluate several models ranging from classical machine learning to deep learning on the Sentiment140dataset.Wehighlighttheircomparativestrengths,proposepotentialimprovements,anddiscussthesocialand ethicalimplicationsofautomatedsentimentdetection.
Theevolutionofsentimentanalysistechniquesreflects aprogressionfromtraditionalmachinelearningtodeep learningand transformer-based models. Pang and Lee (2008) used Naive Bayes and SVM for movie review sentiment classification, highlightingtheimportanceoffeatureengineering.Goetal.(2009)proposeddistantsupervisionusingemoticonsforlabeling tweets.PakandParoubek(2010)focusedonatwo-stageapproachtoclassifysubjective/objectivetweets.
Kouloumpisetal.(2011)emphasizedtheroleofinformalelementslikehashtagsandemoticonsinimprovingaccuracy. Deep learningmodelslikeCNNsbySeverynandMoschitti(2015)andtransformerslikeBERT(Devlinetal.,2019),RoBERTa(Liuet al.,2019),andXLNet(Yangetal.,2019)markedaleapincontextualunderstandingandclassificationperformance.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN:2395-0072
Dataset:TheSentiment140datasetcontains1.6millionlabeledtweetswithpolarityvalues(0fornegativeand4forpositive).It istreatedasabinaryclassificationproblem.
Preprocessing: Involves removing usernames, URLs, converting to lowercase, expanding contractions, removing special characters,tokenization,lemmatization,andstopwordremoval.ToolslikeNLTKandspaCywereused.
Feature Engineering: TF-IDF and GloVe embeddings (100-dimensional vectors) were used to convert text to numeric representations.
ModelArchitectures:
-Classical:LogisticRegression,NaiveBayes,SVM,RandomForest
-DeepLearning:LSTM(capturessequenceandcontext)
-Transformer(conceptual):RoBERTa(notimplementedduetohardwarelimits)
Evaluation: Used 80:20 train-test split and measured performance using accuracy, precision, recall, F1-score, and confusion matrix.
Threemodels LogisticRegression,NaiveBayes,andSVM wereevaluatedon320,000tweets.LogisticRegressionandSVM hadthehighestaccuracy(~77.2%).NaiveBayesshowedslightlylowerperformance.
LSTM outperformed classical models due to its capability to understand sequences and context. The classification metrics reflectedconsistentperformancewithlogisticregressionandSVMbutbettergeneralizationwithLSTMwhenapplied.
5. DISCUSSION
LSTMeffectivelyhandlesnoisy,informal tweets duetoitsabilitytomodelsequential data.Classical models,though fast, lack contextualunderstanding.GloVeembeddingshelpedLSTMcapturewordrelationships,improvingaccuracy.
Although RoBERTa wasn't implemented, itscontextual embeddings anddeeptransformerlayersareexpectedto outperform LSTMifresourcespermit,especiallyforsarcasmorambiguityintweets.
6.
Sentimentmodelscan perpetuate biasesintrainingdata,potentiallyreinforcingstereotypesordiscrimination. Ethical AIuse demands transparency, fairness, and privacy. When analyzing social media, it's critical to anonymize user data and ensure responsiblemodeldeployment.
-BusinessIntelligence:Monitoringbrandreputation
-Politics:Analyzingvotersentiment
-PublicSafety:Identifyingearlycrisissignals
-Healthcare:Detectingmentalhealthissuesviaposts
Sentimentmodelscansupportreal-timedashboardsanddecision-makinginthesesectors.
8.
ThisstudyexploredAI-drivensentimentanalysisusing classical anddeep learning models.Preprocessing,vectorization, and modelselectionwerecriticalforperformance.LSTMachievedsuperioraccuracyduetoitssequence-handlingcapabilities.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
Volume: 12 Issue: 06 | Jun 2025 www.irjet.net p-ISSN:2395-0072
Future work will implement transformer models like RoBERTa forenhancedperformanceandinvestigate explainability and fairnessinsentimentclassification.
[1]Pang,B.,&Lee,L.(2008).Opinionminingandsentimentanalysis.FoundationsandTrendsinInformationRetrieval.
[2]Go,A.,Bhayani,R.,&Huang,L.(2009).Twittersentimentclassificationusingdistantsupervision.
[3]Pak,A.,&Paroubek,P.(2010).Twitterasacorpusforsentimentanalysisandopinionmining.
[4]Kouloumpis,E.,Wilson,T.,&Moore,J.(2011).Twittersentimentanalysis.
[5]Severyn,A.,&Moschitti,A.(2015).Twittersentimentanalysiswithdeepconvolutionalneuralnetworks.
[6]Devlin,J.,etal.(2019).BERT:Pre-trainingofdeepbidirectionaltransformersforlanguageunderstanding.
[7]Liu,Y.,etal.(2019).RoBERTa:ArobustlyoptimizedBERTpretrainingapproach.
[8]Yang,Z.,etal.(2019).XLNet:Generalizedautoregressivepretrainingforlanguageunderstanding.