
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
Daulat Tikhole1, Hittanshi Tikle2, Mehek Uikey3 , Prof. M. A. Chimanna4
1-3 UG Students, Department of Electronics & Computer Engineering, Pune Institute Of Computer Technology, SPPU, Pune, Maharashtra, India.
4 Professor at Pune Institute Of Computer Technology, SPPU, Pune, Maharashtra, India. ***
Abstract –
The exponential growth of digital content, analyzing sentiments expressed in text has become essential for businesses and researchers. Sentiment analysis, a subfield of Natural Language Processing (NLP), is widely used to determine whetherapieceoftextconveysapositive,negative, or neutral sentiment. This research focuses on sentiment classification of IMDB movie reviews using machine learning. A structured pipeline is developed, covering data preprocessing,exploratory dataanalysis,featureengineering, modelselection,evaluation,anddeployment.Variousmachine learning models, including logistic regression, decision trees, randomforests,andAdaBoost,areanalyzedforperformance. Logistic regression with TF-IDF vectorization is identified as the most efficient and interpretable model. To enhance model transparency, explainability techniques such as SHAP and LIME are used to interpret feature importance. Finally, the trained model is deployedasa RESTAPI usingFlask,ensuring practical usability. This study provides an effective sentiment classification approach that can be applied to customer feedback analysis, social media monitoring, and automated opinion mining systems.
Keywords: NLP,SHap,Lime, DataPreprocessing,Feature Engineering, Vectorization, TFIDF, Hyperparameter Tunning
1.INTRODUCTION
The digital era has transformed the way opinions and reviews are shared, leading to an explosion of usergenerated textual data. Online platforms, including movie reviewsites,e-commercestores,andsocialmedianetworks, serve as hubs for users to express their sentiments. Extractingvaluableinsightsfromthesetextualdatasources is crucial for businesses to understand public perception, monitor brand reputation, and improve customer experiences.
Sentiment analysis, often referred to as opinion mining, enables automated identification of emotions within text. Traditional sentiment classification methods relied on manually created dictionaries of positive and negative words, but such approaches struggled with linguistic nuances,context,andnegations.Machinelearning models
have significantly improved sentiment prediction by leveragingstatisticalpatternsintextualdata.However,some models,particularlydeeplearning-basedapproaches,suffer from interpretability challenges, making them difficult to deployinreal-worlddecision-makingsystems.
In this study, sentiment analysis is performed on IMDB movie reviews using machine learning. A structured methodology is adopted, involving data cleaning, feature engineering,modeltraining,evaluation,andexplainability techniques.LogisticregressionwithTF-IDFvectorizationis selected as the primary model due to its balance between accuracy, interpretability, and computational efficiency. ExplainabilitymethodssuchasSHAPandLIMEareusedto uncovertheinfluenceofindividualfeatures,ensuringmodel transparency.Finally,thetrainedmodelisdeployedusing Flask as a REST API, demonstrating its potential for realworldapplications.
Sentimentanalysishasevolvedsignificantlyovertheyears. Early approaches were based on lexicon-based methods, which relied on predefined word lists where words were assignedsentimentscores.Whilethesemethodsweresimple and interpretable, they struggled with handling negations, sarcasm,andcontext-dependentmeanings.
Withtheriseofmachinelearning,modelssuchasNaïve Bayes,DecisionTrees,andSupportVectorMachinesprovided significantimprovementsinsentimentclassification
S.S.Aasiya,S.S.Ahmed,S.S.Akram,S.S.Aslam,andS.S. Amjadetal.,intheirstudy "Sentiment Analysis using Logistic RegressionApproachonE-commerce," analyzehowsentiment analysisenhancesuserexperienceone-commerceplatforms likeAmazon.Theyemploylogisticregressioncombinedwith Python’s NLTK module to classify Amazon reviews into positiveandnegativesentiments.Theirworkhighlightsthe importance of preprocessing and feature selection in improving sentiment classification accuracy, proving the effectivenessofNLPtechniquesincommercialapplications.
X.W.QingWuandW.YunXuetal.,intheirresearchpaper "Sentiment Analysis of Yelp’s Ratings Based on Text Reviews," investigate Yelp customer reviews to predict sentiment
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
ratings. The authors compare different machine learning models, including Naïve Bayes, Perceptron, and Multiclass SVM,evaluatingtheireffectivenessbasedonprecisionand recall metrics. Their findings highlight the significance of feature selection techniques and expose the limitations of traditionalsentimentclassificationapproaches.
J. S. Shivaprasad T K et al., in "Sentiment Analysis of ProductReviews:AReview," exploresentimentanalysisusing naturallanguageprocessing(NLP)toextractopinionsfrom online product reviews. Their study compares various machine learning techniques, demonstrating that Support Vector Machines (SVM) outperform Naïve Bayes and MaximumEntropyintermsofaccuracy.Theyalsohighlight theroleoffeatureextractionanddatasetqualityinimproving classificationperformance.
A. Indriana Hidayah et al., in "Sentiment Analysis on Product Review using Support Vector Machine (SVM)," focus onclassifyingWindowsPhoneuserreviewsusingSVM.Their research investigates the impact of different tokenization methods, such as unigram, bigram, trigram, and n-grams, alongwithstemmingtechniques.Theirstudyrevealsthatngrammodeling,combinedwiththeIterated-LovinStemmer and an optimal C value of 1.0, enhances classification performance,reinforcingtherobustnessofSVMinsentiment analysis.
T. F.S.N. J.C.A. O. D. J. P.R.George B.Aliman etal., in "SentimentAnalysisusingLogisticRegression," emphasizethe role of sentiment analysis in business intelligence. They demonstratehowlogisticregressioncanaccuratelyclassify customerreviews,providingvaluableinsightsforcompanies seeking to enhance user experience and product quality. Theirstudyconfirmslogisticregressionasastrongbaseline model for sentiment classification, particularly when combinedwitheffectivetextpreprocessingtechniques.
S.S.R.Cheerekaetal.,in "SentimentAnalysisusingLogistic Regression," conductacomparativestudyofmachinelearning models for classifying tweets. Their research finds that logisticregressionachievesthehighestaccuracyof81%in detectingmentalhealthcrisistweets.Thestudyunderscores the importance of sentiment analysis in social media monitoring,particularlyforidentifyingcriticalmentalhealth concernsinreal-time.
M.F.A.H.A.M.N.A.HassanRazaetal.,in "Scientific Text SentimentAnalysisusingMachineLearning," introduceanovel approach to analyzing sentiment in scientific citation sentences. Their system, developed using six machine learning algorithms and trained on 8,736 annotated sentences, significantly improves sentiment classification accuracy. They highlight the role of feature engineering techniques such as lemmatization and n-grams, which contributetoanaccuracyimprovementof9%.
M.T.MdT.H.KhanTusaretal.,in "AComparativeStudyof SentimentAnalysisUsingNLPandDifferentMLTechniqueson USAirlineTwitterData," explorehowNLPandMLtechniques canhelpbusinessesunderstandcustomersentiment.Their studycomparesBag-of-WordsandTF-IDFvectorizationwith variousmachinelearningclassifiers.Theirresultsshowthat SVM and logistic regression, when paired with the Bag-ofWords approach, achieve the highest accuracy of 77%, reinforcingtheeffectivenessofthesemodelsforlarge-scale sentimentclassification.
A.Y.Chandraetal.,in "Sentiment Analysis using ML and Deep Learning," investigatetheuseofmachinelearningand deep learning in sentiment classification. Their research evaluates the performance of different models, demonstratingthatcombiningBag-of-WordsandTF-IDFwith SVM and logistic regression yields the best results. Their findingscontributetotheongoingadvancementofsentiment analysis methodologies, particularly in customer feedback analysis.
Datapreprocessingplaysacrucialroleinsentimentanalysis, ensuring that textual data is clean and structured for machine learning models. Several key steps were implementedtorefinetheIMDBmoviereviewsdatasetand enhanceclassificationaccuracy.
Initially, stopwords fromtheNLTKcorpuswereremoved, as they contribute little to sentiment determination. Additionally,domain-specificstopwords suchascommon movie-related terms that do not carry sentiment were identifiedandfilteredouttopreventunnecessarynoisein thedata.
Next, special characters, punctuation, and URLs were eliminated using regular expressions (regex). These elementsdonotcontributetosentimentandcanintroduce inconsistencies in the text. By removing unnecessary symbols,thedatasetbecomesmoreuniformandsuitablefor furtherprocessing.
To normalize textual data, contractions were expanded, converting shortened forms like "isn't" into "is not" and "didn’t" into "did not." This step helps improve model interpretabilitybypreservingtheoriginalmeaningofwords. Additionally, lemmatization wasappliedusingtheWordNet Lemmatizer, reducing words to their root forms. This ensuresthatwordssuchas"running"and"ran"aretreated as "run," reducing redundancy and enhancing feature consistency.
For class labeling,reviewswerecategorizedbasedontheir ratings. Reviews with ratings greater than or equal to 7 werelabeledas positive (1),whilethoserated 4 or below
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
wereclassifiedas negative (0).Neutralreviews,whichhad ratings of 5 and 6, were removed from the dataset to maintainclearsentimentpolarityandpreventambiguityin classification.
By implementing these preprocessing techniques, the dataset was effectively structured to improve sentiment analysis performance, ensuring accurate classification of moviereviews
Effective representation of textual data is essential for accurate sentiment classification. In this study, Term Frequency-Inverse Document Frequency (TF-IDF) is employedtoconverttextualinputintoanumericalformat. TF-IDF assigns greater importance to words that appear frequently within a single review but are rare across the entire dataset. This method helps highlight meaningful words that influence sentiment, preventing commonly occurring words from dominating the analysis. During experimentation, TF-IDF outperformed Count Vectorization, demonstrating better sensitivity to significantbutinfrequentterms.
To capture deeper contextual relationships, n-gram analysis is applied. This technique extends feature extractionbeyondindividual words(unigrams)toinclude bigrams and trigrams, preserving meaningful word sequences. While unigrams such as "worst," "great," and "boring" indicatesentimentatabasiclevel,bigramanalysis helpsretaincrucialcontextualstructures,suchas "notgood" and "very bad." Trigrams further refine sentiment interpretation by considering phrases like "waste of time" and "must watch movie." By incorporating n-grams, the model effectively understands sentiment-bearing expressions that would otherwise be lost in single-word tokenization.
Feature selection is another critical step in optimizing sentimentclassification.The chi-square test isappliedto identifystatisticallysignificantwordsthatcontributetothe sentimentpolarityofreviews.Thismethodensuresthatonly the most relevant and impactful features are retained for model training, improving computational efficiency and enhancingclassifieraccuracy.Byremovinglessinformative terms,themodelachievesbettergeneralization,leadingto morereliablesentimentpredictions.
By integrating these advanced feature engineering techniques, the sentiment analysis model gains a robust understanding of textual nuances, improving its ability to classifyreviewsaccurately.
To determine the most effective sentiment classification model,multiplemachinelearningalgorithmsareevaluated. The models tested include logistic regression, decision trees, random forests, and AdaBoost, all trained on the preprocessed dataset. Among these, logistic regression emerges as the best-performing model, striking an optimal balance between accuracy, efficiency, and interpretability.
Duringevaluation, logistic regression provestobehighly effective,deliveringstrongclassificationperformancewhile minimizing the risk of overfitting Decision trees, while capable of capturing complex patterns, exhibit significant overfitting,whichismitigatedthroughpruningtechniques. Random forests providebettergeneralizationbutintroduce higher computational costs, making them less suitable for real-timeapplications. AdaBoost,thoughrobust,demands extensive training time due to its iterative boosting mechanism.
To further refine the selected model, hyperparameter tuning is performed, particularly focusing on the regularization parameter in logistic regression. This optimization prevents overfitting and ensures the model generalizes well to unseen data. After extensive testing, logistic regressionwithTF-IDFvectorization isconfirmed
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
asthemosteffectiveapproach,offeringhighaccuracywhile maintainingcomputational efficiency.Giventhetrade-offs betweenperformanceandcomplexity,itisselectedasthe finalmodelfordeployment
Flowchart-3.3.1:Workflow
Exploratory data analysis is conducted to gain deeper insights into the dataset and identify key factors affecting sentimentclassification. Sentiment distribution analysis reveals that after removing neutral reviews, the dataset remains balanced, preventing bias in the model’s predictions.
Word frequency analysis highlightscommonwordsinboth positiveandnegativereviews.Positivesentimentsareoften expressed using words like "amazing," "brilliant," and "excellent," whereas negative reviews frequently contain termssuchas "boring," "awful," and"disappointing." This analysis helps identify distinguishing features that contributetosentimentclassification.
AnothercriticalaspectofEDAis negation handling,which addressesphrasesthatcouldmisleadthemodel.Expressions suchas "not bad" and "wasn't great" changethesentiment inawaythatsimpleword-basedmodelsmightmisinterpret. Properpreprocessingtechniquesareimplementedtoensure thatthemodelaccuratelycapturestheintendedsentimentin suchcases.
Furthermore, bigram and trigram analysis isconductedto understand how word sequences impact sentiment classification. Positive reviews often feature phrases like "highly recommend" and "must-watch movie," while negative sentiments are frequently expressed through phrasessuchas "waste of time" and "not worth it." These insights reinforce the significance of n-gram features in
improving model accuracy, ensuring that contextual meaningispreservedduringclassification.
Fig-4.1:Averagewordlengthinreviews
Fig-4.2:Numberofcharactersinreview
Fig-4.3:Numberofwords
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072 © 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
Fig-4.4:MostCommon5-GramsinPositiveReviews
Fig-4.5:MostCommon5-GramsinNegativeReviews
5. Evaluation on Test and Train dataset for Model Selection
Table-5.1: Logistic Regression
Score(TrainingDataset) 0.973524411448867
Score(TrainingDataset) 0.9180280610800495
Table-5.2: Decision Tree Classifier
Table-5.3: Random Forest Classifier
PrecisionScore(TrainingDataset) 0.9999642857142857 AUCScore(TrainingDataset) 0.99999978884388 F1Score(TrainingDataset) 0.9999642857190992
Score(onTest) 0.8561944444444445 AUCScore(onTest) 0.9293159042813259 F1Score(onTest) 0.8561591421711087
Time
Table-5.4: Ada Boost Classifier
PrecisionScore(TrainingDataset) 0.8501428571428571
AUCScore(TrainingDataset) 0.929964566708043
F1Score(TrainingDataset) 0.850142981287583
PrecisionScore(onTest) 0.837855555555556
AUCScore(onTest) 0.917226358082615
F1Score(onTest) 0.8378571408323738
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
The logistic regression model, trained on the optimized featureset,achieveshighaccuracyandanF1-scoreof0.89. Precision and recall metrics further validate the model’s ability to effectively distinguish between positive and negativereviews.
Hyperparameter tuning plays a critical role in enhancing modelperformance.Regularizationstrengthisfine-tunedto prevent overfitting, and the learning rate is adjusted to ensure stable convergence during training. The model demonstrates robustness across multiple test samples, makingitareliabletoolforsentimentclassification.
Toimprovemodeltransparencyandinterpretability,SHAP (SHapley Additive exPlanations) and LIME (Local InterpretableModel-AgnosticExplanations)areemployed.
SHAPvaluesprovideinsightsintotheinfluenceofindividual wordsonsentimentpredictions.Forexample,wordssuchas "brilliant,""fantastic,"and"masterpiece"havehighpositive SHAPvalues,indicatingtheirstrongcontributiontopositive sentiment classification. Conversely, terms like "disappointing," "dull," and "boring" have negative SHAP values,signifyingtheirassociationwithnegativesentiment.
LIMEoperatesbyperturbinginputdataandobservingthe impact on model predictions. It highlights key words and phrases that drive classification decisions, ensuring that model predictions align with human intuition. By visually presenting explanations, LIME aids in verifying that the modeldoesnotrelyonspuriouscorrelations.
These explainability techniques enhance user trust in the modelbyensuringthatitsdecisionsarebasedonmeaningful linguisticpatternsratherthanarbitrarywordoccurrences.
To enable real-world usability, the trained sentiment analysismodelisdeployedasa REST API using Flask.This deploymentfacilitatesreal-timesentimentclassificationby allowinguserstosubmittextinputsthroughanAPIendpoint andreceiveinstantsentimentpredictions.
Thedeploymentprocessinvolves integrating the trained model, preprocessing functions, and feature extraction pipeline into a structured Flask application. When a user submitsareview,theAPIappliesnecessarypreprocessing, convertsthetextintonumericalfeatures,andfeedsitinto the logistic regression model.Thepredictedsentimentis then returned as an output, which can be seamlessly integrated into various applications, including customer feedback systems, online review platforms, and social media analysis tools.
For ease of use, a simple user interface is designed, allowing users to input reviews and receive immediate sentiment classification results. The model is initially deployed on a local server for usability testing and performance validation. To ensure scalability and accessibility, future enhancements will include cloud deploymentandcontainerizationusing Docker
Furthermore, model retraining mechanisms are implemented to adapt to evolving linguistic trends and maintainclassificationaccuracyovertime.Thisensuresthat the sentiment analysis system remains relevant as new phrases, slang, and expressions emerge in user-generated content
The sentiment classification model, based on logistic regressionwithTF-IDFvectorization,demonstratedstrong performanceinanalyzingIMDBmoviereviews.Themodel achieved an accuracy of 89%, with an F1-score of 0.89, indicating its ability to correctly classify positive and negativesentiments.Performancemetricssuchasprecision and recall confirm that the model effectively captures sentiment-related patterns while maintaining a low misclassificationrate.
Further improvements were achieved through hyperparametertuning,whichoptimizedtheregularization strength,reducingoverfittingandenhancinggeneralization acrossvariousreviews.TheapplicationofSHAPandLIME providedinterpretabilitybyhighlightingthemostinfluential words. Terms like “outstanding,” “brilliant,” and “mustwatch” contributed positively, whereas words such as “disappointing,”“waste,”and“boring”hadastrongnegative impactonpredictions.
Forreal-worldusability,thetrainedmodelwasdeployedas a Flask-based REST API, enabling real-time sentiment
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
classification.TheAPIwastestedonnew,unseenreviews, where it maintained high reliability and consistency in sentimentpredictions.
Fig-8.1:Confusionmatrix
This study successfully implemented a sentiment analysis system tailored for IMDB movie reviews using logistic regression as the core classification model. The research demonstrates that logistic regression, paired with TF-IDF vectorization, achieves an optimal balance between accuracy, computational efficiency, and interpretability. Compared to more complex classifiers such as Random ForestandAdaBoost,thelogisticregressionmodelprovedto bebotheffectiveandresource-efficient.
Exploratory Data Analysis (EDA) played a crucial role in uncoveringpatternsinsentiment-ladenwordsandphrases, showing that n-gram features significantly improve classification accuracy. The inclusion of explainability techniquessuchasSHAPandLIMEensuredthatpredictions were transparent, allowing users to understand the reasoningbehindsentimentclassifications.
Tomakethemodel accessible,itwasdeployedasa FlaskbasedRESTAPI,enablingseamlessintegrationintovarious platforms such as review monitoring systems, customer feedbackanalysistools,andsocialmediasentimenttracking applications. Future work can explore deep learning approaches for more nuanced sentiment detection and extendthesystemtosupportmultilingualsentimentanalysis forbroaderapplicability.
[1] S.S.A.S.S.A.S.S.A.S.S.Asiya,“SentimentAnalysis using Logistic Regression Approach on Ecommerce,”vol.6,pp.17-22,2022.
[2] X. W. Q. W. Yun Xu, "Sentiment Analysis of Yelp‘s Ratings Based on text reviews," no. Stanford University,pp.1-5,2015.
[3] J. S. Shivaprasad T K, " Sentiment Analysis of ProductReviews:AReview,"no.IEEE,pp.298-303, 2017.
[4] A. Indriana Hidayah, "Sentiment Analysis on Product Review using Support Vector Machine (SVM),"no.IEEE,pp.1-4,2020.
[5] T. F. S. N. J. C. A. O. D. J. P. R. George B. Aliman, " Sentiment Analysis using Logistic Regression," Journal of Computational Innovations and EngineeringApplications, no.SalleUniversity,pp.16,2022.
[6] S. S.R.Cheerka, "Sentiment Analysis using Logistic Regression," International Journal of Engineering Reseach and Applications, vol.11,pp.1-6,2021.
[7] M.F.A.H.A.M.N.A.HassanRaza1,"ScientificText Sentiment Analysis using Machine," International Journal of Advanced Computer Science and Applications, vol.10,no.no.12,pp.1-9,2019.
[8] M.T.MdT.H.KhanTusar,"AComparativeStudyof Sentiment Analysis Using NLP and differnt ML techniques on US Airline Twitter data," vol. 2, no. CityUniversity,Bangladesh,pp.1-4,2021.
[9] A. Y.Chandra, "Sentiment Analysis using ml and deeplearning," IEEE, pp.1-4,2020.
[10] M. A. T. Chowdhury, "Sentiment Analysis: A Comprehensive Review of Machine Learning Approaches," Journal of Artificial Intelligence Research,vol.15,no.3,pp.45-56,2022.
[11] J.LiuandH.Zhang,"DeepLearningforSentiment Analysis: A Survey," IEEE Transactions on Neural NetworksandLearningSystems,vol.32,no.4,pp. 1-15,2021.
[12] P.Patel,"ComparativeStudyofSentimentAnalysis UsingLSTMandCNN,"InternationalJournalofData ScienceandAnalytics,vol.8,no.2,pp.23-30,2021.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 03 | Mar 2025 www.irjet.net p-ISSN: 2395-0072
[13] R.GuptaandS.Verma,"TwitterSentimentAnalysis Using Hybrid Machine Learning Techniques," InternationalJournalofComputerApplications,vol. 182,no.6,pp.14-20,2020.
[14] D. Kaur and P. Singh, "Sentiment Analysis on ProductReviewsUsingTransformer-BasedModels," SpringerAdvancesinDataScience,vol.12,pp.89102,2023.
[15] Y. Kim, "Convolutional Neural Networks for Sentence Classification," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751, 2014
2025, IRJET | Impact Factor value: 8.315 |