Sentiment Analysis based Stock Forecast Application

Shivani R1 , Devaraju B M2 , Dr.Girijamma H A3

1Post Graduate Student, Department of Computer Science and Engineering, RNSIT, Karnataka, India

2Assistant Professor, Department of Computer Science and Engineering, RNSIT, Karnataka, India

3Professor, Department of Computer Science and Engineering, RNSIT, Karnataka, India

Abstract - An abstract summarizes, in one paragraph (usually), the major aspects of the entire paper in the following prescribed sequence. In the contemporary era, as data continues to grow in volume and significance for businesses, manual data analysis is no longer feasible in the fast-paced world. Therefore, the adoption of artificial intelligence and data mining techniques has become imperative. Among various factors affecting stock price fluctuations, a crucial determinant is the gains or losses incurred by a company. As news is a primary source of information for most traders, it plays a pivotal role in forecastingchangesinthestockmarket.Thisstudyfocuseson sentiment classification and its influence on stock market prices.Sentimentanalysisbasedstockpredictionisacuttingedge approach that leverages natural language processing and machine learning techniques to predict stock price movements using sentiment data from various sources. This studyexplorestheuseofsentimentanalysisonfinancialnews, social media, and other textual data to gauge market sentiment, which can serve as a leading indicator for stock pricetrends Thereareseveralclassifiersthatcouldbeusedin sentimentanalysisforstockprediction.Theclassifiersusedin the proposed system include, Decision Tree, Logistic Regression, Naïve Bayes and LSTM.

Key Words: Machine Learning, Deep Learning, Sentiment analysis, LSTM, Naive Bayes, Sentiment, score, Logistic Regression, Decision Tree

1.INTRODUCTION

Stockpredictionisanintriguingfieldthataimstoforecast thefuturemovementsofstockpricesinfinancialmarkets. However,predictingstock pricesaccuratelyischallenging due to the intricacy and volatility of financial markets. Recent advancements in data science and artificial intelligence have provided new tools and approaches to improvepredictionaccuracy.Thesetechniquesinvolvethe extraction and analysis of vast amounts of financial data, includinghistoricalstockprices,companynews,economic indicators, and social media sentiment. Stock prediction usingsentimentanalysisoffinancialnewshasemergedasan innovativeapproachtogaininsightsintomarkettrends.By harnessingnaturallanguageprocessing(NLP)techniques, this method involves analyzing textual data, suchas news articles and social media posts, to gauge the sentiment associatedwithspecificstocksorcompanies.Theunderlying

premise is that the sentiment expressed in financial news can impact investor behavior, thereby influencing stock prices. Positive sentiment may drive increased buying activity, leading to price appreciation, while negative sentiment could prompt selling and subsequent price decline. Sentiment analysis algorithms employ machine learningandNLPmodelstoclassifythesentimentoftextual dataaccurately.Thesemodelsassessthesentimentpolarity (positive,negative,orneutral)andthestrengthofsentiment expressed.Thesentimentanalysisresultscanbecombined withotherfundamentalandtechnicalindicatorstogenerate predictive models that capture market sentiment as an additionalinput.

1.1 Problem Statement

In the current era, there is a challenge of accurately predictingstockpricesinthefinancialmarket.Theproblem liesinthecomplexandvolatilenatureofthemarket,where stock prices are influenced by various factors, including economic indicators, company performance, and investor sentiment.Thegoalistodevelopareliablepredictionmodel thatcanforecastfuturestock priceswitha highdegree of accuracy based on financial news. This model should leveragehistoricalstockdata,markettrends,andrelevant indicatorstomakeinformedpredictionsandassistinvestors in making strategic investment decisions. The aim is to overcome the inherent uncertainties and challenges associatedwithstockprediction,ultimatelyenablingusers tooptimizetheirportfoliomanagementandmaximizetheir investmentreturns.

2. LITERATURE SURVEY

This study utilizes natural language processing techniques and recurrent neural networks (RNN), specificallylongshort-termmemory(LSTM)networksInthe field of stock forecasting. The objective is to predict stock prices using textual data. The study leverages data from Finwiz,afinancialinformationplatform,focusingonstocks ofcompaniessuchasAdidasAG,ContinentalAG,Deutsche Lufthansa AG, Henkel AG & Co., and Siemens AG. By employing RNN with LSTM, the model aims to capture temporal dependencies and patterns in the textual data, allowing for accurate stock price predictions. The study demonstratesanaccuracyrateof80%forAdidasAG,70% forContinentalAGandDeutscheLufthansaAG,and65%for

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page956

***

HenkelAG&CO.andSiemensAG.Thesefindingshighlight thepotentialofnaturallanguageprocessingandrecurrent networks in forecasting stock prices based on textual information.Theapplicationofnaturallanguageprocessing and RNN in stock forecasting presents promising opportunities for investors and financial analysts. By integratingtextualdataanalysiswithtraditionalquantitative methods, a more comprehensive understanding of stock marketdynamicscanbeachieved.Thisstudypavestheway for further advancements in utilizing natural language processing and recurrent networks to enhance stock forecasting accuracy and assist in making informed investmentdecisions[1].

Thisstudyfocusesonthesentimentanalysisoffinancial news utilizing the VADER (Valence Aware Dictionary and Sentiment Reasoner) sentiment analysis tool in NLTK (NaturalLanguageToolkit).Theobjectiveistoanalyzethe sentimentexpressedinfinancialnewsdataobtainedfrom Finwiz. The study reports an accuracy rate of 70% for sentiment analysis using this approach. The VADER sentimentanalysistool,coupledwithNLTK,allowsforthe classification of financial news into positive, negative, or neutral sentiment categories. These findings demonstrate thepotentialofsentimentanalysistechniquesinextracting valuableinsightsfromfinancialnewsdata.Byunderstanding sentimenttrendsinfinancialnews,investorsandfinancial analysts can gain a deeper understanding of market dynamicsandmakemoreinformeddecisions[2].

Inthisstudy,thefocusisonpredictingstockpricesusing sentimentalanalysisthroughTwitterdata.Theresearchers employed various machine learning algorithms, including Random Forest, LSTM-RNN, CNN, and MLP. The data was obtainedusingtheTwitterAPI.Theaccuracyresultsshowed promisingoutcomes,withMLPachievinganaccuracyof0.82 and0.84,CNNwithtwolayersachievinganaccuracyof0.83, RNN with two layers achieving an accuracy of 0.77, and Wavelet CNN achieves an accuracy of 0.85. These results underscore the potential of sentiment analysis on Twitter datainpredictingstockpricesandhighlighttheefficacyof different machine learning algorithms for this task. This researchcontributesvaluableinsightsintotherealmofstock market prediction and paves the way for further advancementsinsentimentanalysisandmachinelearning techniquesinthisdomain[3].

Inthisstudy,sentimentanalysisisemployedtoexplore the connection between subjective expression in financial reports and company performance. The objective is to analyze how the sentiment conveyed in these reports can potentially impact the success of a company. Regression analysis is utilized as a statistical method to examine the relationship between subjective expression and performance.ThestudyincorporatesdatafromFortune,a renowned business magazine, and Xueqiu, a prominent financial platform. With an accuracy rate of 78.3%, the sentimentanalysismodeldemonstratesitseffectivenessin

capturingthesubjectivenatureoffinancialreportsandits potentialimplicationsforcompanyoutcomes.Thisresearch contributes valuable insights into the interplay between sentiment analysis, financial reports, and company performance, providing a foundation for further investigationsinthisfield[4].

Thisstudyfocusesonconductingsentimentanalysisatthe entitylevelinfinancialtexts,utilizingapre-trainedlanguage model to enhance the granularity of financial information. Theeffectivenessofsentimentanalysisforfinancialtextsis demonstratedusingbidirectionalencoderrepresentationsof transformers,whichleverageapre-trainedlanguagemodel. The results of these experiments reveal that a minimum accuracyof93%canbeachieved[5].

3. PROPOSED SYSTEM

Intheproposedsystemtwodatasetsareusedinorderto trainthesentimentanalysisbasedstock prediction model with precision. The first one is obtained from Kaggle i.e fnews_data and the other is a real-time dataset that is obtained from Finviz which is a stock screening, stock research,andstockmarketfinancialvisualizationsoftware andyfinance,Pythonlibrarythatprovidesasimpleinterface to access financial data from Yahoo Finance. The dataset consistsoftwomodules.

A)Preprocessingofthedata

Thepreprocessingofdatainvolvesseveralcrucialsteps to ensure data quality and prepare it for further analysis. These steps include handling missing values through imputation or removal, addressing outliers and noise, standardizing or normalizing numerical features to a common scale, encoding categorical variables, and performingfeatureselectionordimensionalityreductionto reduce complexity and improve model performance. Additionally, data preprocessing may involve handling duplicates,checkingfordataintegrity,andpartitioningdata intotrainingandtestingsets.Thesepreprocessingstepsare essential for enhancing the accuracy and reliability of subsequentdataanalysisandmachinelearningtasks.

B)Extractionandselectionoffeatures

Featureextractionandselectionarecrucialstepsindata preprocessing that aim to identify and retain the most relevantandinformativefeaturesforsubsequentanalysis. Feature extraction involves transforming raw data into a reducedsetofrepresentativefeaturesusingtechniquessuch as principal component analysis or feature engineering methods.Thisprocessreducesdimensionalityandcaptures theunderlyingpatternsinthedata.Featureselection,onthe otherhand,focusesonidentifyingandselectingthesubsetof featuresthatcontributethemosttothepredictivepowerof amodel,usingmethodslikecorrelationanalysis,recursive feature elimination (RFE), or information gain. These

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page957

techniques aid in improving model efficiency, reducing overfitting,andenhancingtheinterpretabilityofresults.

4. SYSTEM ARCHITECTURE

Thedatasetisfirstloadedintothemodel.Thefnews_data datasetobtainedfromkaggleisusedfortheexperiment.The fnews_datawaschosenbecauseithassignificantamountof redundantentriesinboththedatasets.Thedatasetconsists offinancialnewsfrommultiplecompanies.Ithasover80794 records,ofwhichthemodelwastrainedwith64635records andtestedwith16158records.

Logistic regression

Step1:Gatherthedatasetwithinputfeaturesandlabels.

Step2:Dividethedatasetintoatrainingsetandatestset.

Step3:Setinitialweightsandbiasforthelogisticregression model.

Step 4: Use the sigmoid function to compute predicted probabilities.

Step5:Determinethecostbetweenpredictedprobabilities andactuallabels.

Step6:Adjustweightsandbiastominimizethecostusing gradientdescent.

Step7:Iteratethroughthetrainingsettoupdateparameters andreducethecost.

Step8:Usethetestsettoassessthemodel'sperformance.

Step9:Optimizemodelsettingsforbetterresults.

Step 10: Make predictions on new data using the trained model.

Decision Tree

Fig – 1: SystemArchitectureforStockForecast Application

Onceafterthedatasetisloadedintothemodelthenext step is data pre-processing where the model is been processedtocheckifthereareanyoutliers.Outliersarethe pointsthataredifferentfromtheotherdatapresentinthe dataset.Iftheoutliersarepresenttheyshouldbeeliminated. Thenextstepinstockpredictionisfeatureselection.There are 41 features present in the finviz dataset, of which 5 featuresarechosen.Oncethefeaturesarechosen,thenext stepistocategorizethefinancialnewsaspositive,negative or neutral using machine learning and deep learning algorithmsandcheckforitsaccuracy,precision,recallandFmeasure. The algorithm that has the highest accuracy is chosenandisappliedontherealtimefinvizdatasettoobtain the sentiment score. The sentiment score obtained is combinedwithyfinancedatasettopredictthestockprices usingLSTM.

5. METHODOLOGY

The algorithms used includes Naive Bayes, Logistic regressionanddecisiontreetoobtainthesentimentscore fortherepositorydatasetfromkaggle.Thestepsinvolvedin thesealgorithmsarestated.

Step1: Collectthedatasetcontainingfeatures(attributes) andcorrespondingtargetlabels.

Step2:Dividethedatasetintoatrainingsetandatestset.

Step3:Startbuildingthedecisiontreebyselectingthebest attributetosplitthedata.

Step 4: Create nodes representing decisions based on the selectedattributes.

Step5:Splitthedataintosubsetsateachnodebasedonthe chosenattribute.

Step6:Repeatthesplittingprocessforeachsubsettogrow thedecisiontree.

Step 7: Set criteria for stopping the tree growth, such as reaching a maximum depth or a minimum number of samplespernode.

Step8:Assign targetlabelstothe leafnodesbased on the majorityclassineachleaf.

Step9:Performpruningtoreducethesizeofthetreeand preventoverfitting.

Step10:Usethetestsettoevaluatetheperformanceofthe decisiontree.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page958

Step 11: Fine-tune parameters like maximum depth or splittingcriteriatooptimizethedecisiontree'sperformance.

Step12:Applythetraineddecisiontreetomakepredictions onnewdata.

Naïve bayes

Step 1: Gather the dataset containing features and correspondingclasslabels.

Step2:Dividethedatasetintoatrainingsetandatestset.

Step3:ChoosetheappropriateNaiveBayesvariantbasedon thedatatype:

Step 4: Calculate the prior probability of each class in the trainingset.

Step5: Estimatethelikelihoodofeachfeatureforeachclass:

Step 6: Assume independence among features given the class.

Step 7: Apply Bayes' theorem to compute the posterior probabilityofeachclassgiventheinputfeatures.

Step8:Forclassification,choosetheclasswiththehighest posteriorprobabilityasthepredictedclass.

Step9:Usethetrainingsettocalculatetheprobabilitiesand parametersrequiredforclassification.

Step 10: Assess the Naive Bayes classifier's performance usingthetestset.

Step 11: Apply Laplace smoothing to handle zero probabilitiesandimprovegeneralization.

Step 12: Fine-tune any relevant hyperparameters, suchas smoothing factor or feature selection, for better performance.

AsseenintheTable-1above,NaïveBayesgivesAccuracy of 73.23%, Recall of 73.23%, F-score of 73.03%, and Precision of 75.93%. Decision Tree gives Accuracy of 61.08%,Recallof61.08%,F-scoreof61.10%andPrecision of61.64%.LogisticRegressiongivesAccuracyof66.46% , Recall of 66.46%, F-score of 66.25%, and Precision of 67.18%

Upon comparing the performances of the three algorithms, Naïve Bayes achieved the highest accuracy, recall,F-score,andprecisionamongthethree.DecisionTree exhibitedmoderateperformance,whileLogisticRegression fellinbetweenNaïveBayesandDecisionTreeintermsofthe evaluationmetrics

ThepiechartasshowninChart-1visuallyrepresentsthe distribution of classes in FNews_dataset on using Naïve Bayes. Each slice of the pie represents a class, and the percentageinsideeachsliceindicatestheproportionofthat classinthedataset.Thedifferentclassesinthedatasetbased on sentiment analysis includes positive, negative and neutral.

TheNaïveBayesalgorithmhasthehighestaccuracythus itischosenandisappliedontherealtimefinvizdatasetto obtainthesentimentscore.Thesentimentscoreobtainedis combinedwithyfinancedatasettopredictthestockprices usingLSTM.

6. CONCLUSION AND FUTURE WORK

Sentimentanalysis-basedstockpredictionisastate-ofthe-artmethodthatutilizesnaturallanguageprocessingand machine learning algorithms to forecast stock price movementsbasedonsentimentdata.Thesentimentanalysis modelsaretrainedondatasetsobtainedfromtwoprimary sources:yfinance,whichprovideshistoricalstockpricedata andreal-timemarketinformationfromYahooFinance,and Finviz, a financial visualization platform offering market insightsandvisualizationsforinformeddecision-making.By integratingthesedatasetsandalgorithms,investorscangain

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page959

73% 75%. 61% 61% 66% 67%

Table -1: AccuracytableforKaggledataset Chart -1:SentimentdistributioninFnews_dataset

Algorithm Accuracy NaïveBayes 73% 73% DecisionTree 61% 61% Step 13: Use the trained Naive Bayes classifier to make predictionsonnewdata. LogisticRegression 66% 66% Recall F-Score Precision

valuable insights into market sentiment, correlate it with stock price data, and make more informed investment decisions. The visualization capabilities of Finviz further enhance the understanding of market trends, technical indicators, and fundamental metrics, providing a comprehensive approach to sentiment-driven stock prediction. In future work, the exploration of different datasets from various financial markets and economic conditions can provide valuable insights into the model's adaptabilityandgeneralizationcapabilities.Investigatinga wide range of algorithms, including support vector machines,deep reinforcementlearning,orhybrid models, will enable a comprehensive evaluation of their strengths and weaknesses for sentiment analysis based stock prediction. An opportunity to assess the model's sustainability and performance in long-term investment scenarios can be presented by extending the prediction horizontolongertimeranges,suchasweekly,monthly,or yearly intervals. This analysis could uncover potential challengesandtrendsthatmightnotbeevidentinshorter timeframes.

REFERENCES

[1]DileepKumar,“StockForecastingUsingNaturalLanguage and Recurrent Network”, International Conference on Emerging Technologies in Computer Engineering: MachineLearningandInternetofThings,February2020, 07-08.

[2]Arul Agarwal,”Sentiment Analysis of Financial News”, InternationalConferenceonComputationalIntelligence andCommunicationNetworks,June2021.

[3]Niveditha N Reddy, Naresh E, Vijaya Kumar B P, “Predicting Stock Price using Sentimental analysis through Twitter data”, Institute of Electrical and ElectronicsEngineers,September2020.

[4]NiZhong,JunBaoRen,“Usingsentimentanalysistostudy the relationship between subjective expression in financial reports and company performance”, InternationalConferenceonFrontiersinPsychology,July 2022.

[5]ZhihongHuang,ZhijianFang,“AnEntity-LevelSentiment Analysis of Financial Text Based on Pre-Trained LanguageModel”,IEEE18thInternationalConferenceon IndustrialInformatics,2020.

[6] Rubi Gupta, Min Chen, “Sentiment Analysis for Stock Price Prediction”, IEEE Conference on Multimedia InformationProcessingandRetrieval,2020.

[7]EthanSeals,StevenR.Price,“PreliminaryInvestigationin the use of Sentiment Analysis in Prediction of Stock Forecasting using Machine Learning”, IEEE Southeast Conference,2020.

[8]HeWang,ZhiqiangGuo,“FinancialForecastingbasedon LSTM and Text Emotional Features”, IEEE 8th Joint International Information Technology and Artificial IntelligenceConference,2019.

[9]Saloni Mohan, Sahitya Mullapudi, Sudheer Sammeta, Parag Vijayvergia, David C. Anastasiu, “Stock Price Prediction Using News Sentiment Analysis”, IEEE Fifth InternationalConferenceonBigDataComputingService andApplications,2019.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 07 | July 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page960

Turn static files into dynamic content formats.

Create a flipbook