Facebook User Prediction in India Using Time Series ARIMA Forecasting

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net

Facebook User Prediction in India Using Time Series ARIMA Forecasting

1* Research Scholar, Department of Statistics, University of Rajasthan, Jaipur, Rajasthan,India(302004) (Orcid ID: https://orcid.org/0009-0003-8426-247X)

2 Assistant Professor, Department of Statistics, University of Rajasthan, Jaipur, Rajasthan, India(302004) (Orcid ID: https://orcid.org/0009-0007-9433-1220)*** -

Abstract

This study uses statistical and econometric techniques to analyze trends, stationary, and autocorrelation in the time series data of Facebook user growth from April 2009 to September 2024. The PACF (Partial Autocorrelation Function) and ACF(Autocorrelation Function) are analyzed to determine seasonal patterns and to evaluate stationary patterns using the Augmented Dickey-Fuller (ADF) test. The best model for predicting was determined to be ARIMA(2,1,2). The residual diagnostics, such as the Ljung-Box and Box-Pierce tests, which reveal no appreciable autocorrelation in the residuals, verify thesuitabilityofthemodel.WeestimateFacebookusergrowthusingthismodelfortheensuingtwoyears,andweverifythe model'sdependabilitybyanalyzingresiduals.

Key words: Time series modeling, ARIMA, BIC (Bayesian Information Criterion), Augmented Dickey-Fuller, AIC (Akaike InformationCriterion),Ljung-BoxandBox-Piercetests

Introduction

Facebookintroducedaneweraofsocialconnectivitytoafast-expandingonlineaudiencewhenitenteredtheIndianmarketin 2006.Atfirst,theplatformwasonlyappealingtotech-savvymetropolitanconsumers,butasSmartphonepenetrationroseand internet connectivity got more reasonably priced, it soon gained popularity. Facebook appealed particularly to Indians, who place a great emphasis on social ties, because of its easy-to-use interface and capacity to connect with friends and family. Facebook further increased its popularity by tailoring its platform to India's different demographics by emphasizing local contentandregionallanguages.

FacebookhasbecomeamajorfactorinIndia'sdigitaleconomyovertime,movingbeyondsocialnetworking.Itgavebusinesses newtoolstouse,allowingeventinyandmedium-sizedorganizationstoreachlargermarkets.ByinvestinginJioPlatformsin 2020, Facebook strengthened its connections with the Indian market and increased its sway over the country's digital environment. Withmillions ofactive users,Facebook still influencessocial interaction, business,and evenpolitical debate in Indiatoday,connectingpeoplefromalloverthehugecountry.

Table 1 represents the 16-year monthly Facebook users in India. The data was taken from "Statcounter GlobalStats" as a secondarysourcefromApril2009toSeptember2024.

Table1:FacebookUsersInIndia(Inmillions)

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

ARIMA Model (Box-Jenkins)

A time series is a collection of data that has been observed throughout time. Based on historical data for a single variable, a family of models called ARIMA models may produce accurate forecasts and characterize both stationary and non-stationary time series. In contrast to other forecasting models, This one doesn't make any assumptions about the time series data that needtobeforecasted.TheBox-JenkinsprocessusesthefollowingstepstogenerateARIMAmodels:

(1) Modelrecognition, (2) Parameterselectionandestimation, (3) Diagnostictesting(sometimescalledmodalvalidation),and (4) Modelimplementation.

To identify the model, it is necessary to determine the orders (p, d, and q) of the AR and MA components. In essence, the question it asks is: Is the data stationary or non-stationary? Which order of differentiation (d) causes the time to become stationary?

Statistical Analysis

Forthisstudy,several statistical andtimeseries packages,including"tseries" and"forecast" areutilizedin additiontoother conventionalprograms.Theopen-sourcestatisticalprogram"RStudio"(version3.0.1)isalsoutilized.

Tocreateaforecastingmodel,thedatasetinTable1isutilized.Graph1belowshowsthelinegraphofIndianFacebookusers.

Graph 1: Facebook Users (In Millions) in India from 2009 to 2024

Graph 2: ACF in time series

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

Graph1indicatesa riseintrend, andGraph2showshighautocorrelation intimeseries data.Thus,thedata setofFacebook usersis not stationary. ToapplytheARIMAmodel firstlyweconvertthis series intostationary. We determine the minimum orderdifferencing(d=1)andtestfortheunitrootissue.

difference (d=1)

Graph 3: Line plot of first-order differences in Facebook user data (d=1)

Thegraphabove(Graph3)makesitclearthatthetimeseriesseemstohaveastationarymeanandvariance.However,before continuing,Tocheckifthedifferencedtimeseriesdataisstationary(unitrootproblem),wewillutilizetheaugmentedDickeyFullertest(ADF).

ADF Test:

Wetestnullandalternativehypothesisat levelofsignificance:

Ho:Non-stationarytimeseriesdataarepresent.

H1:Stationarytimeseriesdataarepresent.

ADFTestEquation:

TheADFtestresult:

Dickey-Fuller=-4.1546,Lagorder=5,p-value=0.01

Here,

TheHoisrejected.Thus,weconcludedthattheStationarytimeseriesdataarepresent.

Withthehelpofthistest,wecancontinuedevelopingtheARIMAmodelbydeterminingappropriatevaluesforpinARandq inMA.Thestationary(d=1)timeseriescorrelogramandpartialcorrelogrammustbeexaminedinordertodothat.

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page1840

International Research Journal of Engineering and

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net

Correlogram and Partial Correlogram

Table2belowshowsthe"ACFandPACFcoefficients"forlags1through20ofthatfirst-orderdifferencedseries.

Table2:ValuesofACFandPACFforLags1–20

Graph 4

Graph4illustratestheautocorrelationfunctiongraphicfor lags ranging from 1 to 20 in the d=1 time series of Indian Facebook users. ACF at lag 10 (ACF=0.237579) is displayed,anditbarelysurpassesthesignificancebounds. We can presume that the lag 10 autocorrelation is erroneousandaresultofpurechance.

Graph 5

ThePACFgraphicforlagsrangingfrom1to20inthe d=1 timeseriesofFacebookusersinIndiaisshowninGraph5. Here also we have two outliers at lag 9 and 11. Since all other PACFs from lag 2 to 20 fall inside the significant limits,thiscouldjustbearandomoccurrence.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

Given that the partial PACF and the ACF tail off to zero after lag 2 and lag 3, respectively, after omitting the outlier, we can advisethefollowingpossibleARMA(autoregressivemovingaverage)modelsforthed=1timeseriesdataofFacebookusersin India:

1. Anautoregressivemodeloforderp=2,orARMA(2,0),asboththeautocorrelationandthepartialautocorrelationare zeroafterlag2.

2. AMAmodeloforderq=3,orARMA(0,3)model,sincetheautocorrelationandpartialautocorrelationarebothzero afterlag3.

3. Because both partial autocorrelation and autocorrelation have tails, an ARMA(p,q) model, also known as a mix model,haspandqgreaterthanZero.

Asaresult,underthefollowingcriteria,wearelimitedtothe3ARIMA(p,d,q)modelsthataretentative:

Table 3

Table3aboveshowsasummaryoftheresultsfromeachfittedARIMAmodelinourtimeseries(ofFacebookusers).weselect themodelwiththelowestAICandBICvaluesisARIMA(p=2,d=1,q=2),whichisthebestpredictivemodelforestimatingfuture valuesofourtimeseriesdata.

Consequently, we are fitting our time series to the ARMA(2,2) model for d=1 Given that q is two in MA, this indicates the AR(2)model.Therefore,themodelcanbewrittenas: ( ( )) ( ( )

where isthetimeseriesmeanand isthestationarytimeseriesunderexamination.AsshowninTable3, is errorwith mean zero and constant variance,and , and arethe parameters to be estimated. Forastationarydifferencedseries,themean( )shouldbe0orextremelynearto0 Intheeventthatμisnotequaltozero,in ourstudy Weuse toforecastthefuturevaluesintheaboveequation

Forecasting:

Table4:2-yearsforecastingforFacebookUsersinIndia

International Research Journal of Engineering and Technology (IRJET) e-ISSN:

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

TheforecastforourtimeseriesARIMA(2,1,2)model'sfuturevaluesfora2-yearperioduptoSeptember2026isdisplayedin Table4.InGraph6and7below,weillustrateatwo-yearpredictionofFacebookusers.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

We will then investigate three questions: (1) are the forecast errors of our ARIMA(2,1,2) model normally distributed with ( ) (2)arethererelationshipsamongforecasterrors;and(3)aretheresidualssimplywhitenoise?

Plottingtheerrorswillallowustoexaminethepredictederrordistribution(standardresiduals). Differentstandardresiduals plots and histograms (prediction errors) of the fitted ARIMA(2,1,2) model are displayed in the Graph 8(a),8(b),8(c), and 8(d) below:

Graph 8(c): ARIMA (2,1,2) Histogram of Residuals (Forecast Errors)
Graph 8(d): Residuals (Forecast Errors)–ARIMA(2,1,2) Normal Q-Q Plot
Graph 8 (a): Residuals ARIMA (2,1,2) Graph 8(b): Standard residual of fitted ARIMA(2,1,2)

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

The fitted model's multiple line plots and Q-Q plots of the standard residuals (shown above in Figures 6(a) through 6(d)) suggestthatthemeanandvarianceofstandard errorsareconsistentthroughouttime.However,thereisa greaterdegreeof variation at the beginning and conclusion of the series. Additionally, the histogram verified that errors have a normal distributionwithzeroasthedistributionmean.Additionally,theQ-Qplotvalidatesthattheerrorsarenormal.

We will now plot the prediction errors' partial and absolute correlograms (PACF and ACF) to further investigate any relationshipsbetweensuccessiveforecasterrorsbelowFigures9and10.

Graph 9: Residuals (Forecast Errors) ARIMA Estimated ACF (2,1,2)

Box-Ljung and Box-Pierce Test Statistics:

Thenullandalternativehypothesis:

Ho :Theautocorrelationfunctionsarezero.

H1 :Theautocorrelationfunctionsarenotzero.

Graph 10: Estimated PACF of Residuals ARIMA(2,1,2) below.

The test statistics results are shown in Table 5 below. The Box-Ljung "p-values" for the fitted model are shown in Figure 9

Weseethatp-valueforbothteststatisticisgreaterthanthelevelofsignificance( ) Thus,WefailtorejectHo

Table 5: Statistics for the Box-Ljung and Box-Pierce Tests

2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page1845

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Graph 11: p-values for the fitted ARIMA (2,1,2) using Ljung-Box

We accept the null hypothesis, according to which all autocorrelation functions in lags 1 through 30 are zero, based on the statisticsandhighp-valuesinthetwotestspreviouslyindicated.Wecanconcludethattheforecasterrorsatlagsrangingfrom 1to30exhibitlittletonoevidenceofnon-zeroautocorrelationsinourfittedmodel.

Conclusion:

TheARIMAmodelcanbeusedwiththissetofdatabecausethedifferencedseriespassedthestationaryconditionandtheADF p-valueislessthan0.05. Theoptimal model forforecasting wasdetermined to be ARIMA(2,1,2)based on AIC, BIC,and AICc values.Theresidualsaremainlyrandomlyspreadaroundzeroontheplot,suggestingnodiscerniblepatterns,whichimpliesa decent fit. For optimal model accuracy, the residuals appear to be roughly normal based on the Q-Q plot and histogram. The Ljung-BoxandBox-Piercetestp-valuesaregenerallygreaterthan0.05atvariouslags,indicatingthattheresidualsnolonger includeanysignificantautocorrelation.ThisprovidesmoreevidencethattheARIMA(2,1,2)modelfitsthedataquitewell.

This time series data on Facebook user growth seems to suit the ARIMA(2,1,2) model the best. It effectively captures the patterns withoutleavingsignificant autocorrelationintheresiduals,whichmeansthatitcan reliablybeusedforforecasting future Facebook user growth. The residuals analysis shows that the model assumptions are met, indicating robust model performance.

References:

[1] Box,G.E.P.,Jenkins,G.M.,&Reinsel,G.C.(2015). Time Series Analysis: Forecasting and Control (5thed.).Wiley.

[2] ChrisChatfield,HaipengXing:TheAnalysisofTimeSeries:AnIntroductionwithR.

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072 © 2025, IRJET | Impact Factor value: 8.315 |

[3] Coghlan A. A Little Book of R for Time Series,Readthedocs.org, 2010. Available online at: http://a-little-bookof-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html

[4] Enders,W.(2014). Applied Econometric Time Series (4thed.).Wiley.

[5] Hyndman,R.J.,&Athanasopoulos,G.(2018). Forecasting: Principles and Practice (2nded.).OTexts.

[6] https://atsa-es.github.io/atsa-labs/sec-boxjenkins-aug-dickey-fuller.html

[7] Kumar, Manoj & Anand, Madhu. (2014). An Application Of Time Series Arima Forecasting Model For Predicting SugarcaneProductionInIndia.StudiesinBusinessandEconomics.9.81-94.

[8] Nath,&Bhattacharya,Debasis&Correspondence,Debasis&Bhattacharya,&DHAKRE,DIGVIJAY.(2018).Forecasting wheatproductioninIndia:AnARIMAmodellingapproach.2158-2165.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 www.irjet.net p-ISSN: 2395-0072

[9] Paul S.P. Cowpertwait, Andrew V. Metcalfe : Introductory Time Series with R https://books.google.co.in/books/about/Introductory_Time_Series_with_R.html?id=QFiZGQmvRUQC&source=kp_boo k_description&redir_esc=y

[10] Shumway,R.H.,&Stoffer,D.S.(2017). Time Series Analysis and Its Applications: With R Examples (4thed.).Springer.

2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Facebook User Prediction in India Using Time Series ARIMA Forecasting by IRJET Journal - Issuu