An Exploration of the Cauchit Link for modeling Binary Responses - Utsyo Chakraborty

Page 1

ST.XAVIER’SCOLLEGE(AUTONOMOUS),KOLKATA DEPARTMENTOFSTATISTICS AnExplorationoftheCauchitLink formodelingBinaryResponses Name:Utsyo Session:Semester:Year:Chakraborty3VIRollNo:4342019-2022 Supervisor:Ms.MadhuraDasgupta

2 DECLARATION I affirm that I have identified all my sources and that no part of my dissertation paper uses unacknowledgedmaterials. (UtsyoChakraborty)

3 Contents Section Topic PageNumber 1.1 GeneralizedLinearModels–theoutset 4 1.2 ChoicesofLinkfunctionsformodelingbinarydata 7 2.1 TheCauchitLinkFunction–avalidchoice? 8 2.2 Similaritiesanddifferencewithotherlinkfunctions 9 2.3 ApplicabilityoftheCauchitLink–adiscussion 12 3.1 ParameterEstimationoftheCauchitModel 14 3.2 InterpretationofParametersoftheModel 17 3.3 MeasuresofGoodnessofFitoftheModel 18 4 DataDrivenStudytoimplementtheCauchitlink 4.1 FittingaCauchitModel 20 4.2 Comparingthemodelwiththelogitandprobitmodels. 26 5 Conclusion 30 6 References 31

4

1.1:GeneralizedLinearModels-theoutset Anylinearmodelcanbeexpressedas: ��= ��+�� Here, �� denotes the observation vector of the response variable. �� denotes the systematic componentofthelinearmodeland�� whichdenotestherandomcomponentofthemodel. 1. The systematic component of the model, �� is essentially in the form of a linear predictor �� ofthecovariates�� ,�� ,…,�� ,i.e., ��=���� where, �� × = �� ,�� ,…,�� = �� [��=1(1)��,��=1(1)��] is known as the designmatrixand�� isthevector ofparameters(whichweoftenneedto estimate).The systematic component is purely non stochastic. Thus, the distribution of the responses dependssolelyonthedistributionoftherandomcomponentofthemodel.

3. Alinkfunction,whichlinkstherandomandsystematiccomponentsofthemodel. ��= �� However,theaforementionedassumptionsmaybeviolated.

2. In the case of Simple Classical Linear Model, we consider two assumptions on the randomcomponent�� whicharestatedasfollows:  Theerrorsareuncorrelatedwithzeromeanandconstantvariance,i.e.,�� �� = 0 and �������� �� =�� �� .Thisisknownasthe“minimalassumption”oferrors.

5

 An optional assumption in which the error terms are jointly distributed as a nvariate multivariate normal distribution with the mean vector being the null vector and the dispersion matrix being a diagonal matrix of dimension ��×�� in which the diagonal elements are �� and the other elements being 0, i.e. �� ~�� (0,�� �� ).Thisisknownasthe“maximalassumption”oferrors.

 The maximal assumption of errors may not hold, i.e., the errors are non-normally distributed.  Moreover,theassumptionofhomoscedasticity(commonerrorvariance)maybeviolated. In case of count or binary data, the problem of using a classical linear model would be that of unmatched regression. The response in these cases, assumes discrete or categorical values. However, the values of the linear predictor may very well be continuous and thus inapplicable for use. In such cases, we need to link the systematic component of the model with the expected valueoftheresponseof variablewithamonotonicanddifferentiablelinkfunction��(��).

6 Anexampleillustratingtheuseoflinkfunctionisasfollows: Let ��~������(��,��)[�� ��������������] with �� being the �� value of the response variable ��. We note that �� is discrete in nature. Suppose we want to model the binomial proportions �� s on the basisofsomecontinuouscovariates��,�� ,…,�� . Wedefineourresponseas��∗ = ,and��(��∗)= ��. Ifweuseaclassicallinearmodel,then��(��)= �� =�� Thus, ��(��)= �� =�� =��+���� + …+ �� �� We note that 0<�� <1 but ��+���� + …+ �� �� ∈ℝ. Thus, we can easily get a predicted valueof�� whichliesbeyondtheinterval (0,1)whichisthus,unmatched. Ontheotherhand,ifweuse��(��)=log ,wesee: ��(��)= log 1−���� =log 1−���� =��+���� + …+ �� �� Inthiscasetheregressionismatched,as: 0<�� <1 ⇒ 0<1−���� <∞ ⇒ −∞<log 1−���� <∞ ⇒ log 1−���� ∈ ℝ Thus, using a link function ��(��)=log is necessary in this case. We note that the choice of ��(��)=log is not unique; there exist infinitely many monotonic and differentiable functionswhichcanbeusedtoperformregressiononbinarydata.

The logit link is the most popular one of the two. This is because it is the canonical link of the model (link function same as the canonical parameter in the expression of the one parameter exponentialfamily).

usually consider that the response variables are distributed as a Bernoulli randomvariablewiththeprobabilityofsuccess(response=1)being�� [��~������(��)]

2. Theprobitlink, where ��(��)= Φ (��);0<�� <1 where Φ (.) denotesthe inverse ofthecumulativedistributionfunctionofthe��(0,1)distribution.

1.2:ChoicesofLinkfunctionsformodelingbinarydata

7

1. Thelogitlink,where��(��)=log ;0<�� <1

Thecomplementarylog-logfunctionisalsousedincertainscenarios.Itisgivenby:

If the response variable of the model can assume only two values, then it is known as binary

There are several choices of link functions which could be used for modeling binary data. The twomostcommonlyusedonesare:

For the ease of convenience, they may be denoted by 0 and 1, or in other words, success and failure. Practical examples of situations in which the response variable is binary could be the outcome of a game of chess (in which a victory is denoted by 1, and a tie/loss is denoted by 0), the outcome of a medical trial (in which the recovery of a patient is denoted by 1 and 0 Inotherwise),etc.suchcases,we

8 ��(��)=log(−������(1−��));0<��<1 Wenotethatalltheabovefunctionsaremonotonicanddifferentiableintheirgivendomains. Thus,theyareappropriatechoicesoflinkfunctions. 2.1:TheCauchitLinkFunction-avalidchoice? inBesidestheaforementionedchoicesoflinkfunctions,anotherchoicewhichwewillexploreherethisdissertationisthe CauchitLinkFunction. TheCauchitlinkisgivenas: ��(��)=tan����12;0<��<1 Itisinterestingtonotethat��(��)=+tan��,isthecumulativedistributionfunction oftheCauchydistributionwithlocationandscaleparametersbeing0and1respectively. Wecanverifythroughaplotof��(.)thatitisamonotonicanddifferentiablefunction.

9 Theaboveplot��(��)against��(where0<��<1,thedomainof��())revealsthatthefunction increasesmonotonicallyasweincrease��from0to1.Therealsoarenosharppoints,i.e., suddenchangesincurvaturevisible,thushighlightingtheaspectofdifferentiability.Thus,we canclaimthat��(.)isanappropriatechoiceoflinkfunction. 2.2:Similaritiesanddifferenceswithotherlinkfunctions differencesprobitTheCauchitlinkfunctionsexhibitssomeinterestingstructuralsimilaritieswiththelogitandlinkfunctions.Agraphicalplotofthethreelinkfunctionscanbringtolighttheefficiently.

10

Similarities:1. The three link functions are inverse functions of the cumulative distribution functions ofcertainreal-valuedprobabilitydistributions.

 Probit link - Inverse of the CDF of Normal distribution with location and scale parametersbeing0and1respectively. �� (��)= Φ(��)  Cauchit link - Inverse oftheCDFof Cauchy distribution with location and scale parametersbeing0and1respectively. �� (��)=12+1��tan �� 2. The other similarity is in the shape of these graphs. They are all S-shaped, are monotonically increasing with exactly one point of inflection at ��=0.5. In fact, they closely resemble the graph of a sigmoid function, (which is abounded,differentiable, real function that is defined for all real input values and has a non-negative derivative at eachpointandexactlyoneinflectionpoint).

 Logit link - Inverse of the CDF of Logistic distribution with location and scale parametersbeing0and1respectively. �� (��)= 1+����

3. The link functions are symmetric about 0.5. This is because��(0.5+��)=��(0.5 ��)∀�� ∈(0,1).Thesymmetryisalsoevidentfromthethreegraphsabove.

11 Difference: Thekeydifferenceamongthethreelinksisinthetailregionsoftheabovegraphs. ASuperimposingthethreeabovegraphsintoone,weget:plotof ��(��)against��showsusthatthecurvescorrespondingtotheCauchitandLogit curvesalmostsuperimposeoneachotherwhen��∈(0.1,0.7)approximately.However, when��∈(0,0.1)∪(0.7,1),thecauchitlinktendstowardsinfinityoneitherdirectionata muchfasterratethaneitherthelogitorprobitlinks. ThiscanbeexplainedbythefactthattheCauchydistributionhasthickertailsthanthelogistic makesornormaldistribution.Agraphicalcomparisonamongthedensityplotsofthethreedistributionsthedistinctionnoticeable.

12 2.3:ApplicabilityoftheCauchitlink-adiscussion numberThereisnohardandfastruleforwhentoapplyacertainlinkfunctioninwhichsituation.AofresearchershaveproposeddifferentcircumstancesonwhentousetheCauchitlink. J.M.Norusis(inIBMSPSSstatistics19.0advancedstatisticalprocedurescompanion, 2012 and)hasproposedthatthecauchitlinkcanbeusedfavourablyincomparisontothelogitprobitlinkswhentheresponsehasmany“ extremevalues”. RogerKoenkerandJungmoYoon(inParametriclinksforbinarychoicemodels:A Fisherian-Bayesiancolloquy,2009)havesaid: “outliers”thatvalues,The“cauchit”modelisattractivewhenobservedresponsesexhibitafewsurprisingobservationsforwhichthelinearpredictorislargeinabsolutevalueindicatingtheoutcomeisalmostcertain,andyetthelinearpredictoriswrong.Thesebinarymaybetheresultofavarietyofeasilyimaginedcircumstancesincludingdata

13 recording errors, but whatever their source both probit and logit are rather intolerant of them, while cauchit is more forgiving.  In the book, Ignored Racism: White Animus Toward Latinos, the authors Mark Ramirez and David Peterson use a Cauchit model to examine the relationship between Least Restrictive Environment (LRE) of Latinos and the Vote Choice of voters in the 2014 and 2016 elections for the US House of Representative. The binary response of their study was vote choice (Latino/a or Non-Latino/a) and the nine covariates used were sex, age, education, economic optimism, median household income, partisanship, ideology, immigration policy preferences and latino/a stereotypes. The reason they cited for choosing a cauchit model was “Given that the dependent variable is heavily skewed with more respondents voting for Non-Latino/a candidates than Latino/a candidates, we estimate respondent vote choice using a Cauchit model”

14 3.1:ParameterEstimationofaCauchitModel In this section, we will derive the expressions for the parameter estimates of the model and the standarderrorsoftheseobtainedestimates. Let �� be our required binary response variable where ��’s are independently distributed such that: �� = 1������������������0������������������ ∀��=1(1)�� Let�� ,�� ,…,�� bethe�� independentcovariateswhichweplanonusinginthemodel. Let us define �� , the linear predictor based on the �� observation of the �� covariates as �� + ���� +⋯+�� �� . Now, we model the probability of a success of the �� value of response given the covariate informationas: Pr[�� =1|�� ,�� ,…,�� ]=12+1��tan (�� +���� +⋯+�� �� )= 21+1��tan (��) Lettheaboveprobabilitybe��. Thus,�� = + tan (��)…(1) Since we have defined the probability of success of the response, we can now claim that ��~������(��)independently∀��=1(1)��. Weknow,�� = ��(��)= �� = + tan (��) ⟹ (�� )��= tan (��) ⟹ �� =tan(�� )��=��(��)= ��(��)…(2)

15 Where ��(��)=tan �� ��,0<��<1 is the Cauchit link function. We note that ��(.) is theinverseoftheCDFofaCauchy(0,1)distribution. Remark: ��(��) can be regarded as an appropriate link function as it is monotonic and differentiableintheinterval0<��<1. Objective: To find the Maximum Likelihood Estimates of ����,����,…,����and the standard errorsoftheseestimates. Since��~������(��)independently∀��=1(1)��,thelog-likelihoodcanbewrittenas: ��= {�� log 1−���� +log(1−��)} ⇒ ��log(��)+ (1−��)log(1−��)…(3) We have to now maximize �� with respect to �� ,�� ,…,�� , i.e. by solving (��+1) score equations��(�� )= = 0∀��=0(1)��. �� is a function of�� (as evident from (3)). Further, �� = �� (as ��~������(��) ) , �� =��(��) (from the definition of link function) and �� = �� +���� +⋯+�� �� (the linear predictor). Thus computing is equivalent to computing . . . (by the use of Chain rule of Now,derivatives). = {∑ ����log(����)�� ��=1 + ∑ (1−����)log(1−����)�� ��=1 }= ∑ (���� ����) ����(�� ����) �� �� ��

16 ������ ������ =�� (as�� =��) Weknow, �� =��(��)⇒ �� = �� (��)=��(��) ⇒ ������ ������ =��(����) Where, ��(��)= ( ) ,�� ∈ℝ is the probability density function of a Cauchy (0, 1) distribution and ��(��)= + tan (��),�� ∈ℝ is the cumulative density function of a Cauchy(0,1)distribution. And, = ��0 +��1��1�� +⋯+���� ������ = ������ (for��≠0) Thus, = ∑ ( ) ( ) ��(��)�� =0 ,��=1(1)�� = ∑ ( ) ( ) ��(��) ….(*) (*) gives us (��+1) score equations which are difficult to solve analytically (as they are non linear). They can be solved numerically (for example, by the Newton Raphson Method) to get theMaximumLikelihoodestimates �� ,�� ,…,�� .

17 We can now obtain the standard errors of our estimates �� ,�� ,…,�� by constructing a FisherInformationMatrix.The��+1 × ��+1 matrixisgivenby: ��= �� ⋯ ⋮⋱⋮�� �� ⋯ �� Where, �� = −�� ��2�� ������−1������−1 = ��2 ���� �� (1−��) ����−1 ����−1 ;��,��=1(1)��+1 �������� =1∀��=1(1)�� OninvertingtheFisher-Informationmatrix(byusingnumericalmethods),weget: �� = �� ⋯ ⋮⋱⋮�� �� ⋯ �� Thestandarderrorsoftheestimates�� ,�� ,…,�� arenowgivenby: ���� �� = �� ���� �� = �� … ���� �� = �� 3.2:Interpretationofparametersofthemodel Let us define �� as the probability of getting a positive response given the information on covariatesofthemodel. Thus,�� =Pr[���� =1|��1,��2,…,����]

The measure we can use to assess the goodness of fit of a model is that of deviance. If we are given �� observations, then we can fit a model containing upto �� parameters. Using this as a basis,wecanconstructtwoextremecases:

2. The Full model, �� = �� + �� ;��=1(1)��. Here there are a total of �� parameters, one per observation. The full model thus assigns the entire variation in the response to the systematic component, leaving nothing to be explained by the random component. Thus, it is the most uninformative model asit merely replicatestheentire datain the systematic part.

1. The Null model, �� = ��+ �� ;��=1(1)��. Here there is only one parameter, with each value of the response having a common��. This is the simplest possible model which can be fit in which the entire variation of the response is accounted for by the random component.

18 We have stated before that �� =�� ���� = ��(��0+ ∑ �������� 8 ��=1 ) ,where ��(.) is the CDF of the ��������ℎ��(0,1) distribution. Rearranging the above expression, we get �� + ∑ �� �� =�� (����). (Wenotethatthough�� is non-linear inparameters,�� (����)is linear.) Itiseasytoseethat�� (.)istheinversecumulativedistributionfunctionofthe��������ℎ��(0,1) distribution. Letusdefine�� (.)astheCauchitindex.Now,wecaninterpret�� [��=1(1)8] as the amount by which the cauchit index of a positive response changes for one unit increase in the covariate �� , keeping the other covariates fixed. 3.3:MeasuresofGoodnessoffitofthemodel

19

Althoughthesetwomodelshavetheir pitfalls, the full modelis conventionallyassumedtobethe baseline with which any model can be compared. This forms the basis of our discrepancy measure,deviance. Let under the proposed model, the fitted log-likelihood be given by ��(��), where �� is the MLE of �� underthemodel. Underthefullmodel,theloglikelihoodisgivenby�� �� =�� �� (asthemodelmerelyreplicates theentiredatainthesystematicpart,asmentionedabove). The deviance �� is thus defined as twice the difference between the full model and proposed model. ∴, ��= 2{�� �� − �� �� } Iftheresponseinbinaryinnature,thentheanalyticalformofthedeviancecomesouttobe: ��= 2 {�� log ���� +(1−��)log 1−1−���� } Themodel havinglower deviancehasanaturally lesserdiscrepancy comparedtothe fullmodel, andthusisthebettermodel Another measure which could be used to compare models on the basis of their goodness of fit is theAkaike’sInformationCriterion(AIC).

4:DataDrivenstudytoimplementtheCauchitlink 4.1:FittingaCauchitmodel ToimplementtheutilityoftheCauchitlink,wetrytofitittosomesecondarydataandobserve itsresults. Sourceofdata: In this study, the secondary data has been collected from the website of “The World Happiness Report”.A publicationoftheUnitedNationsSustainableDevelopmentofSolutions Network, this survey aims to rank countries on the basis of happiness level of its immigrant citizens. The data, on a primary basis, has been collected by Gallup World Poll, an annual survey conducted by the American analytics and advisory based company, Gallup, Inc. This endeavour began in 2005.

20 Suppose that we have amodelof some data. Let k be the number of estimatedparametersin the model. Let �� be the maximum value of thelikelihood functionfor the model. Then the AIC valueofthemodelisthusgivenby: ������=2 ��−log �� ThemodelhavingthelowervalueofAICcomparedtotheothers,isthebettermodel.

DescriptionofData: The study variable of this survey is “Level of Happiness” as indicated by a happiness score (also known as life ladder hap), in which participants have to rate their happiness level on a continuousscalefrom0-10.

Covariate Description SocialSupport (social) Itisthenationalaverage ofthebinaryresponsestothequestion“Ifyou wereintrouble,doyouhaverelativesorfriendsyoucancountonto helpyou,wheneveryouneedthem,ornot?”

21 Gallup World Poll (GWP) conducted surveys in over 160 countries with a sample of 1000 people (or sometimes 2000 for large countries) from the adult population semiannually, annually, and biennially. This survey includes almost 100 questions, in which responses were collected either through the medium of telephone/mail/form (generally in the developed countries) for almost 30 minutes or direct interview (generally in developing countries) for almost1hour. Formystudy,Ihaveusedthedatacollectedbytheorganizationin2014.

Information on eight plausible predictors has also been obtained. Their descriptions are as representedinthetablebelow(asreportedontheWorldHappinessReportwebsite):

HealthyLifeExpectancy (health) ThisdataisbasedoninformationcollectedfromtheWorldHealth Organization,theWorldDevelopmentIndicators,andstatistics publishedinjournalarticlestakenasnon-healthadjustedlife expectancy.

Note: The italicized words are the variable names used during the model fitting in R

Generosity (generosity) Itistheresidualobtainedbyregressingthenationalaverageofresponse tothequestion“Haveyoudonatedmoneytoacharityinthepast month?”ontheinformationofGDPpercapitaofthecountry.

22

Informationon theresponseand covariates werecollected from146 countriesovertheworld,in the year 2014. However, visual inspection of the data revealed that 19 countries did not have complete information for all the covariates. The data has thus been cleaned, by removing these observationsandbyutilizingthecompleteinformationofthe127remainingcountries

Positiveaffect (p_affect) Itisdefinedastheaverageofthreepositiveaffectmeasures: happiness,laughandenjoyment.

Freedomtomakelifechoices (freedom) Itisthenationalaverageofresponsestothequestion“Areyousatisfied ordissatisfiedwithyourfreedomtochoosewhatyoudowithyourlife?”

Negativeaffect (n_affect) Itisdefinedastheaverageofthreenegativeaffectmeasures: worry,sadnessandanger, Confidenceinnationalgovernment (confidence) Themeasureisthenationalaverageofthesurveyresponsestothe question:“Doyouhaveconfidenceinthenationalgovernment?

CorruptionPerception (corruption) Themeasureisthenationalaverageofthesurveyresponsestotwo questions:“Is corruptionwidespreadthroughoutthegovernmentornot” and“Iscorruptionwidespreadwithinbusinessesornot?”

It is to be noted that the response variable is not binary in nature. In order to convert these responsesintoabinarysetup,weredefinetheresponseusingthefollowingtransformation: �� = 1 ������ ≥6 0 ����ℎ������������ where,�� istheith valueoftheresponse,i.e.,thehappinessscoreofthe�� country. This procedure will thus convert all the continuous response observations into 0s and 1s, which isinthebinarysetup. Remark: The World Happiness Report has suggested that any country having a happiness score greater than or equal to 6 is said to be regarded as “happy” (6 is said to be a reflection of two contradictory forces which have been chosen as covariates - positive affect and negative effect). Thus, the number 6 has been chosen as a threshold value for the transformation. Now, we can model Pr[�� = 1]= Pr[�� ≥6] on the eight different covariates using the Cauchitlinkfunction.Themodeltobefittedisgivenby: Pr�� = 1|�� =��(��)= �� �� + �� �� where, �� (��=1(1)8) is the covariate information and ��(��)= + tan ��, the CDF of a Cauchy(0,1)distribution.TheprocedureformodelfittinghasbeendescribedinSection3.1.

23

Resultsobtained: At ��=0.05, the upper 100 1− percentage point of the �� distribution with (��−9) degrees of freedom, is approximately equal to 1.98. Thus, we claim that the covariates will be significant in predicting the response if the absolute value of the �� statistic for the given covariate exceeds1.98.

24 We have used the glm function in R for the binomial family to fit this model. The command usedis: glm(hap~social+health+freedom+generosity+corruption+p_affect+n_affect+confidence,family=binomial(link="cauchit"))

After fitting, we have to test which of the covariates are significant at a prescribed level of significance. To test the significance of the covariates, we frame the following �� testing problems: �� :�� =0 ����. �� :�������� ,��.��.�� ≠0 ��=1(1)8 Theteststatisticunder�� isgivenby: �� = �� ��.��.(�� ) ~�� At �� level of significance, we reject �� (i.e., reject the hypothesis that the covariate is insignificant in predicting the response) if |������������������|> �� , / .Here, �� , / is theupper 100 1− percentagepointofthe�� distributionwith(��−9)degreesoffreedom.

25 Theoutputobtainedonthecoefficientsisasfollows: Estimate Std.Error t-statistic Decision (Intercept) -147.2867 70.1332 -2.100 R social 84.6594 41.4620 2.042 R health 0.8136 0.3989 2.040 R freedom -5.8807 5.9833 -0.983 A generosity -5.7643 4.5665 -1.262 A corruption 4.2733 6.9028 0.619 A p_affect 25.4387 13.4078 1.897 A n_affect -11.4916 21.1087 -0.544 A confidence 6.7443 6.7898 0.993 A (Key: R- Reject, A- Accept) Weseethattheabsolutevalueofthet-statisticexceeds1.98for:  intercept  social(SocialSupport)  health(Healthylifeexpectancy) Thus, social support and healthy life expectancy are significant in predicting the happiness state of the country. Moreover, the greater the Social Support and Healthy Life expectancy, the happieristhecountryexpectedtobe.

26  The residual deviance ofthismodelcameouttobe56.895(on118degreesoffreedom).  The AIC measureforthismodelcameouttobe74.895 4.2:Comparingthemodelwiththelogitandprobitmodels. We may now be interested to fit a logit and a probit model to the above data and compare its resultswiththefittedcauchitmodel. Logitmodel-Results: At ��=0.05, the upper 100 1− percentage point of the �� distribution with (��−9) degrees of freedom, is approximately equal to 1.98. Thus, we claim that the covariates will be significant in predicting the response if the absolute value of the �� statistic for the given covariate exceeds1.98. Theoutputobtainedonthecoefficientsisasfollows: Coefficients: Estimate Std.Error t-statistic Decision (Intercept) -53.22675 14.00284 -3.801 R social 25.60577 8.30723 3.082 R health 0.32171 0.11593 2.775 R

27 freedom -1.54500 3.46328 -0.446 A generosity -1.94381 2.16451 -0.898 A corruption 0.01855 2.86305 0.006 A p_affect 12.01384 4.82362 2.491 R n_affect -0.34353 6.54171 -0.053 A confidence 2.83628 2.63490 1.076 A (Key: R- Reject, A- Accept) Weseethattheabsolutevalueofthet-statisticexceeds1.98for:  intercept  social(SocialSupport)  health(Healthylifeexpectancy)  p_affect(Positiveaffect) Thus, social support and healthy life expectancy are significant in predicting the happiness state of the country. Moreover, the greater the Social Support, Healthy Life expectancy and Positive affect,thehappieristhecountryexpectedtobe.  The residual deviance ofthismodelcameouttobe57.313(on118degreesoffreedom).  The AIC measureforthismodelcameouttobe75.313

28 Probitmodel-Results: At ��=0.05, the upper 100 1− percentage point of the �� distribution with (��−9) degrees of freedom, is approximately equal to 1.98. Thus, we claim that the covariates will be significant in predicting the response if the absolute value of the �� statistic for the given covariate exceeds1.98. Theoutputobtainedonthecoefficientsisasfollows: Coefficients: Estimate Std.Error t-statistic Decision (Intercept) -29.9947 7.4158 -4.045 R social 14.3232 4.4062 3.251 R health 0.1803 0.0621 2.904 R freedom -0.7653 2.0151 -0.380 A generosity -1.0032 1.2203 -0.822 A corruption -0.1573 1.5825 -0.099 A p_affect 6.8112 2.6613 2.559 R n_affect 0.4155 3.6013 0.115 A confidence 1.6123 1.4709 1.096 A Weseethattheabsolutevalueofthet-statisticexceeds1.98for:  intercept  social(SocialSupport)

29  health(Healthylifeexpectancy)  p_affect(Positiveaffect) Thus, social support and healthy life expectancy are significant in predicting the happiness state of the country. Moreover, the greater the Social Support, Healthy Life expectancy and Positive affect,thehappieristhecountryexpectedtobe.  The residual deviance ofthismodelcameouttobe57.001(on118degreesoffreedom).  The AIC measureforthismodelcameouttobe75.001 Acomparisonofthe residualdeviancesreveal: Model Logit Probit Cauchit ResidualDeviance 57.313 57.001 56.895 We know that the model having a smaller value of residual deviance is the comparatively better model. From the above table, it is evident that the Cauchit model has the least deviance and thus hasanedgeoverthestandardLogitandProbitmodels. Acomparisonofthe AICvaluesreveal: Model Logit Probit Cauchit AIC 75.313 75.001 74.895

Most analysts often opt to use the logit and probit links while modeling a binary response on a number of covariates. While they do have their interpretative advantages and provide efficient fits in most scenarios, there might be other models which perform better them in specific situations.A simpledatadriven study,asdiscussedinthispaper, has shownthattheCauchitlink function provides a better fit compared to the traditionally established logit and probit models when assessed through the metrics of deviance and AIC. Thus, analysts must devote more time and effort in trying to choose which link function suits their purpose of fitting a model to a binary response. In doing this they can not only provide better predictions and inferences, but canalsomaketheirresearchlogisticallymoreefficient.

5.Conclusion

30 We know that the model having a smaller value of AIC is the comparatively better model. From the above table, it is evident that the Cauchit model has the least deviance and thus has an edge overthestandardLogitandProbitmodels. Another interesting point to note is the difference in the number of significant predictors.

Both the Logit and Probit models have three significant predictors: Social Support, Healthy LifeExpectancy and PositiveAffect,while theCauchitmodeldoesawaywiththePositiveaffect as a significant predictor at the 5% level of significance. We can thus claim that the Cauchit model is a simpler model compared to the Logit and Probit ones as it can efficiently explain the entire variation in the response using a lesser number of significant predictors.

31 6.References  Generalized Linear Models,byMcCullaghandNelder  Categorical Data Analysis,byAlanAgresti.  An Exploration of Link Functions Used in Ordinal Regression, by Thomas Smith, David WalkerandCorneliusMcKenna.  Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions,byHamparsumBozdogan  Selection of link function in binary regression: A case-study with world happiness report on immigration,byArdhenduBanerjee,SubrataChakrabortyandAniketBiswas.  Classification using different Link function than Logit, Probit,bySoumalyaNandi. 

Generalized Linear Mixed Models for Randomized Responses, by Jean-Paul Fox, Duco VeenandKonradKlotzke. IBM SPSS statistics 19.0 advanced statistical procedures companion, 2012, by J.M. Norusis.

32  Parametric Links for Binary Choice Models: A Fisherian-Bayesian Colloquy, by Roger KoenkerandJungmoYoon.  Ignored Racism - White Animus towards Latinos,byMarkRamirezandDavidPeterson. *********

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.