Issuu on Google+

Life Expectancy Building a predicting model by cross-countries analysis

CLIENT: GLOBAL HEALTH ORGANIZATION (GHO) DRI Intern: Trudy Pham April 27, 2012

Page 1


Table of Contents

I Executive Summary.......................................................................................................................................... 3 II Statement of purpose ..................................................................................................................................... 4 III The Dependent Variable ............................................................................................................................... 5 IV Bivariate Analysis ........................................................................................................................................... 8 - GDP per capita ..................................................................................................................................... 9 - Economic Freedom index ................................................................................................................... 12 - Female schooling enrollment ............................................................................................................. 13 - Access to clean water resources ......................................................................................................... 14 - Children DPT Immunization ................................................................................................................ 15 - Urbanization ...................................................................................................................................... 16 - CO2 emissions per capita ................................................................................................................... 17

V Summary of Bivariate Analysis ...................................................................................................................... 18 VI Building Multiple Regression Models ........................................................................................................... 19 VII Conclusion and Recommendations ............................................................................................................. 22 Sources ............................................................................................................................................................ 23 Appendix A: Glossary of Statistical Terms ........................................................................................................ 24 Appendix B: Hypothesis Testing ....................................................................................................................... 25 Appendix C: Regression Models ....................................................................................................................... 26 Appendix C: Tables of variables ....................................................................................................................... 31

Page 2


Executive Summary This report serves as a comprehensive answer to the primary question from our client, the Global Health Organization (GHO): “how to increase the health status of the world’s population?” Because Life Expectancy (LE) is a direct measure of health status worldwide, in this paper we focus on searching for the various factors that influence LE. Based on this, we construct a model that best describe and thus predict LE for countries in the world. From the pool of various factors, we have come to decide that economic and social developments potentially have some impacts on our topic of interest, LE. The rationale for choosing proxies from these categories is backed up by theory that people’s lives are most related to changes and improvement in social and economic welfares. These variables are, in order of presentation: 1. 2. 3. 4. 5. 6. 7.

GDP per capita, 2001 Economic Freedom Index, 2002 Female schooling enrollment, 2003 Access to clean water resources, 2000 Children DPT Immunization, 2000 Urbanization population, 2002 CO2 emission per capita, 2001

A useful way to understand why LE has increased over time is to discover why LE varies across countries at specific point in time; this research will henceforward address the issue with respect to the year of 2003. Since the main purpose is to build a model that effectively reflects changes in LE as a result of changes in the independent variables, we perform univariate and bivariate analysis for each variable to confirm our perceptions about their potential effects on LE. The univariate analysis provides a preliminary outlook on the statistics of each independent variable, while the bivariate analysis reflects the individual influence of the factors on the outcome of LE. In the process, each variable is introduced with a priori assumption of how the world works, and those assumptions are tested. Finally, we combine significant variables from prior analysis to build up a comprehensive model to assess the validity of our research question. Upon performing the univariate analysis for our dependent variable LE, it is essential to be aware of the extreme cases in some countries in Africa who are suffering severely from HIV/AIDS. Cross sectional analysis shows that compare people from two random countries that receive similar levels of support from the government, people living in the country with lower prevalence of HIV or even relatively HIV-free will always live longer than those living in countries with high percentage of population dominated by this disease. HIV in fact “helps” distort the effects of our research’s variables, and thus causes unnecessary troubles for our model building process. In addition, since HIV prevalence is not a universal issue, for the purpose of this research, we exclude 23 countries with high prevalence of HIV from the research in order to give a better view of our policy implication. The results we obtained for relatively HIV-free countries suggest that although at first LE seems to depend heavily on GDP, further analysis involving multiple independent variables greatly reduces the explanatory power of GDP over LE. That is, holding GDP constant, it is possible to improve LE by implementing some relevant changes for the remaining variables, which are Female schooling enrollment, Clean Water, Immunization, and CO2 emission. Thus, the obvious policy implication is that to improve LE, it is reasonable to implement some progresses for the macroeconomic conditions of the countries regarding education, healthcare, and pollution. Nevertheless, our report suffers certain drawbacks such as the unavailability of data, leading to the limitation of what proxies could be used to assess the topic. Moreover, given the diverse geographical, biological, and cultural structures regarding people and countries around the globe, it is a challenging task to put them on the same ground to measure the effect of some certain macroeconomic factors on their levels of LE. Thus, the outputs should be taken with consideration. The report itself will only present the output of complex statistical processes; the details of these processes can be found in the appendices at the end of the research. Page 3


Statement of purpose We conduct this research for the year of 2003 to fulfill the commitment toward our client, the Global Health Organization (GHO), of how to improve health status in the world. By univariate analysis on the dependent variable LE, we form the hypothesis that Life expectancy status across countries is not merely random but in fact influenced by various factors. This prompts us to determine these factors and their degree of influence on LE using more advanced statistical tools. We identify various factors we suspect of influencing life expectancy, describe them and their relationship to the dependent variable (LE) and employ them as the independent variables during the model-building. The independent variables are chosen according to our priori knowledge and experience and only those that pass the test for significance will be remained in our last step, the building of the multiple regression model. We anticipate that the best model after all will bring us closest to the true relationship between LE and related factors, which can be used as a predicting model to improve health status in the world. The comprehensive statistical analysis that follows answers the client’s question to the best of our knowledge and capabilities. By going in-depth into the reasons of Life expectancy variation among countries, we ultimately aim to provide GHO with appropriate recommendations about policy implementation.

Page 4


The dependent variable: Life Expectancy (LE) Both Male Female 1

Life expectancy at a given age is the number of years of life remaining to an average person at that specified age, assuming that current agespecific mortality rates remain constant during the person’s lifetime. Throughout this research, life expectancy of a country is measured by life expectancy of the average person at birth.

66.6 70.7 Standard deviation 12.1 18.2 Coef of Variance

64.3 67.5 11.4

69.1 73.7 13.0

17.8

18.8

34.8 81.9 201

33.9 79.1 201

34.8 85.3 201

Mean Median

Minimum Maximum # of countries

Table 1: Life Expectancy by Genders, 2003 Wor ld's Life Expectancy of M ale, 2 0 0 3 Mean=64.3

Med=67.5

The LE mean of 66.6 indicates a relatively high quality of life. The distributions are skewed right in all three cases, meaning it is more common for countries to have LE above average than the other way around. A substantial standard deviation of 12 years and a wide variation of life span (from 34.8 to 81.9 years) also confirm this observation. In addition, countries on the right end of the LE scale seem to gather into a big cluster rather than spread out like those on the other end. Similar to the return to economies of scale, it is easier and faster to improve standard of life when a country has a low LE, yet once reaching a certain level of LE, countries slow down and remain pretty stable over time.

Frequency

30

15

0 37.5

45.0

52.5

60.0 # of years

67.5

75.0

82.5

Wor ld's Life Expectancy of Female, 2 0 0 3

Frequency

Mean=69.1

Med=73.7

20

10

0 37.5

45.0

52.5

60.0 # of years

67.5

75.0

82.5

Wor ld's Life Expectancy of Both sexes, 2 0 0 3

Frequency

Mean=66.6

Med=70.7

20 10 0 37.5

45.0

52.5

60.0 # of years

67.5

75.0

82.5

Figure 1: World's Life Expectancy by Genders, 2003

World's Life Epectancy of Female vs male, 2003 90 80 70

Female

Overall females have a higher life expectancy than males. This has been mainly explained by the difference in both biological structures and healthier lifestyles that females often lead. Few exceptions to the observation above are seen mostly on the bottom part of the line y=x, where lie countries with very low LE. Kenya, Zimbabwe, and Lesotho become the three largest outliers because of the extreme prevalence of HIV. In those countries, mostly under-developed, there might be more gender discrimination against female as well.

Y=X

60 50 Kenya 40 30 30

Zimbabwe Lesotho 40

50

60 Male

70

80

Figure 2: World’s Life Expectancy of Male vs Female, 2003

1

Data source: http://www.who.int/topics/life_expectancy/en/

Page 5

90


Sources of variations of the dependent variable The skewness of the data set, the existence of strong outliers on the left side of the distribution, and the high coefficient of variation (approximately 18% for both genders) show that the variability of LE across countries is not merely random. Theoretically, every country is a unique entity comprised of different social, economic, and political systems. It is inevitable that there are factors that drive countries apart regarding the matter of LE. As stated by Ridley (2001) in his book “Ring Life Expectancy: A Global History”: “from the beginning, different regions and different countries advanced life expectancy by using their own tactics or combinations of tactics.” What contribute to the phenomenon of Health transition –the upward trend in LE as a result of the reduction of mortality rate in the long run, according to Ridley, includes but not limited to Social advance, Economic development, Biological structure enhancement, and bioethical and political concerns. Politics, while undoubtedly plays a crucial role in determining life quality, is excluded in our analysis. This is due to two main reasons: 1. political factors are generally very difficult to be accurately quantified, or even categorized; 2. politics often exerts its influence indirectly through social and economic channels. Thus, restricting the analysis to socioeconomic factors already suffices for our research purpose. It first comes to mind that the level of Gross Domestic Product (GDP) per capita is one of the most comprehensive measures of the wealth and well-being of a nation. As people get richer, they not only can meet the need for survival, but can afford to enjoy a better standard of living. Their expected numbers of years of life, hence, rise over years. Thus, we consider GDP per capita one of our independent variables to determine LE in the model we are going to build in this research. Furthermore, we assume that as individuals have more freedom and greater opportunity and individual choices, they lead a more active and happier life and thus, live longer. Economic freedom, therefore, is used to assess the magnitude of impact that the flexibility of the economic environment has on LE. In addition to the two proxies for economic development, major categories of variables that influence life expectancy are education, healthcare, urbanization, technology, inequality, and epidemic risks. Due to the availability of data for the year of concern 2003, we look for the potential effects of proxies in the first three categories on LE. It is worthwhile to keep in mind that some categories might require more than one proxy for adequate representation. In other cases a proxy might represent more than one category, possibly making it difficult to independently estimate coefficient effects of each variable. One typical example is the reflection of GDP per capita on education, urbanization, and healthcare at a minimum. As the quality of healthcare increase, ceteris paribus, so should the average life expectancy. Although this category is partly reflected in the level of GDP per capita, education and perhaps urbanization, we choose two proxies for healthcare here which are Access to clean water resources, and Immunization DPT. They seem to have important independent effects on LE and were in fact a historical movement in global healthcare in the past century (Riley 2001). Also, we suspect that more educated moms will have more knowledge of ensuring a healthier life for their children since birth, and thus make a positive impact on the length of life for a new generation. For this reason, we use Female schooling enrollment rate as a proxy to measure the level of female education. Last but not least, urbanization seems to have a positive impact on LE because as urbanization increases so does the accessibility, other things equal, to lifeextending resources such as medicine, and education. This category, however, also exerts negative effects such as pollution and the spread of infectious diseases. Thus, the effect of urbanization depends on the net impact of various positive and negative effects. Here we use Urbanization rate and CO2 emission per capita to examine this point of view.

Page 6


Sources of variations of the dependent variable An exceedingly important determinant of LE for many countries in Africa is the prevalence of HIV. In these countries the percentage of adult population with HIV is overwhelmingly at least several times to hundred times more than what is normally observed in the rest of the world. This can be seen in the extreme case of South Africa where although conditions of living are relatively high compared to other countries in Africa, the country has maintained a very low LE of 45 years for several decades because of the prevalence of HIV (16%) among its adult population. For the purpose of this research, we disregard the adverse effect of HIV by excluding 23 countries with HIV prevalence rate of more than 3% from the dataset. The reason for choosing 3% is that by examining the original data, only when countries pass this level do other potential variables severely lose their explanatory powers over LE. By doing so, we confined our number of countries to 178. Below is the descriptive statistics for the remaining countries regarding LE: Both Male Female Mean Median

69.4 71.6

66.8 68.6

72.1 74.6

Standard deviation 9.7

9.3

10.3

Coef of Variance

14.0

13.9

14.3

Minimum Maximum # of countries

38.1 81.9 178

37.0 79.1 178

39.3 85.3 178

Table 2: Descriptive statistics for LE of countries without HIV prevalence, 2003

This shows that by excluding those strong outliers with HIV prevalence, we are able to reduce the variation in our data, indicating by smaller standard deviation from 12.1 to 9.7 and also smaller coefficient of variance from 18% to 14%. The mean of LE is now higher for both sexes. We use this set of data hereupon this research. Our next step is to construct a model that utilizes those variables described above to explain the variation in LE across countries. Our model will be of the general form: Life expectancy=

(

)

∑

In which Îą is the intercept, is the slope of GDP (that is, the change in LE given a change in GDP, keeping everything else constant), are the other variables that potentially influence LE, and are their corresponding slopes. It should be noted that the effects of these variables are lagged: the effect of a change in any of these variables in a given year can only be observed several years later. Thus, to predict LE in 2003, we should use variables with data from several previous years. However, because of the scarcity of data for some variables, we attempt to build a model with data for independent variables taken from the most available previous year, and as a result, any conclusion from the model should be taken with attention.

Page 7


Univariate and Bivariate Analysis In this section, we perform the following tasks2:  

Univariate analysis of each independent variable Bivariate analysis of the relevant variable and Life expectancy - Hypothesis test of the variable’s impact on LE - Simple regression of the variable against LE - Analysis of outliers, missing observations and conclusion

Sources and years of data, as well as note on the number of countries available and missing will be covered in details in succeeding analyses.

Variable Life Expectancy GDP per capita based on purchasing power parity (PPP) Economic freedom index Female schooling enrollment rate Access to clean water resources Immunization DPT Urbanization rate CO2 emissions per capita

Year 2003 2002 2002 2003 2000 2001 2002 2001

Abbreviation LE GDP.pc E.Free F.Sch.Enr H2O Immu. Urban CO2

Table 3: List of variables, by Orders of Analysis

2

For reference to the data set, see Appendix D

Page 8


Univariate and Bivariate Analysis: GDP per capita by PPP (GDP.pc)

GDP per capita by PPP is the value of all final goods and services produced within a nation in a given year divided by the midyear population of a given country, then converted to international dollars using PPP rates. This basically measures how much money would be needed to purchase the same goods and services in two different countries. An international dollar has the same purchasing power over GDP as the U.S. dollar has in the United States.

Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data

The data for GDP.pc was taken from World Bank for the year of 2002. Distribution of GDP per capita (by PPP) of countries in 2002 Mean=11,062

35

31

30 26

Percent

25 20 15

United Arab Emirates

12

10

7.7 6

5 0

5

5

3.8

Brunei 0

0

10000

0.6

Luxembourg Qatar 0

20000 30000 40000 50000 International dollar (US dollar)

1.3

0

60000

Figure 3: Distribution of GDP.pc, 2002

0.6

GDP.pc 11,062 5,840 12,620 114 507 62,591 178 22

Table 4: Descriptive statistics of GDP.pc3, 2002

By looking at the range of GDP per capita across countries, we realize an astonishing inequality existing between the richest and the poorest of the world. Approximately 70% of the countries gather near the low end of the scale, and only 30% of the total exceeds the mean value to spread out towards the high end. Some very strong outliers (4-7 standard deviations above the mean) include small countries like Luxembourg, Qatar and United Arab Emirates with very small population. Consequently, a low GDP could be still translated into a very high GDP per capita. These countries also maintain relatively high LE over years.

While the Coefficient of Variance of LE was only 14%, that of GDP was 114%, showing that there is even more dispersion in GDP per capita than in LE. Thus, the variation in LE cannot be solely explained by variation in GDP.pc but there are other hidden factors as well. Among those 22 countries with missing data on GDP, only Somalia has a low LE of 47.34 years, while others enjoy very high level of LE of at least 65 or so. Because bivariate analysis will only consider countries with availability of data for both variables, statistics on LE figure will be underestimated compared to the real situation. Nevertheless, since the number of missing countries is relatively small here, we should not run into a too big problem with our statistical model. Having said that, we expect having higher income would have a positive impact on the length of life for several reasons. As income increases, disposable income also increases, providing people with more resources for better shelter, food, and medical care. Using countrywide data in our model offers the advantage that individual data does not have: a wealthy person living in a poor country is unlikely to have the same access to quality food and medicine as a wealthy person living in a wealthy country. Since income is highly correlated with many other categories that would affect life expectancy (which will be later in the research by the correlation matrix), it is held constant to provide an unbiased estimate of marginal effects of the remaining variables. Hence, for a slope of GDP.pc in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as GDP.pc increases) against (LE increases as GDP.pc increases) We proceed to examine the relationship between LE and GDP.pc with the aid of statistical software. 3

Data source: http://data.worldbank.org/indicator/NY.GDP.PCAP.CD/countries.

Page 9


Univariate and Bivariate Analysis: GDP per capita by PPP (GDP.pc)

11062

11062 90

80

Brunei

Luxembourg Qatar United Arab Emirates

LE

70 60

70

69.4

69.4 LE

80

LE = 63.72 + 0.000450 GDP PPP 02 R-Square=33.7%

60

50

50 40

40 0

10000

20000

30000 40000 GDP PPP 02

50000

60000

70000

Figure 4: Life expectancy vs GDP per capita, 2003: A linear model

0

10000

20000

30000 40000 GDP PPP 02

50000

60000

70000

Figure 5: Life expectancy vs GDP per capita, 2003: A quadratic model

Observe that to the left of the line signaling the mean of GDP per capita, there is a remarkable variation in LE. One can theorize that GDP per capita only has substantial impact on LE up to a certain level, meaning that an increase in GDP per capita in very poor countries can have huge impact to the condition of living, and thus directly influences LE. Yet once countries pass a certain level of economic wellbeing, each dollar increased in GDP per capita does not have the same power to boost up the life expectancy any more. This features our familiar concept, the diminishing marginal rate of return. Although these above scatterplots confirm our intuition that generally LE tends to increases together with GDP per capita, the correlation between LE and our hypothesized predictor GDP per capita is not clear since the R-suqare value is only 34%. What more closely depicts the relationship between LE and GDP.pc, if there is any, is a quadratic model. This fact does not seem very appealing to us because we want to be able to build a linear model to approximate LE. That said, it is more interesting and useful to plot LE against the log base 10 of GDP per capita, or the real growth rate of an economy. As shown below, the regression lines express a relatively strong correlation between a country’s LE and its GDP per capita growth rate. Nonetheless, an R-Square less than 1 from still points out that GDP per capita growth rate alone cannot completely explain LE. 90 LE = 14.37 + 14.47 Log.GDP.pc R-Square=62%

* Appeared in the eclipse on the right are countries with high growth rate but very low LE, including Equatorial Guinea, Angola, to name a few. They are most likely countries in Africa or South America where natural disasters, epidemic disease and apartheid are the most common factor in causing death for the population, even though the economy is still doing well.

80

LE

70

60

50

40 3.0

3.5

4.0 Log.GDP.pc

4.5

5.0

Figure 6: Life expectancy vs GDP.pc growth rate

For the purpose of this research, hereafter instead of GDP per capita, we use the log base 10 of GDP per capita as an independent variable to explore the potential impact of economic advancement on LE. Hence, for a slope

of log.GDP.pc in a simple regression equation against LE, we still expect

to be positive, only Page 10


Univariate and Bivariate Analysis: GDP per capita by PPP (GDP.pc) the meaning of the slope differs from before: (LE decreases or remains constant as growth rate of GDP per capita increases) against (LE increases as growth rate of GDP per capita increases ) The simple regression model confirms our intuition by showing a positive slope for log.GDP.pc of 14.5 and a t-test value of 15.95. These results allow us to reject the null hypothesis and conclude that GDP plays a significant role in measuring LE.

Page 11


Univariate and Bivariate Analysis: Economic Freedom Index (E.Free) The Index of Economic Freedmo (E.Free) is a composition of economic freedom from 10 different viewpoints, including Trade freedom, Business freedom, Government spending, Fiscal freedom, Monetary freedom, Financial freedom, Investment freedom, Property rights, Labor freedom, and Freedom from corruption. Taken together, these 10 economic freedoms provide a comprehensive, albeit imperfect, picture of economic freedom, both in individual countries and in the global economy as a whole.

Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data

E.Free 59.56 59.95 12.58 21.12 8.9 89.4 178 44

Table 5: Descriptive statistics for E.Free4 , 2002 Economic Freedom index

The distribution of E.Free is generally normal and centers around 60%, except for the two strong outliers from the far left, which are North Korea and Iraq.

26

25 22

21

# of countries

20

13 10

10

5

An R-Square value of 18% is neither too poor nor excellent a predictor for LE.

15

15

8 6

5

North Korea Iraq 1

0

1

Hongkong Singapore

3 0

0

15

1

1

30

45

60

75

1

90

Index

Figure 7: Distribution of E.Free across countries, 2002

LE = 50.86 + 0.3163 E.free 80

North Korea

70

LE

A large number of countries in the world maintains a level of economic freedom from 60% to 70%, in which the corresponding LE can be anywhere from the lowest to the highest value. Data seems less dispersing as countries improves their E.Free beyond 70%, where all of them maintain very high level of LE. Hence, we expect E.Free to have a clear impact on LE once it passes a certain level. Exceptions to this theory are the cases of Korean Rep. and Iraq where they have extremely low level of E.Free but still enjoy a very high LE. They are excellent examples to show that E.Free is not the only factor to determine LE.

R-Square=18.6%

Iraq

60

50

40 0

10

20

30

40 50 Percentage

60

70

80

90

Figure 8: Life expectancy vs E.Free

We expect having more freedom in economic activities and individual choices make the country better off, and hence higher E.Free would have a positive impact on the length of life. Hence, for a slope of E.Free in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as E.Free increases) against (LE increases as E.Free increases) The simple regression model is in accordance with our hypothesis; a positive slope for E.Free of 0.32 and a t-test value of 4.49 allow us to reject the null hypothesis and conclude that E.Free is statistically significant. 4

Data source: http://www.heritage.org/index/explore.aspx?view=by-region-country-year

Page 12


Univariate and Bivariate Analysis: Female schooling enrollment rate (F.Sch.Enr) Female schooling enrollment rate (F.Sch.Enr) is measured by the total female enrolment in a specific level of education, regardless of age, expressed as a percentage of the eligible official school-age population corresponding to the same level of education in a given school year. For the tertiary level, the population used is that of the five-year age group following on from the secondary school leaving. * Due to unavailability of data for previous years, data for F.Sch.Enr is taken in 2003. Also, because missing data for a substantial number of countries, we provide a new descriptive statistics for LE of countries with available data on F.Sch.Enr.

Variable

F.Sch.Enr LE

Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data

73.93 76.05 22.23 30.07 17 120.77 178 63

69.8 71.6 9.4 13.4 42.0 81.9 178 63 5

Table 6: Descriptive statistic of F.Sch.Enr 2003 and corresponding LE Female Schooling enrollment rate 30

30

Also, the school enrollment rate appears somewhat more concentrated to the center (coefficient of variation is only 30%), with some strong outliers. We will remove the outliers that are more than 3 standard deviations away from the mean for the regression.

25

# of countries

22

20 17

15 12

10 5 0

8

8

3

3

20

40

7

2

60 80 Gross enrollment ratio

100

3

120

Figure 9: Distribution of F.Sch.Enr across countries, 2003

* Countries in the red eclipse are those with Enrollment rate exceeding 100%. This seems counterintuitive at first, but if we look at the definition carefully, which shows that F.Sch.Enr is calculated by dividing the number of students enrolled in a given level of education regardless of age by the population of the age group which officially corresponds to that level of education, a ratio larger than 100% is totally possible. These countries are all developed countries with high LE (Finland, Norway, New Zealand, Sweden, etc.)

90 LE = 43.98 + 0.3487 F.Sch.Enr 80

R-Square=68.7%

LE

70

60

50

40 20

40

60 80 Percentage of enrollment

100

120

Figure 10: Life expectancy vs F.Sch.Enr

Well-educated women care more and have the knowledge to provide their children with the better child cares, more effectively reduce the chances that they fail their children, and thus establish a foundation for a healthier life later on for their children. We expect more female education would have a positive impact on the length of life. Hence, for a slope of F.Sch.Enr in a simple regression equation against LE, we expect to be positive, and our hypothesis is: (LE decreases or remains constant as F.Sch.Enr increases) against (LE increases as F.Sch.Enr increases) The positive slope for F.Sch.Enr of 0.349 (with t-test value of 15.75) allows us to reject the null hypothesis and conclude that education positively affects LE, as predicted.

5

Data sourse: http://stats.uis.unesco.org/unesco/TableViewer/document.aspx?ReportId=136&IF_Language=eng&BR_Topic=0

Page 13


Univariate and Bivariate Analysis: Access to Improved Water resource (H2O) Access to an improved water source (H2O) refers to the percentage of the population with reasonable access to an adequate amount of water from an improved source, such as a household connection, public standpipe, borehole, protected well or spring, and rainwater collection. Unimproved sources include vendors, tanker trucks, and unprotected wells and springs. Reasonable access is defined as the availability of at least 20 liters a person a day from a source within one kilometer of the dwelling. * The world’s database on this variable is updated every 5 years starting 1990. Here the data is chosen from 2000 - the closest year prior to our year of interest, 2003. Access to Water resource 60 54

# of countries

50 40 30 23

24

20 14 10

10 4 1

0

0

15

1

1

30

1

5

4

5

4

4

1

45 60 Percentage of population

3

75

Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data

H2O 85.14 92 18.59 21.84 21 100 178 19

Table 7: Descriptive Statistics on H2O6., 2000

With only 19 countries with missing data, this variable is able to reflect the wide variation between nations regarding access to improved water resources, which can range from as low as 21% to the ideal condition where 100% citizen has access to those resources. This can be likewise observed by a large standard deviation existing in the data set. A median much greater than the mean (85%) suggests that majority of countries in the world have more than 85% of population having access to improved water resources. However, a remarkable gap remains between them and the rest of the world.

90

Figure 11: Distribution of H2O across countries, 2000

80

LE = 31.88 + 0.4322 H2O R-Square=65.9%

LE

70

60

50

40 20

30

40

50 60 70 % of population

80

Figure 12: Life expectancy vs H2O

90

100

Another interesting observation is that this model does a much better job in estimating LE when H2O is high. The smaller dispersion of the data with H2O equal to or higher than 80% shows that the estimated LE is closer to the observed LE and in many cases they are almost the same. In contrast, data to the left of the 80% line in Figure 12 seems to vary a lot more given any level of H2O. While clearly countries to the right of the 80% line have relatively higher LE than those to the left, this observation in the variation of H2O allows us to form a theory that in low LE countries, the effect of H2O can be remarkably powerful to improve quality of life, and thus LE.

H2O performs quite well to predict LE compared to other variables, indicated by a high level of R-Square of 66%. Since having access to improved water resource can prevent a number of waterborne diseases, we expect that this variable will have a positive impact on life expectancy. Hence, for a slope of H2O in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as H2O increases) against (LE increases as H2O increases) The regression analysis again solidifies our theory. The positive slope of 0.432 and a t-test value of 17.41 allow us to reject the null hypothesis and conclude that H2O is statistically significant. 6

Data source: http://data.worldbank.org/indicator/SH.H2O.SAFE.RU.ZS

Page 14


Univariate and Bivariate Analysis: Immunization DPT (Immu) Child immunization DPT (Immu) measures the percentage of children ages 12-23 months who received vaccinations before 12 months or at any time before the survey. A child is considered adequately immunized against diphtheria, pertussis (or whooping cough), and tetanus (DPT) after receiving three doses of vaccine. * Data for Immunization is taken for the year of 2001 so as to minimize the number of missing countries and to allow a period of time for the lag effect to take place in 2003.

Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data

Immu 84.63 92 17.55 20.74 26 99 178 18

Table 8: Descriptive Statistics on Immu7, 2001 DPT Immunization

The distribution of Immu across countries parallels that of H2O. A majority of countries clusters around 92-95%, while some exceptional outliers lie to the left, thus inflating the variation.

51

50

# of countries

40

30

27 23

20 14 10

10 4 1

0

1

30

6 2

40

3

4

6 3

5

0

50 60 70 Percentage of children

80

90

100

Figure 13: Distribution of Immu across countries, 2001

80

LE = 31.69 + 0.4361 Immu. R-Square=60.7%

LE

70

60

50

40 20

30

40

50 60 70 Percentage of children

80

90

100

With a small number of countries with missing data on Immunization, in which most of them maintain high level of LE, our regression model will still be able to serve as a means to partly predict LE for the majority of the world. It is clear from Figure 12 that when Immu rate is less than 80%, the variations in LE for countries with similar Immu are considerable; thus the variable does not account for much of the deviation of LE at lower level of Immunization. Countries with almost perfect immunization rate appear to all have high LE.

Figure 14: Life expectancy vs Immu.

It is easy to see that the higher the child immunization rate, the less prone to disease the population would be; hence the lower death rate and the longer LE. Hence, for a slope of Immu in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as Immu increases) against (LE increases as Immu increases) We reject the null hypothesis in favor of the alternative hypothesis, with a positive slope for Immu of 0.44 and a t-test value of 15.63.

7

Data source: http://data.worldbank.org/indicator/SH.IMM.IDPT

Page 15


Univariate and Bivariate Analysis: Urbanization rate (Urban) Urban population rate (Urban) measure the percentage of people of a population living in urban areas as defined by national statistical offices. It is calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects. * Data for Urban is from 2002.

Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data

Urban 58.25 60.08 23.91 41.06 11.36 100 178 1 8

Table 9: Descriptive statistics of Urban , 2002 Urbanization Rate 18 16

16

# of countries

14

13

13

12

12

12 11

12 11

10

9 8

8

8

9 8

8

7

7

6

6

Contrary to GDP, we see that Urban is distributed quite uniformly through its range, although there is still a huge gap between the lowest and the highest Urbanized countries. However, the data provides a good overview of the world regarding this matter since it does not suffer from the problem of missing data.

6

4 2

1

0

15

30

45 60 % of total population

75

90

Figure 15: Distribution of Urban. across countries, 2002

One strong outlier is Djibouti, an underdeveloped country with very basic quality of life. Although there have been some urban development in Djibouti, it has not seemed to make up for other low indicators, leaving the country among the low LE groups.

R-Square=39.1%

LE = 54.60 + 0.2543 Urban. 80

LE

70

60

Countries in the blue circle are mostly in Africa and have both low urbanized population and LE. They also suffer from HIV, though not as severe as the countries that were excluded from the beginning of the research.

50 Djibouti 40 0

20

40

60

80

100

Urban.

Given a pretty high R-Square of 40.1%, we expect that Urban can be an important factor to explain LE.

Figure 16: Life expectancy vs Urban.

As said, we suspect that having a high urban demography means higher quality of life. Hence, for a slope of Urban in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as Urban increases) against (LE increases as Urban increases) With a very high t-value of 10.6, we can reject the null hypothesis at ease, and the value of the slope is statistically significant. However, the regression itself is not complete, with only an R-squared value of 39%. The conclusion is that even though we can generally expect LE to increase as the percentage of Urban population increases, this expectation is very uncertain. 8

Data source: http://data.worldbank.org/indicator/SP.URB.TOTL

Page 16


Univariate and Bivariate Analysis: CO2 emissions per capita (CO2)

Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring. CO2 emissions per capita (CO2), thus, is measured by the total emissions divided by total population. *The data was acquired for the year of 2001.

Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data

CO2 4.753 2.326 6.335 133.27 0.02 49.5 178 10 9

Table 10: Descriptive statistics of CO2 , 2001 CO2 level 80

The distribution in CO2 is extremely skewed left, analogous to that of GDP. Here come into place several strong outliers who are either very developed countries such as Australia or Canada, or small countries with petroleum industry like Qatar, United Arab Emirates, Kuwait, and Trinidad and Tobago.

76

70

# of countries

60 50 39

40 30

25 18

20

Qatar

10 0

Following the analysis on GDP, here we take the log base 10 of CO2 to reflect the growth rate in CO2 emission across countries. Although R-Square confirms a pretty high explanatory power of this model, surprisingly this model predicts a positive relationship between CO2 growth rate and LE, meaning that an increase in level of CO2 is associated with an increase in LE.

20

4

0.0

7.5

2

15.0

3

3

1

1

1

0

1

0

22.5 30.0 Metric tons per capita

0

37.5

0

0

0

0

1

45.0

Figure 17: Distribution of CO2 across countries, 2001

90 LE = 65.05 + 10.92 logCO2 R-Square = 58%

80

For a slope of logCO2 in a simple regression against LE, we expect to be negative, and our test hypothesis is:

70

LE

(LE rises or remains constant as logCO2 increases) against (LE decrease as logCO2 increases)

60

50 Equatorial Guinea 40 -2

-1

0 Growth rate of CO2 emission

1

2

With a positive slope of 10 and a t-test of 15.25, we fail to reject our null hypothesis.

Figure 18: Life expectancy vs Growth rate of CO2

Here, one needs to be mindful that “correlation does not always imply causation”. There exists no theory explaining why breathing in more CO2 would lengthen one’s life. The positive slope very likely due to the strong positive correlation between CO2 level and economic developments. In the next section, we will see that by including both GDP per capita and CO2 into the model. The expected result is that logCO2 will be negatively correlated to LE in the final best model.

9

Data source: http://data.worldbank.org/indicator/EN.ATM.CO2E.

Page 17


Summary of Bivariate Analysis of Independent variables Based on our analysis thus far, it appears that the variables in our analysis especially GDP per capita, H2O, and Female schooling enrollment work very well to predict LE. Each of these represents an aspect of the society that potentially affects LE. Below is a summary of the bivariate analysis that we have done so far:

Constant Log.GDP.pc (T test) E.Free (T test) F.Sch.Enr (T test) H2O (T test) Immu (T test) Urban (T test) Log CO2 (T test) R-Square (adj) F-Test N

Models 1 14.4 14.5 (15.95)

2 50.9

3 44

4 31.9

5 31.7

6 54.6

7 65.1

0.316 (5.49) 0.349 (15.75) 0.432 (17.41) 0.436 (15.63) 0.254 (10.61)

62% 254 156

18% 30.2 134

68.4% 248 115

65.7% 303 159

60.5% 244 160

38.8% 112 177

10.9* (15.25) 58.1% 232 168

Table 11: Summary of simple regression with LE

Variable Life Expectancy GDP per capita growth rate Economic freedom index Female schooling enrollment rate Access to clean water resources Immunization DPT Urbanization rate CO2 emission growth rate

Year 2003 2002 2002 2003 2000 2001 2002 2001

Abbreviation LE Log.GDP.pc E.Free F.Sch.Enr H2O Immu. Urban LogCO2

Table 12: Summary of multivariate regression variables

* This variable does not pass the hypothesis test for simple regression

Page 18


The multiple regression analysis Although we have constructed several simple regressions that work to predict LE, none of them serves the purpose completely; this could be shown by a value of R-Square less than 1 in all of our simple regressions. The reason for this could be either there exist other factors which completely influence LE that we haven’t noticed, or the observed variation might indeed be largely due to randomness, or our statistical analysis up till now has been ineffective. It is conceivable that a severe data unavailability is one of our hindrances, and by only looking at the bivariate analysis, we deliberately ignore all other motions in the real world that can affect LE. Since our world is a dynamic entity on its own, with numerous interactions among its diverse components, it is not feasible to find the change in one macroeconomic variable such as LE as a result of solely one other factor. Therefore, we ask for the assistance of the multiple regression where we, by exploring the interactions between different variables, will construct a model to best predict our dependent variable, LE. We finally suggest some policy implications based on our complete model. Living in a tangled world where everything is related to one another, it is expected that our independent variables will be related to each other to some degree. We want to inspect this matter to ensure the validity of the model we are going to build:

Immu E.Free F.Sch.Enr Urban LogCO2 H20

LogGDP.pc 0.586 0.528 0.759 0.738 0.896 0.730

Immu

E.Free

F.Sch.Enr

Urban

LogCO2

0.284 0.717 0.446 0.663 0.745

0.346 0.393 0.335 0.413

0.679 0.508 0.747

0.664 0.597

0.477

Table 13: Correlation between independent variables

As shown in the table above, correlations between LogGDP.pc and the other 5 variables are very high, especially with LogCO2 and F.Sch.Enr. This is expected since they are very closely related through the economic channel. Besides, there is also a high correlation between H2O and Immu. This is not because they are causal and effect, but because they are different aspects of social cares, meaning that countries who invest to improve their health infrastructure will likely be willing to have their young citizens immunized. In addition, Urban is found to be highly correlated to F.Sch.Enr and H2O for several reasons. First, urban concentration requires higher standard of living and improvement in hygiene. Second, urban lifestyle is much more open compare to rural lifestyle in which it exposes people to more opportunity to pursue education, and also frees women from household tasks and traditional roles. Women then can spend more time for their educations, raising the Female school enrollment rate, compared to places where most population concentrates in rural areas of the country. Lastly, F.Sch.Enr and Immu are highly correlated because as mentioned before, as mom’s education increases so does the knowledge of how to lead a healthier life for their children. To analyze the ceteris paribus effect of any of our independent variable on LE thus becomes a major difficulty, since each of them is tightly correlated with one another; we are facing the problem of multicollinearity. As we showed in the arguments above, this problem will still exist no matter what other proxies for healthcare, education or social infrastructure we choose. The problem here is that mulicollinearity increase the standard error of the estimated slopes, decreases the t-test and thus makes it harder to reject the null hypothesis that a certain variable does not have an impact, or have an unexpected impact on the dependent variable. Therefore, if we can show that in spite of the problem of multicollinearity, all variables in our final model are statistically significant, and then multicollinearity indeed helps strengthen the validity of Page 19


The multiple regression analysis our model, not weaken it. However, the values of slopes (or the marginal effects) of the variables are not entirely accurate. This may cause some difficulty when drafting policies. In other words, the value of our model will lie in the fact that it reflects changes in LE as a result of a change in any of our variables. We now revisit our model:

(

Life expectancy=

)

∑

In which Îą is the intercept, is the slope of growth rate of GDP per capita (that is, the change in LE given a change in Log.GDP.pc, keeping everything else constant), are the other variables that potentially influence LE, and are their corresponding slopes. We thus construct the hypothesis test, in which the null hypothesis is that in our final model, the slopes of all independent variables including of Log.GDP.pc will either remain constant, or have the signs in the unexpected directions, meaning that they do not explain the variations in LE, and the alternating hypothesis is that these slopes are indeed statistically significant. Our critical value will be 1.658 ( 5%). The expected signs for the coefficients Variable Coefficient Direction

of the independent variables are listed below:

Log.GDP.pc

E.Free

F.Sch.Enr

Immu

H2O

Urban

CO2

+

+

+

+

+

+

-

Mindful of this, we now perform multiple regression analysis. A concrete result for this task is provided below:

Constant Log.GDP.pc (T test) E.Free (T test) F.Sch.Enr (T test) H2O (T test) Immu (T test) Urban (T test) Log CO2 (T test) R-Square (adj) F-incremental F-Test N

Multiple regression models 1 2 3 14.4 15.3 25.1 14.5 13.9 7.95 (15.95) (12.94) (6.43) 0.028 (0.61) 0.192 (6.28)

62.00% 254 156

64.90% 11.56 122 132

78.10% 81.4 197 111

4 25.7 4.75 (3.46)

5 19.7 5.39 (4.14)

6 19.2 5.63 (3.95)

7 8.38 7.62 (4.42)

8 10.8 8.40 (4.94)

0.16 (5.18) 0.163 (4.35)

0.11 (3.51) 0.097 (2.49) 0.147 (3.77)

0.11 (3.51) 0.099 (2.51) 0.146 (3.71) -0.011 (-0.41)

0.113 (3.6) 0.112 (2.85) 0.177 (4.27)

0.123 (3.92)

-2.78 (-1.94) 84.3% 3.95 112 104

-1.92 (-1.34) 82.9% 11.16 132 109

81.80% 21.4 155 104

83.90% 14.1 135 104

83.8% 0.64 107 104

0.212 (5.32)

Page 20


The multiple regression analysis Our final model is model 7, which includes Log.GDP.pc, F.Sch.Enr, H2O, Immu, and LogCO2. The slopes of Log.GDP.pc, F.Sch.Enr, H2O, Immu are significant at 99%, and the slope of LogCO2 is significant at 95%. The incremental F-test value is significant at 95% and thus shows that including LogCO2 into the model has improved the explanatory power of the model. Given the critical value 1.658, we are able to reject the null hypothesis and conclude that all of our variables in the final model do have an impact on LE and consequently, our model can be used as a statistical reference to implement some policy decisions with the goal of improving LE worldwide. A very interesting observation here is that the slope of LogCO2 has reversed the sign from positive to negative, and therefore meets our expectation that the growth rate of CO2 emission is a negative factor with regard to LE. The switch in the sign of LogCO2 also confirms that we have been able to draw out the effects of other factors having on LogCO2 and thus, reveals the true relationship between LE and LogCO2. Also, compare our best model with the simple model that has only Log.GDP.pc, we see that the constant and of the slope of Log.GDP.pc have reduced significantly, meaning that we have also been able to draw out some of the effects of other variables other than only GDP, thus making our model more applicable towards our purpose of improving LE. Finally, an R-Square adjusted value of 84.4% is quite promising because we are able to explain close to 85% of the variation in LE just by using this model. This, of course, is only appropriate in the context where we exclude countries with extreme prevalence of HIV.

Page 21


The conclusion In our research, we find that life expectancy is influenced by many factors, ranging from education to healthcare to environmental concern. Among them, while GDP per capita growth rate is more of a general variable and cannot be easily controlled, other variables such as Immunization, Clean water or Female education can be seen as low-cost approaches to improve LE, especially in developing countries where these social aspects are not fully developed yet. While there is no doubt about the important effect of GDP per capita on LE, using our model we show that, even when holding Log.GDP.pc constant, improvement in LE is still possible with the expansion of government expenditure on things that are normally overlooked: essential healthcare services and educations for the common people. This is definitely good lesson for developing countries in particular and countries in the world as a whole because the solution to the general development problem has been an intractable goal.

Page 22


Sources All online data were taken acquired from April 14th to April 28th, 2012. Ridley J.C. (2001). Rising Life Expectancy: A Global History. Cambridge University Press. USA. United Nations (2005). World Population Monitoring 2003. Access to improved water resources http://data.worldbank.org/indicator/SH.H2O.SAFE.RU.ZS CO2 emissions per capita http://data.worldbank.org/indicator/EN.ATM.CO2E. Economic Freedom Index http://www.heritage.org/index/explore.aspx?view=by-region-country-year. Female schooling enrollment rate http://stats.uis.unesco.org/unesco/TableViewer/document.aspx?ReportId=136&IF_Language=eng&BR_Topic=0. GDP per capita based on purchasing power parity (PPP) http://data.worldbank.org/indicator/NY.GDP.PCAP.CD/countries. HIV prevalence rate in adults from 15-45 (%) http://apps.who.int/ghodata/?vid=34000# Immunization DPT http://data.worldbank.org/indicator/SH.IMM.IDPT Life Expectancy http://www.census.gov/ipc/www/idbsprd.html. Urbanization rate http://data.worldbank.org/indicator/SP.URB.TOTL World's Life Expectancy, 2003 http://www.census.gov/ipc/www/idbsprd.html


Appendix A: Glossary of technical terms Mean –the arithmetic mean of observations in the data set. It is the sum of all observations divided by the number of observations. N



x i 1

i

N

StDev – the Standard Deviation; it shows how much variation or 'dispersion' there is from the 'average' in the dataset. n

s2 

 x  x  i 1

2

i

n 1

CoefVar – The coefficient of variation: it is a normalized measure of dispersion of a probability distribution. The coefficient of variation is defined as the ratio of the standard deviation to the mean. Its value does not depend on the unit of the random variable.

Minimum – the smallest value of available observations. Median – Value in the middle when data are placed in ascending order. Maximum – the greatest value of available observations. Range – the difference between the maximum and the minimum. Correlation coefficient ‘r’: A measure of linear association between two variables that takes on values between -1 and 1. Values near +1 indicate strong positive linear relationship; values near -1 indicate a strong negative linear relationship; and values near 0 indicate the lack of a linear relationship. ∑(

Where

∑(

̅ )(

̅)

is the covariance of x and y, and

̅ )(

̅)

is the product of standard deviations of x and y. Value

of the correlation coefficient is not affected by the units of measurement for x and y.

Multicollinearity: A statistical term used to refer to the correlation among the independent variables in a multiple regression model, which ranges between -1 and 1. A value of 0 indicates no linear relationship between the two variables; values near -1 indicate strong negative relationships, and values near 1 indicate strong positive relationship. Although Multicollinearity does not affect the way we perform our regression analysis or interpret the output, it distorts the results of t tests on the individual parameters.

Page 24


Appendix B: Hypothesis testing A statistical hypothesis is an assumption about a population parameter, which may or may not be true. There are two types of statistical hypotheses. 

Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance.

Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some non-random cause.

Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample data. This process is called hypothesis testing, consisting of four steps: state the hypotheses, formulate an analysis plan, analyze sample data and interpret results. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level, often denoted by α (alpha). In this memo, we assume that α = 5%. The t-score formula for rejection zone testing in testing slopes of regression equations is: ̂ ̂

where ̂ and

̂

are defined as above.

If the test statistic t is above a level pre-specified, then we reject the null hypothesis. If not, then we accept it. The prespecified level is called the level of significance, and by convention often takes up the value of 10%, 5%, 2.5%, and 1%.

Page 25


Appendix C: Regression techniques: Least Square Method Purpose: To find a straight line that best represents the linear correlation in a dataset between 2 given variables: the dependent variable Y on the vertical axis and the independent variable X on the horizontal axis. The Method: According to this method, the best line is the one that minimizes the sum of the squares of the differences between the observed values and the estimated values of the dependent variable Y for all data points in the data set. Using mathematical symbols, the value to be minimized is: ∑(

̂)

where n = the total number of observations yi = observed value of the dependent variable for the ith observation ̂ = estimated value of the dependent variable for the ith observation Let the estimated regression equation be of the form ̂ formulas: ̂ where 

∑(

̅ )( ∑(

̂ ̅)

̅)

̂ , then ̂ and ̂ can be calculated using the following

̂

̅

̂ ̅

̅ and ̅ are the means of X and Y, respectively Evaluation: Goodness of fit The goodness of fit R2, or the Coefficient of Determination, measures how much of the variation in y i has been explained by ̂ ̂ ̂ . R2 ranges from 0 to 1, with 1 meaning that all points are on the regression line.

On the graph, the Total, or Total Sum of Squares (TSS) presents the total variation within the data; the Explained, or the Regression Sum of Squares (RSS) shows the portion of variation explained by the regression; and last but not least, the Unexplained, or the Error Sum of Squares (ESS) presents the unexplained portion. In this sense, what we are trying to do when using the Least squares Method is simply to minimize the ESS.

By definition of the Goodness of fit, we then have:

Page 26


Appendix C: Regression techniques: The error term in the regression model Assumptions regarding the error term u: The tests of significance in regression analysis are based on the following assumptions about the error term u: 1. u is a normally distributed random variable, i.e u N for all values of x. Because our regression model y is a linear function of u, y is also normally distributed. 2. The standard deviation of u is constant within a model, i.e. Since in the formal regression model, are population parameters, the standard deviation of y about the regression line will be exactly equal to that of the error term u, and is the same for all value of x. 3. The expected value of u is EV(u)=0. Meaning and estimation of the standard deviation of u, su: First, recall that standard deviation, the square root of variance, is a measure of variability of a random variable about its mean. A standard deviation of u different than 0 confirms the existence of variability in u, and thus presents the variance of the y values about the regression line. From the previous Primer, we know that the ESS, Error Sum of Square, given by the formula ∑( ̂ ) is a measure of variability of the actual observations about the estimated regression line. Statistical calculations show us that the variance of y can be estimated by dividing this ESS by its associated degree of freedom (n-k), with k is the number of population parameters being estimated: ∑( ̂)

The standard deviation of y follows by

∑(

̂)

Using the implication from the second assumption regarding the error term u, we can be confident to say that the √

standard deviation of u is

∑(

̂)

Properties of the sampling distribution of ̂ and the estimation interval around ̂ When using ̂ as an estimator for it is essential to note that once again due to the existence of the error term and its normal distribution, ̂ is not a fixed value but also a sample statistics with its own sampling distribution. The properties of the sampling distribution of ̂ follow: 1. The expected value of ̂ is 2. The estimated standard deviation of ̂ is 3.

̂

√∑(

̅)

̂ is normally distributed.

Keeping those properties in mind, we can now develop an estimation interval around ̂ sing the t-test method. With a ) 95%, or level of confidence, say ( our interval will be of the form: (̂

.

̂;

̂

.

̂ is the point estimator;

̂)

̂

is the margin of error; and

is the t value providing an area of ⁄ in the upper tail

of a t distribution with a (n-k) degree of freedom. Page 27


Appendix C: Regression techniques: The error term in the regression model The nature of the hypothesis test H0:

0:

From the model y= α+βx+u, since α, β are two constants, we can infer that EV(y) = EV(α+βx+u) = EV(α)+EV(βx)+EV(u) = α+βx+0 = α+βx If β=0, then EV(y)= α regardless of value of x. Thus the expected y value does not depend on the value of x and we conclude that y and x are not linearly correlated. A value of β different than 0, on the other hand, indicates some relationship existed between x and y. In particular, a positive value of β shows that x and y are positively correlated and vice versa. Therefore, the purpose of the hypothesis test H0:

0

H a:

0

is to test if x and y are negatively correlated (i.e if we can reject the null hypothesis to accept the alternative one). Using the test statistic t =

̂ ̂

, we reject the null hypothesis when t <

with

is the critical value corresponding to

the willingness to make a type I mistake

Constructing a confidence interval for : Our estimate of the slope ̂ is a point estimate. From this point estimate, we can construct a confidence interval around it where the true slope  has a high chance of lying in. This is possible and necessary because ̂ is also not a fixed value, but a probability distribution, which again due to the existence of the error term. Thus, there is an associated variance ̂ . This is analogous to constructing a confidence interval for the expected mean value ̅ of a random variable V in a single variable dataset.

First, we calculate the standard deviation of ̂ . The formula is: s bˆ 

su

 x

i

 x

2

Then, we can proceed to construct the confidence interval around the estimated value of the slope as follows: ̂ (̂ ̂ ̂) Our level of significance is chosen to be α = 5%. Note here that we do not use the standardized normal benchmark but the studentized benchmark

,

instead, with (n-k) degree of freedom, n is the number of observations and k is the

number of population parameters.

Page 28


Appendix C: Regression techniques: The error term in the regression model

Confidence interval and prediction interval for Y: From regression analysis, there are in essence two ways that we can infer about the value Y, given the value of X. Again, we can ask two different questions: 1. for a given value of X, what would the value of a single Y be? 2. For a given value of X, what would the average value of Y likely be? The answer to the first question is a prediction interval, and the answer to the second question is a confidence interval. We now examine these two in details. 

Prediction interval

Given a hypothetical data point (xp, yp) in which we know xp, a prediction interval is the interval where the corresponding yp would most likely lie. Once more, there exists variance in the value of y p, due to the error term. Analogous to other statistic interval, the prediction interval is: ( )

̂

̂

̂

̂ is taken from the regression equation we just plug it into the regression equation The estimate of standard deviation of the distribution of yp, is

( ∑(

̂

̅) ̅)

Confidence interval

Given a set of hypothetical data points (xp, yp1), (xp, yp2 ), (xp, yp3)… with the same known X-coordinate value of xp, a confidence interval is the interval where ̅̅̅, the corresponding mean value of all the yp would most likely lie. This confidence interval is calculated as ̂) (̅̅̅ ̂ But ̅̅̅

̂ ̅̅̅̅

̂ ̅̅̅

̂ ̅̅̅̅

̂, so in fact this interval is just ̂

̂ ̅̅̅̅

And the only difference is the standard deviation in question, ̂ ̅̅̅̅

̂. ̅̅̅̅

( ∑(

It is calculated as ̅) ̅)

which is always smaller than the standard deviation for prediction interval, regardless of the value of xp (see the graph below).

Page 29


Appendix C: Regression techniques: Advanced tools in model assessment R-squared adjusted: R-squared adjusted is a simple method which is used to account for the fact that R-squared always increase as we increase the number of independent variables. The equation used to calculate R² Adj. is:

1   y  yˆ  / n  m  1 Adj.   y  y  /n  1 2

R

2

i

2

i

Where n 5) is the number of observations, and m is the number of independent variables. The value (n-(m+1)) is the number of degrees of freedom for ESS, and (n-1) is the number of degrees of freedom for TSS. The Global F-test: The global F-test measures the extent to which a set of independent variables influences the dependent variable. It tells us whether or not that marginal impact is small or large. We set up a null hypothesis saying that all of our independent variables are equal to each other and equal to 0. That is,

Against: To test this, we use the F-statistic with formula: (

) ⁄(

)

). If the statistic is larger than the critical value, we Where the F-statistics has degree of freedom m and ( reject the null hypothesis, and say that our model has some explanatory power.

Incremental F Test: This test allowed us to determine the added descriptive value of each variable as we built our model. Specifically it compares the explanatory power of the model without the added variable (the restricted model) to the explanatory power of the model with the new variable. If the value from the F Test was below our critical value* we rejected the new variable on the basis that it does not have any additional explanatory power and built the model without the variable. ⁄ ⁄

*Critical values are determined with an F-distribution table. Page 30


Appendix D: Summary table of variables

Data on Life Expectancy in 2003 by countries and relevant proxies by categories Note: This color indicates countries with HIV prevalence who as a result are excluded from our analysis. All data were acquired and updated in April, 2012. * indicates missing data. Life Expectancy 1 Country

Both

Year

Male

Female

2003

Economic proxies

Education

GDP per capita Growth rate (%)

Economic Freedom Index 3

Female Schooling Enrollment rate (%) 4

DPT Immu. rate (%)

2002

2003

2000

GDP per capita (US$) 2 2002

Healthcare

5

Clean Water rate (%)6 2000

Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002

2001

HIV rate (%)9 2000

Afghanistan

42.0

41.8

42.2

524

2.7

*

29.0

33

21

21.9

-1.62

*

Albania

76.9

74.1

79.9

4,844

3.7

56.8

66.9

97

97

42.9

0.02

*

Algeria

72.5

71.0

74.0

5,844

3.8

61

*

89

89

61.2

0.43

0.1

Angola

38.1

37.0

39.3

2,680

3.4

*

*

42

41

51.0

-0.17

1.9

Antigua and Barbuda

71.3

69.0

73.8

15,206

4.2

*

*

97

91

31.5

0.64

0.4

Argentina

75.5

71.7

79.4

7,912

3.9

65.7

94.5

83

96

90.6

0.58

*

Armenia

70.9

67.5

75.0

2,635

3.4

68

72.6

94

93

64.7

0.06

0.1

Aruba

78.8

75.5

82.3

*

*

*

83.3

*

100

46.7

1.38

*

Australia

80.1

77.3

83.1

28,907

4.5

77.3

116.7

92

100

87.6

1.22

0.1

Austria

78.8

76.0

81.8

30,458

4.5

67.4

87.1

84

100

66.1

0.90

0.1

Azerbaijan

63.2

59.0

67.6

2,746

3.4

53.3

70.4

75

74

51.3

0.56

0.1

Bahamas, The

65.7

62.3

69.2

24,619

4.4

74.4

*

99

96

82.4

0.78

3.2

Bahrain

73.7

71.3

76.2

23,919

4.4

75.6

*

99

*

88.4

1.34

*

Bangladesh

61.3

61.5

61.2

948

3.0

51.9

*

85

79

24.4

-0.61

0.1

Barbados

72.2

70.3

74.2

15,259

4.2

73.6

*

84

100

37.1

0.66

0.4

Belarus

68.4

62.5

74.6

5,935

3.8

39

90.9

99

100

70.8

0.72

0.1

Belgium

78.3

75.1

81.6

30,046

4.5

67.6

118.9

95

100

97.2

1.05

0.2

Belize

68.6

66.5

70.7

5,380

3.7

65.6

73.4

96

89

48.8

0.44

2.2

Benin

52.0

50.9

53.2

1,241

3.1

57.3

42.2

76

66

39.0

-0.59

1.4

Bermuda

77.4

75.4

79.5

*

*

*

*

*

*

100.0

0.90

*

Bhutan

53.6

53.9

53.3

2,735

3.4

*

*

88

91

27.6

-0.15

0.1

Bolivia

64.8

62.2

67.5

3,290

3.5

65.1

*

71

79

62.8

0.02

0.2

Bosnia and Herzegovina

79.1

75.5

82.9

4,968

3.7

37.4

*

91

97

44.2

0.74

*

Botswana

34.8

34.2

35.3

9,627

4.0

66.2

72.2

97

94

54.8

0.39

26

Brazil

71.1

67.2

75.3

7,372

3.9

61.5

88.1

98

93

82.4

0.28

*

Brunei Darussalam

74.3

71.9

76.8

45,630

4.7

*

81.2

97

*

72.1

1.27

*

Bulgaria

71.5

67.9

75.3

7,579

3.9

57.1

75.7

94

100

69.4

0.77

0.1

Burkina Faso

47.6

46.2

49.0

842

2.9

58.8

19.3

71

60

17.3

-1.10

2.3

Burundi

49.2

48.7

49.8

332

2.5

*

31.1

81

72

8.8

-1.48

5.2

Cambodia

58.3

56.3

60.4

1,065

3.0

60.7

54.0

60

46

18.0

-0.68

1.3

Cameroon

50.6

50.3

50.9

1,773

3.2

52.8

48.4

64

64

51.7

-0.67

5.5

Canada

79.8

76.4

83.4

29,903

4.5

74.6

*

92

100

79.7

1.23

0.2

Cape Verde

69.8

66.5

73.2

2,203

3.3

57.6

*

90

83

55.0

-0.33

*

Cayman Islands

79.7

77.1

82.3

*

*

*

*

*

93

100.0

1.03

*

Page 31


Appendix D: Summary table of variables Life Expectancy 1 Country

Both

Year

Male

Female

2003

Economic proxies

Education

GDP per capita Growth rate (%)

Economic Freedom Index 3

Female Schooling Enrollment rate (%) 4

DPT Immu. rate (%)

2002

2003

2000

GDP per capita (US$) 2 2002

Healthcare

5

Clean Water rate (%)6 2000

Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002

2001

HIV rate (%)9 2000

Central African Republic

43.4

43.1

43.7

680

2.8

59.8

*

40

63

37.8

-1.19

9.4

Chad

46.6

45.0

48.4

780

2.9

49.2

27.7

26

45

24.2

-1.69

3

Chile

76.2

72.9

79.6

9,918

4.0

77.8

78.8

97

94

86.6

0.54

0.2

China

71.6

70.1

73.3

2,863

3.5

52.8

66.1

86

80

37.6

0.44

*

Colombia

71.1

67.3

75.1

6,152

3.8

64.2

*

80

91

72.7

0.14

0.9

Comoros

61.2

58.9

63.5

987

3.0

*

52.7

70

92

28.0

-0.82

0.1

Congo, Dem. Rep.

51.2

50.2

52.2

230

2.4

*

50.1

30

44

30.7

-1.51

3.9

Congo, Rep.

50.4

49.0

51.8

2,975

3.5

45.3

*

31

70

59.1

-0.57

*

Costa Rica

76.4

73.9

79.1

7,483

3.9

67.5

*

91

95

60.1

0.16

0.1

Cote d'Ivoire

48.2

45.7

50.8

1,586

3.2

57.3

*

66

78

44.8

-0.34

6.9

Croatia

73.8

69.6

78.3

12,572

4.1

51.1

75.4

94

99

56.0

0.67

0.1

Cuba

76.8

74.6

79.2

*

*

32.4

79.5

99

90

75.6

0.36

0.1

Cyprus

77.3

74.9

79.7

21,375

4.3

73

78.3

97

100

68.9

0.85

*

Czech Republic

75.5

72.3

79.0

16,866

4.2

66.5

80.7

98

100

73.8

1.09

0.1

Denmark

77.3

75.0

79.6

30,766

4.5

71.1

104.5

97

100

85.4

0.96

0.1

Djibouti

43.1

41.8

44.5

1,620

3.2

57.8

18.5

53

84

84.4

-0.29

2.9

Dominica

74.1

71.2

77.2

7,665

3.9

*

*

99

95

71.8

0.21

*

Dominican Republic

71.0

69.5

72.6

5,566

3.7

58.6

79.7

72

87

64.2

0.37

1

Ecuador

75.8

73.0

78.8

5,307

3.7

53.1

*

90

86

61.6

0.27

0.5

Egypt, Arab Rep.

70.4

67.9

73.0

3,898

3.6

54.1

*

99

96

42.6

0.26

0.1

El Salvador

70.6

67.0

74.4

4,919

3.7

73

69.9

92

82

59.0

0.00

0.7

Equatorial Guinea

50.4

48.4

52.5

14,518

4.2

46.4

*

33

43

38.8

0.76

1.5

Eritrea

57.5

56.1

59.0

612

2.8

*

28.1

86

54

18.4

-0.78

1.2

Estonia

71.0

65.3

77.1

11,989

4.1

77.6

98.1

94

98

69.4

1.07

0.5

Ethiopia

48.5

47.3

49.7

507

2.7

49.8

29.0

63

28

15.4

-1.19

*

Faeroe Islands

78.9

75.4

82.4

*

*

*

*

*

*

37.7

1.15

*

Fiji

68.9

66.4

71.4

3,747

3.6

53.9

76.7

91

*

49.3

0.14

0.1

Finland

78.1

74.7

81.8

27,531

4.4

73.6

111.7

98

100

61.6

1.04

0.1

France

79.3

75.6

83.1

27,658

4.4

58

93.1

97

100

76.2

0.80

0.3

French Polynesia

75.5

73.1

77.9

*

*

*

*

*

100

52.1

0.48

*

Gabon

56.1

54.5

57.8

11,902

4.1

58

*

45

85

81.5

0.15

5.2

Gambia, The

53.1

51.3

55.0

1,029

3.0

57.7

*

87

84

51.0

-0.67

0.5

Georgia

75.4

72.1

79.2

2,584

3.4

56.7

73.9

87

89

52.6

-0.07

0.1

Germany

78.4

75.5

81.6

27,437

4.4

70.4

*

93

100

73.2

1.02

0.1

Ghana

57.7

57.0

58.5

1,018

3.0

57.2

45.7

87

71

45.5

-0.45

2.3

Gibraltar

79.4

76.5

82.4

*

*

*

*

*

*

100.0

1.10

*

Greece

78.8

76.3

81.4

21,402

4.3

59.1

92.4

91

99

60.0

0.93

0.1

Greenland

69.0

65.4

72.7

*

*

*

*

*

*

82.1

0.98

*

Grenada

64.5

62.7

66.3

8,194

3.9

*

*

96

94

30.8

0.34

*

Guam

77.8

74.8

81.0

*

*

*

*

*

100

93.1

*

*

Guatemala

68.4

66.8

70.2

3,692

3.6

62.3

60.3

77

89

45.9

-0.03

0.5

Page 32


Appendix D: Summary table of variables Life Expectancy 1 Country

Both

Year

Male

Female

2003

Economic proxies

Education

GDP per capita Growth rate (%)

Economic Freedom Index 3

Female Schooling Enrollment rate (%) 4

DPT Immu. rate (%)

2002

2003

2000

GDP per capita (US$) 2 2002

Healthcare

5

Clean Water rate (%)6 2000

Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002

2001

HIV rate (%)9 2000

Guinea

49.1

47.9

50.4

846

2.9

52.9

32.7

50

62

31.8

-0.81

1.7

Guinea-Bissau

46.3

44.4

48.4

945

3.0

42.3

*

53

55

29.7

-0.81

1.8

Guyana

64.7

62.1

67.3

2,357

3.4

54.3

*

85

89

28.4

0.31

1.5

Haiti

52.3

51.0

53.7

1,000

3.0

47.9

*

51

55

38.4

-0.75

2.8

Honduras

69.3

67.6

70.9

2,722

3.4

58.7

*

96

80

45.2

-0.05

1.3

Hong Kong SAR, China

81.3

78.6

84.2

27,753

4.4

89.4

72.4

*

*

100.0

0.75

*

Hungary

72.1

68.0

76.5

14,669

4.2

64.5

91.5

99

99

65.3

0.75

0.1

Iceland

80.2

78.2

82.2

31,036

4.5

73.1

101.6

92

100

92.2

0.87

0.2

India

63.6

62.9

64.4

1,724

3.2

51.2

52.2

60

81

28.1

0.07

0.4

Indonesia

68.9

66.5

71.5

2,550

3.4

54.8

63.4

71

77

44.4

0.13

0.1

Iran, Islamic Rep.

69.4

68.0

70.7

7,489

3.9

36.4

65.3

96

93

65.3

0.73

0.1

Iraq

67.8

66.7

69.0

3,480

3.5

15.6

*

76

80

67.4

0.53

*

Ireland

77.2

74.5

80.0

33,052

4.5

80.5

97.4

84

100

59.7

1.05

0.2

Israel

79.0

77.0

81.2

23,535

4.4

66.9

94.0

95

100

91.5

1.01

0.1

Italy

79.4

76.5

82.5

26,804

4.4

63.6

89.5

93

100

67.4

0.89

0.3

Jamaica

73.4

71.8

75.2

6,093

3.8

61.7

72.8

98

93

52.2

0.61

1.9

Japan

80.9

77.6

84.4

26,814

4.4

66.7

84.1

95

100

65.5

0.98

0.1

Jordan

77.9

75.4

80.5

3,507

3.5

66.2

80.7

99

96

78.3

0.51

*

Kazakhstan

65.6

60.2

71.3

6,216

3.8

52.4

89.1

95

96

56.6

1.00

0.1

Kenya

46.6

47.4

45.7

1,171

3.1

58.2

56.4

76

52

20.1

-0.53

9

Kiribati

60.9

58.0

64.0

2,153

3.3

*

79.4

85

62

43.2

-0.52

*

Korea, Dem. Rep.

70.8

68.1

73.6

*

*

69.5

*

62

100

60.8

0.54

*

Korea, Rep.

76.5

73.0

80.2

19,656

4.3

8.9

88.8

97

93

80.1

0.97

0.1

Kuwait

76.7

75.7

77.6

34,376

4.5

65.4

90.4

99

99

98.2

1.44

*

Kyrgyz Republic

67.5

63.5

71.7

1,434

3.2

51.7

80.7

99

82

35.6

-0.12

0.1

Lao PDR

54.3

52.3

56.3

1,343

3.1

36.8

52.5

52

48

24.2

-0.65

0.1

Latvia

70.7

66.0

75.6

9,870

4.0

65

95.0

97

99

68.1

0.46

0.3

Lebanon

72.1

69.6

74.6

8,262

3.9

57.1

*

80

100

86.2

0.63

0.2

Lesotho

35.3

35.8

34.8

1,035

3.0

48.9

62.4

78

74

21.3

*

24.5

Liberia

37.5

35.2

39.9

446

2.6

*

*

42

65

55.8

-0.77

3.3

Libya

76.1

73.9

78.3

10,573

4.0

35.4

94.0

94

54

76.6

0.98

*

Liechtenstein

79.3

75.6

82.9

*

*

*

65.0

*

*

14.9

*

*

Lithuania

73.0

67.5

78.7

10,567

4.0

66.1

97.3

95

*

66.8

0.57

0.1

Luxembourg

78.4

75.2

81.9

57,550

4.8

79.4

76.9

98

100

83.4

1.29

0.2

Macao SAR, China

81.9

79.1

84.9

23,603

4.4

*

84.6

*

*

100.0

0.58

*

Macedonia, FYR

73.2

70.8

75.8

6,037

3.8

58

*

91

100

63.9

0.77

*

Madagascar

56.1

53.8

58.5

725

2.9

56.8

*

59

37

27.7

-0.56

0.2

Malawi

41.1

41.2

40.9

564

2.8

56.9

66.5

90

63

16.0

-1.10

14.2

Malaysia

71.7

69.0

74.5

9,515

4.0

60.1

74.7

95

97

64.2

1.16

0.4

Maldives

63.3

62.1

64.6

4,229

3.6

*

76.1

98

91

30.2

0.72

0.1

Mali

48.0

46.0

50.0

768

2.9

61.1

28.8

49

44

28.9

-0.93

1.7

Page 33


Appendix D: Summary table of variables Life Expectancy 1 Country

Both

Year

Male

Female

2003

Economic proxies

Education

GDP per capita Growth rate (%)

Economic Freedom Index 3

Female Schooling Enrollment rate (%) 4

DPT Immu. rate (%)

2002

2003

2000

GDP per capita (US$) 2 2002

Healthcare

5

Clean Water rate (%)6 2000

Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002

2001

HIV rate (%)9 2000

Malta

78.5

76.3

80.8

19,045

4.3

62.2

76.0

95

100

92.9

1.20

0.1

Marshall Islands

69.4

67.5

71.4

*

*

*

70.9

48

95

69.0

0.59

*

Mauritania

51.9

49.8

54.1

1,402

3.1

52.5

42.6

72

40

40.2

0.07

0.6

Mauritius

71.8

67.8

75.9

8,662

3.9

67.7

69.7

92

99

42.5

0.79

0.3

Mayotte

60.6

58.5

62.8

*

*

*

*

*

*

*

*

*

Mexico

74.7

71.9

77.6

9,318

4.0

63

75.9

97

90

75.3

0.59

0.3

Micronesia, Fed. Sts.

69.1

67.4

71.0

2,843

3.5

*

*

75

92

22.3

-0.29

*

Moldova

64.9

60.6

69.4

1,754

3.2

57.4

73.3

97

92

43.8

0.01

0.4

Monaco

79.3

75.4

83.4

*

*

*

*

99

100

100.0

*

*

Mongolia

63.8

61.6

66.1

2,151

3.3

56.7

80.7

95

66

56.6

0.51

0.1

Morocco

70.0

67.8

72.4

2,913

3.5

59

52.7

96

78

54.0

0.11

0.1

Mozambique

41.5

40.8

42.2

538

2.7

57.7

*

73

42

32.2

-1.07

8.6

Namibia

46.0

46.0

46.1

4,212

3.6

65.1

73.6

78

81

33.5

0.03

15.3

Nepal

59.0

59.4

58.6

833

2.9

52.3

*

72

83

14.4

-0.86

0.5

Netherlands

78.6

76.1

81.2

31,940

4.5

75.1

97.5

97

100

78.2

1.02

0.2

New Caledonia

73.5

70.6

76.6

*

*

*

*

*

*

62.6

0.99

*

New Zealand

78.3

75.3

81.4

22,899

4.4

80.7

108.6

90

100

85.9

0.95

0.1

Nicaragua

69.7

67.7

71.8

1,986

3.3

61.1

71.0

87

80

55.2

-0.13

0.1

Niger

43.0

43.1

43.0

566

2.8

48.2

17.0

36

42

16.2

-1.17

1

Nigeria

46.2

45.7

46.8

1,350

3.1

50.9

48.0

27

53

44.0

-0.18

3.9

Northern Mariana Islands

75.5

72.9

78.2

*

*

*

*

*

98

90.4

*

*

Norway

79.1

76.5

81.9

37,060

4.6

67.4

104.1

91

100

76.6

0.96

0.1

Oman

72.6

70.4

74.9

18,966

4.3

64

67.0

99

83

71.6

0.95

0.1

Pakistan

62.2

61.3

63.1

1,719

3.2

55.8

31.0

65

88

33.9

-0.13

0.1

Palau

69.5

66.4

72.8

11,073

4.0

*

*

98

83

72.8

0.98

*

Panama

75.3

72.6

78.0

7,419

3.9

68.5

*

99

90

67.8

0.37

1.4

Papua New Guinea

64.2

62.1

66.4

1,703

3.2

*

*

55

39

13.0

-0.23

0.4

Paraguay

74.4

71.9

77.0

3,424

3.5

59.6

73.2

89

74

56.6

-0.15

0.3

Peru

68.9

67.2

70.7

5,229

3.7

64.8

81.0

90

79

70.9

0.02

0.5

Philippines

69.3

66.4

72.3

2,541

3.4

60.7

80.9

79

88

60.2

-0.01

0.1

Poland

74.6

70.7

78.8

11,563

4.1

65

93.5

98

100

61.6

0.90

0.1

Portugal

77.2

73.9

80.7

19,088

4.3

65.4

95.1

97

99

55.7

0.79

0.4

Puerto Rico

78.1

74.1

82.2

*

*

*

*

*

*

95.8

*

*

Qatar

73.1

70.7

75.8

62,591

4.8

61.9

83.6

93

100

95.1

1.69

0.1

Romania

70.9

67.4

74.6

7,013

3.8

48.7

75.9

99

*

53.6

0.64

0.1

Russian Federation

66.4

59.9

73.3

8,029

3.9

48.7

90.8

96

95

73.2

1.02

0.3

Rwanda

46.3

45.3

47.4

683

2.8

50.4

50.1

77

67

15.3

-1.09

3.8

Samoa

70.1

67.4

73.0

3,117

3.5

*

80.1

93

89

22.1

-0.09

*

San Marino

81.4

77.9

85.3

*

*

*

*

96

*

93.7

*

*

Sao Tome and Principe

66.3

64.8

67.8

1,151

3.1

*

62.4

92

79

55.3

-0.19

*

Saudi Arabia

75.0

73.1

77.0

17,602

4.2

65.3

*

97

*

80.3

1.16

*

Page 34


Appendix D: Summary table of variables Life Expectancy 1 Country

Both

Year

Male

Female

2003

Economic proxies

Education

GDP per capita Growth rate (%)

Economic Freedom Index 3

Female Schooling Enrollment rate (%) 4

DPT Immu. rate (%)

2002

2003

2000

GDP per capita (US$) 2 2002

Healthcare

5

Clean Water rate (%)6 2000

Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002

2001

HIV rate (%)9 2000

Senegal

58.2

56.7

59.8

1,403

3.1

58.6

35.5

52

65

41.0

-0.35

0.6

Serbia

74.1

71.6

76.7

6,587

3.8

46.6

*

93

99

51.3

*

0.1

Seychelles

71.3

65.8

76.9

16,112

4.2

*

83.4

96

*

51.8

0.90

*

Sierra Leone

39.5

37.4

41.7

541

2.7

*

*

38

55

36.0

-0.70

0.9

Singapore

81.4

78.9

84.2

34,800

4.5

87.4

*

96

100

100.0

1.01

0.1

Slovak Republic

73.9

69.9

78.1

12,965

4.1

59.8

74.3

99

100

56.3

0.86

0.1

Slovenia

75.7

71.9

79.7

19,769

4.3

57.8

99.2

92

100

50.3

0.88

0.1

Solomon Islands

72.1

69.6

74.7

1,763

3.2

*

46.7

78

70

16.2

-0.39

*

Somalia

47.3

45.7

49.1

*

*

*

*

33

23

34.0

-1.18

0.2

South Africa

45.3

44.6

46.0

7,244

3.9

64

*

71

86

57.9

0.91

16.1

Spain

79.2

75.9

82.8

24,067

4.4

68.8

95.7

96

100

76.5

0.86

0.4

Sri Lanka

72.6

70.1

75.3

2,829

3.5

64

*

98

80

15.5

-0.27

0.1

Sudan

57.7

56.6

58.9

1,374

3.1

*

31.6

66

61

38.0

-0.74

0.3

Suriname

69.2

66.8

71.8

4,873

3.7

48

*

68

91

72.8

0.68

1

Swaziland

35.3

33.9

36.8

3,972

3.6

60.9

55.6

89

55

23.6

0.03

22.3

Sweden

80.2

78.1

82.5

29,281

4.5

70.8

120.8

99

100

84.1

0.76

0.1

Switzerland

80.2

77.4

83.2

33,658

4.5

79.3

83.2

93

100

73.3

0.77

0.3

Syrian Arab Republic

69.4

68.2

70.7

3,633

3.6

36.3

*

82

87

52.2

0.57

*

Tajikistan

64.4

61.4

67.5

1,054

3.0

47.3

64.9

85

60

26.5

-0.43

0.1

Tanzania

44.6

43.9

45.3

860

2.9

58.3

*

87

54

23.1

-1.05

7.3

Thailand

71.4

69.0

73.9

5,323

3.7

69.1

69.0

96

96

31.6

0.53

1.8

Togo

56.4

54.3

58.5

791

2.9

45.2

51.9

50

55

37.9

-0.63

3.6

Tonga

68.9

66.4

71.4

3,784

3.6

*

*

94

100

23.5

0.15

*

Trinidad and Tobago

66.8

65.5

68.2

14,295

4.2

70.1

64.6

91

91

11.4

1.29

1.2

Tunisia

74.4

72.8

76.2

5,836

3.8

60.2

75.7

98

90

64.2

0.33

0.1

Turkey

71.8

69.4

74.3

8,741

3.9

54.2

65.6

88

93

65.7

0.48

0.1

Turkmenistan

61.2

57.7

64.8

2,919

3.5

43.2

*

95

83

46.4

0.94

*

Turks and Caicos Islands

74.0

71.8

76.3

*

*

*

*

*

100

86.6

-0.15

*

Tuvalu

67.3

65.2

69.6

*

*

*

*

96

94

46.8

*

*

Uganda

49.2

48.7

49.7

765

2.9

61

70.4

55

57

12.3

-1.18

7.3

Ukraine

67.9

62.3

73.7

3,991

3.6

48.2

88.8

99

97

67.4

0.82

0.9

United Arab Emirates

74.8

72.3

77.4

61,603

4.8

73.6

76.5

94

100

77.8

1.51

*

United Kingdom

78.2

75.7

80.7

28,886

4.5

78.5

93.3

91

100

89.5

0.97

0.1

United States

77.1

74.4

80.1

36,797

4.6

78.4

97.9

94

99

79.8

1.28

0.5

Uruguay

75.7

72.5

79.0

7,835

3.9

68.7

*

94

98

91.6

0.19

0.3

Uzbekistan

64.0

60.5

67.6

1,589

3.2

38.5

73.5

99

89

37.1

0.69

0.1

Vanuatu

61.7

60.3

63.2

3,068

3.5

*

61.9

70

72

22.4

-0.35

*

Venezuela, RB

73.8

70.8

77.1

8,004

3.9

54.7

76.0

70

92

90.7

0.84

*

Vietnam

70.2

67.4

73.1

1,644

3.2

45.6

60.7

96

79

25.1

-0.11

0.2

Virgin Islands (U.S.)

78.6

74.7

82.7

*

*

*

*

*

*

93.2

*

*

West Bank and Gaza

72.7

71.0

74.5

2,104

3.3

*

*

*

93

71.5

-0.36

*

Page 35


Appendix D: Summary table of variables Life Expectancy 1 Country

Both

Year

Male

Female

2003

Economic proxies

Education

GDP per capita Growth rate (%)

Economic Freedom Index 3

Female Schooling Enrollment rate (%) 4

DPT Immu. rate (%)

2002

2003

2000

GDP per capita (US$) 2 2002

Healthcare

5

Clean Water rate (%)6 2000

Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002

2001

HIV rate (%)9 2000

Yemen, Rep.

61.0

59.2

62.9

1,993

3.3

48.6

53.3

76

65

27.3

-0.05

*

Zambia

39.0

38.8

39.3

980

3.0

59.6

85

54

34.9

-0.74

14.4

Zimbabwe

39.0

40.0

38.1

*

*

36.7

* *

76

80

34.6

0.00

24.8

Data sources: 1 http://www.census.gov/ipc/www/idbsprd.html. 2 http://data.worldbank.org/indicator/NY.GDP.PCAP.CD/countries. 3 http://www.heritage.org/index/explore.aspx?view=by-region-country-year. 4 http://stats.uis.unesco.org/unesco/TableViewer/document.aspx?ReportId=136&IF_Language=eng&BR_Topic=0. 5 http://data.worldbank.org/indicator/SH.IMM.IDPT 6 http://data.worldbank.org/indicator/SH.H2O.SAFE.RU.ZS 7 http://data.worldbank.org/indicator/SP.URB.TOTL 8 http://data.worldbank.org/indicator/EN.ATM.CO2E. 9 http://apps.who.int/ghodata/?vid=34000#

Page 36


Life Expectancy - Building a predicting model by cross-country analyses