Life Expectancy Building a predicting model by crosscountries analysis
CLIENT: GLOBAL HEALTH ORGANIZATION (GHO) DRI Intern: Trudy Pham April 27, 2012
Page 1
Table of Contents
I Executive Summary.......................................................................................................................................... 3 II Statement of purpose ..................................................................................................................................... 4 III The Dependent Variable ............................................................................................................................... 5 IV Bivariate Analysis ........................................................................................................................................... 8  GDP per capita ..................................................................................................................................... 9  Economic Freedom index ................................................................................................................... 12  Female schooling enrollment ............................................................................................................. 13  Access to clean water resources ......................................................................................................... 14  Children DPT Immunization ................................................................................................................ 15  Urbanization ...................................................................................................................................... 16  CO2 emissions per capita ................................................................................................................... 17
V Summary of Bivariate Analysis ...................................................................................................................... 18 VI Building Multiple Regression Models ........................................................................................................... 19 VII Conclusion and Recommendations ............................................................................................................. 22 Sources ............................................................................................................................................................ 23 Appendix A: Glossary of Statistical Terms ........................................................................................................ 24 Appendix B: Hypothesis Testing ....................................................................................................................... 25 Appendix C: Regression Models ....................................................................................................................... 26 Appendix C: Tables of variables ....................................................................................................................... 31
Page 2
Executive Summary This report serves as a comprehensive answer to the primary question from our client, the Global Health Organization (GHO): “how to increase the health status of the world’s population?” Because Life Expectancy (LE) is a direct measure of health status worldwide, in this paper we focus on searching for the various factors that influence LE. Based on this, we construct a model that best describe and thus predict LE for countries in the world. From the pool of various factors, we have come to decide that economic and social developments potentially have some impacts on our topic of interest, LE. The rationale for choosing proxies from these categories is backed up by theory that people’s lives are most related to changes and improvement in social and economic welfares. These variables are, in order of presentation: 1. 2. 3. 4. 5. 6. 7.
GDP per capita, 2001 Economic Freedom Index, 2002 Female schooling enrollment, 2003 Access to clean water resources, 2000 Children DPT Immunization, 2000 Urbanization population, 2002 CO2 emission per capita, 2001
A useful way to understand why LE has increased over time is to discover why LE varies across countries at specific point in time; this research will henceforward address the issue with respect to the year of 2003. Since the main purpose is to build a model that effectively reflects changes in LE as a result of changes in the independent variables, we perform univariate and bivariate analysis for each variable to confirm our perceptions about their potential effects on LE. The univariate analysis provides a preliminary outlook on the statistics of each independent variable, while the bivariate analysis reflects the individual influence of the factors on the outcome of LE. In the process, each variable is introduced with a priori assumption of how the world works, and those assumptions are tested. Finally, we combine significant variables from prior analysis to build up a comprehensive model to assess the validity of our research question. Upon performing the univariate analysis for our dependent variable LE, it is essential to be aware of the extreme cases in some countries in Africa who are suffering severely from HIV/AIDS. Cross sectional analysis shows that compare people from two random countries that receive similar levels of support from the government, people living in the country with lower prevalence of HIV or even relatively HIVfree will always live longer than those living in countries with high percentage of population dominated by this disease. HIV in fact “helps” distort the effects of our research’s variables, and thus causes unnecessary troubles for our model building process. In addition, since HIV prevalence is not a universal issue, for the purpose of this research, we exclude 23 countries with high prevalence of HIV from the research in order to give a better view of our policy implication. The results we obtained for relatively HIVfree countries suggest that although at first LE seems to depend heavily on GDP, further analysis involving multiple independent variables greatly reduces the explanatory power of GDP over LE. That is, holding GDP constant, it is possible to improve LE by implementing some relevant changes for the remaining variables, which are Female schooling enrollment, Clean Water, Immunization, and CO2 emission. Thus, the obvious policy implication is that to improve LE, it is reasonable to implement some progresses for the macroeconomic conditions of the countries regarding education, healthcare, and pollution. Nevertheless, our report suffers certain drawbacks such as the unavailability of data, leading to the limitation of what proxies could be used to assess the topic. Moreover, given the diverse geographical, biological, and cultural structures regarding people and countries around the globe, it is a challenging task to put them on the same ground to measure the effect of some certain macroeconomic factors on their levels of LE. Thus, the outputs should be taken with consideration. The report itself will only present the output of complex statistical processes; the details of these processes can be found in the appendices at the end of the research. Page 3
Statement of purpose We conduct this research for the year of 2003 to fulfill the commitment toward our client, the Global Health Organization (GHO), of how to improve health status in the world. By univariate analysis on the dependent variable LE, we form the hypothesis that Life expectancy status across countries is not merely random but in fact influenced by various factors. This prompts us to determine these factors and their degree of influence on LE using more advanced statistical tools. We identify various factors we suspect of influencing life expectancy, describe them and their relationship to the dependent variable (LE) and employ them as the independent variables during the modelbuilding. The independent variables are chosen according to our priori knowledge and experience and only those that pass the test for significance will be remained in our last step, the building of the multiple regression model. We anticipate that the best model after all will bring us closest to the true relationship between LE and related factors, which can be used as a predicting model to improve health status in the world. The comprehensive statistical analysis that follows answers the clientâ€™s question to the best of our knowledge and capabilities. By going indepth into the reasons of Life expectancy variation among countries, we ultimately aim to provide GHO with appropriate recommendations about policy implementation.
Page 4
The dependent variable: Life Expectancy (LE) Both Male Female 1
Life expectancy at a given age is the number of years of life remaining to an average person at that specified age, assuming that current agespecific mortality rates remain constant during the personâ€™s lifetime. Throughout this research, life expectancy of a country is measured by life expectancy of the average person at birth.
66.6 70.7 Standard deviation 12.1 18.2 Coef of Variance
64.3 67.5 11.4
69.1 73.7 13.0
17.8
18.8
34.8 81.9 201
33.9 79.1 201
34.8 85.3 201
Mean Median
Minimum Maximum # of countries
Table 1: Life Expectancy by Genders, 2003 Wor ld's Life Expectancy of M ale, 2 0 0 3 Mean=64.3
Med=67.5
The LE mean of 66.6 indicates a relatively high quality of life. The distributions are skewed right in all three cases, meaning it is more common for countries to have LE above average than the other way around. A substantial standard deviation of 12 years and a wide variation of life span (from 34.8 to 81.9 years) also confirm this observation. In addition, countries on the right end of the LE scale seem to gather into a big cluster rather than spread out like those on the other end. Similar to the return to economies of scale, it is easier and faster to improve standard of life when a country has a low LE, yet once reaching a certain level of LE, countries slow down and remain pretty stable over time.
Frequency
30
15
0 37.5
45.0
52.5
60.0 # of years
67.5
75.0
82.5
Wor ld's Life Expectancy of Female, 2 0 0 3
Frequency
Mean=69.1
Med=73.7
20
10
0 37.5
45.0
52.5
60.0 # of years
67.5
75.0
82.5
Wor ld's Life Expectancy of Both sexes, 2 0 0 3
Frequency
Mean=66.6
Med=70.7
20 10 0 37.5
45.0
52.5
60.0 # of years
67.5
75.0
82.5
Figure 1: World's Life Expectancy by Genders, 2003
World's Life Epectancy of Female vs male, 2003 90 80 70
Female
Overall females have a higher life expectancy than males. This has been mainly explained by the difference in both biological structures and healthier lifestyles that females often lead. Few exceptions to the observation above are seen mostly on the bottom part of the line y=x, where lie countries with very low LE. Kenya, Zimbabwe, and Lesotho become the three largest outliers because of the extreme prevalence of HIV. In those countries, mostly underdeveloped, there might be more gender discrimination against female as well.
Y=X
60 50 Kenya 40 30 30
Zimbabwe Lesotho 40
50
60 Male
70
80
Figure 2: Worldâ€™s Life Expectancy of Male vs Female, 2003
1
Data source: http://www.who.int/topics/life_expectancy/en/
Page 5
90
Sources of variations of the dependent variable The skewness of the data set, the existence of strong outliers on the left side of the distribution, and the high coefficient of variation (approximately 18% for both genders) show that the variability of LE across countries is not merely random. Theoretically, every country is a unique entity comprised of different social, economic, and political systems. It is inevitable that there are factors that drive countries apart regarding the matter of LE. As stated by Ridley (2001) in his book “Ring Life Expectancy: A Global History”: “from the beginning, different regions and different countries advanced life expectancy by using their own tactics or combinations of tactics.” What contribute to the phenomenon of Health transition –the upward trend in LE as a result of the reduction of mortality rate in the long run, according to Ridley, includes but not limited to Social advance, Economic development, Biological structure enhancement, and bioethical and political concerns. Politics, while undoubtedly plays a crucial role in determining life quality, is excluded in our analysis. This is due to two main reasons: 1. political factors are generally very difficult to be accurately quantified, or even categorized; 2. politics often exerts its influence indirectly through social and economic channels. Thus, restricting the analysis to socioeconomic factors already suffices for our research purpose. It first comes to mind that the level of Gross Domestic Product (GDP) per capita is one of the most comprehensive measures of the wealth and wellbeing of a nation. As people get richer, they not only can meet the need for survival, but can afford to enjoy a better standard of living. Their expected numbers of years of life, hence, rise over years. Thus, we consider GDP per capita one of our independent variables to determine LE in the model we are going to build in this research. Furthermore, we assume that as individuals have more freedom and greater opportunity and individual choices, they lead a more active and happier life and thus, live longer. Economic freedom, therefore, is used to assess the magnitude of impact that the flexibility of the economic environment has on LE. In addition to the two proxies for economic development, major categories of variables that influence life expectancy are education, healthcare, urbanization, technology, inequality, and epidemic risks. Due to the availability of data for the year of concern 2003, we look for the potential effects of proxies in the first three categories on LE. It is worthwhile to keep in mind that some categories might require more than one proxy for adequate representation. In other cases a proxy might represent more than one category, possibly making it difficult to independently estimate coefficient effects of each variable. One typical example is the reflection of GDP per capita on education, urbanization, and healthcare at a minimum. As the quality of healthcare increase, ceteris paribus, so should the average life expectancy. Although this category is partly reflected in the level of GDP per capita, education and perhaps urbanization, we choose two proxies for healthcare here which are Access to clean water resources, and Immunization DPT. They seem to have important independent effects on LE and were in fact a historical movement in global healthcare in the past century (Riley 2001). Also, we suspect that more educated moms will have more knowledge of ensuring a healthier life for their children since birth, and thus make a positive impact on the length of life for a new generation. For this reason, we use Female schooling enrollment rate as a proxy to measure the level of female education. Last but not least, urbanization seems to have a positive impact on LE because as urbanization increases so does the accessibility, other things equal, to lifeextending resources such as medicine, and education. This category, however, also exerts negative effects such as pollution and the spread of infectious diseases. Thus, the effect of urbanization depends on the net impact of various positive and negative effects. Here we use Urbanization rate and CO2 emission per capita to examine this point of view.
Page 6
Sources of variations of the dependent variable An exceedingly important determinant of LE for many countries in Africa is the prevalence of HIV. In these countries the percentage of adult population with HIV is overwhelmingly at least several times to hundred times more than what is normally observed in the rest of the world. This can be seen in the extreme case of South Africa where although conditions of living are relatively high compared to other countries in Africa, the country has maintained a very low LE of 45 years for several decades because of the prevalence of HIV (16%) among its adult population. For the purpose of this research, we disregard the adverse effect of HIV by excluding 23 countries with HIV prevalence rate of more than 3% from the dataset. The reason for choosing 3% is that by examining the original data, only when countries pass this level do other potential variables severely lose their explanatory powers over LE. By doing so, we confined our number of countries to 178. Below is the descriptive statistics for the remaining countries regarding LE: Both Male Female Mean Median
69.4 71.6
66.8 68.6
72.1 74.6
Standard deviation 9.7
9.3
10.3
Coef of Variance
14.0
13.9
14.3
Minimum Maximum # of countries
38.1 81.9 178
37.0 79.1 178
39.3 85.3 178
Table 2: Descriptive statistics for LE of countries without HIV prevalence, 2003
This shows that by excluding those strong outliers with HIV prevalence, we are able to reduce the variation in our data, indicating by smaller standard deviation from 12.1 to 9.7 and also smaller coefficient of variance from 18% to 14%. The mean of LE is now higher for both sexes. We use this set of data hereupon this research. Our next step is to construct a model that utilizes those variables described above to explain the variation in LE across countries. Our model will be of the general form: Life expectancy=
(
)
âˆ‘
In which Îą is the intercept, is the slope of GDP (that is, the change in LE given a change in GDP, keeping everything else constant), are the other variables that potentially influence LE, and are their corresponding slopes. It should be noted that the effects of these variables are lagged: the effect of a change in any of these variables in a given year can only be observed several years later. Thus, to predict LE in 2003, we should use variables with data from several previous years. However, because of the scarcity of data for some variables, we attempt to build a model with data for independent variables taken from the most available previous year, and as a result, any conclusion from the model should be taken with attention.
Page 7
Univariate and Bivariate Analysis In this section, we perform the following tasks2:
Univariate analysis of each independent variable Bivariate analysis of the relevant variable and Life expectancy  Hypothesis test of the variable’s impact on LE  Simple regression of the variable against LE  Analysis of outliers, missing observations and conclusion
Sources and years of data, as well as note on the number of countries available and missing will be covered in details in succeeding analyses.
Variable Life Expectancy GDP per capita based on purchasing power parity (PPP) Economic freedom index Female schooling enrollment rate Access to clean water resources Immunization DPT Urbanization rate CO2 emissions per capita
Year 2003 2002 2002 2003 2000 2001 2002 2001
Abbreviation LE GDP.pc E.Free F.Sch.Enr H2O Immu. Urban CO2
Table 3: List of variables, by Orders of Analysis
2
For reference to the data set, see Appendix D
Page 8
Univariate and Bivariate Analysis: GDP per capita by PPP (GDP.pc)
GDP per capita by PPP is the value of all final goods and services produced within a nation in a given year divided by the midyear population of a given country, then converted to international dollars using PPP rates. This basically measures how much money would be needed to purchase the same goods and services in two different countries. An international dollar has the same purchasing power over GDP as the U.S. dollar has in the United States.
Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data
The data for GDP.pc was taken from World Bank for the year of 2002. Distribution of GDP per capita (by PPP) of countries in 2002 Mean=11,062
35
31
30 26
Percent
25 20 15
United Arab Emirates
12
10
7.7 6
5 0
5
5
3.8
Brunei 0
0
10000
0.6
Luxembourg Qatar 0
20000 30000 40000 50000 International dollar (US dollar)
1.3
0
60000
Figure 3: Distribution of GDP.pc, 2002
0.6
GDP.pc 11,062 5,840 12,620 114 507 62,591 178 22
Table 4: Descriptive statistics of GDP.pc3, 2002
By looking at the range of GDP per capita across countries, we realize an astonishing inequality existing between the richest and the poorest of the world. Approximately 70% of the countries gather near the low end of the scale, and only 30% of the total exceeds the mean value to spread out towards the high end. Some very strong outliers (47 standard deviations above the mean) include small countries like Luxembourg, Qatar and United Arab Emirates with very small population. Consequently, a low GDP could be still translated into a very high GDP per capita. These countries also maintain relatively high LE over years.
While the Coefficient of Variance of LE was only 14%, that of GDP was 114%, showing that there is even more dispersion in GDP per capita than in LE. Thus, the variation in LE cannot be solely explained by variation in GDP.pc but there are other hidden factors as well. Among those 22 countries with missing data on GDP, only Somalia has a low LE of 47.34 years, while others enjoy very high level of LE of at least 65 or so. Because bivariate analysis will only consider countries with availability of data for both variables, statistics on LE figure will be underestimated compared to the real situation. Nevertheless, since the number of missing countries is relatively small here, we should not run into a too big problem with our statistical model. Having said that, we expect having higher income would have a positive impact on the length of life for several reasons. As income increases, disposable income also increases, providing people with more resources for better shelter, food, and medical care. Using countrywide data in our model offers the advantage that individual data does not have: a wealthy person living in a poor country is unlikely to have the same access to quality food and medicine as a wealthy person living in a wealthy country. Since income is highly correlated with many other categories that would affect life expectancy (which will be later in the research by the correlation matrix), it is held constant to provide an unbiased estimate of marginal effects of the remaining variables. Hence, for a slope of GDP.pc in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as GDP.pc increases) against (LE increases as GDP.pc increases) We proceed to examine the relationship between LE and GDP.pc with the aid of statistical software. 3
Data source: http://data.worldbank.org/indicator/NY.GDP.PCAP.CD/countries.
Page 9
Univariate and Bivariate Analysis: GDP per capita by PPP (GDP.pc)
11062
11062 90
80
Brunei
Luxembourg Qatar United Arab Emirates
LE
70 60
70
69.4
69.4 LE
80
LE = 63.72 + 0.000450 GDP PPP 02 RSquare=33.7%
60
50
50 40
40 0
10000
20000
30000 40000 GDP PPP 02
50000
60000
70000
Figure 4: Life expectancy vs GDP per capita, 2003: A linear model
0
10000
20000
30000 40000 GDP PPP 02
50000
60000
70000
Figure 5: Life expectancy vs GDP per capita, 2003: A quadratic model
Observe that to the left of the line signaling the mean of GDP per capita, there is a remarkable variation in LE. One can theorize that GDP per capita only has substantial impact on LE up to a certain level, meaning that an increase in GDP per capita in very poor countries can have huge impact to the condition of living, and thus directly influences LE. Yet once countries pass a certain level of economic wellbeing, each dollar increased in GDP per capita does not have the same power to boost up the life expectancy any more. This features our familiar concept, the diminishing marginal rate of return. Although these above scatterplots confirm our intuition that generally LE tends to increases together with GDP per capita, the correlation between LE and our hypothesized predictor GDP per capita is not clear since the Rsuqare value is only 34%. What more closely depicts the relationship between LE and GDP.pc, if there is any, is a quadratic model. This fact does not seem very appealing to us because we want to be able to build a linear model to approximate LE. That said, it is more interesting and useful to plot LE against the log base 10 of GDP per capita, or the real growth rate of an economy. As shown below, the regression lines express a relatively strong correlation between a countryâ€™s LE and its GDP per capita growth rate. Nonetheless, an RSquare less than 1 from still points out that GDP per capita growth rate alone cannot completely explain LE. 90 LE = 14.37 + 14.47 Log.GDP.pc RSquare=62%
* Appeared in the eclipse on the right are countries with high growth rate but very low LE, including Equatorial Guinea, Angola, to name a few. They are most likely countries in Africa or South America where natural disasters, epidemic disease and apartheid are the most common factor in causing death for the population, even though the economy is still doing well.
80
LE
70
60
50
40 3.0
3.5
4.0 Log.GDP.pc
4.5
5.0
Figure 6: Life expectancy vs GDP.pc growth rate
For the purpose of this research, hereafter instead of GDP per capita, we use the log base 10 of GDP per capita as an independent variable to explore the potential impact of economic advancement on LE. Hence, for a slope
of log.GDP.pc in a simple regression equation against LE, we still expect
to be positive, only Page 10
Univariate and Bivariate Analysis: GDP per capita by PPP (GDP.pc) the meaning of the slope differs from before: (LE decreases or remains constant as growth rate of GDP per capita increases) against (LE increases as growth rate of GDP per capita increases ) The simple regression model confirms our intuition by showing a positive slope for log.GDP.pc of 14.5 and a ttest value of 15.95. These results allow us to reject the null hypothesis and conclude that GDP plays a significant role in measuring LE.
Page 11
Univariate and Bivariate Analysis: Economic Freedom Index (E.Free) The Index of Economic Freedmo (E.Free) is a composition of economic freedom from 10 different viewpoints, including Trade freedom, Business freedom, Government spending, Fiscal freedom, Monetary freedom, Financial freedom, Investment freedom, Property rights, Labor freedom, and Freedom from corruption. Taken together, these 10 economic freedoms provide a comprehensive, albeit imperfect, picture of economic freedom, both in individual countries and in the global economy as a whole.
Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data
E.Free 59.56 59.95 12.58 21.12 8.9 89.4 178 44
Table 5: Descriptive statistics for E.Free4 , 2002 Economic Freedom index
The distribution of E.Free is generally normal and centers around 60%, except for the two strong outliers from the far left, which are North Korea and Iraq.
26
25 22
21
# of countries
20
13 10
10
5
An RSquare value of 18% is neither too poor nor excellent a predictor for LE.
15
15
8 6
5
North Korea Iraq 1
0
1
Hongkong Singapore
3 0
0
15
1
1
30
45
60
75
1
90
Index
Figure 7: Distribution of E.Free across countries, 2002
LE = 50.86 + 0.3163 E.free 80
North Korea
70
LE
A large number of countries in the world maintains a level of economic freedom from 60% to 70%, in which the corresponding LE can be anywhere from the lowest to the highest value. Data seems less dispersing as countries improves their E.Free beyond 70%, where all of them maintain very high level of LE. Hence, we expect E.Free to have a clear impact on LE once it passes a certain level. Exceptions to this theory are the cases of Korean Rep. and Iraq where they have extremely low level of E.Free but still enjoy a very high LE. They are excellent examples to show that E.Free is not the only factor to determine LE.
RSquare=18.6%
Iraq
60
50
40 0
10
20
30
40 50 Percentage
60
70
80
90
Figure 8: Life expectancy vs E.Free
We expect having more freedom in economic activities and individual choices make the country better off, and hence higher E.Free would have a positive impact on the length of life. Hence, for a slope of E.Free in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as E.Free increases) against (LE increases as E.Free increases) The simple regression model is in accordance with our hypothesis; a positive slope for E.Free of 0.32 and a ttest value of 4.49 allow us to reject the null hypothesis and conclude that E.Free is statistically significant. 4
Data source: http://www.heritage.org/index/explore.aspx?view=byregioncountryyear
Page 12
Univariate and Bivariate Analysis: Female schooling enrollment rate (F.Sch.Enr) Female schooling enrollment rate (F.Sch.Enr) is measured by the total female enrolment in a specific level of education, regardless of age, expressed as a percentage of the eligible official schoolage population corresponding to the same level of education in a given school year. For the tertiary level, the population used is that of the fiveyear age group following on from the secondary school leaving. * Due to unavailability of data for previous years, data for F.Sch.Enr is taken in 2003. Also, because missing data for a substantial number of countries, we provide a new descriptive statistics for LE of countries with available data on F.Sch.Enr.
Variable
F.Sch.Enr LE
Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data
73.93 76.05 22.23 30.07 17 120.77 178 63
69.8 71.6 9.4 13.4 42.0 81.9 178 63 5
Table 6: Descriptive statistic of F.Sch.Enr 2003 and corresponding LE Female Schooling enrollment rate 30
30
Also, the school enrollment rate appears somewhat more concentrated to the center (coefficient of variation is only 30%), with some strong outliers. We will remove the outliers that are more than 3 standard deviations away from the mean for the regression.
25
# of countries
22
20 17
15 12
10 5 0
8
8
3
3
20
40
7
2
60 80 Gross enrollment ratio
100
3
120
Figure 9: Distribution of F.Sch.Enr across countries, 2003
* Countries in the red eclipse are those with Enrollment rate exceeding 100%. This seems counterintuitive at first, but if we look at the definition carefully, which shows that F.Sch.Enr is calculated by dividing the number of students enrolled in a given level of education regardless of age by the population of the age group which officially corresponds to that level of education, a ratio larger than 100% is totally possible. These countries are all developed countries with high LE (Finland, Norway, New Zealand, Sweden, etc.)
90 LE = 43.98 + 0.3487 F.Sch.Enr 80
RSquare=68.7%
LE
70
60
50
40 20
40
60 80 Percentage of enrollment
100
120
Figure 10: Life expectancy vs F.Sch.Enr
Welleducated women care more and have the knowledge to provide their children with the better child cares, more effectively reduce the chances that they fail their children, and thus establish a foundation for a healthier life later on for their children. We expect more female education would have a positive impact on the length of life. Hence, for a slope of F.Sch.Enr in a simple regression equation against LE, we expect to be positive, and our hypothesis is: (LE decreases or remains constant as F.Sch.Enr increases) against (LE increases as F.Sch.Enr increases) The positive slope for F.Sch.Enr of 0.349 (with ttest value of 15.75) allows us to reject the null hypothesis and conclude that education positively affects LE, as predicted.
5
Data sourse: http://stats.uis.unesco.org/unesco/TableViewer/document.aspx?ReportId=136&IF_Language=eng&BR_Topic=0
Page 13
Univariate and Bivariate Analysis: Access to Improved Water resource (H2O) Access to an improved water source (H2O) refers to the percentage of the population with reasonable access to an adequate amount of water from an improved source, such as a household connection, public standpipe, borehole, protected well or spring, and rainwater collection. Unimproved sources include vendors, tanker trucks, and unprotected wells and springs. Reasonable access is defined as the availability of at least 20 liters a person a day from a source within one kilometer of the dwelling. * The worldâ€™s database on this variable is updated every 5 years starting 1990. Here the data is chosen from 2000  the closest year prior to our year of interest, 2003. Access to Water resource 60 54
# of countries
50 40 30 23
24
20 14 10
10 4 1
0
0
15
1
1
30
1
5
4
5
4
4
1
45 60 Percentage of population
3
75
Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data
H2O 85.14 92 18.59 21.84 21 100 178 19
Table 7: Descriptive Statistics on H2O6., 2000
With only 19 countries with missing data, this variable is able to reflect the wide variation between nations regarding access to improved water resources, which can range from as low as 21% to the ideal condition where 100% citizen has access to those resources. This can be likewise observed by a large standard deviation existing in the data set. A median much greater than the mean (85%) suggests that majority of countries in the world have more than 85% of population having access to improved water resources. However, a remarkable gap remains between them and the rest of the world.
90
Figure 11: Distribution of H2O across countries, 2000
80
LE = 31.88 + 0.4322 H2O RSquare=65.9%
LE
70
60
50
40 20
30
40
50 60 70 % of population
80
Figure 12: Life expectancy vs H2O
90
100
Another interesting observation is that this model does a much better job in estimating LE when H2O is high. The smaller dispersion of the data with H2O equal to or higher than 80% shows that the estimated LE is closer to the observed LE and in many cases they are almost the same. In contrast, data to the left of the 80% line in Figure 12 seems to vary a lot more given any level of H2O. While clearly countries to the right of the 80% line have relatively higher LE than those to the left, this observation in the variation of H2O allows us to form a theory that in low LE countries, the effect of H2O can be remarkably powerful to improve quality of life, and thus LE.
H2O performs quite well to predict LE compared to other variables, indicated by a high level of RSquare of 66%. Since having access to improved water resource can prevent a number of waterborne diseases, we expect that this variable will have a positive impact on life expectancy. Hence, for a slope of H2O in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as H2O increases) against (LE increases as H2O increases) The regression analysis again solidifies our theory. The positive slope of 0.432 and a ttest value of 17.41 allow us to reject the null hypothesis and conclude that H2O is statistically significant. 6
Data source: http://data.worldbank.org/indicator/SH.H2O.SAFE.RU.ZS
Page 14
Univariate and Bivariate Analysis: Immunization DPT (Immu) Child immunization DPT (Immu) measures the percentage of children ages 1223 months who received vaccinations before 12 months or at any time before the survey. A child is considered adequately immunized against diphtheria, pertussis (or whooping cough), and tetanus (DPT) after receiving three doses of vaccine. * Data for Immunization is taken for the year of 2001 so as to minimize the number of missing countries and to allow a period of time for the lag effect to take place in 2003.
Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data
Immu 84.63 92 17.55 20.74 26 99 178 18
Table 8: Descriptive Statistics on Immu7, 2001 DPT Immunization
The distribution of Immu across countries parallels that of H2O. A majority of countries clusters around 9295%, while some exceptional outliers lie to the left, thus inflating the variation.
51
50
# of countries
40
30
27 23
20 14 10
10 4 1
0
1
30
6 2
40
3
4
6 3
5
0
50 60 70 Percentage of children
80
90
100
Figure 13: Distribution of Immu across countries, 2001
80
LE = 31.69 + 0.4361 Immu. RSquare=60.7%
LE
70
60
50
40 20
30
40
50 60 70 Percentage of children
80
90
100
With a small number of countries with missing data on Immunization, in which most of them maintain high level of LE, our regression model will still be able to serve as a means to partly predict LE for the majority of the world. It is clear from Figure 12 that when Immu rate is less than 80%, the variations in LE for countries with similar Immu are considerable; thus the variable does not account for much of the deviation of LE at lower level of Immunization. Countries with almost perfect immunization rate appear to all have high LE.
Figure 14: Life expectancy vs Immu.
It is easy to see that the higher the child immunization rate, the less prone to disease the population would be; hence the lower death rate and the longer LE. Hence, for a slope of Immu in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as Immu increases) against (LE increases as Immu increases) We reject the null hypothesis in favor of the alternative hypothesis, with a positive slope for Immu of 0.44 and a ttest value of 15.63.
7
Data source: http://data.worldbank.org/indicator/SH.IMM.IDPT
Page 15
Univariate and Bivariate Analysis: Urbanization rate (Urban) Urban population rate (Urban) measure the percentage of people of a population living in urban areas as defined by national statistical offices. It is calculated using World Bank population estimates and urban ratios from the United Nations World Urbanization Prospects. * Data for Urban is from 2002.
Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data
Urban 58.25 60.08 23.91 41.06 11.36 100 178 1 8
Table 9: Descriptive statistics of Urban , 2002 Urbanization Rate 18 16
16
# of countries
14
13
13
12
12
12 11
12 11
10
9 8
8
8
9 8
8
7
7
6
6
Contrary to GDP, we see that Urban is distributed quite uniformly through its range, although there is still a huge gap between the lowest and the highest Urbanized countries. However, the data provides a good overview of the world regarding this matter since it does not suffer from the problem of missing data.
6
4 2
1
0
15
30
45 60 % of total population
75
90
Figure 15: Distribution of Urban. across countries, 2002
One strong outlier is Djibouti, an underdeveloped country with very basic quality of life. Although there have been some urban development in Djibouti, it has not seemed to make up for other low indicators, leaving the country among the low LE groups.
RSquare=39.1%
LE = 54.60 + 0.2543 Urban. 80
LE
70
60
Countries in the blue circle are mostly in Africa and have both low urbanized population and LE. They also suffer from HIV, though not as severe as the countries that were excluded from the beginning of the research.
50 Djibouti 40 0
20
40
60
80
100
Urban.
Given a pretty high RSquare of 40.1%, we expect that Urban can be an important factor to explain LE.
Figure 16: Life expectancy vs Urban.
As said, we suspect that having a high urban demography means higher quality of life. Hence, for a slope of Urban in a simple regression equation against LE, we expect to be positive, and our test hypothesis is: (LE decreases or remains constant as Urban increases) against (LE increases as Urban increases) With a very high tvalue of 10.6, we can reject the null hypothesis at ease, and the value of the slope is statistically significant. However, the regression itself is not complete, with only an Rsquared value of 39%. The conclusion is that even though we can generally expect LE to increase as the percentage of Urban population increases, this expectation is very uncertain. 8
Data source: http://data.worldbank.org/indicator/SP.URB.TOTL
Page 16
Univariate and Bivariate Analysis: CO2 emissions per capita (CO2)
Carbon dioxide emissions are those stemming from the burning of fossil fuels and the manufacture of cement. They include carbon dioxide produced during consumption of solid, liquid, and gas fuels and gas flaring. CO2 emissions per capita (CO2), thus, is measured by the total emissions divided by total population. *The data was acquired for the year of 2001.
Variable Mean Median Standard Deviation Coefficient of Variance Minimum Maximum Total # of countries # of countries with missing data
CO2 4.753 2.326 6.335 133.27 0.02 49.5 178 10 9
Table 10: Descriptive statistics of CO2 , 2001 CO2 level 80
The distribution in CO2 is extremely skewed left, analogous to that of GDP. Here come into place several strong outliers who are either very developed countries such as Australia or Canada, or small countries with petroleum industry like Qatar, United Arab Emirates, Kuwait, and Trinidad and Tobago.
76
70
# of countries
60 50 39
40 30
25 18
20
Qatar
10 0
Following the analysis on GDP, here we take the log base 10 of CO2 to reflect the growth rate in CO2 emission across countries. Although RSquare confirms a pretty high explanatory power of this model, surprisingly this model predicts a positive relationship between CO2 growth rate and LE, meaning that an increase in level of CO2 is associated with an increase in LE.
20
4
0.0
7.5
2
15.0
3
3
1
1
1
0
1
0
22.5 30.0 Metric tons per capita
0
37.5
0
0
0
0
1
45.0
Figure 17: Distribution of CO2 across countries, 2001
90 LE = 65.05 + 10.92 logCO2 RSquare = 58%
80
For a slope of logCO2 in a simple regression against LE, we expect to be negative, and our test hypothesis is:
70
LE
(LE rises or remains constant as logCO2 increases) against (LE decrease as logCO2 increases)
60
50 Equatorial Guinea 40 2
1
0 Growth rate of CO2 emission
1
2
With a positive slope of 10 and a ttest of 15.25, we fail to reject our null hypothesis.
Figure 18: Life expectancy vs Growth rate of CO2
Here, one needs to be mindful that “correlation does not always imply causation”. There exists no theory explaining why breathing in more CO2 would lengthen one’s life. The positive slope very likely due to the strong positive correlation between CO2 level and economic developments. In the next section, we will see that by including both GDP per capita and CO2 into the model. The expected result is that logCO2 will be negatively correlated to LE in the final best model.
9
Data source: http://data.worldbank.org/indicator/EN.ATM.CO2E.
Page 17
Summary of Bivariate Analysis of Independent variables Based on our analysis thus far, it appears that the variables in our analysis especially GDP per capita, H2O, and Female schooling enrollment work very well to predict LE. Each of these represents an aspect of the society that potentially affects LE. Below is a summary of the bivariate analysis that we have done so far:
Constant Log.GDP.pc (T test) E.Free (T test) F.Sch.Enr (T test) H2O (T test) Immu (T test) Urban (T test) Log CO2 (T test) RSquare (adj) FTest N
Models 1 14.4 14.5 (15.95)
2 50.9
3 44
4 31.9
5 31.7
6 54.6
7 65.1
0.316 (5.49) 0.349 (15.75) 0.432 (17.41) 0.436 (15.63) 0.254 (10.61)
62% 254 156
18% 30.2 134
68.4% 248 115
65.7% 303 159
60.5% 244 160
38.8% 112 177
10.9* (15.25) 58.1% 232 168
Table 11: Summary of simple regression with LE
Variable Life Expectancy GDP per capita growth rate Economic freedom index Female schooling enrollment rate Access to clean water resources Immunization DPT Urbanization rate CO2 emission growth rate
Year 2003 2002 2002 2003 2000 2001 2002 2001
Abbreviation LE Log.GDP.pc E.Free F.Sch.Enr H2O Immu. Urban LogCO2
Table 12: Summary of multivariate regression variables
* This variable does not pass the hypothesis test for simple regression
Page 18
The multiple regression analysis Although we have constructed several simple regressions that work to predict LE, none of them serves the purpose completely; this could be shown by a value of RSquare less than 1 in all of our simple regressions. The reason for this could be either there exist other factors which completely influence LE that we havenâ€™t noticed, or the observed variation might indeed be largely due to randomness, or our statistical analysis up till now has been ineffective. It is conceivable that a severe data unavailability is one of our hindrances, and by only looking at the bivariate analysis, we deliberately ignore all other motions in the real world that can affect LE. Since our world is a dynamic entity on its own, with numerous interactions among its diverse components, it is not feasible to find the change in one macroeconomic variable such as LE as a result of solely one other factor. Therefore, we ask for the assistance of the multiple regression where we, by exploring the interactions between different variables, will construct a model to best predict our dependent variable, LE. We finally suggest some policy implications based on our complete model. Living in a tangled world where everything is related to one another, it is expected that our independent variables will be related to each other to some degree. We want to inspect this matter to ensure the validity of the model we are going to build:
Immu E.Free F.Sch.Enr Urban LogCO2 H20
LogGDP.pc 0.586 0.528 0.759 0.738 0.896 0.730
Immu
E.Free
F.Sch.Enr
Urban
LogCO2
0.284 0.717 0.446 0.663 0.745
0.346 0.393 0.335 0.413
0.679 0.508 0.747
0.664 0.597
0.477
Table 13: Correlation between independent variables
As shown in the table above, correlations between LogGDP.pc and the other 5 variables are very high, especially with LogCO2 and F.Sch.Enr. This is expected since they are very closely related through the economic channel. Besides, there is also a high correlation between H2O and Immu. This is not because they are causal and effect, but because they are different aspects of social cares, meaning that countries who invest to improve their health infrastructure will likely be willing to have their young citizens immunized. In addition, Urban is found to be highly correlated to F.Sch.Enr and H2O for several reasons. First, urban concentration requires higher standard of living and improvement in hygiene. Second, urban lifestyle is much more open compare to rural lifestyle in which it exposes people to more opportunity to pursue education, and also frees women from household tasks and traditional roles. Women then can spend more time for their educations, raising the Female school enrollment rate, compared to places where most population concentrates in rural areas of the country. Lastly, F.Sch.Enr and Immu are highly correlated because as mentioned before, as momâ€™s education increases so does the knowledge of how to lead a healthier life for their children. To analyze the ceteris paribus effect of any of our independent variable on LE thus becomes a major difficulty, since each of them is tightly correlated with one another; we are facing the problem of multicollinearity. As we showed in the arguments above, this problem will still exist no matter what other proxies for healthcare, education or social infrastructure we choose. The problem here is that mulicollinearity increase the standard error of the estimated slopes, decreases the ttest and thus makes it harder to reject the null hypothesis that a certain variable does not have an impact, or have an unexpected impact on the dependent variable. Therefore, if we can show that in spite of the problem of multicollinearity, all variables in our final model are statistically significant, and then multicollinearity indeed helps strengthen the validity of Page 19
The multiple regression analysis our model, not weaken it. However, the values of slopes (or the marginal effects) of the variables are not entirely accurate. This may cause some difficulty when drafting policies. In other words, the value of our model will lie in the fact that it reflects changes in LE as a result of a change in any of our variables. We now revisit our model:
(
Life expectancy=
)
âˆ‘
In which Îą is the intercept, is the slope of growth rate of GDP per capita (that is, the change in LE given a change in Log.GDP.pc, keeping everything else constant), are the other variables that potentially influence LE, and are their corresponding slopes. We thus construct the hypothesis test, in which the null hypothesis is that in our final model, the slopes of all independent variables including of Log.GDP.pc will either remain constant, or have the signs in the unexpected directions, meaning that they do not explain the variations in LE, and the alternating hypothesis is that these slopes are indeed statistically significant. Our critical value will be 1.658 ( 5%). The expected signs for the coefficients Variable Coefficient Direction
of the independent variables are listed below:
Log.GDP.pc
E.Free
F.Sch.Enr
Immu
H2O
Urban
CO2
+
+
+
+
+
+

Mindful of this, we now perform multiple regression analysis. A concrete result for this task is provided below:
Constant Log.GDP.pc (T test) E.Free (T test) F.Sch.Enr (T test) H2O (T test) Immu (T test) Urban (T test) Log CO2 (T test) RSquare (adj) Fincremental FTest N
Multiple regression models 1 2 3 14.4 15.3 25.1 14.5 13.9 7.95 (15.95) (12.94) (6.43) 0.028 (0.61) 0.192 (6.28)
62.00% 254 156
64.90% 11.56 122 132
78.10% 81.4 197 111
4 25.7 4.75 (3.46)
5 19.7 5.39 (4.14)
6 19.2 5.63 (3.95)
7 8.38 7.62 (4.42)
8 10.8 8.40 (4.94)
0.16 (5.18) 0.163 (4.35)
0.11 (3.51) 0.097 (2.49) 0.147 (3.77)
0.11 (3.51) 0.099 (2.51) 0.146 (3.71) 0.011 (0.41)
0.113 (3.6) 0.112 (2.85) 0.177 (4.27)
0.123 (3.92)
2.78 (1.94) 84.3% 3.95 112 104
1.92 (1.34) 82.9% 11.16 132 109
81.80% 21.4 155 104
83.90% 14.1 135 104
83.8% 0.64 107 104
0.212 (5.32)
Page 20
The multiple regression analysis Our final model is model 7, which includes Log.GDP.pc, F.Sch.Enr, H2O, Immu, and LogCO2. The slopes of Log.GDP.pc, F.Sch.Enr, H2O, Immu are significant at 99%, and the slope of LogCO2 is significant at 95%. The incremental Ftest value is significant at 95% and thus shows that including LogCO2 into the model has improved the explanatory power of the model. Given the critical value 1.658, we are able to reject the null hypothesis and conclude that all of our variables in the final model do have an impact on LE and consequently, our model can be used as a statistical reference to implement some policy decisions with the goal of improving LE worldwide. A very interesting observation here is that the slope of LogCO2 has reversed the sign from positive to negative, and therefore meets our expectation that the growth rate of CO2 emission is a negative factor with regard to LE. The switch in the sign of LogCO2 also confirms that we have been able to draw out the effects of other factors having on LogCO2 and thus, reveals the true relationship between LE and LogCO2. Also, compare our best model with the simple model that has only Log.GDP.pc, we see that the constant and of the slope of Log.GDP.pc have reduced significantly, meaning that we have also been able to draw out some of the effects of other variables other than only GDP, thus making our model more applicable towards our purpose of improving LE. Finally, an RSquare adjusted value of 84.4% is quite promising because we are able to explain close to 85% of the variation in LE just by using this model. This, of course, is only appropriate in the context where we exclude countries with extreme prevalence of HIV.
Page 21
The conclusion In our research, we find that life expectancy is influenced by many factors, ranging from education to healthcare to environmental concern. Among them, while GDP per capita growth rate is more of a general variable and cannot be easily controlled, other variables such as Immunization, Clean water or Female education can be seen as lowcost approaches to improve LE, especially in developing countries where these social aspects are not fully developed yet. While there is no doubt about the important effect of GDP per capita on LE, using our model we show that, even when holding Log.GDP.pc constant, improvement in LE is still possible with the expansion of government expenditure on things that are normally overlooked: essential healthcare services and educations for the common people. This is definitely good lesson for developing countries in particular and countries in the world as a whole because the solution to the general development problem has been an intractable goal.
Page 22
Sources All online data were taken acquired from April 14th to April 28th, 2012. Ridley J.C. (2001). Rising Life Expectancy: A Global History. Cambridge University Press. USA. United Nations (2005). World Population Monitoring 2003. Access to improved water resources http://data.worldbank.org/indicator/SH.H2O.SAFE.RU.ZS CO2 emissions per capita http://data.worldbank.org/indicator/EN.ATM.CO2E. Economic Freedom Index http://www.heritage.org/index/explore.aspx?view=byregioncountryyear. Female schooling enrollment rate http://stats.uis.unesco.org/unesco/TableViewer/document.aspx?ReportId=136&IF_Language=eng&BR_Topic=0. GDP per capita based on purchasing power parity (PPP) http://data.worldbank.org/indicator/NY.GDP.PCAP.CD/countries. HIV prevalence rate in adults from 1545 (%) http://apps.who.int/ghodata/?vid=34000# Immunization DPT http://data.worldbank.org/indicator/SH.IMM.IDPT Life Expectancy http://www.census.gov/ipc/www/idbsprd.html. Urbanization rate http://data.worldbank.org/indicator/SP.URB.TOTL World's Life Expectancy, 2003 http://www.census.gov/ipc/www/idbsprd.html
Appendix A: Glossary of technical terms Mean –the arithmetic mean of observations in the data set. It is the sum of all observations divided by the number of observations. N
x i 1
i
N
StDev – the Standard Deviation; it shows how much variation or 'dispersion' there is from the 'average' in the dataset. n
s2
x x i 1
2
i
n 1
CoefVar – The coefficient of variation: it is a normalized measure of dispersion of a probability distribution. The coefficient of variation is defined as the ratio of the standard deviation to the mean. Its value does not depend on the unit of the random variable.
Minimum – the smallest value of available observations. Median – Value in the middle when data are placed in ascending order. Maximum – the greatest value of available observations. Range – the difference between the maximum and the minimum. Correlation coefficient ‘r’: A measure of linear association between two variables that takes on values between 1 and 1. Values near +1 indicate strong positive linear relationship; values near 1 indicate a strong negative linear relationship; and values near 0 indicate the lack of a linear relationship. ∑(
Where
∑(
̅ )(
̅)
is the covariance of x and y, and
̅ )(
̅)
is the product of standard deviations of x and y. Value
of the correlation coefficient is not affected by the units of measurement for x and y.
Multicollinearity: A statistical term used to refer to the correlation among the independent variables in a multiple regression model, which ranges between 1 and 1. A value of 0 indicates no linear relationship between the two variables; values near 1 indicate strong negative relationships, and values near 1 indicate strong positive relationship. Although Multicollinearity does not affect the way we perform our regression analysis or interpret the output, it distorts the results of t tests on the individual parameters.
Page 24
Appendix B: Hypothesis testing A statistical hypothesis is an assumption about a population parameter, which may or may not be true. There are two types of statistical hypotheses.
Null hypothesis. The null hypothesis, denoted by H0, is usually the hypothesis that sample observations result purely from chance.
Alternative hypothesis. The alternative hypothesis, denoted by H1 or Ha, is the hypothesis that sample observations are influenced by some nonrandom cause.
Statisticians follow a formal process to determine whether to reject a null hypothesis, based on sample data. This process is called hypothesis testing, consisting of four steps: state the hypotheses, formulate an analysis plan, analyze sample data and interpret results. A Type I error occurs when the researcher rejects a null hypothesis when it is true. The probability of committing a Type I error is called the significance level, often denoted by α (alpha). In this memo, we assume that α = 5%. The tscore formula for rejection zone testing in testing slopes of regression equations is: ̂ ̂
where ̂ and
̂
are defined as above.
If the test statistic t is above a level prespecified, then we reject the null hypothesis. If not, then we accept it. The prespecified level is called the level of significance, and by convention often takes up the value of 10%, 5%, 2.5%, and 1%.
Page 25
Appendix C: Regression techniques: Least Square Method Purpose: To find a straight line that best represents the linear correlation in a dataset between 2 given variables: the dependent variable Y on the vertical axis and the independent variable X on the horizontal axis. The Method: According to this method, the best line is the one that minimizes the sum of the squares of the differences between the observed values and the estimated values of the dependent variable Y for all data points in the data set. Using mathematical symbols, the value to be minimized is: ∑(
̂)
where n = the total number of observations yi = observed value of the dependent variable for the ith observation ̂ = estimated value of the dependent variable for the ith observation Let the estimated regression equation be of the form ̂ formulas: ̂ where
∑(
̅ )( ∑(
̂ ̅)
̅)
̂ , then ̂ and ̂ can be calculated using the following
̂
̅
̂ ̅
̅ and ̅ are the means of X and Y, respectively Evaluation: Goodness of fit The goodness of fit R2, or the Coefficient of Determination, measures how much of the variation in y i has been explained by ̂ ̂ ̂ . R2 ranges from 0 to 1, with 1 meaning that all points are on the regression line.
On the graph, the Total, or Total Sum of Squares (TSS) presents the total variation within the data; the Explained, or the Regression Sum of Squares (RSS) shows the portion of variation explained by the regression; and last but not least, the Unexplained, or the Error Sum of Squares (ESS) presents the unexplained portion. In this sense, what we are trying to do when using the Least squares Method is simply to minimize the ESS.
By definition of the Goodness of fit, we then have:
Page 26
Appendix C: Regression techniques: The error term in the regression model Assumptions regarding the error term u: The tests of significance in regression analysis are based on the following assumptions about the error term u: 1. u is a normally distributed random variable, i.e u N for all values of x. Because our regression model y is a linear function of u, y is also normally distributed. 2. The standard deviation of u is constant within a model, i.e. Since in the formal regression model, are population parameters, the standard deviation of y about the regression line will be exactly equal to that of the error term u, and is the same for all value of x. 3. The expected value of u is EV(u)=0. Meaning and estimation of the standard deviation of u, su: First, recall that standard deviation, the square root of variance, is a measure of variability of a random variable about its mean. A standard deviation of u different than 0 confirms the existence of variability in u, and thus presents the variance of the y values about the regression line. From the previous Primer, we know that the ESS, Error Sum of Square, given by the formula ∑( ̂ ) is a measure of variability of the actual observations about the estimated regression line. Statistical calculations show us that the variance of y can be estimated by dividing this ESS by its associated degree of freedom (nk), with k is the number of population parameters being estimated: ∑( ̂)
√
The standard deviation of y follows by
∑(
̂)
Using the implication from the second assumption regarding the error term u, we can be confident to say that the √
standard deviation of u is
∑(
̂)
Properties of the sampling distribution of ̂ and the estimation interval around ̂ When using ̂ as an estimator for it is essential to note that once again due to the existence of the error term and its normal distribution, ̂ is not a fixed value but also a sample statistics with its own sampling distribution. The properties of the sampling distribution of ̂ follow: 1. The expected value of ̂ is 2. The estimated standard deviation of ̂ is 3.
̂
√∑(
̅)
̂ is normally distributed.
Keeping those properties in mind, we can now develop an estimation interval around ̂ sing the ttest method. With a ) 95%, or level of confidence, say ( our interval will be of the form: (̂
⁄
.
̂;
̂
⁄
.
̂ is the point estimator;
̂)
⁄
̂
is the margin of error; and
⁄
is the t value providing an area of ⁄ in the upper tail
of a t distribution with a (nk) degree of freedom. Page 27
Appendix C: Regression techniques: The error term in the regression model The nature of the hypothesis test H0:
0:
From the model y= α+βx+u, since α, β are two constants, we can infer that EV(y) = EV(α+βx+u) = EV(α)+EV(βx)+EV(u) = α+βx+0 = α+βx If β=0, then EV(y)= α regardless of value of x. Thus the expected y value does not depend on the value of x and we conclude that y and x are not linearly correlated. A value of β different than 0, on the other hand, indicates some relationship existed between x and y. In particular, a positive value of β shows that x and y are positively correlated and vice versa. Therefore, the purpose of the hypothesis test H0:
0
H a:
0
is to test if x and y are negatively correlated (i.e if we can reject the null hypothesis to accept the alternative one). Using the test statistic t =
̂ ̂
, we reject the null hypothesis when t <
with
is the critical value corresponding to
the willingness to make a type I mistake
Constructing a confidence interval for : Our estimate of the slope ̂ is a point estimate. From this point estimate, we can construct a confidence interval around it where the true slope has a high chance of lying in. This is possible and necessary because ̂ is also not a fixed value, but a probability distribution, which again due to the existence of the error term. Thus, there is an associated variance ̂ . This is analogous to constructing a confidence interval for the expected mean value ̅ of a random variable V in a single variable dataset.
First, we calculate the standard deviation of ̂ . The formula is: s bˆ
su
x
i
x
2
Then, we can proceed to construct the confidence interval around the estimated value of the slope as follows: ̂ (̂ ̂ ̂) Our level of significance is chosen to be α = 5%. Note here that we do not use the standardized normal benchmark but the studentized benchmark
,
instead, with (nk) degree of freedom, n is the number of observations and k is the
number of population parameters.
Page 28
Appendix C: Regression techniques: The error term in the regression model
Confidence interval and prediction interval for Y: From regression analysis, there are in essence two ways that we can infer about the value Y, given the value of X. Again, we can ask two different questions: 1. for a given value of X, what would the value of a single Y be? 2. For a given value of X, what would the average value of Y likely be? The answer to the first question is a prediction interval, and the answer to the second question is a confidence interval. We now examine these two in details.
Prediction interval
Given a hypothetical data point (xp, yp) in which we know xp, a prediction interval is the interval where the corresponding yp would most likely lie. Once more, there exists variance in the value of y p, due to the error term. Analogous to other statistic interval, the prediction interval is: ( )
̂
̂
̂
̂ is taken from the regression equation we just plug it into the regression equation The estimate of standard deviation of the distribution of yp, is
( ∑(
√
̂
̅) ̅)
Confidence interval
Given a set of hypothetical data points (xp, yp1), (xp, yp2 ), (xp, yp3)… with the same known Xcoordinate value of xp, a confidence interval is the interval where ̅̅̅, the corresponding mean value of all the yp would most likely lie. This confidence interval is calculated as ̂) (̅̅̅ ̂ But ̅̅̅
̂ ̅̅̅̅
̂ ̅̅̅
̂ ̅̅̅̅
̂, so in fact this interval is just ̂
̂ ̅̅̅̅
And the only difference is the standard deviation in question, ̂ ̅̅̅̅
√
̂. ̅̅̅̅
( ∑(
It is calculated as ̅) ̅)
which is always smaller than the standard deviation for prediction interval, regardless of the value of xp (see the graph below).
Page 29
Appendix C: Regression techniques: Advanced tools in model assessment Rsquared adjusted: Rsquared adjusted is a simple method which is used to account for the fact that Rsquared always increase as we increase the number of independent variables. The equation used to calculate R² Adj. is:
1 y yˆ / n m 1 Adj. y y /n 1 2
R
2
i
2
i
Where n 5) is the number of observations, and m is the number of independent variables. The value (n(m+1)) is the number of degrees of freedom for ESS, and (n1) is the number of degrees of freedom for TSS. The Global Ftest: The global Ftest measures the extent to which a set of independent variables influences the dependent variable. It tells us whether or not that marginal impact is small or large. We set up a null hypothesis saying that all of our independent variables are equal to each other and equal to 0. That is,
Against: To test this, we use the Fstatistic with formula: (
) ⁄(
)
). If the statistic is larger than the critical value, we Where the Fstatistics has degree of freedom m and ( reject the null hypothesis, and say that our model has some explanatory power.
Incremental F Test: This test allowed us to determine the added descriptive value of each variable as we built our model. Specifically it compares the explanatory power of the model without the added variable (the restricted model) to the explanatory power of the model with the new variable. If the value from the F Test was below our critical value* we rejected the new variable on the basis that it does not have any additional explanatory power and built the model without the variable. ⁄ ⁄
*Critical values are determined with an Fdistribution table. Page 30
Appendix D: Summary table of variables
Data on Life Expectancy in 2003 by countries and relevant proxies by categories Note: This color indicates countries with HIV prevalence who as a result are excluded from our analysis. All data were acquired and updated in April, 2012. * indicates missing data. Life Expectancy 1 Country
Both
Year
Male
Female
2003
Economic proxies
Education
GDP per capita Growth rate (%)
Economic Freedom Index 3
Female Schooling Enrollment rate (%) 4
DPT Immu. rate (%)
2002
2003
2000
GDP per capita (US$) 2 2002
Healthcare
5
Clean Water rate (%)6 2000
Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002
2001
HIV rate (%)9 2000
Afghanistan
42.0
41.8
42.2
524
2.7
*
29.0
33
21
21.9
1.62
*
Albania
76.9
74.1
79.9
4,844
3.7
56.8
66.9
97
97
42.9
0.02
*
Algeria
72.5
71.0
74.0
5,844
3.8
61
*
89
89
61.2
0.43
0.1
Angola
38.1
37.0
39.3
2,680
3.4
*
*
42
41
51.0
0.17
1.9
Antigua and Barbuda
71.3
69.0
73.8
15,206
4.2
*
*
97
91
31.5
0.64
0.4
Argentina
75.5
71.7
79.4
7,912
3.9
65.7
94.5
83
96
90.6
0.58
*
Armenia
70.9
67.5
75.0
2,635
3.4
68
72.6
94
93
64.7
0.06
0.1
Aruba
78.8
75.5
82.3
*
*
*
83.3
*
100
46.7
1.38
*
Australia
80.1
77.3
83.1
28,907
4.5
77.3
116.7
92
100
87.6
1.22
0.1
Austria
78.8
76.0
81.8
30,458
4.5
67.4
87.1
84
100
66.1
0.90
0.1
Azerbaijan
63.2
59.0
67.6
2,746
3.4
53.3
70.4
75
74
51.3
0.56
0.1
Bahamas, The
65.7
62.3
69.2
24,619
4.4
74.4
*
99
96
82.4
0.78
3.2
Bahrain
73.7
71.3
76.2
23,919
4.4
75.6
*
99
*
88.4
1.34
*
Bangladesh
61.3
61.5
61.2
948
3.0
51.9
*
85
79
24.4
0.61
0.1
Barbados
72.2
70.3
74.2
15,259
4.2
73.6
*
84
100
37.1
0.66
0.4
Belarus
68.4
62.5
74.6
5,935
3.8
39
90.9
99
100
70.8
0.72
0.1
Belgium
78.3
75.1
81.6
30,046
4.5
67.6
118.9
95
100
97.2
1.05
0.2
Belize
68.6
66.5
70.7
5,380
3.7
65.6
73.4
96
89
48.8
0.44
2.2
Benin
52.0
50.9
53.2
1,241
3.1
57.3
42.2
76
66
39.0
0.59
1.4
Bermuda
77.4
75.4
79.5
*
*
*
*
*
*
100.0
0.90
*
Bhutan
53.6
53.9
53.3
2,735
3.4
*
*
88
91
27.6
0.15
0.1
Bolivia
64.8
62.2
67.5
3,290
3.5
65.1
*
71
79
62.8
0.02
0.2
Bosnia and Herzegovina
79.1
75.5
82.9
4,968
3.7
37.4
*
91
97
44.2
0.74
*
Botswana
34.8
34.2
35.3
9,627
4.0
66.2
72.2
97
94
54.8
0.39
26
Brazil
71.1
67.2
75.3
7,372
3.9
61.5
88.1
98
93
82.4
0.28
*
Brunei Darussalam
74.3
71.9
76.8
45,630
4.7
*
81.2
97
*
72.1
1.27
*
Bulgaria
71.5
67.9
75.3
7,579
3.9
57.1
75.7
94
100
69.4
0.77
0.1
Burkina Faso
47.6
46.2
49.0
842
2.9
58.8
19.3
71
60
17.3
1.10
2.3
Burundi
49.2
48.7
49.8
332
2.5
*
31.1
81
72
8.8
1.48
5.2
Cambodia
58.3
56.3
60.4
1,065
3.0
60.7
54.0
60
46
18.0
0.68
1.3
Cameroon
50.6
50.3
50.9
1,773
3.2
52.8
48.4
64
64
51.7
0.67
5.5
Canada
79.8
76.4
83.4
29,903
4.5
74.6
*
92
100
79.7
1.23
0.2
Cape Verde
69.8
66.5
73.2
2,203
3.3
57.6
*
90
83
55.0
0.33
*
Cayman Islands
79.7
77.1
82.3
*
*
*
*
*
93
100.0
1.03
*
Page 31
Appendix D: Summary table of variables Life Expectancy 1 Country
Both
Year
Male
Female
2003
Economic proxies
Education
GDP per capita Growth rate (%)
Economic Freedom Index 3
Female Schooling Enrollment rate (%) 4
DPT Immu. rate (%)
2002
2003
2000
GDP per capita (US$) 2 2002
Healthcare
5
Clean Water rate (%)6 2000
Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002
2001
HIV rate (%)9 2000
Central African Republic
43.4
43.1
43.7
680
2.8
59.8
*
40
63
37.8
1.19
9.4
Chad
46.6
45.0
48.4
780
2.9
49.2
27.7
26
45
24.2
1.69
3
Chile
76.2
72.9
79.6
9,918
4.0
77.8
78.8
97
94
86.6
0.54
0.2
China
71.6
70.1
73.3
2,863
3.5
52.8
66.1
86
80
37.6
0.44
*
Colombia
71.1
67.3
75.1
6,152
3.8
64.2
*
80
91
72.7
0.14
0.9
Comoros
61.2
58.9
63.5
987
3.0
*
52.7
70
92
28.0
0.82
0.1
Congo, Dem. Rep.
51.2
50.2
52.2
230
2.4
*
50.1
30
44
30.7
1.51
3.9
Congo, Rep.
50.4
49.0
51.8
2,975
3.5
45.3
*
31
70
59.1
0.57
*
Costa Rica
76.4
73.9
79.1
7,483
3.9
67.5
*
91
95
60.1
0.16
0.1
Cote d'Ivoire
48.2
45.7
50.8
1,586
3.2
57.3
*
66
78
44.8
0.34
6.9
Croatia
73.8
69.6
78.3
12,572
4.1
51.1
75.4
94
99
56.0
0.67
0.1
Cuba
76.8
74.6
79.2
*
*
32.4
79.5
99
90
75.6
0.36
0.1
Cyprus
77.3
74.9
79.7
21,375
4.3
73
78.3
97
100
68.9
0.85
*
Czech Republic
75.5
72.3
79.0
16,866
4.2
66.5
80.7
98
100
73.8
1.09
0.1
Denmark
77.3
75.0
79.6
30,766
4.5
71.1
104.5
97
100
85.4
0.96
0.1
Djibouti
43.1
41.8
44.5
1,620
3.2
57.8
18.5
53
84
84.4
0.29
2.9
Dominica
74.1
71.2
77.2
7,665
3.9
*
*
99
95
71.8
0.21
*
Dominican Republic
71.0
69.5
72.6
5,566
3.7
58.6
79.7
72
87
64.2
0.37
1
Ecuador
75.8
73.0
78.8
5,307
3.7
53.1
*
90
86
61.6
0.27
0.5
Egypt, Arab Rep.
70.4
67.9
73.0
3,898
3.6
54.1
*
99
96
42.6
0.26
0.1
El Salvador
70.6
67.0
74.4
4,919
3.7
73
69.9
92
82
59.0
0.00
0.7
Equatorial Guinea
50.4
48.4
52.5
14,518
4.2
46.4
*
33
43
38.8
0.76
1.5
Eritrea
57.5
56.1
59.0
612
2.8
*
28.1
86
54
18.4
0.78
1.2
Estonia
71.0
65.3
77.1
11,989
4.1
77.6
98.1
94
98
69.4
1.07
0.5
Ethiopia
48.5
47.3
49.7
507
2.7
49.8
29.0
63
28
15.4
1.19
*
Faeroe Islands
78.9
75.4
82.4
*
*
*
*
*
*
37.7
1.15
*
Fiji
68.9
66.4
71.4
3,747
3.6
53.9
76.7
91
*
49.3
0.14
0.1
Finland
78.1
74.7
81.8
27,531
4.4
73.6
111.7
98
100
61.6
1.04
0.1
France
79.3
75.6
83.1
27,658
4.4
58
93.1
97
100
76.2
0.80
0.3
French Polynesia
75.5
73.1
77.9
*
*
*
*
*
100
52.1
0.48
*
Gabon
56.1
54.5
57.8
11,902
4.1
58
*
45
85
81.5
0.15
5.2
Gambia, The
53.1
51.3
55.0
1,029
3.0
57.7
*
87
84
51.0
0.67
0.5
Georgia
75.4
72.1
79.2
2,584
3.4
56.7
73.9
87
89
52.6
0.07
0.1
Germany
78.4
75.5
81.6
27,437
4.4
70.4
*
93
100
73.2
1.02
0.1
Ghana
57.7
57.0
58.5
1,018
3.0
57.2
45.7
87
71
45.5
0.45
2.3
Gibraltar
79.4
76.5
82.4
*
*
*
*
*
*
100.0
1.10
*
Greece
78.8
76.3
81.4
21,402
4.3
59.1
92.4
91
99
60.0
0.93
0.1
Greenland
69.0
65.4
72.7
*
*
*
*
*
*
82.1
0.98
*
Grenada
64.5
62.7
66.3
8,194
3.9
*
*
96
94
30.8
0.34
*
Guam
77.8
74.8
81.0
*
*
*
*
*
100
93.1
*
*
Guatemala
68.4
66.8
70.2
3,692
3.6
62.3
60.3
77
89
45.9
0.03
0.5
Page 32
Appendix D: Summary table of variables Life Expectancy 1 Country
Both
Year
Male
Female
2003
Economic proxies
Education
GDP per capita Growth rate (%)
Economic Freedom Index 3
Female Schooling Enrollment rate (%) 4
DPT Immu. rate (%)
2002
2003
2000
GDP per capita (US$) 2 2002
Healthcare
5
Clean Water rate (%)6 2000
Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002
2001
HIV rate (%)9 2000
Guinea
49.1
47.9
50.4
846
2.9
52.9
32.7
50
62
31.8
0.81
1.7
GuineaBissau
46.3
44.4
48.4
945
3.0
42.3
*
53
55
29.7
0.81
1.8
Guyana
64.7
62.1
67.3
2,357
3.4
54.3
*
85
89
28.4
0.31
1.5
Haiti
52.3
51.0
53.7
1,000
3.0
47.9
*
51
55
38.4
0.75
2.8
Honduras
69.3
67.6
70.9
2,722
3.4
58.7
*
96
80
45.2
0.05
1.3
Hong Kong SAR, China
81.3
78.6
84.2
27,753
4.4
89.4
72.4
*
*
100.0
0.75
*
Hungary
72.1
68.0
76.5
14,669
4.2
64.5
91.5
99
99
65.3
0.75
0.1
Iceland
80.2
78.2
82.2
31,036
4.5
73.1
101.6
92
100
92.2
0.87
0.2
India
63.6
62.9
64.4
1,724
3.2
51.2
52.2
60
81
28.1
0.07
0.4
Indonesia
68.9
66.5
71.5
2,550
3.4
54.8
63.4
71
77
44.4
0.13
0.1
Iran, Islamic Rep.
69.4
68.0
70.7
7,489
3.9
36.4
65.3
96
93
65.3
0.73
0.1
Iraq
67.8
66.7
69.0
3,480
3.5
15.6
*
76
80
67.4
0.53
*
Ireland
77.2
74.5
80.0
33,052
4.5
80.5
97.4
84
100
59.7
1.05
0.2
Israel
79.0
77.0
81.2
23,535
4.4
66.9
94.0
95
100
91.5
1.01
0.1
Italy
79.4
76.5
82.5
26,804
4.4
63.6
89.5
93
100
67.4
0.89
0.3
Jamaica
73.4
71.8
75.2
6,093
3.8
61.7
72.8
98
93
52.2
0.61
1.9
Japan
80.9
77.6
84.4
26,814
4.4
66.7
84.1
95
100
65.5
0.98
0.1
Jordan
77.9
75.4
80.5
3,507
3.5
66.2
80.7
99
96
78.3
0.51
*
Kazakhstan
65.6
60.2
71.3
6,216
3.8
52.4
89.1
95
96
56.6
1.00
0.1
Kenya
46.6
47.4
45.7
1,171
3.1
58.2
56.4
76
52
20.1
0.53
9
Kiribati
60.9
58.0
64.0
2,153
3.3
*
79.4
85
62
43.2
0.52
*
Korea, Dem. Rep.
70.8
68.1
73.6
*
*
69.5
*
62
100
60.8
0.54
*
Korea, Rep.
76.5
73.0
80.2
19,656
4.3
8.9
88.8
97
93
80.1
0.97
0.1
Kuwait
76.7
75.7
77.6
34,376
4.5
65.4
90.4
99
99
98.2
1.44
*
Kyrgyz Republic
67.5
63.5
71.7
1,434
3.2
51.7
80.7
99
82
35.6
0.12
0.1
Lao PDR
54.3
52.3
56.3
1,343
3.1
36.8
52.5
52
48
24.2
0.65
0.1
Latvia
70.7
66.0
75.6
9,870
4.0
65
95.0
97
99
68.1
0.46
0.3
Lebanon
72.1
69.6
74.6
8,262
3.9
57.1
*
80
100
86.2
0.63
0.2
Lesotho
35.3
35.8
34.8
1,035
3.0
48.9
62.4
78
74
21.3
*
24.5
Liberia
37.5
35.2
39.9
446
2.6
*
*
42
65
55.8
0.77
3.3
Libya
76.1
73.9
78.3
10,573
4.0
35.4
94.0
94
54
76.6
0.98
*
Liechtenstein
79.3
75.6
82.9
*
*
*
65.0
*
*
14.9
*
*
Lithuania
73.0
67.5
78.7
10,567
4.0
66.1
97.3
95
*
66.8
0.57
0.1
Luxembourg
78.4
75.2
81.9
57,550
4.8
79.4
76.9
98
100
83.4
1.29
0.2
Macao SAR, China
81.9
79.1
84.9
23,603
4.4
*
84.6
*
*
100.0
0.58
*
Macedonia, FYR
73.2
70.8
75.8
6,037
3.8
58
*
91
100
63.9
0.77
*
Madagascar
56.1
53.8
58.5
725
2.9
56.8
*
59
37
27.7
0.56
0.2
Malawi
41.1
41.2
40.9
564
2.8
56.9
66.5
90
63
16.0
1.10
14.2
Malaysia
71.7
69.0
74.5
9,515
4.0
60.1
74.7
95
97
64.2
1.16
0.4
Maldives
63.3
62.1
64.6
4,229
3.6
*
76.1
98
91
30.2
0.72
0.1
Mali
48.0
46.0
50.0
768
2.9
61.1
28.8
49
44
28.9
0.93
1.7
Page 33
Appendix D: Summary table of variables Life Expectancy 1 Country
Both
Year
Male
Female
2003
Economic proxies
Education
GDP per capita Growth rate (%)
Economic Freedom Index 3
Female Schooling Enrollment rate (%) 4
DPT Immu. rate (%)
2002
2003
2000
GDP per capita (US$) 2 2002
Healthcare
5
Clean Water rate (%)6 2000
Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002
2001
HIV rate (%)9 2000
Malta
78.5
76.3
80.8
19,045
4.3
62.2
76.0
95
100
92.9
1.20
0.1
Marshall Islands
69.4
67.5
71.4
*
*
*
70.9
48
95
69.0
0.59
*
Mauritania
51.9
49.8
54.1
1,402
3.1
52.5
42.6
72
40
40.2
0.07
0.6
Mauritius
71.8
67.8
75.9
8,662
3.9
67.7
69.7
92
99
42.5
0.79
0.3
Mayotte
60.6
58.5
62.8
*
*
*
*
*
*
*
*
*
Mexico
74.7
71.9
77.6
9,318
4.0
63
75.9
97
90
75.3
0.59
0.3
Micronesia, Fed. Sts.
69.1
67.4
71.0
2,843
3.5
*
*
75
92
22.3
0.29
*
Moldova
64.9
60.6
69.4
1,754
3.2
57.4
73.3
97
92
43.8
0.01
0.4
Monaco
79.3
75.4
83.4
*
*
*
*
99
100
100.0
*
*
Mongolia
63.8
61.6
66.1
2,151
3.3
56.7
80.7
95
66
56.6
0.51
0.1
Morocco
70.0
67.8
72.4
2,913
3.5
59
52.7
96
78
54.0
0.11
0.1
Mozambique
41.5
40.8
42.2
538
2.7
57.7
*
73
42
32.2
1.07
8.6
Namibia
46.0
46.0
46.1
4,212
3.6
65.1
73.6
78
81
33.5
0.03
15.3
Nepal
59.0
59.4
58.6
833
2.9
52.3
*
72
83
14.4
0.86
0.5
Netherlands
78.6
76.1
81.2
31,940
4.5
75.1
97.5
97
100
78.2
1.02
0.2
New Caledonia
73.5
70.6
76.6
*
*
*
*
*
*
62.6
0.99
*
New Zealand
78.3
75.3
81.4
22,899
4.4
80.7
108.6
90
100
85.9
0.95
0.1
Nicaragua
69.7
67.7
71.8
1,986
3.3
61.1
71.0
87
80
55.2
0.13
0.1
Niger
43.0
43.1
43.0
566
2.8
48.2
17.0
36
42
16.2
1.17
1
Nigeria
46.2
45.7
46.8
1,350
3.1
50.9
48.0
27
53
44.0
0.18
3.9
Northern Mariana Islands
75.5
72.9
78.2
*
*
*
*
*
98
90.4
*
*
Norway
79.1
76.5
81.9
37,060
4.6
67.4
104.1
91
100
76.6
0.96
0.1
Oman
72.6
70.4
74.9
18,966
4.3
64
67.0
99
83
71.6
0.95
0.1
Pakistan
62.2
61.3
63.1
1,719
3.2
55.8
31.0
65
88
33.9
0.13
0.1
Palau
69.5
66.4
72.8
11,073
4.0
*
*
98
83
72.8
0.98
*
Panama
75.3
72.6
78.0
7,419
3.9
68.5
*
99
90
67.8
0.37
1.4
Papua New Guinea
64.2
62.1
66.4
1,703
3.2
*
*
55
39
13.0
0.23
0.4
Paraguay
74.4
71.9
77.0
3,424
3.5
59.6
73.2
89
74
56.6
0.15
0.3
Peru
68.9
67.2
70.7
5,229
3.7
64.8
81.0
90
79
70.9
0.02
0.5
Philippines
69.3
66.4
72.3
2,541
3.4
60.7
80.9
79
88
60.2
0.01
0.1
Poland
74.6
70.7
78.8
11,563
4.1
65
93.5
98
100
61.6
0.90
0.1
Portugal
77.2
73.9
80.7
19,088
4.3
65.4
95.1
97
99
55.7
0.79
0.4
Puerto Rico
78.1
74.1
82.2
*
*
*
*
*
*
95.8
*
*
Qatar
73.1
70.7
75.8
62,591
4.8
61.9
83.6
93
100
95.1
1.69
0.1
Romania
70.9
67.4
74.6
7,013
3.8
48.7
75.9
99
*
53.6
0.64
0.1
Russian Federation
66.4
59.9
73.3
8,029
3.9
48.7
90.8
96
95
73.2
1.02
0.3
Rwanda
46.3
45.3
47.4
683
2.8
50.4
50.1
77
67
15.3
1.09
3.8
Samoa
70.1
67.4
73.0
3,117
3.5
*
80.1
93
89
22.1
0.09
*
San Marino
81.4
77.9
85.3
*
*
*
*
96
*
93.7
*
*
Sao Tome and Principe
66.3
64.8
67.8
1,151
3.1
*
62.4
92
79
55.3
0.19
*
Saudi Arabia
75.0
73.1
77.0
17,602
4.2
65.3
*
97
*
80.3
1.16
*
Page 34
Appendix D: Summary table of variables Life Expectancy 1 Country
Both
Year
Male
Female
2003
Economic proxies
Education
GDP per capita Growth rate (%)
Economic Freedom Index 3
Female Schooling Enrollment rate (%) 4
DPT Immu. rate (%)
2002
2003
2000
GDP per capita (US$) 2 2002
Healthcare
5
Clean Water rate (%)6 2000
Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002
2001
HIV rate (%)9 2000
Senegal
58.2
56.7
59.8
1,403
3.1
58.6
35.5
52
65
41.0
0.35
0.6
Serbia
74.1
71.6
76.7
6,587
3.8
46.6
*
93
99
51.3
*
0.1
Seychelles
71.3
65.8
76.9
16,112
4.2
*
83.4
96
*
51.8
0.90
*
Sierra Leone
39.5
37.4
41.7
541
2.7
*
*
38
55
36.0
0.70
0.9
Singapore
81.4
78.9
84.2
34,800
4.5
87.4
*
96
100
100.0
1.01
0.1
Slovak Republic
73.9
69.9
78.1
12,965
4.1
59.8
74.3
99
100
56.3
0.86
0.1
Slovenia
75.7
71.9
79.7
19,769
4.3
57.8
99.2
92
100
50.3
0.88
0.1
Solomon Islands
72.1
69.6
74.7
1,763
3.2
*
46.7
78
70
16.2
0.39
*
Somalia
47.3
45.7
49.1
*
*
*
*
33
23
34.0
1.18
0.2
South Africa
45.3
44.6
46.0
7,244
3.9
64
*
71
86
57.9
0.91
16.1
Spain
79.2
75.9
82.8
24,067
4.4
68.8
95.7
96
100
76.5
0.86
0.4
Sri Lanka
72.6
70.1
75.3
2,829
3.5
64
*
98
80
15.5
0.27
0.1
Sudan
57.7
56.6
58.9
1,374
3.1
*
31.6
66
61
38.0
0.74
0.3
Suriname
69.2
66.8
71.8
4,873
3.7
48
*
68
91
72.8
0.68
1
Swaziland
35.3
33.9
36.8
3,972
3.6
60.9
55.6
89
55
23.6
0.03
22.3
Sweden
80.2
78.1
82.5
29,281
4.5
70.8
120.8
99
100
84.1
0.76
0.1
Switzerland
80.2
77.4
83.2
33,658
4.5
79.3
83.2
93
100
73.3
0.77
0.3
Syrian Arab Republic
69.4
68.2
70.7
3,633
3.6
36.3
*
82
87
52.2
0.57
*
Tajikistan
64.4
61.4
67.5
1,054
3.0
47.3
64.9
85
60
26.5
0.43
0.1
Tanzania
44.6
43.9
45.3
860
2.9
58.3
*
87
54
23.1
1.05
7.3
Thailand
71.4
69.0
73.9
5,323
3.7
69.1
69.0
96
96
31.6
0.53
1.8
Togo
56.4
54.3
58.5
791
2.9
45.2
51.9
50
55
37.9
0.63
3.6
Tonga
68.9
66.4
71.4
3,784
3.6
*
*
94
100
23.5
0.15
*
Trinidad and Tobago
66.8
65.5
68.2
14,295
4.2
70.1
64.6
91
91
11.4
1.29
1.2
Tunisia
74.4
72.8
76.2
5,836
3.8
60.2
75.7
98
90
64.2
0.33
0.1
Turkey
71.8
69.4
74.3
8,741
3.9
54.2
65.6
88
93
65.7
0.48
0.1
Turkmenistan
61.2
57.7
64.8
2,919
3.5
43.2
*
95
83
46.4
0.94
*
Turks and Caicos Islands
74.0
71.8
76.3
*
*
*
*
*
100
86.6
0.15
*
Tuvalu
67.3
65.2
69.6
*
*
*
*
96
94
46.8
*
*
Uganda
49.2
48.7
49.7
765
2.9
61
70.4
55
57
12.3
1.18
7.3
Ukraine
67.9
62.3
73.7
3,991
3.6
48.2
88.8
99
97
67.4
0.82
0.9
United Arab Emirates
74.8
72.3
77.4
61,603
4.8
73.6
76.5
94
100
77.8
1.51
*
United Kingdom
78.2
75.7
80.7
28,886
4.5
78.5
93.3
91
100
89.5
0.97
0.1
United States
77.1
74.4
80.1
36,797
4.6
78.4
97.9
94
99
79.8
1.28
0.5
Uruguay
75.7
72.5
79.0
7,835
3.9
68.7
*
94
98
91.6
0.19
0.3
Uzbekistan
64.0
60.5
67.6
1,589
3.2
38.5
73.5
99
89
37.1
0.69
0.1
Vanuatu
61.7
60.3
63.2
3,068
3.5
*
61.9
70
72
22.4
0.35
*
Venezuela, RB
73.8
70.8
77.1
8,004
3.9
54.7
76.0
70
92
90.7
0.84
*
Vietnam
70.2
67.4
73.1
1,644
3.2
45.6
60.7
96
79
25.1
0.11
0.2
Virgin Islands (U.S.)
78.6
74.7
82.7
*
*
*
*
*
*
93.2
*
*
West Bank and Gaza
72.7
71.0
74.5
2,104
3.3
*
*
*
93
71.5
0.36
*
Page 35
Appendix D: Summary table of variables Life Expectancy 1 Country
Both
Year
Male
Female
2003
Economic proxies
Education
GDP per capita Growth rate (%)
Economic Freedom Index 3
Female Schooling Enrollment rate (%) 4
DPT Immu. rate (%)
2002
2003
2000
GDP per capita (US$) 2 2002
Healthcare
5
Clean Water rate (%)6 2000
Urbanization CO2 Urban emission rate per capita (%)7 Growth rate (%)8 2002
2001
HIV rate (%)9 2000
Yemen, Rep.
61.0
59.2
62.9
1,993
3.3
48.6
53.3
76
65
27.3
0.05
*
Zambia
39.0
38.8
39.3
980
3.0
59.6
85
54
34.9
0.74
14.4
Zimbabwe
39.0
40.0
38.1
*
*
36.7
* *
76
80
34.6
0.00
24.8
Data sources: 1 http://www.census.gov/ipc/www/idbsprd.html. 2 http://data.worldbank.org/indicator/NY.GDP.PCAP.CD/countries. 3 http://www.heritage.org/index/explore.aspx?view=byregioncountryyear. 4 http://stats.uis.unesco.org/unesco/TableViewer/document.aspx?ReportId=136&IF_Language=eng&BR_Topic=0. 5 http://data.worldbank.org/indicator/SH.IMM.IDPT 6 http://data.worldbank.org/indicator/SH.H2O.SAFE.RU.ZS 7 http://data.worldbank.org/indicator/SP.URB.TOTL 8 http://data.worldbank.org/indicator/EN.ATM.CO2E. 9 http://apps.who.int/ghodata/?vid=34000#
Page 36