datamining methods and models

Page 27

PRINCIPAL COMPONENTS ANALYSIS

TABLE 1.3

9

Eigenvalues and Proportion of Variance Explained by Component

Initial Eigenvalues Component

Total

% of Variance

Cumulative %

1 2 3 4 5 6 7 8

3.901 1.910 1.073 0.825 0.148 0.082 0.047 0.014

48.767 23.881 13.409 10.311 1.847 1.020 0.586 0.179

48.767 72.648 86.057 96.368 98.215 99.235 99.821 100.000

combination of the variables accounts for more variability than that of any other conceivable linear combination. It has maximized the variance Var(Y1 ) = e 1 ρ e1 . As we suspected from the matrix plot and the correlation matrix, there is evidence that total rooms, total bedrooms, population, and households vary together. Here, they all have very high (and very similar) component weights, indicating that all four variables are highly correlated with the first principal component. Let’s examine Table 1.3, which shows the eigenvalues for each component along with the percentage of the total variance explained by that component. Recall that result 3 showed us that the proportion of the total variability in Z that is explained by the ith principal component is λi /m, the ratio of the ith eigenvalue to the number of variables. Here we see that the first eigenvalue is 3.901, and since there are eight predictor variables, this first component explains 3.901/8 = 48.767% of the variance, as shown in Table 1.3 (allowing for rounding). So a single component accounts for nearly half of the variability in the set of eight predictor variables, meaning that this single component by itself carries about half of the information in all eight predictors. Notice also that the eigenvalues decrease in magnitude, λ1 ≥ λ2 ≥ · · · ≥ λm , λ1 ≥ λ2 ≥ · · · ≥ λ8 , as we noted in result 2. The second principal component Y2 is the second-best linear combination of the variables, on the condition that it is orthogonal to the first principal component. Two vectors are orthogonal if they are mathematically independent, have no correlation, and are at right angles to each other. The second component is derived from the variability that is left over once the first component has been accounted for. The third component is the third-best linear combination of the variables, on the condition that it is orthogonal to the first two components. The third component is derived from the variance remaining after the first two components have been extracted. The remaining components are defined similarly.

How Many Components Should We Extract? Next, recall that one of the motivations for principal components analysis was to reduce the number of distinct explanatory elements. The question arises: How do we determine how many components to extract? For example, should we retain only


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.