Using Statistical Data in HR Management
T
By RICHARD WORKS
he Bureau of Labor Statistics (BLS) publishes timely estimates commonly used in the field of human resources. However, some users may be aware that BLS also makes available corresponding measurements of variation (error) in their data. This information is published so that users have complete information when making comparisons using BLS survey data. For example, do you know how to interpret "an estimated average wage of $20 with a relative standard error of 1.5%"? To calculate a standard error, you divide the standard deviation by the square root of the sample size. To get a relative standard error (RSE), you divide the standard error by the estimate; the RSE is expressed as a percent of the estimate. An estimated average wage of $20 with a relative standard error of 1.5% indicates that the standard error is $20*.015=$.30 and the variance is (.30)^2=.09. So, how is this used and why should I care? Data estimates that are computed from samples are not exact because we will not necessarily get the same calculated results from multiple (different) samples. Since estimates are not exact, we need to be careful when making comparison statements. When making comparisons, an appropriate statistical test is to be conducted. We must take into consideration the standard errors (or other similar measures of variation) when evaluating estimates due to survey errors. For our example, the 90% confidence interval for our estimated average wage of $20 with a standard error of 0.30 would be 20 ± (1.6 x 0.3) = [19.52, 20.48]. This means that the chances are 90 out of 100 percent that the true population estimate would be between $19.52 and $20.48.
Testing Comparison Statements Statistical statements can be evaluated using different significance levels, known as alpha levels. The 0.1 alpha level is a level commonly used in practice, which means that there is a less than 10% probability of assuming a significant relation exists when it really does not. The variance and the standard deviation are used to measure the spread around the estimate with the estimate as the center. The variance is the mean of the squares of the deviations from their mean. The standard deviation is the positive square root of the variance, and has the same units as the original observations. This is one reason why it is used more than the variance. Now we will consider some of the most common tests needed before comparison statements can be made. Differences between two different estimates is one type of statement that needs testing before being confidently made. For example, the average pay of x is higher than average pay of y. A statement like this can only be made if |x – y| > z*s where z is the z-score and s is the standard error of the 42
www.HRProfessionalsMagazine.com
difference between x and y. The standard error of the difference between x and y is the square root of the sum of the variance for x and the variance for y. The values of z translate into the percentile for all normal distributions. For example, z is 1.645 at the 90% confidence interval and 1.96 at the 95% confidence interval. Using this calculation ignores any correlation of x and y. By ignoring a possible correlation when we know that one may exists, which in most situations is positive, we are being conservative with our test. In other words, if a statement passes the test with correlation of zero, then it will also pass with positive correlation. If we want to make the statement, “full-time nurses had higher hourly wage rates ($14.11) than part-time nurses ($13.57),” using the variance of .3768 for full-time nurses and .2091 for part-time nurses, then to test this statement, we have to consider the following calculation: |14.11 – 13.57| > 1.645*sqrt(.3768+.2091). Since the difference of $0.54 in the average hourly rate between the full-time nurses and part-time nurses is not greater than $1.26, the statement is deemed not acceptable at the 90% confidence interval. This suggest that we are not confident that the stated pay rate of $14.11 is actually larger than the $13.57 pay rate in the population given our sample and variation. Another example is comparing an estimate with a constant number. For example, the average pay of x is higher than some specified value, e.g. $10.00. This statement can only be made if |x – c| > 1.645*sqrt(standard error). For example, “bus drivers in Cleveland earned less than $10.00 per hour. Using the average hourly earnings for bus drivers of $9.74 with a standard error of $1.08,” this statement can only be made if |9.74 – 10| > 1.645*1.08. Since the difference of $0.26 for bus drivers is not greater than the product of 1.645 and the standard error, then the statement is not acceptable, i.e., the differences in hourly wages is not significantly different from $10.00. The final test that we’ll cover in this article is the comparison of more than two estimates. For example, the average pay x of occupation j was the highest among the k occupations in group y. For this test, we have to compute confidence intervals for each occupation within group y and check whether any of the intervals overlap. If they overlap, then the statement is deemed unacceptable. Otherwise, it is acceptable and permissible to be made. The confidence interval for each estimate is to be calculated as x ± d*sqrt(variance of x) where d is the multiplicative factor which is provided in the table below for k number of values.
Table 1. Multiplicative factors (d) given the number of estimates being compared (k)
k
d
k
d
3 2.13
12 2.64
4
2.24
13
2.67
5
2.33
14
2.69
6
2.40
15
2.72
7
2.45
16
2.74
8
2.50
17
2.76
9
2.54
18
2.77
10
2.58
19
2.79
11
2.61
20
2.81