STA301_LEC32

Page 1

Virtual University of Pakistan Lecture No. 32 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah


IN THE LAST LECTURE, YOU LEARNT

• Sampling Distribution of X • Mean and Standard Deviation of the Sampling Distribution of X • Central Limit Theorem


TOPICS FOR TODAY

• Sampling Distribution of pˆ • Sampling Distribution of X1 − X2


You will recall that, in the last lecture, we discussed the sampling distribution of X. We discussed the mean and the standard deviation of the sampling distribution, and, towards the end of the lecture, we consider the very important theorem known as the Central Limit Theorem.


Let us now consider the real-life application of this concept with the help of an example:


EXAMPLE A construction company has 310 employees who have an average annual salary of Rs.24,000. The standard deviation of annual salaries is Rs.5,000.


Suppose that the employees of this company launch a demand that the government should institute a law by which their average salary should be at least Rs. 24500, and, suppose that the government decides to check the validity of this demand by drawing a random sample of 100 employees of this company, and acquiring information regarding their present salaries.


What is the probability that, in a random sample of 100 employees, the average salary will exceed Rs.24,500 (so that the government decides that the demand of the employees of this company is unfounded, and hence does not pay attention to the demand(although, in reality, it was justified))?


SOLUTION The sample size (n = 100) is large enough to assume that the sampling distribution of鵃出 is approximately normally distributed with the following mean and standard deviation:


µ x = µ = Rs.24,000. and standard deviation σ N − n 5000 310 − 100 σx = . = 310 − 1 n N −1 100 = Rs. 412.20


NOTE: Here we have used finite population correction factor (fpc), because the sample size n = 100 is greater than 5 percent of the population size N = 310.


Since X is approximately N(24000, 412.20), therefore X − µ x X − 24000 Z= = σx 412.20

is approximately N(0, 1).


We are required to evaluate P(X > 24,500). Atx = 24,500, we find that

24500 − 24000 z= = 1.21 412.20


24000

24500

0

1.21

X Z


Using the table of areas under the standard normal curve, we find that the area between z = 0 and z = 1.21 is 0.3869.


0.3869 24000

24500

0

1.21

X Z


Hence, P(X > 24,500) = P(Z > 1.21) = 0.5 – P(0 < Z < 1.21) = 0.5 – 0.3869 = 0.1131.


0.3869 0.1131 24000

24500

0

1.21

X Z


Hence, the chances are only 11% that in a random sample of 100 employees from this particular construction company , the average salary will exceed Rs.24,500. In other words, the chances are 89% that, in such a sample, the average salary will exceed Rs.24,500.

not


Hence, the chances are considerably high that the government might pay attention to the employees’ demand.


Next, we consider the SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION:


In this regard, the first point to be noted is that, whenever the elements of a population can be classified into two categories, technically called “success” and “failure”, we may be interested in the proportion of “successes” in the population.


If X denotes the number of successes in the population, then the proportion of successes in the population is given by

X p= . N


Similarly, if we draw a sample of size n from the population, the proportion of successes in the sample is given by

X pˆ = , n

where X represents the number of successes in the sample.


It is interesting to note that X is a binomial random variable and the binomial parameter p is being called a proportion of successes here.


The sample proportion pˆ has different values in different samples. It is obviously a random variable and has a probability distribution.


This probability distribution of the proportions of successes in all possible random samples of size n, is called the sampling distribution of pˆ.


We illustrate this sampling distribution with the help of the following examples:


EXAMPLE-1 A population consists of six values 1, 3, 6, 8, 9 and 12. Draw all possible samples of size n = 3 without replacement from the population and find the proportion of even numbers in each sample.


Construct the sampling distribution of sample proportions and verify that i) µ pˆ = p pq N − n . . ii) Var ( pˆ ) = n N −1

1 - p ; pˆ and p where q = are sample and population proportions respectively.


SOLUTION The number of possible samples of size n = 3 that could be selected without replacement from a population of size N is 6   = 20. 3


Let

pˆ represent the

proportion

of

even

numbers in the sample. Then the 20 possible samples and the proportion of even numbers are given as follows:


Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Sample Data 1, 3, 6 1, 3, 8 1, 3, 9 1, 3, 12 1, 6, 8 1, 6, 9 1, 6, 12 1, 8, 9 1, 8, 12 1, 9, 12 3, 6, 8 3, 6, 9 3, 6, 12 3, 8, 9 3, 8, 12 3, 9, 12 6, 8, 9 6, 8, 12 6, 9, 12 8, 9, 12

Sample Proportion ( pˆ ) 1/3 1/3 0 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1/3 2/3 1 2/3 2/3


The sampling distribution of sample proportion is given below:


Sampling Distribution of pˆ : (pˆ ) 0 1/3 2/3 1

Σ

No. of Samples 1 9 9 1

Probability

pˆ f (pˆ )

ˆp 2 f (pˆ )

1/20 9/20 9/20 1/20

0 3/20 6/20 1/20

0 1/20 4/20 1/20

20

1

10/20

6/20

f (pˆ )


As n → ∞, the sampling distribution of approaches normality:

µ pˆ = p. pq σpˆ = , n


Now

10 µpˆ = ∑ pˆ f ( pˆ ) = = 0.5 , and 20 2 σpˆ

= ∑ pˆ f ( pˆ ) − [ ∑ pˆ f ( pˆ ) ] 2

2

2

2  10  1 = −  = = 0.05 . 60  20  20


To verify the given relations, we first calculate the population proportion p. Thus :


X p = , where X represents N the number of even numbers in the population.

3 In other words, p = = 0.5 , 6


Hence, we find that µpˆ =0.5 =p ,

pq N −n 0.25 6 −3 . = . n N −1 3 6 −1 and 0.25 = =0.05 =Var (pˆ ) 5 Hence, two properties of the sampling distribution of pˆ are verified.


The sampling ˆ p distribution of has the following important properties:


PROPERTIES OF THE SAMPLING DISTRIBUTION OF

Property No. 1: The mean of the sampling distribution of proportions, denoted by µ pˆ , is equal to the population proportion p, that is µ = p.


Property No. 2: The standard deviation of the sampling distribution of proportions, called the standard error of pˆ and Ďƒ , ˆ p denoted by is given as:


pq , a) Ďƒpˆ = n

when the sampling is performed with replacement


b)

pq N − n σ pˆ = , n N −1

when

sampling is done without replacement from a finite population.


(As in the case of the sampling distribution of X, N−n , is N −1

known as the finite population correction factor (fpc).)


Property No. 3: SHAPE OF THE DISTRIBUTION:

ˆ The sampling distribution of p is the binomial distribution. However, for sufficiently large sample sizes, the sampling ˆ is approximately distribution of p normal.


As n → ∞, the sampling distribution of approaches normality:

µ pˆ = p. pq σpˆ = , n


As a rule of thumb, the sampling distribution of pˆ will be approximately normal whenever both np and nq are equal to or greater than 5.


Let us apply this concept to a real-world situation:


EXAMPLE-2 Ten percent of the 1-kilogram boxes of sugar in a large warehouse are underweight. Suppose a retailer buys a random sample of 144 of these boxes. What is the probability that at least 5 percent of the sample boxes will be underweight?


SOLUTION Here the statistic is the sample proportion ( pˆ ) . The sample size (n = 144) is large enough to assume that the sample proportion pˆ is approximately normally distributed with mean


Mean of the sampling distribution of : and Standard Error of

:


Therefore, the sampling distribution of pˆ is approximately N(0.10, 0.025) And, hence: Z =

pˆ − µ pˆ σ pˆ

pˆ − p = pq / n pˆ − 0.10 = 0.025

is approximately N(0, 1).


We are required to find the probability that the proportion of underweight boxes in the sample is equal to or greater than 5% i.e., we require

P( pˆ ≼ 0.05) .


In this regard, a very important point to be noted is that, just as we use a continuity correction of + ½ whenever we consider the normal approximation to the binomially distributed random variable X, in this situation, since pˆ = X , n

therefore, we need to use the following continuity correction:


We need to use a continuity

1 correction of Âą 2n

in the case of the sampling distribution of pˆ .


Applying the continuity correction in this problem, we have:   1  P( pˆ ≥ 0.05) ⇒ P pˆ ≥ 0.05 − ( 2)(144)   1  ˆ = P p ≥ 0.05 −  288  

 pˆ − 0.10 ( 0.05 −1 / 288) − 0.10  = P ≥  0.025  0.025  = P ( Z ≥ −2.14) = P ( − 2.14 ≤ Z ≤ 0) + P( 0 ≤ Z ≤ ∞) = 0.4838 + 0.5 = 0.9838

(using the area table of the standard normal distribution)


0.4838

-2.14

0.5 0.10

0

Z


Hence, the probability that at least 5% of the sample boxes are underweight is as high as 98% !


The sampling distributions ˆ pertain to the of X and p situation when we are drawing all possible samples of a particular size from one particular population.


Next, we will discuss the case when we are dealing with all possible samples drawn from two populations, such that the samples from the two populations are independent. In this regard, we will consider the sampling distributions of

X1 − X 2 and

pˆ1 − pˆ 2 :


We begin with the sampling distribution of

X1 − X 2 :


SAMPLING DISTRIBUTION OF DIFFERENCES BETWEEN MEANS:

Suppose we have two distinct populations with means µ 1 and µ 2 and variances

2 σ 1

and

2 σ 2 respectively.


Let independent random samples of sizes n1 and n 2 be selected from the respective populations, and the differences x1 − x 2 between the means of all possible pairs of samples be computed.


Then, a probability distribution of the differences X1 −X 2 can be obtained. Such a distribution is called the sampling distribution of the differences of sample means X1 −X 2 .


We illustrate the sampling distribution of X1 − X 2 with the help of the following example:


EXAMPLE Draw all possible random samples of size n1 = 2 with replacement from a finite population consisting of 4, 6, 8. Similarly, draw all possible random samples of size n = 2 with replacement from another finite population consisting of 1, 2, 3.


a) Find the possible differences between the sample means of the two population. b) Construct the sampling distribution of X1 − X 2 and compute its mean and variance.


c)

Verify that

µ x1 − x 2 = µ1 − µ 2 and σ

2

x1 − x 2

=

2 σ1

n1

+

2 σ2 .

n1


SOLUTION:


Whenever we are sampling with replacement from a finite population, the total number of possible n samples is N (where N is the population size, and n is the sample size).


Hence, in this example, there are (3)2 = 9 possible samples which can be drawn with replacement from each population. These two sets of samples and their means are given below:


From Population 1 From Population 2 Sample Sample Sample Sample x1 x2   No. Value No. Value 1 4, 4 4 1 1, 1 1.0 2 4, 6 5 2 1, 2 1.5 3 4, 8 6 3 1, 3 2.0 4 6, 4 5 4 2, 1 1.5 5 6, 6 6 5 2, 2 2.0 6 6, 8 7 6 2, 3 2.5 7 8, 4 6 7 3, 1 2.0 8 8, 6 7 8 3, 2 2.5 9 8, 8 8 9 3, 3 3.0


a) Since there are 9 samples from the first population as well as 9 from the second, hence, there are 81 possible combinations of x1 andx2 . The 81 possible differences x1 –x2 are presented in the following table:


x2 1.0 1.5 2.0 1.5 2.0 2.5 2.0 2.5 3.0

x2 4 3.0 2.5 2.0 2.5 2.0 1.5 2.0 1.0 1.0

5 4.0 3.5 3.0 3.5 3.0 2.5 3.0 2.5 2.0

6 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0

5 4.0 3.5 3.0 3.5 3.0 2.5 3.0 2.5 2.0

6 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0

7 6.0 5.5 5.0 5.5 5.0 4.5 5.0 4.5 4.0

6 5.0 4.5 4.0 4.5 4.0 3.5 4.0 3.5 3.0

7 6.0 5.5 5.0 5.5 5.0 4.5 5.0 4.5 4.0

8 7.0 6.5 6.0 6.5 6.0 5.5 6.0 5.5 5.0


b) The sampling distribution of X1 − X 2 is as follows:


Probability

x1 − x 2

Tally

=d

f

f ( x1 − x 2 )

df (d)

d f(d)

= f ( d)

2

1.0

|

1

1/81

1/81

1.0/81

1.5

||

2

2/81

3/81

4.5/81

2.0

||||

5

5/81

10/81

20.0/81

2.5

|||| |

6

6/81

15/81

37.5/81

3.0

|||| ||||

10

10/81

30/81

90.0/81

3.5

|||| ||||

10

10/81

35/81

122.5/81

4.0

|||| |||| |||

13

13/81

52/81

208.0/81

4.5

|||| ||||

10

10/81

45/81

202.5/81

5.0

|||| ||||

10

10/81

50/81

250.0/81

5.5

|||| |

6

6/81

33/81

181.5/81

6.0

||||

5

5/81

30/81

180.0/81

6.5

||

2

2/81

13/81

84.5/81

7.0

|

1

1/81

7/81

49.0/81

---

81

1

324/81

1431/81

Total


Thus the mean and the variance are

µ x1 − x 2 = ∑ ( x1 − x 2 ) f ( x1 − x 2 )

324 = ∑ df ( d ) = = 4 , and 81

2 σ x1 − x 2

= ∑ d f ( d ) − [ ∑ df ( d ) ] 2

2

2

1431  324  53 5 = −  = − 16 = = 1.67 81  81  3 3


c) In order to verify the properties of the sampling distribution of X1 − X 2 , we first need to compute the mean and variance of the first population:


The mean and standard deviation of the first population are: 4+6+8 µ1 = = 6 , and 3 2 σ1

( 4 − 6) =

2

+ ( 6 − 6) + ( 8 − 6) 8 = . 3 3 2

2


and


The mean and variance of the second population are: 1+ 2 + 3 µ2 = = 2 , and 3 2 σ2

( 1 − 2) =

2

+ ( 2 − 2) + ( 3 − 2) 2 = . 3 3 2

2


Now µ x1 − x 2 = 4 = 6 − 2 = µ1 − µ 2 , and 2 σ1

2 σ2

8 1 2 1 + = . + . n1 n 2 3 2 3 2 4 1 5 = + = 3 3 3 = 1.67 =

2 σ x1 − x 2

Hence, two properties of the sampling distribution of X1 − X 2 are satisfied.


The sampling distribution X2 of the differences X1 − has the following properties:


PROPERTIES OF THE SAMPLING DISTRIBUTION OF X1 −X 2 Property No. 1: The mean of the sampling distribution of X1 −X 2 , denoted by

µX1 −X 2 , is equal to the difference

between population means, that is

µ X1 − X2 = µ1 − µ 2


Property No. 2: In case of sampling with or without replacement from two infinite populations, the standard deviation of the sampling distribution of X1 −X 2 (i.e. standard error of X1 −X 2 ), denoted by σX1 −X 2 , is given by

σ X1 − X2 =

2 σ1

n1

+

2 σ2

n2


The above expression for the Standard Error of X1 −X 2 also holds for finite population when sampling is performed with replacement.


In case of sampling without replacement from a finite population, the formula for the standard error of X1 − X2 will be suitably modified.


Property No. 3: Shape of the distribution: a) If the POPULATIONS

normally

are

distributed, the sampling

distribution of X1 −X 2 , regardless of sample sizes, will be

normal

mean µ1 −µ2 and variance

2 σ1

n1

+

2 σ2

n2

with .


In other words, the variable

( X1 − X 2 ) − ( µ1 − µ 2 ) Z= 2 σ1

n1

+

2 σ2

n2

is normally distributed with zero mean and unit variance.


b) If the POPULATIONS are non-normal and if both sample sizes are large, (i.e., greater than or equal to 30), then the sampling distribution of the differences between means is approximately a normal distribution by the Central Limit Theorem.


In this case too, the variable ( X1 − X 2 ) − ( µ1 − µ 2 ) Z= 2 2 σ1 σ 2 + n1 n 2 will be approximately normally distributed with mean zero and variance one.


IN TODAY’S LECTURE, YOU LEARNT

• Sampling Distribution of pˆ • Sampling Distribution of X1 − X2


IN THE NEXT LECTURE, YOU WILL LEARN

ˆ 1 − pˆ 2 •Sampling Distribution of p •Point Estimation •Desirable Qualities of a Good Point Estimator –Unbiasedness –Consistency –Efficiency Methods of Point Estimation: •The Method of Moments,


Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.