Page 1

2019/2/15

MTH 432 : Introduction to Sampling Theory

Shalabh shalab@iitk.ac.in shalabh1@yahoo.com Department of Mathematics & Statistics Indian Institute of Technology Kanpur, Kanpur - 208016 (India) HOME PAGE

MTH 432A : Introduction to Sampling Theory Syllabus : Principles of sample surveys; Simple, stratified and unequal probability sampling with and without replacement; ratio, product and regression method of estimation: Systematic sampling

Books: You can choose any one of the following book for your reference. Books at serial numbers 1 and 2 are easily available, so I will base my lectures on them. Other books are available in the library. 1. 2. 3.

Sampling Techniques : W.G. Cochran, Wiley (Low price edition available) Theory and Methods of Survey Sampling : Parimal Mukhopadhyay, Prentice Hall of India Theory of Sample surveys with applications : P.V. Sukhatme, B.V Sukhatme, S. Sukhatme and C. Asok, IASRI, Delhi 4. Sampling Methodologies and Applications : P.S.R.S. Rao, Chapman and Hall/ CRC 5. Sampling Theory and Methods : M.N. Murthy, Statistical Publishing Society, Calcutta (Out of print) 6. Elements of sampling theory and methods : Z. Govindrajalu, Prentice Hall

Grading Scheme : Quiz- 30%, Mid Sem.- 70% Announcements: Assignments: Assignment 1 Assignment 2 Assignment 3 Assignment 4 Assignment 5 Assignment 6

Lecture notes for your help (If you find any typo, please let me know) http://home.iitk.ac.in/~shalab/course432.htm

1/2


2019/2/15

MTH 432 : Introduction to Sampling Theory

Lecture Notes 1 : Introduction Lecture Notes 2 : Simple Random Sampling Lecture Notes 3 : Sampling For Proportions and Percentages Lecture Notes 4 : Stratified Sampling Lecture Notes 5 : Ratio and Product Methods of Estimation Lecture Notes 6 : Regression Method of Estimation Lecture Notes 7 : Varying Probability Sampling Lecture Notes 8 : Double Sampling (Two Phase Sampling) Lecture Notes 9 : Cluster Sampling Lecture Notes 10 : Two Stage Sampling (Subsampling) Lecture Notes 11 : Systematic Sampling Lecture Notes 12 : Sampling on Successive Occasions Lecture Notes 13 : Non Sampling Errors

http://home.iitk.ac.in/~shalab/course432.htm

2/2


Chapter 1 Introduction

Statistics is the science of data.

Data are the numerical values containing some information.

Statistical tools can be used on a data set to draw statistical inferences. These statistical inferences are in turn used for various purposes. For example, government uses such data for policy formulation for the welfare of the people, marketing companies use the data from consumer surveys to improve the company and to provide better services to the customer, etc. Such data is obtained through sample surveys. Sample surveys are conducted throughout the world by governmental as well as governmental agencies. For example,

non-

“National Sample Survey Organization (NSSO)” conducts

surveys in India, “Statistics Canada” conducts surveys in Canada, agencies of United Nations like “World

Health Organization (WHO), “Food and Agricultural Organization (FAO)” etc. conduct

surveys in different countries.

Sampling theory provides the tools and techniques for data collection keeping in mind the objectives to be fulfilled and nature of population.

There are two ways of obtaining the information 1. Sample surveys 2. Complete enumeration or census

Sample surveys collect information on a fraction of total population whereas census collect information on whole population. Some surveys e.g., economic surveys, agricultural surveys etc. are conducted regularly. Some surveys are need based and are conducted when some need arises, e.g., consumer satisfaction surveys at a newly opened shopping mall to see the satisfaction level with the amenities provided in the mall .

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 1


Sampling unit: An element or a group of elements on which the observations can be taken is called a sampling unit. The objective of the survey helps in determining the definition of sampling unit.

For example, if the objective is to determine the total income of all the persons in the household, then the sampling unit is household. If the objective is to determine the income of any particular person in the household, then the sampling unit is the income of the particular person in the household. So the definition of sampling unit depends and varies as per the objective of the survey. Similarly, in another example, if the objective is to study the blood sugar level, then the sampling unit is the value of blood sugar level of a person. On the other hand, if the objective is to study the health conditions, then the sampling unit is the person on whom the readings on the blood sugar level, blood pressure and other factors will be obtained. These values will together classify the person as healthy or unhealthy.

Population: Collection of all the sampling units in a given region at a particular point of time or a particular period is called the population. For example, if the medical facilities in a hospital are to be surveyed through the patients, then the total number of patients registered in the hospital during the time period of survey will the population. Similarly, if the production of wheat in a district is to be studied, then all the fields cultivating wheat in that district will be constitute the population. The total number of sampling units in the population is the population size, denoted generally by N. The population size can be finite or infinite (N is large).

Census: The complete count of population is called census. The observations on all the sampling units in the population are collected in the census. For example, in India, the census is conducted at every tenth year in which observations on all the persons staying in India is collected.

Sample: One or more sampling units are selected from the population according to some specified procedure. A sample consists only of a portion of the population units. Such a collection of units is called the sample.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 2


In the context of sample surveys, a collection of units like households, people, cities, countries etc. is called a finite population. A census is a 100% sample and it is a complete count of the population.

Representative sample: When all the salient

features of the population are

present in the sample, then it is called a

representative sample, It goes without saying that every sample is considered as a representative sample.

For example, if a population has 30% males and 70% females, then we also expect the sample to have nearly 30% males and 70% females.

In another example, if we take out a handful of wheat from a 100 Kg. bag of wheat, we expect the same quality of wheat in hand as inside the bag. Similarly, it is expected that a drop of blood will give the same information as all the blood in the body.

Sampling frame: The list of all the units of the population to be surveyed constitutes the sampling frame. All the sampling units in the sampling frame have identification particulars. For example, all the students in a particular university listed along with their roll numbers constitute the sampling frame. Similarly, the list of households with the name of head of family or house address constitutes the sampling frame. In another example, the residents of a city area may be listed in more than one frame - as per automobile registration as well as the listing in the telephone directory.

Ways to ensure representativeness: There are two possible ways to ensure that the selected sample is representative.

1. Random sample or probability sample: The selection of units in the sample from a population is governed by the laws of chance or probability. The probability of selection of a unit can be equal as well as unequal.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 3


2. Non-random sample or purposive sample: The selection of units in the sample from population is not governed by the probability laws.

For example, the units are selected on the basis of personal judgment of the surveyor. The persons volunteering to take some medical test or to drink a new type of coffee also constitute the sample on non-random laws.

Another type of sampling is Quota Sampling. The survey in this case is

continued until a

predetermined number of units with the characteristic under study are picked up.

For example, in order to conduct an experiment for rare type of disease, the survey is continued till the required number of patients with the disease are collected.

Advantages of sampling over complete enumeration: 1. Reduced cost and enlarged scope. Sampling involves the collection of data on smaller number of units in comparison to the complete enumeration, so the cost involved in the collection of information is reduced. Further, additional information can be obtained at little cost in comparison to conducting another separate survey. For example, when an interviewer is collecting information on health conditions, then he/she can also ask some questions on health practices. This will provide additional information on health practices and the cost involved

will be much less than

conducting an entirely new survey on health practices.

2. Organizaton of work: It is easier to manage the organization of collection of smaller number of units than all the units in a census. For example, in order to draw a representative sample from a state, it is easier to manage to draw small samples from every city than drawing the sample from the whole state at a time. This ultimately results in more accuracy in the statistical inferences because better organization provides better data and in turn, improved statistical inferences are obtained.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 4


3. Greater accuracy: The persons involved in the collection of data are trained personals. They can collect the data more accurately if they have to collect smaller number of units than large number of units.

4. Urgent information required: The data from a sample can be quickly summarized. For example, the forecasting of the crop production can be done quickly on the basis of a sample of data than collecting first all the observation.

5. Feasibility: Conducting the experiment on smaller number of units, particularly when the units are destroyed, is more feasible. For example, in determining the life of bulbs, it is more feasible to fuse minimum number of bulbs. Similarly, in any medical experiment, it is more feasible to use less number of animals.

Type of surveys: There are various types of surveys which are conducted on the basis of the objectives to be fulfilled.

1. Demographic surveys: These surveys are conducted to collect the demographic data, e.g., household surveys, family size, number of males in families, etc. Such surveys are useful in the policy formulation for any city, state or country for the welfare of the people.

2. Educational surveys: These surveys are conducted to collect the educational data, e.g., how many children go to school, how many persons are graduate, etc. Such surveys are conducted to examine the educational programs in schools and colleges. Generally, schools are selected first and then the students from each school constitue the sample.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 5


3. Economic surveys: These surveys are conducted to collect the economic data, e.g., data related to export and import of goods, industrial production, consumer expenditure etc. Such data is helpful in constructing the indices indicating the growth in a particular sector of economy or even the overall economic growth of the country.

4. Employment surveys: These surveys are conducted to collect the employment related data, e.g., employment rate, labour conditions, wages, etc. in a city, state or country. Such data helps in constructing various indices to know the employment conditions among the people.

5. Health and nutrition surveys: These surveys are conducted to collect the data related to health and nutrition issues, e.g., number of visits to doctors, food given to children, nutritional value etc. Such surveys are conducted in cities, states as well as countries by the national and international organizations like UNICEF, WHO etc.

6. Agricultural surveys: These surveys are conducted to collect the agriculture related data to estimate, e.g., the acreage and production of crops, livestock numbers, use of fertilizers, use of pesticides and other related topics. The government bases its planning related to the food issues for the people based on such surveys.

7. Marketing surveys: These surveys are conducted to collect the data related to marketing. They are conducted by major companies, manufacturers or those who provide services to consumer etc. Such data is used for knowing the satisfaction and opinion of consumers as well as in developing the sales, purchase and promotional activities etc.

8. Election surveys: These surveys are conducted to study the outcome of an election or a poll. For example, such polls are conducted in democratic countries to have the opinions of people about any candidate who is contesting the election.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 6


9. Public polls and surveys: These surveys are conducted to collect the public opinion on any particular issue. Such surveys are generally conducted by the news media and the agencies which conduct polls and surveys on the current topics of interest to public.

10. Campus surveys: These surveys are conducted on the students of any educational institution to study about the educational programs, living facilities, dining facilities, sports activities, etc.

Principal steps in a sample survey: The broad steps to conduct any sample surveys are as follows:

1. Objective of the survey: The objective of the survey has to be clearly defined and well understood by the person planning to conduct it. It is expected from the statistician to be well versed with the issues to be addressed in consultation with the person who wants to get the survey conducted. In complex surveys, sometimes the objective is forgotten and data is collected on those issues which are far away from the objectives.

2. Population to be sampled: Based on the objectives of the survey, decide the population from which the information can be obtained. For example, population of farmers is to be sampled for an agricultural survey whereas the population of patients has to be sampled for determining the medical facilities in a hospital.

3. Data to be collected: It is important to decide that which data is relevant for fulfilling the objectives of the survey and to note that no essential data is omitted.

Sometimes, too many questions are asked and some of their

outcomes are never utilized. This lowers the quality of the responses and in turn results in lower efficiency in the statistical inferences.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 7


4. Degree of precision required: The results of any sample survey are always subjected to some uncertainty. Such uncertainty can be reduced by taking larger samples or using superior instruments. This involves more cost and more time. So it is very important to decide about the required degree of precision in the data. This needs to be conveyed to the surveyor also.

5. Method of measurement: The choice of measuring instrument and the method to measure the data from the population needs to be specified clearly. For example, the data has to be collected through interview, questionnaire, personal visit, combination of any of these approaches, etc. The forms in which the data is to be recorded so that the data can be transferred to mechanical equipment for easily creating the data summary etc. is also needed to be prepared accordingly.

6. The frame: The sampling frame has to be clearly specified. The population is divided into sampling units such that the units cover the whole population and every sampling unit is tagged with identification. The list of all sampling units is called the frame. The frame must cover the whole population and the units must not overlap each other in the sense that every element in the population must belong to one and only one unit. For example, the sampling unit can be an individual member in the family or the whole family.

7. Selection of sample: The size of the sample needs to be specified for the given sampling plan. This helps in determining and comparing the relative cost and time of different sampling plans. The method and plan adopted for drawing a representative sample should also be detailed.

8. The Pre-test: It is advised to try the questionnaire and field methods on a small scale. This may reveal some troubles and problems beforehand which the surveyor may face in the field in large scale surveys.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 8


9. Organization of the field work: How to conduct the survey, how to handle business administrative issues, providing proper training to surveyors, procedures, plans for handling the non-response and missing observations etc. are some of the issues which need to be addressed for organizing the survey work in the fields. The procedure for early checking of the quality of return should be prescribed. It should be clarified how to handle the situation when the respondent is not available.

10. Summary and analysis of data: It is to be noted that based on the objectives of the data, the suitable statistical tool is decided which can answer the relevant questions. In order to use the statistical tool, a valid data set is required and this dictates the choice of responses to be obtained for the questions in the questionnaire, e.g., the data has to be qualitative, quantitative, nominal, ordinal etc. After getting the completed questionnaire back, it needs to be edited to amend the recording errors and delete the erroneous data. The tabulating procedures, methods of estimation and tolerable amount of error in the estimation needs to be decided before the start of survey. Different methods of estimation may be available to get the answer of the same query from the same data set. So the data needs to be collected which is compatible with the chosen estimation procedure.

11. Information gained for future surveys: The completed surveys work as guide for improved sample surveys in future. Beside this they also supply various types of prior information required to use various statistical tools, e.g., mean, variance, nature of variability, cost involved etc. Any completed sample survey acts as a potential guide for the surveys to be conducted in the future. It is generally seen that the things always do not go in the same way in any complex survey as planned earlier. Such precautions and alerts help in avoiding the mistakes in the execution of future surveys.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 9


Variability control in sample surveys: The variability control is an important issue in any statistical analysis. A general objective is to draw statistical inferences with minimum variability. There are various types of sampling schemes which are adopted in different conditions. These schemes help in controlling the variability at different stages. Such sampling schemes can be classified in the following way.

1. Before selection of sampling units •

Stratified sampling

Cluster sampling

Two stage sampling

Double sampling etc.

2. At the time of selection of sampling units •

Systematic sampling

Varying probability sampling

3. After the selection of sampling units •

Ratio method of estimation

Regression method of estimation

Note that the ratio and regtresion methods are the methods of estimation and not the methods of drawing samples.

Methods of data collection There are various way of data collection. Some of them are as follows:

1. Physical observations and measurements: The surveyor contacts the respondent personally through the meeting. He observes the sampling unit and records the data. The surveyor can always use his prior experience to collect the data in a better way. For example, a young man telling his age as 60 years can easily be observed and corrected by the surveyor.

Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 10


2. Personal interview: The surveyor is supplied with a well prepared questionnaire. The surveyor goes to the respondents and asks the same questions mentioned in the questionnaire. The data in the questionnaire is then filled up accordingly based on the responses from the respondents.

3. Mail enquiry: The well prepared questionnaire is sent to the respondents through postal mail, e-mail, etc. The respondents are requested to fill up the questionnaires and send it back. In case of postal mail, many times the questionnaires are accompanied by a self addressed envelope with postage stamps to avoid any non-response due to the cost of postage.

4. Web based enquiry: The survey is conducted online through internet based web pages. There are various websites which provide such facility. The questionnaires are to be in their formats and the link is sent to the respondents through email. By clicking on the link, the respondent is brought to the concerned website and the answers are to be given online. These answers are recorded and responses as well as their statistics is sent to the surveyor. The respondents should have internet connection to support the data collection with this procedure.

5. Registration: The respondent is required to register the data at some designated place. For example, the number of births and deaths along with the details provided by the family members are recorded at city municipal office which are provided by the family members.

6. Transcription from records: The sample of data is collected from the already recorded information. For example, the details of the number of persons in different families or number of births/deaths in a city can be obtained from the city municipal office directly.

The methods in (1) to (5) provide primary data which means collecting the data directly from the source. The method in (6) provides the secondary data which means getting the data from the primary sources. Sampling Theory| Chapter 1 | Introduction | Shalabh, IIT Kanpur

Page 11


Chapter -2 Simple Random Sampling Simple random sampling (SRS) is a method of selection of a sample comprising of n number of sampling units out of the population having N number of sampling units such that every sampling unit has an equal chance of being chosen. The samples can be drawn in two possible ways. 

The sampling units are chosen without replacement in the sense that the units once chosen are not placed back in the population .



The sampling units are chosen with replacement in the sense that the chosen units are placed back in the population.

1. Simple random sampling without replacement (SRSWOR): SRSWOR is a method of selection of n units out of the N units one by one such that at any stage of selection, anyone of the remaining units have same chance of being selected, i.e. 1/ N .

2. Simple random sampling with replacement (SRSWR): SRSWR is a method of selection of n units out of the N units one by one such that at each stage of selection each unit has equal chance of being selected, i.e., 1/ N . .

Procedure of selection of a random sample: The procedure of selection of a random sample follows the following steps: 1.

Identify the N units in the population with the numbers 1 to N .

2.

Choose any random number arbitrarily

in the random number table and start reading

numbers. 3.

Choose the sampling unit whose serial number corresponds to the random number drawn

from the table of random numbers. 4.

In case of SRSWR, all the random numbers are accepted ever if repeated more than once.

In case of SRSWOR, if any random number is repeated, then it is ignored and more numbers are drawn. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 1


Such process can be implemented through programming and using the discrete uniform distribution. Any number between 1 and N can be generated from this distribution and corresponding unit can be selected into the sample by associating an index with each sampling unit. Many statistical softwares like R, SAS, etc. have inbuilt functions for drawing a sample using SRSWOR or SRSWR.

Notations: The following notations will be used in further notes: N:

Number of sampling units in the population (Population size).

n:

Number of sampling units in the sample (sample size)

Y:

The characteristic under consideration

Yi :

Value of the characteristic for the i th unit of the population

y

1 n  yi : sample mean n i 1

Y 

1 N

S2 

y

i

i 1

: population mean

N 1 N 1 (Yi  Y ) 2  ( Yi 2  NY 2 )  N  1 i 1 N  1 i 1

 2  s2 

N

1 N

N

 (Y  Y ) i 1

i

2

1 N 2 ( Yi  NY 2 ) N i 1

n 1 n 1 2 y y   ( ) (  i  yi2  ny 2 ) n  1 i 1 n  1 i 1

Probability of drawing a sample : 1.SRSWOR: N If n units are selected by SRSWOR, the total number of possible samples are   . n So the probability of selecting any one of these samples is

1 . N   n

Note that a unit can be selected at any one of the n draws. Let ui be the ith unit selected in the sample. This unit can be selected in the sample either at first draw, second draw, …, or nth draw. Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 2


Let Pj (i ) denotes the probability of selection of ui at the jth draw, j = 1,2,...,n. Then Pj (i )  P1 (i )  P2 (i )  ...  Pn (i ) 1 1 1   ...  (n times ) N N N n  . N 

Now if u1 , u2 ,..., un are the n units selected in the sample, then the probability of their selection is P (u1 , u2 ,..., un )  P (u1 ).P (u2 ),..., P (un ).

Note that when the second unit is to be selected, then there are (n – 1) units left to be selected in the sample from the population of (N – 1) units. Similarly, when the third unit is to be selected, then there are (n – 2) units left to be selected in the sample from the population of (N – 2) units and so on. If P (u1 )  P (u2 ) 

n , then N

n 1 1 ,..., P (un )  . N 1 N  n 1

Thus P(u1 , u2 ,.., un ) 

n n 1 n  2 1 1 . . ...  . N N 1 N  2 N  n 1  N    n

Alternative approach: The probability of drawing a sample in SRSWOR can alternatively be found as follows: Let ui ( k ) denotes the ith unit drawn at the kth draw. Note that the ith unit can be any unit out of the N units. Then so  (ui (1) , ui (2) ,..., ui ( n ) ) is an ordered sample in which the order of the units in which they are drawn, i.e., ui (1) drawn at the first draw, ui (2) drawn at the second draw and so on, is also considered. The probability of selection of such an ordered sample is P ( so )  P (ui (1) ) P (ui (2) | ui (1) ) P (ui (3) | ui (1)ui (2) )...P (ui ( n ) | ui (1)ui (2) ...ui ( n 1) ).

Here P (ui ( k ) | ui (1)ui (2) ...ui ( k 1) ) is the probability

of drawing ui ( k ) at the kth draw given that

ui (1) , ui (2) ,..., ui ( k 1) have already been drawn in the first (k – 1) draws.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 3


Such probability is obtained as P(ui ( k ) | ui (1) ui (2) ...ui ( k 1) ) 

1 . N  k 1

So n

P ( so )   k 1

1 ( N  n)!  . N  k 1 N!

The number of ways in which a sample of size n can be drawn  n !

Probability of drawing a sample in a given order 

( N  n)! N!

So the probability of drawing a sample in which the order of units in which they are drawn is ( N  n)! 1 .  N! N   n

irrelevant  n !

2. SRSWR When n units are selected with SRSWR, the total number of possible samples are N n . The Probability of drawing a sample is

1 . Nn

Alternatively, let ui be the ith unit selected in the sample. This unit can be selected in the sample either at first draw, second draw, …, or nth draw. At any stage, there are always N units in the population in case of SRSWR, so the probability of selection of ui at any stage is 1/N for all i = 1,2,…,n. Then the probability of selection of n units u1 , u2 ,..., un in the sample is P (u1 , u2 ,.., un )  P (u1 ).P (u2 )...P(un ) 1 1 1 . ... N N N 1  n N 

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 4


Probability of drawing an unit 1. SRSWOR Let Ae denotes an event that a particular unit u j is not selected at the th draw. The probability of selecting, say, j th unit at k th draw is

P (selection of u j at k th draw) = P ( A1  A2  ....  Ak 1  Ak )  P ( A1 ) P( A2 A1 ) P( A3 A1 A2 ).....P( Ak 1 A1 , A2 ...... Ak  2 ) P( Ak A1 , A2 ...... Ak 1 ) 1  1  1   1 1    1   1   1   ... 1    N   N 1   N  2   N  k  2  N  k 1 N 1 N  2 N  k  1 1  . ... . N N 1 N  k  2 N  k  1 1  N

2. SRSWR P[ selection of u j at kth draw] =

1 . N

Estimation of population mean and population variance One of the main objectives after the selection of a sample is to know about the tendency of the data to cluster around the central value and the scatterdness of the data around the central value. Among various indicators of central tendency and dispersion, the popular choices are arithmetic mean and variance. So the population mean and population variability are generally measured by the arithmetic mean (or weighted arithmetic mean) and variance, respectively. There are various popular estimators for estimating the population mean and population variance. Among them, sample arithmetic mean and sample variance are more popular than other estimators. One of the reason to use these estimators is that they possess nice statistical properties. Moreover, they are also obtained through well established statistical estimation procedures like maximum likelihood estimation, least squares estimation, method of moments etc. under several standard statistical distributions. One may also consider other indicators like median, mode, geometric mean, harmonic mean for measuring the central tendency and mean deviation, absolute deviation, Pitman nearness etc. for measuring the dispersion. The properties of such estimators can be studied by numerical procedures like bootstraping.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 5


1. Estimation of population mean arithmetic mean y 

Let us consider the sample Y 

1 N

N

Y

and verify y is an unbiased estimator of Y under the two cases.

i

i 1

1 n  yi as an estimator of population mean n i 1

SRSWOR n

Let ti   yi . Then i 1

n 1 E ( yi ) n i 1 1  E  ti  n  N      1  1 n    ti n   N  i 1      n  

E( y ) 

N   n

1 1  n    yi . n  N  i 1  i 1    n

When n units are sampled from N units by without replacement , then each unit of the population can occur with other units selected out of the remaining  N  1 units is the population and each unit  N  1 N occurs in   of the   possible samples. So  n 1  n So

N   n

  N  1

n

N

   y    n  1   y . i 1

i

i 1

 i 1

i

Now E( y )  

( N  1)! n !( N  n)! N yi  (n  1)!( N  n)! nN! i 1 1 N

N

y i 1

i

Y.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 6


Thus y is an unbiased estimator of Y . Alternatively, the following approach can also be adopted to show the unbiasedness property.

1 n

1 n

  Y P (i) 

1 n

  Y . N 

1 n

Y

E( y )    

n

E( y j )

j 1 n

j 1 n

i 1

j 1

N

i

1

N

i 1

j

i

n

j 1

Y where Pj (i ) denotes the probability of selection of i th unit at j th stage.

SRSWR n 1 E ( yi ) n i 1 1 n   E ( yi ) n i 1

E( y ) 

1 n  (Y1P1  ..  YN P) n i 1

1 n Y n

Y. where

Pi 

1 for all i  1, 2,..., N is the probability of selection of a unit. Thus y is an unbiased N

estimator of population mean under SRSWR also.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 7


Variance of the estimate Assume that each observation has some variance  2 . Then V ( y )  E ( y  Y )2

1 n   E   ( yi  Y )   n i 1 

2

1 n  1 n n  E  2  ( yi  Y ) 2  2  ( yi  Y )( y j  Y )  n i j  n i 1  n n n 1 1  2  E ( yi  Y ) 2  2  E ( yi  Y )( y j  Y ) n n i j 1 n 2 K    n2 n2 N 1 2 K  S  2 Nn n 

n

n

i

j

where K   E ( yi  Y )( yi  Y ) assuming that each observation has variance  2 . Now we find

K under the setups of SRSWR and SRSWOR.

SRSWOR n

n

i

j

K   E ( yi  Y )( yi  Y ) . Consider E ( yi  Y )( y j  Y ) 

N N 1  ( yk  Y )( ye  Y ) N ( N  1) k  

Since 2

N N N N  2     ( y Y ) ( y Y ) ( yk  Y )( y  Y ))   k   k i 1 k   k 1  N

N

k



0  ( N  1) S 2   ( yk  Y )( y  Y ) N

N

 ( y k



k

 Y )( y  Y ) 

1 [( N  1) S 2 ] N ( N  1)



S2 . N

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 8


S2 Thus K   n(n  1) N

and so substituting the value of K , the variance of y under SRSWOR is

N 1 2 1 S2 S  2 n(n  1) Nn n N N n 2  S . Nn

V ( yWOR ) 

SRSWR N

N

K   E ( yi  Y )( yi  Y ) i

j

N

N

i

j

  E ( yi  Y ) E ( y je  Y ) 0 because the ith and jth draws (i  j ) are independent. Thus the variance of y under SRSWR is V ( yWR ) 

N 1 2 S . Nn

It is to be noted that if N is infinite (large enough), then V ( y) 

S2 n

is both the cases of SRSWOR and SRSWR. So the factor

N n is responsible for changing the N

variance of y when the sample is drawn from a finite population in comparison to an infinite population. This is why

N n is called a finite population correction (fpc) . It may be noted that N

N n n N n n is close to 1 if the ratio of sample size to population , is very small or  1  , so N N N N

negligible. The term

n is called sampling fraction. In practice, fpc can be ignored whenever N

n  5% and for many purposes even if it is as high as 10%. Ignoring fpc will result in the N

overestimation of variance of y .

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 9


Efficiency of y under SRSWOR over SRSWR N n 2 S Nn N 1 2 V ( yWR )  S Nn N  n 2 n 1 2  S  S Nn Nn  V ( yWOR )  a positive quantity V ( yWOR ) 

Thus V ( yWR )  V ( yWOR ) and so, SRSWOR is more efficient than SRSWR.

Estimation of variance from a sample Since the expressions of variances of sample mean involve S 2 which is based on population values, so these expressions can not be used in real life applications. In order to estimate the variance of y on the basis of a sample, an estimator of S 2 (or equivalently  2 ) is needed. Consider S 2 as an estimator of s 2 (or  2 ) and we investigate its biasedness for S 2 in the cases of SRSWOR and SRSWR, Consider

s2 

1 n ( yi  y ) 2  n  1 i 1

1 n   ( yi  Y )  ( y  Y )  n  1 i 1  

2

1  n  ( yi  Y ) 2  n( y  Y ) 2    n  1  i 1 

E (s 2 )  

1  n  E ( yi  Y ) 2  nE ( y  Y ) 2    n  1  i 1  1  n 1   n 2  nVar ( y )  Var ( yi )  nVar ( y )     n  1  i 1  n 1

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 10


In case of SRSWOR V ( yWOR ) 

N n 2 S Nn

and so

E (s 2 )  

n  2 N n 2   S  n  1  Nn  n  N 1 2 N  n 2  S  S  n  1  N Nn 

 S2 In case of SRSWR V ( yWR ) 

N 1 2 S Nn

and so

E (s 2 ) 

n  2 N n 2   S  n  1  Nn 

n  N 1 2 N  n 2  S  S  n  1  N Nn  N 1 2  S N 2 

Hence  S 2 is SRSWOR E (s2 )   2  is SRSWR

An unbiased estimate of Var ( y ) is N n 2 Vˆ ( yWOR )  s in case of SRSWOR and Nn

N 1 N 2 . Vˆ ( yWR )  s Nn N  1 s2 in case of SRSWR.  n

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 11


Standard errors The standard error of y is defined as Var ( y ) . In order to estimate the standard error, one simple option is to consider the square root of estimate of variance of sample mean. under SRSWOR, a possible estimator is ˆ ( y ) 

under SRSWR, a possible estimator is ˆ ( y ) 

N n s. Nn

N 1 s. Nn

( y) . It is to be noted that this estimator does not possess the same properties as of Var  Reason being if ˆ is an estimator of  , then  is not necessarily an estimator of  . In fact, the ˆ ( y ) is a negatively biased estimator under SRSWOR.

The approximate expressions for large N case are as follows:

(Reference: Sampling Theory of Surveys with Applications, P.V. Sukhatme, B.V. Sukhatme, S. Sukhatme, C. Asok, Iowa State University Press and Indian Society of Agricultural Statistics, 1984, India)

Consider s as an estimator of S . Let s 2  S 2   with E ( )  0, E ( 2 )  S 2 . Write s  ( S 2   )1/2 1/2

    S 1  2   S 

   2  S 1  2  4  ...  8S  2S  assuming  will be small as compared to S 2 and as n becomes large, the probability of such an event approaches one. Neglecting the powers of  higher than two and taking expectation, we have

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 12


 Var ( s 2 )  E ( s )  1  S 8S 4   where 2S 4   n  1   Var  s   1     2  3)   for large N .  (n  1)   2n   2

j 

2 

j

1 N

 Yi  Y 

4

: coefficient of kurtosis.

S4

N

i 1

Thus  1   3  2 E  s   S 1  8n   4(n  1) 2

 1 Var ( s 2 )  Var ( s )  S  S 1   4  8 S  2 Var ( s )  4S 2  S 2   n 1   1     2  3  .  2  n  1   2n   2

2

Note that for a normal distribution,  2  3 and we obtain

Var ( s ) 

S2 . 2  n  1

Both Var ( s ) and Var ( s 2 ) are inflated due to nonnormality to the same extent, by the inflation factor   n 1   1   2n    2  3      and this does not depends on coefficient of skewness. This is an important result to be kept in mind while determining the sample size in which it is assumed that S 2 is known. If inflation factor is ignored and population is non-normal, then the reliability on s 2 may be misleading.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 13


Alternative approach: The results for the unbiasedness property and the variance of sample mean can also be proved in an alternative way as follows:

(i) SRSWOR With the ith unit of the population, we associate a random variable ai defined as follows:

1, if the i th unit occurs in the sample ai   th 0, if the i unit does not occurs in the sample (i  1, 2,..., N ) Then, E (ai )  1 Probability that the i th unit is included in the sample n , i 1, 2,..., N . N E (ai2 )  1 Probability that the i th unit is included in the sample 

n , i 1, 2,..., N N E (ai a j )  1 Probability that the i th and j th units are included in the sample 

n(n  1) , i  j  1, 2,..., N . N ( N  1)

From these results, we can obtain

n( N  n) , i 1, 2,..., N N2 n( N  n) Cov(ai , a j )  E (ai a j )  E (ai ) E (a j )  2 , i  j  1, 2,..., N . N ( N  1) We can rewrite the sample mean as

Var (ai )  E (ai2 )   E (ai )   2

1 N  ai yi n i 1 Then y

E( y ) 

1 N  E (ai ) yi  Y n i 1

and Var ( y ) 

N  1  N  1 N 2 Var a y ( ) Var a y Cov(ai , a j ) yi y j  .     i i i i  2  2 n i j  n  i 1  i 1 

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 14


Substituting the values of Var (ai ) and Cov (ai , a j ) in the expression of Var ( y ) and simplifying, we get Var ( y ) 

N n 2 S . Nn

To show that E ( s 2 )  S 2 , consider s2 

1  n 2 1 N  2 y ny ai yi2  ny 2  .     i    (n  1)  i 1  (n  1)  i 1 

Hence, taking, expectation, we get E (s 2 ) 

1 N  E (ai ) yi2  n Var ( y )  Y 2    (n  1)  i 1 

Substituting the values of E (ai ) and Var ( y ) in this expression and simplifying, we get E ( s 2 )  S 2 .

(ii)

SRSWR

Let a random variable ai associated with the ith unit of the population denotes the number of times the ith unit occurs in the sample i  1, 2,..., N .

So

ai assumes values 0, 1, 2,…,n. The joint

distribution of a1 , a2 ,..., aN is the multinomial distribution given by P (a1 , a2 ,..., aN ) 

n! N

a ! i 1

where

N

a i 1

i

.

1 Nn

i

 n. For this multinomial distribution, we have

n , N n( N  1) , i  1, 2,..., N . Var (ai )  N2 n Cov(ai , a j )   2 , i  j 1, 2,..., N . N We rewrite the sample mean as E (ai ) 

y

1 N  ai yi . n i 1

Hence, taking expectation of y and substituting the value of E (ai )  n / N we obtain that E( y )  Y . Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 15


Further, Var ( y ) 

N 1 N  2 Var a y Cov(ai , a j ) yi y j  ( )    i i 2  n  i 1 i 1 

Substituting, the values of Var (ai )  n( N  1) / N 2 and Cov(ai , a j )  n / N 2 and simplifying, we get Var ( y ) 

N 1 2 S . Nn

To prove that E ( s 2 )  n

N 1 2 S   2 in SRSWR, consider N N

(n  1) s 2   yi2  ny 2   ai yi2  ny 2 , i 1

i 1

(n  1) E ( s 2 )   E (ai ) yi2  n Var ( y )  Y 2  N

i 1

n N 2 ( N  1) 2 yi  n. S  nY 2  N i 1 nN (n  1)( N  1) 2  S N N 1 2 E (s 2 )  S 2 N 

Estimator of population total: Sometimes, it is also of interest to estimate the population total, e.g. total household income, total expenditures etc. Let denotes the population total N

YT   Yi  NY i 1

which can be estimated by

YˆT  NYˆ  Ny .

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 16


Obviously

 

E YˆT  NE  y   NY

 

Var YˆT  N 2  y   2  N  n  2 N ( N  n) 2 S for SRSWOR  N  Nn  S  n      N 2  N  1  S 2  N ( N  1) S 2 for SRSWOR   Nn  n and the estimates of variance of YˆT are  N ( N  n) 2 s for SRSWOR  n  ˆ Var (YT )    N s2 for SRSWOR  n

Confidence limits for the population mean Now we construct the 100 (1   ) % confidence interval for the population mean. Assume that the population is normally distributed N (  ,  2 ) with mean  and variance  2 .

then

y Y Var ( y )

follows N (0,1) when  2 is known. If  2 is unknown and is estimated from the sample then y Y follows a t -distribution with (n  1) degrees of freedom. When  2 is known, then the Var ( y ) 100( 1   ) % confidence interval is given by

 y Y  Z P Z  Var ( y )  2 2

   1 

 or P  y  Z  Var ( y )  y  y  Z   2 2

 Var ( y )   1   

and the confidence limits are

  y  Z 2 

Var ( y ), y  Z  2

 Var ( y  

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 17


when Z 

denotes the upper

2

 2

% points on N (0,1) distribution. Similarly, when  2 is unknown,

then the 100(1- 1   ) % confidence interval is   y Y  t   1   P  t  Varˆ( y )  2 2     or P  y  t  Varˆ( y )  y  y  t Varˆ( y )   1     2 2

and the confidence limits are    y  t  Varˆ( y )  y  t Varˆ( y )  2 2  

where t denotes the upper 2

 2

% points on t -distribution with (n  1) degrees of freedom.

Determination of sample size The size of the sample is needed before the survey starts and goes into operation. One point to be kept is mind is that when the sample size increases, the variance of estimators decreases but the cost of survey increases and vice versa. So there has to be a balance between the two aspects. The sample size can be determined on the basis of prescribed values of standard error of sample mean, error of estimation, width of the confidence interval, coefficient of variation of sample mean, relative error of sample mean or total cost among several others. An important constraint or need to determine the sample size is that the information regarding the population standard derivation S should be known for these criterion. The reason and need for this will be clear when we derive the sample size in the next section. A question arises about how to have information about S before hand? The possible solutions to this issue are to conduct a pilot survey and collect a preliminary sample of small size, estimate S and use it as known value of S it. Alternatively, such information can also be collected from past data, past experience, long association of experimenter with the experiment, prior information etc. Now we find the sample size under different criteria assuming that the samples have been drawn using SRSWOR. The case for SRSWR can be derived similarly.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 18


1. Prespecified variance The sample size is to be determined such that the variance of y should not exceed a given value, say V. In this case, find n such that

Var ( y )  V or

N n (y)  V Nn

or

N n 2 S V Nn

or

1 1 V   2 n N S

or

1 1 1   n N ne

n

ne n 1 e N

where ne 

S2 . v

It may be noted here that ne can be known only when S 2 is known. This reason compels to assume that S should be known. The same reason will also be seen in other cases. The smallest sample size needed in this case is nsmallest 

ne . ne 1 N

It N is large, then the required n is n  ne and nsmallest  ne .

2. Pre-specified estimation error It may be possible to have some prior knowledge of population mean Y and it may be required that the sample mean y should not

differ from it by more than a

specified amount of absolute

estimation error, i.e., which is a small quantity. Such requirement can be satisfied by associating a probability (1   ) with it and can be expressed as P  y  Y  e   (1   ).

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 19


Since y follows N (Y ,

N n 2 S ) assuming the normal distribution for the population, we can write Nn

 y Y  e P    1 Var ( y )   Var ( y )

which implies that

e  Z Var ( y ) 2 or Z 2 Var ( y )  e 2 2

or Z 2 2

N n 2 S  e2 Nn

  Z S 2    2         e     or n    2   Z S    1 2    1   N  e        which is the required sample size. If N is large then 2

 Z S    n   2e  .    

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 20


3. Pre-specified width of confidence interval If the requirement is that the width of the confidence interval of y with confidence coefficient (1   ) should not exceed a prespecified amount W , then the sample size n is determined such that

2 Z  Var ( y )  W 2

assuming  2 is known and population is normally distributed. This can be expressed as

2Z  2

N n S W Nn

1 1  or 4Z 2    S 2  W 2 N 2 n or

1 1 W2   n N 4 Z 2 S 2 2

4Z 2 S 2 2

W2 . 4Z 2 S 2

or n 

1

2

NW 2

The minimum sample size required is

4Z 2 S 2 2

W2 4 Z 2 S 2

nsmallest  1

2

NW 2

If N is large then 4Z 2 S 2

n

2

W2

and the minimum sample size needed is 4Z 2 S 2

nsmallest 

2

W2

.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 21


4. Pre-specified coefficient of variation The coefficient of variation (CV) is defined as the ratio of standard error (or standard deviation) and mean. The knowledge of coefficient of variation has played an important role in the sampling theory as this information has helped in deriving efficient estimators. If it is desired that the the coefficient of variation of y should not exceed a given or pre-specified value of coefficient of variation, say C0 , then the required sample size n is to be determined such that

CV ( y )  C0 or

Var ( y )  C0 Y

N n 2 S or Nn 2  C02 Y or

1 1 C02   n N C2

C2 Co2 or n  C2 1 NC02 is the required sample size where C 

S is the population coefficient of variation. Y

The smallest sample size needed in this case is

nsmallest

C2 C02  . C2 1 NC02

If N is large, then

n

C2 C02

and nsmalest 

C2 C02

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 22


5. Pre-specified relative error When y is used for estimating the population mean Y , then the relative estimation error is defined as

y Y . If it is required that such relative estimation error should not exceed a pre-specified value Y

R with probability (1   ) , then such requirement can be satisfied by expressing it like such requirement can be satisfied by expressing it like  y Y RY   P   1. Var ( y )   Var ( y )

 N n 2 S . Assuming the population to be normally distributed, y follows N  Y , Nn   So it can be written that

RY  Z . Var ( y ) 2

 N n 2 2 2 or Z 2  S  R Y 2  Nn  R2 1 1  or     2 2  n N  C Z 2

2

 Z C   2   R     or n   2  Z C  1   1  2  N R    where C 

S is the population coefficient of variation and should be known. Y

If N is large, then 2

 z C  n 2  .  R     

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 23


6. Pre-specified cost Let an amount of money C is being designated for sample survey to called n observations, C0 be the overhead cost and C1 be the cost of collection of one unit in the sample. Then the total cost C can be expressed as C  C0  nC1 Or n 

C  C0 C1

is the required sample size.

Sampling Theory| Chapter 2 | Simple Random Sampling | Shalabh, IIT Kanpur

Page 24


Chapter 3 Sampling For Proportions and Percentages In many situations, the characteristic under study on which the observations are collected are qualitative in nature. For example, the responses of customers in many marketing surveys are based on replies like ‘yes’ or ‘no’ , ‘agree’ or ‘disagree’ etc. Sometimes the respondents are asked to arrange several options in the order like first choice, second choice etc. Sometimes the objective of the survey is to estimate the proportion or the percentage of brown eyed persons, unemployed persons, graduate persons or persons favoring a proposal, etc. In such situations, the first question arises how to do the sampling and secondly how to estimate the population parameters like population mean, population variance, etc.

Sampling procedure: The same sampling procedures that are used for drawing a sample in case of quantitative characteristics can also be used for drawing a sample for qualitative characteristic. So, the sampling procedures remain same irrespective of the nature of characteristic under study either qualitative or quantitative. For example, the SRSWOR and SRSWR procedures for drawing the samples remain the same for qualitative and quantitative characteristics. Similarly, other sampling schemes like stratified sampling, two stage sampling etc. also remain same.

Estimation of population proportion: The population proportion in case of qualitative characteristic can be estimated in a similar way as the estimation of population mean in case of quantitative characteristic.

Consider a qualitative characteristic based on which the population can be divided into two mutually exclusive classes, say C and C*. For example, if C is the part of population of persons saying ‘yes’ or ‘agreeing’ with the proposal then C* is the part of population of persons saying ‘no’ or ‘disagreeing’ with the proposal. Let A be the number of units in C and (N - A) units in C* be in a population of size N. Then the proportion of units in C is P=

A N

and the proportion of units in C* is Q=

N−A = 1 − P. N

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 1


An indicator variable Y can be associated with the characteristic under study and then for i = 1,2,..,N 1 Yi =  0

i th unit belongs to C i th unit belongs to C *.

Now the population total is N

Y ∑=

YTOTAL =

i

i =1

A

and population mean is N

∑Y

Y=

i

i =1

N

A = P. N

=

Suppose a sample of size n is drawn from a population of size N by simple random sampling . Let a be the number of units in the sample which fall into class C and (n − a ) units fall in class C*, then the sample proportion of units in C is a p= . n

which can be written as n

a p= = n

∑y

i

= y.

i =1

n N

Since

∑Y = 2

i =1

= S 2

= = =

i

A= NP, so we can write S 2 and s 2 in terms of P and Q as follows:

1 N (Yi − Y ) 2 ∑ N − 1 i =1 N 1 (∑ Yi 2 − NY 2 ) N − 1 i =1 1 ( NP − NP 2 ) N −1 N PQ. N −1 n

Similarly,

∑y= i =1

2 i

a= np and

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 2


1 n ∑ ( yi − y )2 n − 1 i =1 n 1 (∑ yi2 − ny 2 ) = n − 1 i =1 1 (np − np 2 ) = n −1 n = pq. n −1

= s2

Note that the quantities y , Y , s 2 and S 2 have been expressed as functions of sample and population proportions. Since the sample has been drawn by simple random sampling and sample proportion is same as the sample mean, so the properties of sample proportion in SRSWOR and SRSWR can be derived using the properties of sample mean directly.

1. SRSWOR Since sample mean y an unbiased estimator of population mean Y , i.e. E ( y ) = Y in case of SRSWOR, so E ( p= ) E( y= ) Y= P and p is an unbiased estimator of P.

Using the expression of Var ( y ), the variance of p can be derived as N −n 2 S Nn N −n N PQ = . Nn N − 1 N − n PQ = . . N −1 n

Var = = ( p ) Var (y)

Similarly, using the estimate of Var ( y ), the estimate of variance can be derived as N −n 2   = = Var s ( p ) Var ( y) Nn N −n n = pq Nn n − 1 N −n = pq. N (n − 1) (ii) SRSWR Since the sample mean y is an unbiased estimator of population mean Y in case of SRSWR, so the sample proportion, Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 3


E ( p= ) E( y= ) Y= P, i.e., p is an unbiased estimator of P. Using the expression of variance of y and its estimate in case of SRSWR, the variance of p and its estimate can be derived as follows:

N −1 2 S Nn N −1 N PQ = Nn N − 1 PQ = n  ( p ) = n . pq Var n −1 n pq = . n −1 Var = ( p ) Var = ( y)

Estimation of population total or total number of count It is easy to see that an estimate of population total A (or total number of count ) is Na = Aˆ Np = , n

its variance is Var ( Aˆ ) = N 2 Var ( p )

and the estimate of variance is

 ( Aˆ ) = N 2 Var  ( p ). Var

Confidence interval estimation of P If N and n are large then

p−P approximately follows N(0,1). With this approximation, we Var ( p )

can write   p−P P −Z α ≤ ≤ Z α  =− 1 α Var ( p )  2 2  

and the 100(1 − α )% confidence interval of P is

   p − Z α Var ( p ), p + Z α Var ( p )  .   2 2 Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 4


It may be noted that in this case, a discrete random variable is being approximated by a continuous random variable, so a continuity correction n/2 can be introduced in the confidence limits and the limits become  n n  p − Z α Var ( p ) + , p + Z α Var ( p ) −  2 2  2 2

Use of Hypergeometric distribution : When SRS is applied for the sampling of a qualitative characteristic, the methodology is to draw the units one-by-one and so the probability of selection of every unit remains the same at every step. If n sampling units are selected together from N units, then the probability of selection of units does not remains the same as in the case of SRS.

Consider a situation in which the sampling units in a population are divided into two mutually exclusive classes. Let P and Q be the proportions of sampling units in the population belonging to classes ‘1’ and ‘2’ respectively. Then NP and NQ are the total number of sampling units in the population belonging to class ‘1’ and ‘2’, respectively and so NP + NQ = N.

The

probability that in a sample of n selected units out of N units by SRS such that n1 selected units belongs to class ‘1’ and

n2 selected units belongs to class ‘2’ is governed by the

hypergeometric distribution and  NP  NQ     n1  n2   . P(n1 ) = N   n As N grows large, the hypergeometric distribution tends to Binomial distribution and P(n1 ) is approximated by

n = P (n1 )   p n1 (1 − p ) n2  n1 

Inverse sampling In general, it is understood in the SRS methodology for qualitative characteristic that the attribute under study is not a rare attribute. If the attribute is rare, then the procedure of estimating the population proportion P by sample proportion n / N is not suitable. Some such situations are, e.g., estimation of frequency of rare type of genes, proportion of some rare type Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 5


of cancer cells in a biopsy, proportion of rare type of blood cells affecting the red blood cells etc. In such cases, the methodology of inverse sampling can be used.

In the methodology of inverse sampling, the sampling is continued until a predetermined number of units possessing the attribute under study occur in the sampling which is useful for estimating the population proportion. The sampling units are drawn one-by-one with equal probability and without replacement. The sampling is discontinued as soon as the number of units in the sample possessing the characteristic or attribute equals a predetermined number.

Let m denotes the predetermined number indicating the number of units possessing the characteristic. The sampling is continued till m number of units are obtained. Therefore, the sample size n required to attain m becomes a random variable.

Probability distribution function of n In order to find the probability distribution function of n, consider the stage of drawing of samples t such that at t = n, the sample size n completes the m units with attribute. Thus the first (t - 1) draws would contain (m - 1) units in the sample possessing the characteristic out of NP units. Equivalently, there are (t - m) units which do not possess the characteristic out of NQ such units in the population. Note that the last draw must ensure that the units selected possess the characteristic. So the probability distribution function of n can be expressed as

P ( n)

 The unit drawn at   In a sample of (n -1) units      P  drawn from N , (m -1) units  × P  the nth draw will   will possess the attribute   possess the attribute        NP  NQ      m − 1 n − m    NP − m + 1    = , n =m, m + 1,..., m + NQ.    N − n + 1   N       n − 1  

Note that the first term (in square brackets) is derived using hypergeometric distribution as the probability for deriving a sample of size (n – 1) in which (m – 1) units are from NP units and (n – m) units are from NQ units. The second term

NP − m + 1 is the probability associated N − n +1

with the last draw where it is assumed that we get the unit possessing the characteristic. m + NQ

Note that

P(n) = 1.

n=m

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 6


Estimate of population proportion Consider the expectation of  m −1  E =  n −1  =

m + NQ

 m −1 

∑  n − 1  P(n)

n=m

m + NQ

n=m

=

m −1 . n −1

 NP   NQ      m − 1   m − 1  n − m  Np − m + 1 .   N − n +1  N   n −1     n − 1

m + NQ −1

n=m

 NP − 1  NQ      NP − m + 1   m − 2   n − m     N − 1  N − n +1     n−2

which is obtained by replacing NP by NP – 1, m by (m – 1) and n by (n - 1) in the earlier step. Thus  m −1  E  = P.  n −1  m −1 So Pˆ = is an unbiased estimator of P. n −1

Estimate of variance of P̂ Now we derive an estimate of variance of P̂ . By definition Var = ( Pˆ ) E ( Pˆ 2 ) −  E ( Pˆ ) 

2

=E ( Pˆ 2 ) − P 2 .

Thus

 ( Pˆ= Var ) Pˆ 2 − Estimate of P 2 . In order to obtain an estimate of P 2 , consider the expectation of

(m − 1)(m − 2) , i.e., (n − 1)(n − 2)

 (m − 1)(m − 2)   (m − 1)(m − 2)  = ∑ E  P ( n )  (n − 1)(n − 2)  n≥ m  (n − 1)(n − 2)    NP − 2   NQ      P( NP − 1)  NP − m + 1    m − 3   n − m   = ∑   N − 1 n≥m  N − n + 1    N − 2      n−3   

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 7


where the last term inside the square bracket is obtained by replacing NP by ( NP − 2), N by (n − 2) and m by (m - 2) in the probability distribution function of hypergeometric distribution.

This solves further to  (m − 1)(m − 2)  NP 2 P . = − E   (n − 1)(n − 2)  N − 1 N − 1 Thus an unbiased estimate of P 2 is  N − 1  (m − 1)(m − 2) = Estimate of P 2  +   N  (n − 1)(n − 2)  N − 1  (m − 1)(m − 2) =  +   N  (n − 1)(n − 2)

Pˆ N 1 m −1 . . N n −1

Finally, an estimate of variance of P̂ is  ( Pˆ= Var ) Pˆ 2 − Estimate of P 2  m − 1   N − 1 (m − 1)(m − 2) 1  m − 1   = . +   −   n − 1   N (n − 1)(n − 2) N  n − 1    m − 1   m − 1  1  ( N − 1)(m − 2)   =    + 1 −  . n−2  n − 1   n − 1  N   2

For large N , the hypergeometric distribution tends to negative Binomial distribution with  n − 1  m −1 n − m probability density function   P Q . So  m − 1 m −1 Pˆ = n −1

and m − 1)(n − m) Pˆ (1 − Pˆ )  ( Pˆ ) (= . = Var (n − 1) 2 (n − 2) n−2

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 8


Estimation of proportion for more than two classes We have assumed up to now that there are only two classes in which the population can be divided based on a qualitative characteristic. There can be situations when the population is to be divided into more than two classes. For example, the taste of a coffee can be divided into four categories very strong, strong, mild and very mild. Similarly in another example the damage to crop due to storm can be classified into categories like heavily damaged, damaged, minor damage and no damage etc.

These type of situations can be represented by dividing the population of size N into, say k, mutually = P1

exclusive

classes

Ck C1 C2 , P2 ,..., , = = Pk N N n

C1 , C2 ,..., Ck .

Corresponding

to

these

classes,

let

be the proportions of units in the classes C1 , C2 ,..., Ck

respectively.

Let a sample of size n is observed such that c1 , c2 ,..., ck number of units have been drawn from C1 , C2 ,..., Ck . respectively. Then the probability of observing c1 , c2 ,..., ck is  C1  C2   Ck     ...   c c c P (c1 , c2 ,..., ck ) =  1  2   k  . N   n The population proportions Pi can be estimated by = pi

ci , i 1, 2,..., k . = n

It can be easily shown that E= ( pi ) P= i 1, 2,..., k , i, Var ( pi ) =

N − n PQ i i N −1 n

and  ( p ) = N − n pi qi Var i N n −1

For estimating the number of units in the ith class,

Cˆ i = Npi Var (Cˆ ) = N 2Var ( p ) i

i

and  (Cˆ ) = N 2 Var  ( p ). Var i

i

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 9


The confidence intervals can be obtained based on single pi as in the case of two classes. If N is large, then the probability of observing c1 , c2 ,..., ck can be approximated by multinomial distribution given by P(c1 , c2 ,..., ck ) =

n! P1c1 P2c2 ...Pkck . c1 !c2 !...ck !

For this distribution E= i 1, 2,.., k , ( pi ) P= i, Var ( pi ) =

Pi (1 − Pi ) n

and  ( pˆ ) = pi (1 − pi ) . Var i n

Sampling Theory| Chapter 3 | Sampling for Proportions | Shalabh, IIT Kanpur

Page 10


Chapter 4 Stratified Sampling An important objective in any estimation problem is to obtain an estimator of a population parameter which can take care of the salient features of the population. If the population is homogeneous with respect to the characteristic under study, then the method of simple random sampling will yield a homogeneous sample and in turn, the sample mean will serve as a good estimator of population mean. Thus, if the population is homogeneous with respect to the characteristic under study, then the sample drawn through simple random sampling is expected to provide a representative sample. Moreover, the variance of sample mean not only depends on the sample size and sampling fraction but also on the population variance. In order to increase the precision of an estimator, we need to use a sampling scheme which can reduces the heterogeneity in the population. If the population is heterogeneous with respect to the characteristic under study, then one such sampling procedure is stratified sampling.

The basic idea behind the stratified sampling is to •

divide the whole heterogeneous population into smaller groups or subpopulations, such that the sampling units are homogeneous with respect to the characteristic under study within the subpopulation and

heterogeneous with respect to the characteristic under study between/among the subpopulations. Such subpopulations are termed as strata.

Treat each subpopulation as separate population and draw a sample by SRS from each stratum.

[Note: ‘Stratum’ is singular and ‘strata’ is plural].

Example: In order to find the average height of the students in a school of class 1 to class 12, the height varies a lot as the students in class 1 are of age around 6 years and students in class 10 are of age around 16 years. So one can divide all the students into different subpopulations or strata such as

Students of class 1, 2 and 3: Stratum 1 Students of class 4, 5 and 6: Stratum 2 Students of class 7, 8 and 9: Stratum 3 Students of class 10, 11 and 12: Stratum 4

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 1


Now draw the samples by SRS from each of the strata 1, 2, 3 and 4. All the drawn samples combined together will constitute the final stratified sample for further analysis.

Notations: We use the following symbols and notations: N

: Population size

k

: Number of strata

Ni

: Number of sampling units in ith strata k

N = ∑ Ni i =1

ni

: Number of sampling units to be drawn from ith stratum. k

n = ∑ ni : Total sample size i =1

Population (N units)

Stratum 1 N1 units

Stratum 2 N2 units

… … …

Stratum k Nk units

k

N = ∑ Ni i =1

Sample 1 n1 units

Sample 2 n2 units

… … …

Sample k nk it

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

k

n = ∑ ni i =1

Page 2


Procedure of stratified sampling Divide the population of N units into k strata. Let the ith stratum has N1 , i = 1, 2,..., k number of units. •

Strata are constructed such that they are non-overlapping and homogeneous with respect to the k

characteristic under study such that

∑N i =1

Draw a sample of size

i

= N.

ni from ith ( i = 1, 2,..., k ) stratum using SRS (preferably WOR)

independently from each stratum. •

All the sampling units drawn from each stratum will constitute a stratified sample of size k

n = ∑ ni . i =1

Difference between stratified and cluster sampling schemes In stratified sampling, the strata are constructed such that they are •

within homogeneous and

among heterogeneous.

In cluster sampling, the clusters are constructed such that they are •

within heterogeneous and

among homogeneous.

[Note: We discuss the cluster sampling later.]

Issue in the estimation of parameters in stratified sampling Divide the population of N units in k strata. Let the i th stratum has N i , i = 1, 2,..., k number of units. Note that there are k independent samples drawn through SRS of sizes n1 , n2 ,..., nk from each of the strata. So, one can have k estimators of a parameter based on the sizes n1 , n2 ,..., nk respectively. Our interest is not to have k different estimators of the parameters but the ultimate goal is to have a single estimator. In this case, an important issue is how to combine the different sample information together into one estimator which is good enough to provide the information about the parameter.

We now consider the estimation of population mean and population variance from a stratified sample. Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 3


Estimation of population mean and its variance Let

Y : characteristic under study,

yij : value of jth unit in ith stratum j = 1,2,…,n i , i = 1,2,...,k, Ni

Yi =

1 Ni

∑y

yi =

1 ni

ni

ij

: population mean of ith stratum

ij

: sample mean from ith stratum

j =1

∑y j =1

k 1 k N = N iYi ∑ wiYi : population mean where wi = i . ∑ N i 1 =i 1 N =

= Y

Estimation of population mean: First we discuss the estimation of population mean. Note that the population mean is defined as the weighted arithmetic mean of stratum means in the case of stratified sampling where the weights are provided in terms of strata sizes. Based on the expression Y =

y=

1 N

k

∑NY, i =1

i i

one may choose the sample mean

1 k ∑ ni yi n i =1

as a possible estimator of Y .

Since the sample in each stratum is drawn by SRS, so

E ( yi ) = Yi , thus 1 k ∑ ni E ( yi ) n i =1 1 k = ∑ ni Yi n i =1

E( y ) =

≠Y

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 4


and y turns out to be a biased estimator of Y . Based on this, one can modify y so as to obtain an unbiased estimator of Y . Consider the stratum mean which is defined as the weighted arithmetic mean of strata sample means with strata sizes as weights given by k

1 yst = N

∑N y. i

i =1

i

Now

E ( yst ) = =

1 k ∑ Ni E ( yi ) N i =1

1 N

k

∑N Y i =1

i

i

=Y Thus yst is an unbiased estimator of Y .

Variance of yst = Var ( yst )

k

k

ni

∑ w Var ( y ) + ∑ ∑ w w Cov( y , y ). i= 1

2 i

i

i ( ≠ j )= 1 j = 1

i

j

i

j

Since all the samples have been drawn independently from each of the strata by SRSWOR so

Cov( yi , y= 0, i ≠ j j) Var ( yi ) =

N i − ni 2 Si N i ni

where = Si2

1 Ni (Yij − Y i ) 2 . ∑ N i − 1 j =1

Thus k

Var ( yst ) = ∑ wi2 i =1

=

N i − ni 2 Si N i ni

ni  Si2 2 1 w −  . ∑ i  N i =1 i  ni  k

Observe that Var ( yst ) is small when Si2 is small. This observation suggests how to construct the strata. If Si2 is small for all i = 1,2,...,k, then Var ( yst ) will also be small. That is why it was

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 5


mentioned earlier that the strata are to be constructed such that they are within homogeneous, i.e., Si2 is small and among heterogeneous.

For example, the units in geographical proximity will tend to be more closer. The consumption pattern in the households will be similar within a lower income group housing society and within a higher income group housing society whereas they will differ a lot between the two housing societies based on income.

Estimate of Variance Since the samples have been drawn by SRSWOR, so E ( si2 ) = Si2 where si2 =

1 ni ∑ ( yij − yi )2 ni − 1 j =1

 ( y ) = N i − ni s 2 and Var i i N i ni k

 ( y ) = w2 Var so Var ∑ i  ( yi ) st i =1

k  N −n = ∑ wi2  i i i =1  N i ni

 2  si 

Note: If SRSWR is used instead of SRSWOR for drawing the samples from each stratum, then in this case k

yst = ∑ wi yi i =1

E ( yst ) = Y 2 k k 2  Ni − 1  2 2 σi w S w = ∑ i  Nn  i ∑ i ni =i 1 = i 1  i i  k wi2 si2 (y ) = Var ∑ st ni i =1

Var ( yst ) =

σ i2 where =

1 ni

Ni

∑(y j =1

ij

− yi ) 2 .

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 6


Advantages of stratified sampling 1.

Data of known precision may be required for certain parts of the population. This can be accomplished with a more careful investigation to few strata. Example: In order to know the direct impact of hike in petrol prices, the population can be divided into strata like lower income group, middle income group and higher income group. Obviously, the higher income group is more affected than the lower income group. So more careful investigation can be made in the higher income group strata.

2.

Sampling problems may differ in different parts of the population. Example: To study the consumption pattern of households, the people living in houses, hotels, hospitals, prison etc. are to be treated differently.

3.

Administrative convenience can be exercised in stratified sampling. Example:

In taking a sample of villages from a big state, it is more administratively

convenient to consider the districts as strata so that the administrative setup at district level may be used for this purpose. Such administrative convenience and the convenience in organization of field work are important aspects in national level surveys. 4.

Full cross-section of population can be obtained through stratified sampling. It may be possible in SRS that some large part of the population may remain unrepresented. Stratified sampling enables one to draw a sample representing different segments of the population to any desired extent. The desired degree of representation of some specified parts of population is also possible.

5.

Substantial gain in the efficiency is achieved if the strata are formed intelligently.

6.

In case of skewed population, use of stratification is of importance since larger weight may have to be given for the few extremely large units

which in turn reduces the sampling

variability. 7.

When estimates are required not only for the population but also for the subpopulations, then the stratified sampling is helpful.

8.

When the sampling frame for subpopulations is more easily available than the sampling frame for whole population, then stratified sampling is helpful.

9.

If population is large, then it is convenient to sample separately from the strata rather than the entire population.

10.

The population mean or population total can be estimated with higher precision by suitably providing the weights to the estimates obtained from each stratum.

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 7


Allocation problem and choice of sample sizes is different strata Question: How to choose the sample sizes n1 , n2 ,..., nk so that the available resources are used in an effective way? There are two aspects of choosing the sample sizes: (i)

Minimize the cost of survey for a specified precision.

(ii)

Maximize the precision for a given cost.

Note: The sample size cannot be determined by minimizing both the cost and variability simultaneously. The cost function is directly proportional to the sample size whereas variability is inversely proportional to the sample size.

Based on different ideas, some allocation procedures are as follows:

1. Equal allocation Choose the sample size ni to be the same for all the strata. Draw samples of equal size from each strata. Let n be the sample size and k be the number of strata, then = ni

n = for all i 1, 2,..., k . k

2. Proportional allocation For fixed k, select ni such that it is proportional to stratum size N i , i.e.,

ni ∝ N i or ni = CN i where C is the constant of proportionality. k

k

∑ n = ∑ CN

i =i 1 =i 1

i

or n = CN n ⇒ C =. N n Thus ni =   N i . N Such allocation arises from the considerations like operational convenience. Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 8


3. Neyman or optimum allocation This allocation considers the size of strata as well as variability

ni ∝ N i Si ni = C * N i Si where C* is the constant of proportionality. k

k

∑ n = ∑C N S *

i =i 1 =i 1

i

i

k

or n = C * ∑ N i Si i =1

or C * =

n k

∑N S i

i =1

Thus ni =

i

nN i Si

.

k

∑N S i =1

i

i

This allocation arises when the Var ( yst ) is minimized subject to the constraint

k

∑n i =1

i

(prespecified).

There are some limitations of the optimum allocation. The knowledge of Si (i = 1, 2,..., k ) is needed to know ni . If there are more than one characteristics, then they may lead to conflicting allocation.

Choice of sample size based on cost of survey and variability The cost of survey depends upon the nature of survey. A simple choice of the cost function is k

= C0 + ∑ Ci ni C i =1

where C : total cost

C0 : overhead cost, e.g., setting up of office, training people etc

Ci : cost per unit in the ith stratum k

∑ C n : total cost within sample. i =1

i i

To find ni under this cost function, consider the Lagrangian function with Lagrangian multiplier λ as Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 9


φ= Var ( yst ) + λ 2 (C − C0 ) k 1  2 2 1 2 w − S + λ Ci ni  i ∑ ∑ i  n N 1 i 1= i i   i 2 2 k k k wS w2 S 2 = ∑ i i + λ 2 ∑ Ci ni − ∑ i i ni =i 1 =i 1 N i =i 1 k

=

2

w S  = ∑  i i − λ Ci ni  + terms independent of ni . i =1   ni  k

Thus φ is minimum when

wi Si ni or ni =

= λ Ci ni for all i 1 wi Si . λ Ci

How to determine λ ? There are two ways to determine λ . (i)

Minimize variability for fixed cost .

(ii)

Minimize cost for given variability.

We consider both the cases.

(i)

Minimize variability for fixed cost

Let C = C0* be the pre-specified cost which is fixed. k

So

∑C n i =1

i i

k

or

∑C i =1

i

= C0*

wi Si = C0* λ Ci

k

or λ =

∑ i =1

Ci wi Si C0*

.

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 10


Substituting λ in the expression for ni =  wS  C0* ni* = i i  k Ci   ∑ Ci wi Si  i =1

1 wi Si , the optimum ni is obtained as λ Ci

  .   

The required sample size to estimate Y such that the variance is minimum for given cost C = C0* is k

n = ∑ ni* . i =1

(ii)

Minimize cost for given variability

Let V = V0 be the pre-specified variance. Now determine ni such that

 2 2 V0  wi Si = i =1  i  2 2 k k wS w2 S 2 or ∑ i i= V0 + ∑ i i ni Ni =i 1 = i 1 k

1

∑ n

1 Ni

k λ Ci 2 2 wi2 Si2 or ∑ wi Si= V0 + ∑ wi Si Ni i 1= i 1 k

or λ

k w2 S 2 V0 + ∑ i i Ni 1 wi Si i =1 (after substituting ni ). = k λ Ci ∑ wi Si Ci i =1

Thus the optimum ni is  k wi Si Ci  wi Si  ∑ i =1 ni = k wi2 Si2 Ci  V +  0 ∑ N i =1 i 

  .   

So the required sample size to estimate Y such that cost C is minimum for a k

prespecified variance V0 is

n = ∑ ni . i =1

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 11


Sample size under proportional allocation for fixed cost and for fixed variance (i) If cost C = C0 is fixed then C0 = Under proportional allocation,= ni

k

∑C n . i =1

i i

n = N i nwi N

k

So C0 = n ∑ wi Ci i =1

or n =

C0

.

k

∑wC i

i =1

Thus ni =

i

Co wi . ∑ wiCi k

The required sample size to estimate Y in this case is n = ∑ ni . i =1

(ii) If variance = V0 is fixed, then

1  2 2 V0  wi Si = Ni  i =1  i k k w2 S 2 w2 S 2 or ∑ i i= V0 + ∑ i i ni Ni =i 1 = i 1 k

1

∑ n

k wi2 Si2 wi2 Si2 (using ni = = + V nwi ) ∑ nw 0 ∑ N =i 1 = 1 i i i k

or

k

or n =

∑w S 2 i

i =1

2 i

wi2 Si2 V0 + ∑ Ni i =1 k

k

or ni = wi

∑w S i

i =1

2 i

w2 S 2 V0 + ∑ i i Ni i =1 k

.

This is known Bowley’s allocation.

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 12


Variances under different allocations Now we derive the variance of yst under proportional and optimum allocations.

(i) Proportional allocation Under proportional allocation

ni =

n Ni N

and k  N −n Var ( y ) st = ∑  i i i =1  N i ni

 2 2 wi Si  n   2 Ni  k  Ni −  Ni  2 N Varprop ( y ) st = ∑     Si n i =1  Ni Ni   N  N   2 N − n k N i Si = ∑ Nn i =1 N N −n k = wi Si2 . ∑ Nn i =1

(ii) Optimum allocation Under optimum allocation ni =

nN i Si k

∑N S i

i =1

i

 2 2  wi Si i =1  i  2 2 k k wi Si wi2 Si2 = ∑ −∑ ni Ni =i 1 = i 1 k

1

∑ n

( yst ) Vopt=

1 Ni

  k   ∑ N i Si   k w2 S 2 k   wi2 Si2  i =1   −∑ i i ∑ nN S Ni   =i 1 = i 1 i i         1 N i Si  k   k wi2 Si2 N i Si   − ∑ ∑  n . N 2  ∑ =i 1 = i 1=   i 1 Ni k

2

2

k wi2 Si2 1  k 1 k 1  k N i Si   = ∑ − = − w S wi Si2 . ∑ ∑ ∑ i i    N  i1 N n i 1 = n  i 1= = =  N i1 i

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 13


Comparison of variances of sample mean under SRS with stratified mean under proportional and optimal allocation: (a) Proportional allocation: N −n 2 S Nn N − n k N i Si2 V p r op ( yst ) = ∑ . Nn i =1 N VSRS ( y ) =

In order to compare VSRS ( y ) and V prop ( yst ), first we attempt to express S 2 as a function of Si2 . Consider Ni

k

∑ ∑ (Y

( N − 1)= S2

ij

=i 1 =j 1

=

Ni

k

∑ ∑ (Y

ij

=i 1 =j 1

=

− Y )2 2

− Yi ) + (Yi − Y ) 

Ni

k

k

∑ (Yij − Yi )2 + ∑

=i 1 =j 1

=

k

∑ (N

Ni

∑ (Y − Y )

=i 1 =j 1

− 1) S +

2

i

k

∑ N (Y − Y )

2 i i =i 1 =i 1

i

2

i

k N N − 1 2 k Ni − 1 2 S = ∑ Si + ∑ i (Yi − Y ) 2 . N N = i 1= i 1 N

For simplification, we assume that N i is large enough to permit the approximation

Ni − 1 N −1 ≈ 1 and ≈1. Ni N Thus Ni 2 k Ni (Yi − Y ) 2 Si + ∑ ∑ N =i 1 = i 1 N S2 =

or

k

N − n 2 N − n k Ni 2 N − n k Ni N -n S = Si + (Yi − Y ) 2 (Premultiply by on both sides) ∑ ∑ Nn Nn i 1 = N Nn i 1 N Nn = VarSRS (Y ) = V prop ( y st ) + k

Since

∑ w (Y − Y ) i =1

i

i

2

N −n Nn

k

∑ w (Y − Y ) i =1

i

2

i

≥ 0,

⇒ Varprop ( yst ) ≤ VarSRS ( y ).

Larger gain in the difference is achieved when Yi differs from Y more. Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 14


(b) Optimum allocation 2

1 k 1 k  Vopt ( yst ) w S wi Si2 . = − ∑ ∑ i i   n  i 1= N i1 = 

Consider 2 1  k  1 k  N − n  k  2 V prop ( yst ) −= Vopt ( yst )  w S w S wi Si2  − −  ∑ ∑ i i  ∑ i i   =  N i1  Nn  i 1 =   n  i 1 =  2 1 k  k   2 =  ∑ wi Si −  ∑ wi Si   n  i 1 = = i1  

1 k 1 wi Si2 − S 2 ∑ n i =1 n k 1 = ∑ wi (Si − S )2 n i =1 =

where k

S = ∑ wi Si i =1

⇒ Varprop ( yst ) − Varopt ( yst ) ≥ 0 or Varopt ( yst ) ≤ Varprop ( yst ).

Larger gain in efficiency is achieved when Si differ from S more. Combining the results in (a) and (b), we have

Varopt ( yst ) ≤ Varprop ( yst ) ≤ VarSRS ( y )

Estimate of variance and confidence intervals Under SRSWOR, an unbiased estimate of Si 2 for the ith stratum (i = 1,2,...,k) is

= si2

1 ni ( yij − yi ) 2 . ∑ ni − 1 j =1

In stratified sampling, k

Var ( yst ) = ∑ wi2 i =1

N i − ni 2 Si . N i ni

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 15


So an unbiased estimate of Var ( yst ) is k N −n (y ) = Var wi2 i i si2 ∑ st N i ni i =1

wi2 si2 k wi2 si2 ∑ n −∑ =i 1 = i 1 Ni i

=

k

wi2 si2 1 k − ∑ wi si2 ∑ ni N i1 =i 1 = k

=

The second term in this expression represents the reduction due to finite population correction. The confidence limits of Y can be obtained as (y ) yst ± t Var st

assuming yst is normally distributed and

 ( y ) is well determined so that t can be read from Var st

normal distribution tables. If only few degrees of freedom are provided by each stratum, then t values are obtained from the table of student’s t-distribution.

The distribution of

 ( y ) is generally complex. An approximate method of assigning an effective Var st

 ( y ) is number of degrees of freedom (ne ) to Var st

 k 2  ∑ gi si  ne =  i =k1 2 4 gi si ∑ i =1 ni − 1 where gi =

2

N i ( N i − ni ) ni

and Min(ni − 1) ≤ ne ≤

k

∑ (n − 1) i =1

i

assuming yij are normally distributed.

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 16


Modification of optimal allocation Sometimes in the optimal allocation, the size of subsample exceeds the stratum size. In such a case, replace ni by N i and recompute the rest of ni ' s by the revised allocation. For example, if n1 > N1 , then take the revised ni ' s as

n1 = N1 and

= ni

(n − N1 ) wi Si = ; i 2,3,..., k k ∑ wi Si i =2

provided ni ≤ N i for all i = 2,3,…,k. Suppose in revised allocation, we find that n2 > N 2 then the revised allocation would be n1 = N1 n2 = N 2 = ni

(n − N1 − N 2 ) wi Si = ; i 3, 4,..., k . k ∑ wi Si i =3

provided ni < N i for all i = 3, 4,..., k . We continue this process until every ni < N i . In such cases, the formula for minimum variance of yst need to be modified as

Min = Var ( y st ) where

*

(∑ * wi Si ) 2 n*

*

wi Si2

N

denotes the summation over the strata in which ni ≤ N i and n* is the revised total sample

size in the strata.

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 17


Stratified sampling for proportions If the characteristic under study is qualitative is nature, then its values will fall into one of the two mutually exclusive complementary classes C and C’ . Ideally, only two strata are needed in which all the units can be divided depending on whether they belong to C or its complement C’. Thus is difficult to achieve in practice. So the strata are constructed such that the proportion in C varies as much as possible among strata. Let

Pi =

Ai : Proportion of units in C in the ith stratum Ni

pi =

ai : Proportion of units in C in the sample from the ith stratum ni

An estimate of population proportion based on the stratified sampling is k

pst = ∑ i =1

N i pi . N

which is based on the indicator variable 1 Yij =  0 and yst = pst . Here Si2 =

when j th unit belongs to the i th stratum is in C otherwise

Ni Pi Qi Ni − 1

where Qi = 1 − Pi . k

Also

Var ( yst ) = ∑ i =1

So

N i − ni 2 2 wi Si . N i ni

1 Var ( pst ) = 2 N

N i2 ( N i − ni ) PQ i i . ∑ Ni − 1 ni i =1 k

If the finite population correction can be ignored, then k

Var ( pst ) = ∑ wi2 i =1

PQ i i . ni

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 18


If proportional allocation is used for ni , then the variance of pst is Varprop ( pst ) = =

N − n 1 k N i2 PQ i i ∑ N Nn i =1 N i − 1 N −n k ∑ wi PQ i i Nn i =1

and its estimate is k  prop ( p ) = N − n w pi qi . Var ∑ i st Nn i =1 ni − 1

The best choice of ni such that it minimizes the variance for fixed total sample size is N i PQ i i Ni − 1

ni ∝ N i

= N i PQ i i

Thus

N i PQ i i

ni = n

.

k

∑N i =1

i

PQ i i k

= C0 + ∑ Ci ni is Similarly, the best choice of ni such that the variance is minimum for fixed cost C i =1

PQ i i Ci

nN i ni =

k

∑N i =1

i

PQ i i Ci

.

Estimation of the gain in precision due to stratification An obvious question crops up that what is the advantage of stratifying a population in the sense that instead of using SRS, the population is divided into various strata? This is answered by estimating the variance of estimators of population mean under SRS (without stratification) and stratified sampling by evaluating

 SRS ( y ) − Var (y ) Var st .  Var ( y ) st

This gives an idea about the gain in efficiency due to stratification.

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 19


Since

N −n 2 S , so there is a need to express S 2 in terms of Si2 . How to estimate S 2 Nn

VarSRS ( y ) =

based on a stratified sample? Consider S2 ( N − 1)=

k

Ni

∑∑ (Y

=i 1 =j 1

=

k

ij

− Y )2

Ni

∑∑ (Yij − Yi ) + (Yi − Y ) 

2

=i 1 =j 1

=

k

Ni

∑∑ (Y

=i 1 =j 1

=

k

∑ (N

ij

k

− Y ) 2 + ∑ N i (Yi − Y ) 2 =i 1

k

− 1) S + ∑ N i (Yi − Y ) 2

2 i i =i 1 =i 1

 k 2 2 2 − + N S N ( 1) ∑ i i  ∑ wiYi − Y . =i 1 = i 1  =

k

In order to estimate S 2 , we need to estimates of Si2 , Yi 2 and Y 2 . We consider their estimation one by one.

(I) For estimate of Si2 , we have

E ( si2 ) = Si2 So

Sˆi2 = si2 .

(II) For estimate of Yi 2 , we know

Var ( yi ) E ( yi 2 ) − [ E ( yi )]2 = = E ( yi 2 ) − Yi 2 or Yi 2 E ( yi 2 ) − Var ( yi ). =

An unbiased estimate of Yi 2 is 2 (y ) Yˆi= yi2 − Var i

 N −n = yi2 −  i i  N i ni

 2  si . 

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 20


(III) For the estimation of Y 2 , we know

( yst ) E ( yst2 ) − [ E ( yst )]2 Var= = E ( yst2 ) − Y 2 ⇒ Y 2= E ( yst2 ) − Var ( yst ) So an estimate of Y 2 is 2 (y ) Yˆ= yst2 − Var st k  N −n = yst2 − ∑  i i i =1  N i ni

 2 2 wi si . 

Substituting these estimates in the expression (n − 1) S 2 as follows, the estimate of S 2 is obtained as  k 2 2 2 ( 1) N S N − + ∑ i i  ∑ wi Yi − Y  =i 1 = i 1  k k 1 N   ( N i − 1) Sˆi2 + as = Sˆ 2 w iYˆi 2 − Yˆ 2  ∑ ∑  N −1 i 1= N −1  i 1 =  ( N − 1) S 2=

k

 2  N i − ni  2    2 k N i − ni 2 2   N  k 1  k 2 N s w wi si   − + 1   ( )  si   −  yst − ∑ ∑ i ∑ i  yi − = i  N − 1  i 1 = N i ni = i 1  N − 1  i 1   N i ni       k N − ni 2  N  k 1  k 2 2 N s w y y wi (1 − wi ) i si  . 1 ( ) = − + − − ( i ) i ∑ ∑ ∑ i i st  N −1  i 1 N i ni = =  N − 1  i 1 =i 1  =

Thus  SRS ( y ) = N − n Sˆ 2 Var Nn k N i − ni 2  N −n  k N ( N − n)  k 2 2 N s w y y w w si  ( 1) ( ) (1 ) − + − − − ∑ ∑ ∑  i i i i st i i  nN ( N − 1) N ( N − 1)n  i 1 N i ni = =  i 1 =i 1 

= and

N i − ni 2 2 ( y ) = Var wi si . ∑ st i =1 N i ni k

Substituting these expressions in

 SRS ( y ) − Var (y ) Var st ,  Var ( y ) st

the gain in efficiency due to stratification can be obtained. If any other particular allocation is used, then substituting the appropriate ni under that allocation, such gain can be estimated. Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 21


Interpenetrating subsampling Suppose a sample consists of two or more subsamples which are drawn according to the same sampling scheme. The samples are such that each subsample yields an estimate of parameter. Such subsamples are called interpenetrating subsamples.

The subsamples need not necessarily be independent. The assumption of independent subsamples helps in obtaining an unbiased estimate of the variance of the composite estimator. This is even helpful if the sample design is complicated and the expression for variance of the composite estimator is complex.

Let there be g independent interpenetrating subsamples and t1 , t2 ,..., t g be g unbiased estimators of parameter θ where t j ( j = 1, 2,..., g ) is based on jth interpenetrating subsample. Then an unbiased estimator of θ is given by 1 g θˆ = = ∑ t j t , say. g j =1

Then E= (θˆ) E= (t ) θ

and   Var = (θˆ) Var = (t )

g 1 (t j − t ) 2 . ∑ g ( g − 1) j =1

Note that  g  1  (t ) E Var E  ∑ (t j − θ ) 2 − g ( t − θ ) 2  =   g ( g − 1)  j =1  =

 g  1  ∑ Var (t j ) − g Var ( t )  g ( g − 1)  j =1 

=

1 ( g 2 − g )Var ( t =) Var ( t ) g ( g − 1)

If the distribution of each estimator t j is symmetric about θ , then the confidence interval of θ can be obtained by

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 22


1 P  Min(t1 , t2 ,..., t g ) < θ < Max(t1 , t2 ,..., t g )  =1 −   2

g −1

.

Implementation of interpenetrating subsamples in stratified sampling Consider the set up of stratified sampling. Suppose that each stratum interpenetrating subsample.

provides an independent

So based on each stratum, there are L independent interpenetrating

subsamples drawn according to the same sampling scheme.

Let Yˆij (tot ) be an unbiased estimator of the total of jth stratum based on the ith subsample , i = 1,2,...,L; j = 1,2,...,k. An unbiased estimator of the jth stratum total is given by

1 J ˆ ˆ Y j (tot ) = ∑ Yij (tot ) L i =1 and an unbiased estimator of the variance of Yˆj (tot ) is given by L 1 (Yˆij (tot ) − Yˆj (tot ) ) 2 . ∑ L( L − 1) i =1

 (Yˆ ) = Var j ( tot )

Thus an unbiased estimator of population total Ytot is = Yˆtot

k 1 L k ˆ ˆ = Y Yij (tot ) ∑ j (tot ) k=∑∑ =j 1 i 1 =j 1

And an unbiased estimator of its variance is given by k

 (Yˆ ) = Var Var ∑  (Yˆj (tot ) ) tot j =1

=

L k 1 (Yˆij (tot ) − Yˆj (tot ) ) 2 . ∑∑ L( L − 1)=i 1 =j 1

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 23


Post Stratifications Sometimes the stratum to which a unit belongs may be known after the field survey only. For example, the age of persons, their educational qualifications etc. can not be known in advance. In such cases, we adopt the post stratification procedure to increase the precision of the estimates. Note: This topic is to be read after the next module on ratio method of estimation. Since it is related to the startification, so it is given here.

In post stratification, •

draw a sample by simple random sampling from the population and carry out the survey.

After the completion of survey, stratify the sampling units to increase the precision of the estimates.

Assume that the stratum size N i is fairly accurately known. Let

mi : number of sampling units from ith stratum, i = 1,2,...,k. k

∑m i =1

i

= n.

Note that mi is a random variable (and that is why we are not using the symbol ni as earlier).

Assume n is large enough or the stratification is such that the probability that some

mi = 0 is

negligibly small. In case, mi = 0 for some strata, two or more strata can be combined to make the sample size non-zero before evaluating the final estimates.

A post stratified estimator of population mean Y is

y post =

1 k ∑ Ni yi . N i =1

Now

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 24


1  k  E  ∑ N i E ( yi m1 , m2 ,..., mk )  N  i =1  k 1   = E  ∑ N iYi  N  i =1 

E ( y post ) =

=Y = Var ( y post ) E Var ( y post m1 , m2 ,..., mk )  + Var  E ( y post m1 , m2 ,..., mk )   k  1 1   = E  ∑ wi2  −  Si2  + Var (Y )  i =1  mi N i   k   1   1  = wi2  E   −    Si2 (Since Var (Y ) = 0). ∑ i =1   mi   N i  

 1 To find E   mi

 1  − , proceed as follows :  Ni

Consider the estimate of ratio based on ratio method of estimation as n

N

∑y

∑Y

j =j 1 =j 1 n N

ˆ y= R= x

∑x

,

R=

Y = X

∑X

j =j 1 =j 1

j

. j

We know that

N − n RS X2 − S XY E ( Rˆ ) − R = . . Nn X2 Let

1 if j th unit belongs to i th stratum xj =  0 otherwise

and

y j = 1 for all j = 1,2,...,N. Then R, Rˆ and S x2 reduces to

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 25


n

= Rˆ

∑y

j

n = n ∑ xj i j =1 n

j =1 N

= R

∑Y

j

N j =1 = N N ∑Xj i j =1

2 S= x

S xy=

N i2  Ni 2  1 N 2 1  1  2 X NX N N N − = − = − ∑ j   i   i  N − 1  j =1 N 2  N −1  N   N −1   N N 1 N 1  Ni − i 2 = 0.  ∑ X jY j − NXY =   N − 1  j =1 N   N −1 

Using these values in E ( Rˆ ) − R, we have n E ( Rˆ ) −= R E  ni

 N N ( N − n)( N − N i ) .  −= nN i2 ( N − 1)  Ni

Thus 1 E  ni

 1 N ( N − n)( N − N i ) 1 N = + − − n 2 N i2 ( N − 1) Ni  N i nN i N 1 ( N − n) N  = − . 1 + n( N − 1) N i  N i n n 

Replacing mi in place of ni , we obtain

 1  1 ( N − n) N  N 1 E = −  − 1 + ( 1) m N n N N N n n − i i  i  i Now substitute this in the expression of Var ( y post ) as

Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 26


  1  1 2  E   −  Si i =1   mi  N i  k  N −n N  N 1  = ∑ wi2 Si2  −  . 1 + i =1  ( N − 1)n N i  nN i n   = Var ( y post )

= =

k

∑w

2 i

1 1  N −n k 2 2  1  −  wi Si  1 + ∑ n( N − 1) i =1  wi  nwi n    1 N −n k wi Si2  n − 1 +  ∑ 2 n ( N − 1) i =1 wi   N −n k ∑ (nwi + 1 − wi )Si2 n 2 ( N − 1) i =1

=

N −n k N −n k 2 w S + (1 − wi ) Si2 . ∑ ∑ i i 2 n( N − 1) i 1 = n ( N − 1) i 1 = =

Assuming N − 1 ≈ N . N −n n N −n n 2 w S + (1 − wi ) Si2 ∑ ∑ i i 2 Nn i 1 = n N i1 = N −n n V prop ( yst ) + = ∑ (1 − wi )Si2 . Nn 2 i =1

V ( y post= )

The second term is the contribution in the variance of

y post due to mi ' s not being proportionately

distributed. If Si2 ≈ S w2 , say for all i, then the last term in the expression is k N −n k N −n 2 2 − w = S S k − = wi 1) (1 ) ( 1) (Since ∑ ∑ i w w Nn 2 i 1 = Nn 2 i 1

 k − 1  N − n  2 =   Sw  n  Nn  k −1 Var ( yst ). = n The increase in the variance over Varprop ( yst ) is small if the average sample size n =

n per stratum is 2

reasonably large.

Thus a post stratification with a large sample produces an estimator which is almost as precise as an estimator in the stratified sampling with proportional allocation. Sampling Theory| Chapter 4 | Stratified Sampling | Shalabh, IIT Kanpur

Page 27


Chapter 5 Ratio and Product Methods of Estimation An important objective in any statistical estimation procedure is to obtain the estimators of parameters of interest with more precision. It is also well understood that incorporation of more information in the estimation procedure yields better estimators, provided the information is valid and proper. Use of such auxiliary information is made through the ratio method of estimation to obtain an improved estimator of population mean. In ratio method of estimation, auxiliary information on a variable is available which is linearly related to the variable under study and is utilized to estimate the population mean.

Let Y be the variable under study and X be any auxiliary variable which is correlated with Y . The observations xi on X and yi on Y are obtained for each sampling unit. The population mean X of X (or equivalently the population total X tot ) must be known. For example, xi ' s may be the values of

yi ' s from -

some earlier completed census,

-

some earlier surveys,

-

some characteristic on which it is easy to obtain information etc.

For example, if yi is the quantity of fruits produced in the ith plot, then xi can be the area of ith plot or the production of fruit in the same plot in previous year.

Let

( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn ) be the random sample of size n on paired

variable (X, Y) drawn,

preferably by SRSWOR, from a population of size N. The ratio estimate of population mean Y is y ˆ YˆR = X RX = x N

assuming the population mean X is known. The ratio estimator of population total Ytot = ∑ Yi is i =1

y YˆR (tot ) = tot X tot xtot N

X tot = ∑ X i is the population total of X which is assumed to be known,

where

i =1

n

ytot = ∑ yi

and

i =1

n

xtot = ∑ xi are the sample totals of Y and X respectively. The YˆR (tot ) can be equivalently expressed as i =1

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 1


y YˆR (tot ) = X tot x ˆ . = RX tot Looking at the structure of ratio estimators, note that the ratio method estimates the relative change

that occurred after ( xi , yi ) were observed. It is clear that if the variation among the values of is nearly same for all i = 1,2,...,n then values of

Ytot X tot

yi and xi

ytot y (or equivalently ) vary little from sample to xtot x

sample and the ratio estimate will be of high precision.

Bias and mean squared error of ratio estimator: Assume that the random sample ( xi , yi ), i = 1, 2,..., n is drawn by SRSWOR and population mean X is known. Then 1 E (YˆR ) = N   n

N   n

yi

∑x i =1

X

i

≠ Y (in general).  y2  y Moreover, it is difficult to find the exact expression for E   and E  2  . So we approximate them x x  and proceed as follows: Let y −Y ⇒ y = (1 + ε o )Y Y x−X ⇒ x = (1 + ε1 ) X . ε1 = X

ε0 =

Since SRSWOR is being followed , so E (ε 0 ) = 0 E (ε1 ) = 0 1 E ( y − Y )2 2 Y 1 N −n 2 SY = 2 Y Nn f SY2 = n Y2 f = CY2 n

E (ε 02 ) =

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 2


S 1 N N −n 2 where f = , SY = (Yi − Y ) 2 and CY = Y is the coefficient of variation related to Y. ∑ N N − 1 i =1 Y

Similarly, f 2 CX n 1 E (ε 0ε= E[( x − X )( y − Y )] 1) XY 1 N −n 1 N = ∑ ( X i − X )(Yi − Y ) XY Nn N − 1 i =1 1 f = . S XY XY n 1 f = ρ S X SY XY n f S S = ρ X Y n X Y f = ρ C X CY n E (ε12 ) =

where C X =

SX is the coefficient of variation related to X and ρ is the population correlation coefficient X

between X and Y.

Writing YˆR in terms of ε ' s, we get y YˆR = X x (1 + ε 0 )Y X = (1 + ε1 ) X (1 ε 0 )(1 + ε1 ) −1Y =+

Assuming ε1 < 1, the term (1 + ε1 ) −1 may be expanded as an infinite series and it would be convergent. Such assumption means that

x−X < 1, i.e., possible estimate x of population mean X lies between 0 X

and 2 X , This is likely to hold true if the variation in x is not large. In order to ensures that variation in

x is small, assume that the sample size n is fairly large. With this assumption, YˆR = Y (1 + ε 0 )(1 − ε1 + ε12 − ...) = Y (1 + ε 0 − ε1 + ε12 − ε1ε 0 + ...). So the estimation error of YˆR is

YˆR − Y= Y (ε 0 − ε1 + ε12 − ε1ε 0 + ...). Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 3


In case, when sample size is large, then

ε 0 and ε1 are likely to be small quantities and so the terms

involving second and higher powers of ε 0 and ε1 would be negligibly small. In such a case YˆR − Y  Y (ε 0 − ε1 ) and E (YˆR − Y ) = 0.

So the ratio estimator is an unbiased estimator of population mean upto the first order of approximation. If we assume that only terms of ε 0 and ε1 involving powers more than two are negligibly small (which is more realistic than assuming that powers more than one are negligibly small), then the estimation error of YˆR can be approximated as

YˆR − Y  Y (ε 0 − ε1 + ε12 − ε1ε 0 ) Then the bias of YˆR is given by f f   E (YˆR − Y= ) Y  0 − 0 + C X2 − ρ C X C y  n n   f Bias (Yˆ )= E (YˆR − Y )= YC X (C X − ρ CY ). n upto the second order of approximation. The bias generally decreases as the sample size grows large.

The bias of YˆR is zero, i.e., Bias (YˆR ) = 0 if E (ε12 − ε 0ε1 ) = 0 Var ( x ) Cov( x , y ) 0 − = X2 XY  X 1  or if 2 Var ( x ) − Cov( x , y )  = 0 X  Y  or if

Cov( x , y ) = 0 (assuming X ≠ 0) R Y Cov( x , y ) R = or if = X Var ( x ) or if Var ( x ) −

which is satisfied when the regression line of Y on X passes through origin.

Now, to find the mean squared error, consider MSE (= YˆR ) E (YˆR − Y ) 2 = E Y 2 (ε 0 − ε1 + ε12 − ε1ε 0 + ...) 2   E Y 2 (ε 02 + ε12 − 2ε 0ε1 )  .

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 4


Under the assumption ε1 <1 and the terms of ε 0 and ε1 involving powers more than two are negligible small, ˆ ) Y 2  f C 2 + f C 2 − 2 f ρC C  MSE (Y= R X Y  n X n Y n  2 Y f C X2 + CY2 − 2 ρ C X C y  = n 

up to the second order of approximation.

Efficiency of ratio estimator in comparison to SRSWOR Ratio estimator is better estimate of Y than sample mean based on SRSWOR if MSE (YˆR ) < VarSRS ( y ) f f (C X2 + CY2 − 2 ρ C X CY ) < Y 2 CY2 n n 2 or if C X − 2 ρ C X CY < 0 or if Y 2

or if ρ >

1 CX . 2 CY

Thus ratio estimator is more efficient than the sample mean based on SRSWOR if

ρ>

1 CX 2 CY

and ρ < −

if R > 0

1 CX 2 CY

if R < 0.

It is clear from this expression that the success of ratio estimator depends on how close is the auxiliary information to the variable under study.

Upper limit of ratio estimator: Consider ˆ ) − E ( Rˆ ) E ( x ) Cov(= Rˆ , x ) E ( Rx y = E x

 x  − E ( Rˆ ) E ( x )  ˆ = Y − E ( R) X .

Thus Y Cov( Rˆ , x ) E ( Rˆ= ) − X X Cov( Rˆ , x ) = R− X

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 5


Bias= ( Rˆ ) E ( Rˆ ) − R Cov( Rˆ , x ) = − X = −

ρ Rˆ , x σ Rˆσ x X

Rˆ and x ; σ Rˆ and σ x are the standard errors of

where ρ Rˆ , x is the correlation between

Rˆ and x

respectively. Thus Bias ( Rˆ ) = ≤

− ρ Rˆ , x σ Rˆσ x

σ Rˆσ x

X

X

Rˆ , x

)

≤1 .

assuming X > 0. Thus Bias ( Rˆ )

σ Rˆ or

Bias ( Rˆ )

σ Rˆ

σx X

≤ CX

where C X is the coefficient of variation of X. If C X < 0.1, then the bias in R̂ may be safely regarded as negligible in relation to standard error of Rˆ .

Alternative form of MSE (YˆR ) Consider N

N

) ∑ (Y − Y ) + (Y − RX )  ∑ (Y − RX =

2

2

i i i 1 =i 1

=

i

i

2

N

Y RX ) ∑ (Yi − Y ) + R( X i − X )  (Using = i =1

=

N

N

N

∑ (Yi − Y )2 + R 2 ∑ ( X i − X )2 − 2 R∑ ( X i − X )(Yi − Y )

=i 1 =i 1

=i 1

N

1 SY2 + R 2 S X2 − 2 RS XY . ∑ (Yi − RX i )2 = N − 1 i =1 The MSE of YˆR has already been derived which is now expressed again as follows:

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 6


fY 2 2 (CY + C X2 − 2 ρ C X CY ) n S  f 2  SY2 S X2 Y  2 + 2 − 2 XY  = n X XY  Y

MSE (= YˆR )

 f Y2  2 Y2 2 Y S + 2 S X − 2 S XY  2  Y nY  X X  f = ( SY2 + R 2 S X2 − 2 RS XY ) n N f (Yi − RX i ) 2 = ∑ n( N − 1) i =1 =

=

N −n N (Yi − RX i ) 2 . ∑ nN ( N − 1) i =1

Estimate of MSE (YˆR ) Yi − RX i , i = 1, 2,.., N then MSE of YˆR can be expressed as Let U i = f 1 N (U i − U ) 2 ∑ n N − 1 i =1 f = SU2 n 1 N where SU2 (U i − U ) 2 . = ∑ N − 1 i =1

MSE (YˆR ) =

Based on this, a natural estimator of MSE (YˆR ) is  (Yˆ ) = f s 2 MSE R u n 1 n where su2 (ui − u ) 2 = ∑ n − 1 i =1 1 n = ∑ ( yi − y ) − Rˆ ( xi − x )  n − 1 i =1  ˆ , s 2 Rˆ 2 s 2 − 2 Rs =+ y

x

2

xy

y Rˆ = . x

Based on the expression

MSE (YˆR ) =

N f (Yi − RX i ) 2 , ∑ n( N − 1) i =1

an estimate of MSE (YˆR ) is

 (Yˆ ) MSE = R

n f ˆ )2 ∑ ( yi − Rx i n(n − 1) i =1 . f 2 ˆ2 2 ˆ ). = ( s y + R sx − 2 Rs xy n

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 7


Confidence interval of ratio estimator If the sample is large so that the normal approximation is applicable, then the 100(1- α )% confidence intervals of Y and R are ˆ ˆ  ˆ  ˆ   YR − Z α Var (YR ), YR + Z α Var (YR )  2 2   and ˆ   R − Z α Var ( Rˆ ), 2 

 ( Rˆ )  Rˆ + Z α Var  2 

respectively where Z α

is the normal derivate to be chosen for given value of confidence coefficient

2

(1 − α ). If

( x , y ) follows a bivariate normal distributions, then

( y − Rx ) is normally distributed. If SRS is

followed for drawing the sample, then assuming R is known, the statistic y − Rx N −n 2 ( s y + R 2 sx2 − 2 R sxy ) Nn

is approximately N(0,1).

This can also be used for finding confidence limits, see Cochran (1977, Chapter 6, page 156) for more details.

Conditions under which the ratio estimate is optimum The ratio estimate YˆR is the best linear unbiased estimator of Y when (i)

the relationship between yi and xi is linear passing through origin., i.e.

= yi β xi + ei , where ei ' s are independent with E (ei / xi ) = 0 and β is the slope parameter (ii)

this line is proportional to xi , i.e.

Var ( yi= / xi ) E= (ei2 ) Cxi where C is constant.

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 8


n

yi β xi + ei and  i ‘s are constant. Proof. Consider the linear estimate of β because βˆ = ∑  i yi where = i =1

y ) β X + E (ei / xi ). Then β̂ is unbiased if Y = β X as E (= If n sample values of xi are kept fixed and then in repeated sampling n

E ( βˆ ) = ∑  i xi β i =1

n

n 2 i i i =i 1 =i 1

 Var ( y / x ) = ∑

and Var ( βˆ ) =

C ∑  2i xi

n

when ∑  i xi 1. = So E ( βˆ ) β= i =1

Consider the minimization of

Var ( yi / xi ) subject to the condition for being the unbiased estimator

n

∑  x = 1 using Lagrangian function. Thus the Lagrangian function with Lagrangian multiplier is i i

i =1

n

ϕ = Var ( yi / xi ) − 2λ (∑  i xi − 1.) i =1

n

n

=C ∑ 12 xi − 2λ (∑  i xi − 1).

=i 1 =i 1

Now ∂ϕ 0  i xi = 1, 2,.., n λ xi , i = =⇒ ∂ i n ∂ϕ = 0 ⇒ ∑  i xi = 1 ∂λ i =1 n

Using

∑ x i =1

n

or

∑λx i =1

or λ =

i

i i

=1

=1

1 . nx

Thus i =

1 nx n

= βˆ and so

∑y

i

y . = nx x

i =1

Thus β̂ is not only superior to y but also the best in the class of linear and unbiased estimators.

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur

Page 9


Alternative approach: This result can alternatively be derived as follows: y Y The ratio estimator Rˆ = is the best linear unbiased estimator of R = if the following two x X

conditions hold: For fixed x, E ( y ) = β x, i.e., the line of regression of y on x is a straight line passing

(i)

through the origin. For fixed x , Var ( x) ∝ x, i.e., Var ( x) = λ x where λ is constant of proportionality.

(ii)

Proof: Let y (= = y1) , y2 ,..., yn ) ' and x ( x1 , x2 ,..., xn ) ' be two vectors of observations on

y ' s and x ' s. Hence for any fixed x , E ( y) = β x Var ( y ) = Ω = λ diag( x1 , x2 ,..., xn ) where diag( x1 , x2 ,..., xn ) is the diagonal matrix with x1 , x2 ,..., xn as the diagonal elements. The best linear unbiased estimator of β is obtained by minimizing

S 2 =( y − β x ) ' Ω −1 ( y − β x ) n

=∑ i =1

( yi − β xi ) 2 . λ xi

Solving ∂S 2 =0 ∂β n

⇒ ∑ ( yi − βˆ xi ) = 0 i =1

or

ˆ β=

y ˆ = R. x

ˆ = Yˆ is the best Thus R̂ is the best linear unbiased estimator of R . Consequently, RX R linear unbiased estimator of Y .

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 10


Ratio estimator in stratified sampling Suppose a population of size N is divided into k strata. The objective is to estimate the population mean Y using ratio method of estimation.

In such situation, a random sample of size ni is being drawn from the ith strata of size N i on variable under study Y and auxiliary variable X using SRSWOR. Let

yij : jth observation on Y from ith strata xij : jth observation on X from ith strata i =1, 2,…,k; j = 1, 2,..., ni .

An estimator of Y

based on the philosophy of stratified sampling can be derived in following two

possible ways:

1. Separate ratio estimator -

Employ first the ratio method of estimation separately in each strata and obtain ratio estimator YˆRi i = 1, 2,.., k assuming the stratum mean X i to be known.

-

Then combine all the estimates using weighted arithmetic mean.

This gives the separate ratio estimator as k N Yˆ i Ri ˆ YRs = ∑ N i =1 k

= ∑ wiYˆRi i =1 k

= ∑ wi i =1

where yi =

1 ni

1 xi = ni

yi Xi xi

ni

∑y

ij

: sample mean of Y from ith strata

ij

: sample mean of X from ith strata

j =1 ni

∑x j =1

1 Xi = Ni

Ni

∑x j =1

ij

: mean of all the X units in ith stratum

No assumption is made that the true ratio remains constant from stratum to stratum. It depends on information on each X i . Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 11


2. Combined ratio estimator: - Find first the stratum mean of Y ' s and X ' s as k

yst = ∑ wi yi i =1 k

xst = ∑ wi xi . i =1

- Then define the combined ratio estimator as

y YˆRc = st X xst N

where X is the population mean of X based on all the N = ∑ N i units. It does not depend on individual i =1

stratum units. It does not depend on information on each X i but only on X .

Properties of separate ratio estimator: k

k

i =1

i =1

Note that there is an analogy between Y = ∑ wiYi and YRs = ∑ wY i Ri . y We already have derived the approximate bias of YˆR = X as x Yf E (YˆR ) = Y+ (Cx2 − ρ C X CY ) . n

So for YˆRi , we can write f E (YˆRi ) = Yi + Yi i (Cix2 − ρi CiX CiY ) ni 1 Ni 1 Ni , = y X xij ∑ ij i N ∑ Ni j 1 = = i j 1

where Yi =

2 N i − ni 2 Siy Six2 2 , Ciy = , , = = fi C ix Ni Yi 2 X i2

1 Ni 1 Ni 2 2 ( ) , ( X ij − X i ) 2 , − = Y Y S ∑ ∑ ij i ix 1 1 − − N N = j 1= j 1 i i 2 S= iy

ρi : correlation coefficient between the observation on X and Y in ith stratum Cix : coefficient of variation of X values in ith sample. Thus k

E (YˆRs ) = ∑ wi E (YˆRi ) i =1

k   f = ∑ wi Yi + Yi i (Cix2 − ρi Cix Ciy  ni i =1   k wY f = Y + ∑ i i i (Cix2 − ρi Cix Ciy ) ni i =1

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 12


(YˆRs ) E (YˆRs ) − Y Bias= =

k

∑ i =1

wiYi fi Cix (Cix − ρi Ciy ) ni

upto the second order of approximation. Assuming finite population correction to be approximately 1, ni = n / k and Cix , Ciy and ρi are the same for all the strata as Cx , C y and ρ respectively, we have k 2 (Cx − ρ Cx C y ) . n Thus the bias is negligible when the sample size within each stratum should be sufficiently large and Bias= (YˆRs )

YRs is unbiased when Cix = ρ Ciy .

Now we derive the approximate MSE of YˆRs . We already have derived the MSE of YˆR earlier as Y2f (YˆR ) (C X2 − CY2 − 2 ρ Cx C y ) MSE= n N f (Yi − RX i ) 2 = ∑ n( N − 1) i =1

where R =

Y . X

Thus the MSE of ratio estimate upto the second order of approximation based on ith stratum is (YˆRi ) = MSE =

fi (CiX2 − CiY2 − 2 ρi CiX CiY ) ni ( N i − 1) Ni fi (Yij − Ri X ij ) 2 ∑ ni ( N i − 1) j =1

and so k

MSE (YˆRs ) = ∑ wi2 MSE (YˆRi ) i =1

 wi2 fi 2 2  Yi (CiX + CiY2 − 2 ρi CiX CiY )  ∑  i =1  ni  N k  i  fi (Yij − Ri X ij ) 2  = ∑  wi2 ∑ =i 1 =  ni ( N i − 1) j 1  =

k

2 An estimate of MSE (YˆRs ) can be found by substituting the unbiased estimators of SiX2 , SiY2 and SiXY as

six2 , siy2 and sixy , respectively for ith stratum and Ri = Yi / X i can be estimated by ri = yi / xi .

 (YˆRs ) = MSE

 wi2 fi 2  ( siy + ri 2 six2 − 2ri sixy ) . ∑  i =1  ni  k

Also  wi2 fi ni  ( yij − ri xij ) 2   ∑ ∑ =i 1 =  ni (ni − 1) j 1 

 (Yˆ ) = MSE Rs

k

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 13


Properties of combined ratio estimator: Here k

= YˆRC

∑w y i

i

yst = X = X Rˆc X . xst ∑ wi xi i =1 k

i =1

It is difficult to find the exact expression of bias and mean squared error of

YˆRc , so we find their

approximate expressions. Define yst − Y Y x −X ε 2 = st X E (ε1 ) = 0

ε1 =

E (ε 2 ) = 0 k N i − ni wi2 SiY2 fi wi2 SiY2 = ∑ Nn Y2 ∑ Y2 =i 1 = i 1 ni i i

(ε12 ) E=

k

k

E (ε 22 ) = ∑ i =1

 f SY2 f 2  ˆ , E= 2 Recall that in case of ( ) = ε Y CY   1 R n Y2 n  

fi wi2 SiX2 ni X 2 k

E (ε1ε 2 ) = ∑ wi2 i =1

f i SiXY . ni XY

Thus assuming ε 2 < 1, (1 + ε1 )Y YˆRC = X (1 + ε 2 ) X = Y (1 + ε1 )(1 − ε 2 + ε 22 − ...) = Y (1 + ε1 − ε 2 − ε1ε 2 + ε 22 − ...)

Retaining the terms upto order two due to same reason as in the case of YˆR ,

YˆRC  Y (1 + ε1 − ε 2 − ε1ε 2 + ε 22 ) Yˆ − Y= Y (ε − ε − ε ε + ε 2 ) RC

1

2

1 2

2

The approximate bias of YˆRc upto second order of approximation is

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 14


Bias (= YˆRc ) E (YˆRc − Y )  YE (ε1 − ε 2 − ε1ε 2 + ε 22 )

= Y 0 − 0 − E ( ε1ε 2 ) + E ( ε 22 )  k   S 2 S  f = Y ∑  i wi2  iX2 − iXY   XY   i =1  ni X k   S 2 ρ S S  f = Y ∑  i wi2  iX2 − i iX iY   XY   i =1  ni X Y k  fi 2  SiX ρi SiY   = ∑  wi SiX  X − Y  X i =1  ni   k f  = R ∑  i wi2 SiX ( CiX − ρi CiY )  i =1  ni 

where R =

Y th , ρi is the correlation coefficient between the observations on Y and X in the i stratum, X

Cix and Ciy are the coefficients of variation of X and Y respectively in the ith stratum.

The mean squared error upto second order of approximation is

MSE (= YˆRc ) E (YˆRc − Y ) 2  Y 2 E (ε1 − ε 2 − ε1ε 2 + ε 2 ) 2  Y 2 E (ε12 + ε 22 − 2ε1ε 2 )  fi 2  SiX2 SiY2 2 SiXY   = Y ∑  wi  2 + 2 −  Y XY   i =1  ni X k   S 2 S 2 2ρ S S  f = Y 2 ∑  i wi2  iX2 + iY2 − i iX iY   Y X Y  i =1  ni X 2

k

 fi 2  Y 2 2  Y 2  wi  2 SiX + SiY − 2 ρi SiX SiY   ∑ X i =1  ni X  k f  = ∑  i wi2 ( R 2 SiX2 + SiY2 − 2 ρi RSiX SiY ) . i =1  ni 

=

Y2 Y2

k

An estimate of MSE (YRc ) can be obtained by replacing SiX2 , SiY2 and SiXY by their unbiased estimators six2 , siy2 and sixy respectively whereas R =

Y X

is replaced by r =

y . Thus the following estimate is x

obtained:  (Y ) = MSE Rc

 wi2 fi 2 2  r six + siy2 − 2rsixy )  ( ∑  i =1  ni  k

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 15


Comparison of combined and separate ratio estimators An obvious question arises that which of the estimates YˆRs or YˆRc is better. So we compare their MSEs. Note that the only difference in the term of these MSEs is due to the form of ratio estimate. It is yi − Ri = in MSE (YˆRs ) xi Y − R = in MSE (YˆRc ). X

Thus = ∆ MSE (YˆRc ) − MSE (YˆRs )  wi2 fi  ( R 2 − Ri2 ) SiX2 + 2( Ri − R) ρi SiX SiY   ∑  i =1  ni  2 k w f  = ∑  i i ( R − Ri ) 2 SiX2 + 2( R − Ri )( Ri SiX2 − ρi SiX SiY )  . i =1  ni 

=

k

The difference ∆ depends on The magnitude of the difference between the strata ratios ( Ri ) and whole population ratio

(i)

(R). The value of ( Ri Six2 − ρi Six Siy ) is usually small and vanishes when the regression line of y on

(ii)

x is linear and passes through origin within each stratum. See as follows:

Ri Six2 − ρi Six Siy = 0 Ri =

ρi Six Siy Six2

which is the estimator of the slope parameter in the regression of y on x in the ith stratum. In such a case

but

MSE (YˆRc ) > MSE (YˆRs ) Bias (Yˆ ) < Bias (Yˆ ). Rc

Rs

So unless Ri varies considerably, the use of YˆRc would provide an estimate of Y with negligible bias and precision as good as YˆRs . •

If Ri ≠ R, YˆRs can be more precise but bias may be large.

If

Ri  R, YˆRc can be as precise as YˆRs but its bias will be small. It also does not require

knowledge of X 1 , X 2 ,..., X k .

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 16


Ratio estimators with reduced bias: The ratio type estimators that are unbiased or have smaller bias than Rˆ , YˆR or YˆRc (tot ) are useful in sample surveys . There are several approaches to derive such estimators. We consider here two such approaches:

1. Unbiased ratio – type estimators: Under SRS, the ratio estimator has form

Y X to estimate the population mean Y . As an alternative to x

this, we consider following as an estimator of population mean 1 n Y YˆRo = ∑  i n i =1  X i

= Ri Let

 X . 

Yi , i 1, 2,.., N , = Xi

then 1 n YˆR 0 = ∑ Ri X n i =1 = rX where 1 n ∑ Ri n i =1 (YˆR 0 ) E (YˆR 0 ) − Y Bias= r=

= E (rX ) − Y = E (r ) X − Y . Since 1 n 1 N Ri ) ∑(N ∑ n i 1= = i 1 1 n = ∑R n i =1

E (r ) =

= R. So Bias (YˆR= RX − Y . 0)

Using the result that under SRSWOR, Cov( x , y ) =

N −n S XY , it also follows that Nn

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 17


N −n 1 N ∑ ( Ri − R )( X i − X ) Nn N − 1 i =1 N N −n 1 (∑ Ri X i − NRX ) = Nn N − 1 i =1 N Y N −n 1 (∑ i X i − NRX ) = n N − 1 i =1 X i Cov(r , x ) =

N −n 1 ( NY − NRX ) Nn N − 1 N −n 1 [− Bias (YˆR 0 )]. = n N −1

=

Thus using the result that in SRSWOR, Cov( x , y ) =

N −n N −n S XY , and therefore Cov(r , x ) = S RX , we Nn Nn

have n( N − 1) Bias (YˆRo ) = − Cov(r , x ) N −n n( N − 1) N − n S RX = − N − n Nn  N −1  = −  S RX  N  1 N S RX where = ∑ ( Ri − R )( X i − X ). N − 1 i =1

The following result helps in obtaining an unbiased estimator of population mean:. Since under SRSWOR set up,

E ( sxy ) = S xy 1 n ∑ ( xi − x )( yi − y ), n − 1 i =1 1 N = S xy ∑ ( X i − X )(Yi − Y ). N − 1 i =1

where= sxy

−( N − 1) S RX is obtained as follows: So an unbiased estimator of the bias in Bias (YˆR 0 ) =  (Yˆ ) = − ( N − 1) s Bias R0 rx N N −1 n = − ∑ (ri − r )( xi − x ) N (n − 1) i =1 N −1 n = − (∑ ri xi − n r x ) N (n − 1) i =1  N − 1  n yi = −  ∑ xi − nr x  N (n − 1)  i =1 xi  N −1 (ny − nr x ). = − N (n − 1)

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 18


So

( )

 n( N − 1)  (Yˆ ) = − Bias E YˆR 0 − Y = ( y − r x ). R0 N (n − 1) Thus E YˆR 0 − Bias (YˆR 0 )  = Y     n( N − 1) or E YˆR 0 + ( y − r x ) = Y. N (n − 1)   Thus n( N − 1) n( N − 1) YˆR 0 + (y − r x) = rX + (y − r x) N (n − 1) N (n − 1)

is an unbiased estimator of population mean.

2. Jackknife method for obtaining a ratio estimate with lower bias Jackknife method is used to get rid of the term of order 1/n from the bias of an estimator. Suppose the

E ( Rˆ ) can be expanded after ignoring finite population correction as a a E ( Rˆ ) = R + 1 + 22 + ... n n

Let n = mg and the sample is divided at random into g groups, each of size m. Then ga ga E ( gRˆ ) =gR + 1 + 2 2 2 + ... gm g m a a = gR + 1 + 2 2 + ... m gm * y Let Rˆi* = ∑ * i where the ∑ * denotes the summation over all values of the sample except the ith ∑ xi

group. So Rˆi* is based on a simple random sample of size m(g - 1), so we can express a1 a + 2 2 2 + ... E ( Rˆi* ) = R+ m( g − 1) m ( g − 1) or E ( g − 1) Rˆi*  = ( g − 1) R +

a1 a + 2 2 + ... m m ( g − 1)

Thus E  gRˆ − ( g − 1) Rˆi*  =R −

a2 + ... g ( g − 1)m 2

or a g E  gRˆ − ( g − 1) Rˆi*  =R − 22 + ... n g −1

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 19


1 Hence the bias of  gRˆ − ( g − 1) Rˆi*  is of order 2 . n

Now g estimates of this form can be obtained, one estimator for each group. Then the jackknife or Quenouille’s estimator is the average of these of estimators g

RˆQ = gRˆ − ( g − 1)

∑ Rˆ i =1

g

i

.

Product method of estimation: 1 C The ratio estimator is more efficient than the sample mean under SRSWOR if ρ > . x , if R > 0, 2 Cy

which is usually the case. This shows that if auxiliary information is such that

ρ<−

1 Cx , then we 2 Cy

cannot use the ratio method of estimation to improve the sample mean as an estimator of the population mean. So there is a need of another type of estimator which also makes use of information on auxiliary variable X. Product estimator is an attempt in this direction. The product estimator of the population mean Y is defined as yx YˆP = . X

assuming the population mean X to be known We now derive the bias and variance of Yˆp . Let ε 0 =

y −Y x−X = , ε1 , Y X

(i) Bias of

Yˆp .

We write Yˆp as yx Yˆp = =Y (1 + ε 0 )(1 + ε1 ) X = Y (1 + ε 0 + ε1 + ε 0ε1 ).

Taking expectation we obtain bias of Yˆp as 1 f Bias (Yˆp ) E= (ε 0ε1 ) Cov( y , x ) S xy , = X nX

which shows that bias of Yˆp decreases as n increases. Bias of Yˆp can be estimated by  (Yˆ ) = f s . Bias p xy nX

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 20


(ii) MSE of Yˆp : Writing Yˆp is terms of ε 0 and ε1 , we find that the mean squared error of the product estimator Yˆp upto second order of approximation is given by

MSE (= Yˆp ) E (Yˆp − Y ) 2 = Y 2 E (ε1 + ε 0 + ε1ε 2 ) 2 ≈ Y 2 E (ε12 + ε 02 + 2ε1ε 2 ). Here terms in (ε1 , ε 0 ) of degrees greater than two are assumed to be negligible. Using the expected values we find that f  SY2 + R 2 S X2 + 2 RS XY  . MSE (Yˆp ) = n

(iii) Estimation of MSE of Yˆp The mean squared error of Yˆp can be estimated by  (Yˆ ) = MSE p

f 2 2 2  s y + r sx + 2rsxy  n

where r = y / x .

(iv) Comparison with SRSWOR: From the variances of the sample mean under SRSWOR and the product estimator, we obtain f Var ( y ) SRS − MSE (Yˆp ) = − RS X (2 ρ SY + RS X ), n

where Var ( y ) SRS =

f 2 SY which shows that Yˆp is more efficient than the simple mean y for n

ρ<−

1 Cx if R > 0 2 Cy

ρ >−

1 Cx if R < 0. 2 Cy

and for

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 21


Multivariate Ratio Estimator Let y be the study variable and X 1 , X 2 ,..., X p be p auxiliary variables assumed to be corrected with y . Further it is assumed that X 1 , X 2 ,..., X p are independent. Let Y , X 1 , X 2 ,..., X p be the population means of the variables y , X 1 , X 2 ,..., X p . We assume that a SRSWOR of size n is selected from the population of N units. The following notations will be used. Si2 : the population mean sum of squares for the variate X i , si2 : the sample mean sum of squares for the variate X i , S02 : the population mean sum of squares for the study variable y, s02 : the sample mean sum of squares for the study variable y, Ci =

Si : coefficient of variation of the variate X i , Xi

S0 : coefficient of variation of the variate y, Y S ρi = iy : coefficient of correlation between y and X i , Si S 0

C0 =

y YˆRi = :ratio estimator of Y , based on X i Xi

where i = 1, 2,..., p. Then the multivariate ratio estimator of Y is given as follows.

= YˆMR

p

p

wiYˆRi , ∑ wi 1 ∑=

=i 1 =i 1 p

= y ∑ wi i =1

Xi . xi

(i) Bias of the multivariate ratio estimator: The approximate bias of YˆRi upto the second order of approximation is Bias (YˆRi ) =

f Y (Ci2 − ρi Ci C0 ). n

The bias of YˆMR is obtained as Bias (YˆMR ) = =

p

∑w i =1

Yf n

i

Yf (Ci2 − ρi Ci C0 ) n

p

∑ w C (C − ρ C ). i =1

i

i

i

i

0

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 22


(ii) Variance of the multivariate ratio estimator: The variance of YˆRi upto the second order of approximation is given by

Var (= YˆRi )

f 2 2 Y (C0 + Ci2 − 2 ρi C0Ci ). n

The variance of YˆMR upto the second order of approximation is obtained as Var (YˆMR ) =

f 2 p 2 2 Y ∑ wi (C0 + Ci2 − 2 ρi C0Ci ). n i =1

Sampling Theory| Chapter 5 | Ratio Product Method Estimation | Shalabh, IIT Kanpur Page 23


Chapter 6 Regression Method of Estimation The ratio method of estimation uses the auxiliary information which is correlated with the study variable to improve the precision which results in the improved estimators when the regression of Y on X is linear and passes through origin. When the regression of Y on X is linear, it is not necessary that the line should always pass through origin. Under such conditions, it is more appropriate to use the regression type estimator to estimate the population means.

In ratio method, the conventional estimator sample mean y was improved by multiplying it by a a factor

X where x is an unbiased estimator of population mean X which is chosen as population x

mean of auxiliary variable. Now we consider another idea based on difference.

0. Consider an estimator ( x − X ) for which E ( x − X ) = Consider an improved estimator of Y as Yˆ * =+ y µ(x − X )

which is an unbiased estimator of Y and µ is any constant. Now find µ such that the Var (Yˆ * ) is minimum

Var (Yˆ *) = Var ( y ) + µ 2 Var ( x ) + 2 µ Cov( x , y ) ∂Var (Y * ) =0 ∂µ Cov( x , y ) ⇒µ= − Var ( x ) N −n S XY Nn = − N −n 2 SX Nn S = − XY2 SX 1 N 1 N ( X i − X )(Yi − Y ), = S X2 ∑ ∑ ( X i − X ). N −1 i 1= N −1 i 1 = where= S XY

Consider a linear regression model = y xβ + e where y is the dependent variable, x is the independent variable and e is the random error component which takes care of the difference arising due to lack of exact relationship between x and y.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 1


Note that the value of regression coefficient β in a linear regression model = y xβ + e of y on x n

obtained by minimizing

∑e i =1

2 i

n is β = based on n data sets ( xi , yi ), i = 1, 2,..,

Cov( x, y ) S xy = . Thus Var ( x) S x2

the optimum value of µ is same as the regression coefficient of y on x with a negative sign, i.e.,

µ = −β . So the estimator Yˆ * with optimum value of µ is

Yˆreg = y + β (X − x) which is the regression estimator of Y and the procedure of estimation is called as the regression method of estimation. The variance of Yˆreg is

(Yˆreg ) V ( y )[1 − ρ 2 ( x , y )] Var= where ρ ( x , y ) is the correlation coefficient between x and y . So Yˆreg would be efficient if x and y are highly correlated. The estimator Yˆreg

is more efficient than Y if

ρ ( x , y ) ≠ 0 which generally

holds.

Regression estimates with preassigned β : If value of β is known as β 0 (say), then the regression estimator is

Yˆreg = y + β0 ( X − x ) .

Bias of Yˆreg : Now, assuming that the random sample ( xi , yi ), i = 1, 2,.., n is drawn by SRSWOR, E (Yˆreg ) =E ( y ) + β 0  X − E ( x )  Y + β 0  X − X  = =Y

Thus Yˆreg is an unbiased estimator of Y when β is known.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 2


Variance of Yˆreg (Yˆreg ) E Yˆreg − E (Yˆreg )  Var=  

2

= E  y + β 0 ( X − x ) − Y 

2

= E ( y − Y ) − β 0 ( x − X ) 

2

= E ( y − Y ) 2 + β 02 ( x − X ) 2 − 2 β 0 E ( x − X )( y − Y )  = Var ( y ) + β 02Var ( x ) − 2 β 0Cov( x , y ) f  SY2 + β 02 S X2 − 2 β 0 S XY  n f  SY2 + β 02 S X2 − 2 β 0 ρ S X SY  = n =

where

N −n N 1 N S X2 = ∑ ( X i − X )2 N − 1 i =1 1 N SY2 (Yi − Y ) 2 = ∑ N − 1 i =1 ρ : Correlation coefficient between X and Y . f =

Comparing Var (Yˆreg ) with Var ( y ) , we note that Var (Yˆreg ) < Var ( y )

if

β 02 S X2 − 2β 0 S XY < 0

or

β 0 S X2  β 0 −

 

2 S XY  <0 S X2 

which is possible when

 2S  2S either β 0 < 0 and  β 0 − 2XY  > 0 ⇒ 2XY < β 0 < 0 . SX  SX   2S  2S or β 0 > 0 and  β 0 − 2XY )  < 0 ⇒ 0 < β 0 < 2XY . SX  SX 

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 3


Optimal value of β Choose β such that Var (Yˆreg ) is minimum . So ∂Var (Yˆreg ) ∂β

∂  SY2 + β 2 S X2 − 2 βρ S X SY  = 0 ∂β S S ⇒ β= ρ Y= XY2 . SX SX =

ρS The minimum value of variance of Yˆreg with optimum value of β opt = Y is SX 2  S f  2 ˆ 2 SY Varmin (Yreg ) = SY + ρ 2 S X2 − 2 ρ Y ρ S X SY  n SX SX  f 2 SY (1 − ρ 2 ). = n

Since −1 ≤ ρ ≤ 1, so

Var (Yˆreg ) ≤ VarSRS ( y ) which always holds true. So the regression estimator is always better than the sample mean under SRSWOR.

Departure from β : If β 0 is the preassigned value of regression coefficient, then f  SY2 + β 02 S X2 − 2 β 0 ρ S X SY  Varmin (Yˆreg ) = n f  SY2 + β 02 S X2 − 2 ρβ 0 S X SY − ρ 2 SY2 + ρ 2 SY2  = n f 2 S X2  = (1 − ρ 2 ) SY2 + β 02 S X2 − 2β 0 S X2 β opt + β opt n f (1 − ρ 2 ) SY2 + ( β 0 − β opt ) 2 S X2  = n

where β opt =

ρ SY . SX

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 4


Estimate of variance An unbiased sample estimate of Var (Yˆreg ) is  (Yˆ ) Var = reg =

n f 2 [( yi − y ) − β0 ( xi − x )] ∑ n(n − 1) i =1

f n

n

∑ (s i =1

2 y

+ β 02 sx2 − 2 β 0 sxy ).

Note that the variance of Yˆreg increases as the difference between β 0 and β opt increases.

Regression estimates when β is computed from sample Suppose a random sample of size n on paired observations on ( xi , yi ), i = 1, 2,.., n is drawn by SRSWOR. When β is unknown, it is estimated as

s βˆ = xy2 sx and then the regression estimator of Y is given by Yˆreeg = y + βˆ ( X − x ).

It is difficult to find the exact expressions of E (Yreg ) and Var (Yˆreg ). So we approximate them using the same methodology as in the case of ratio method of estimation. Let y −Y ⇒ y = Y (1 + ε 0 ) Y x−X ⇒ x= X (1 + ε1 ) ε1= x s − S XY ε 2= xy ⇒ sxy= S XY (1 + ε 2 ) S XY

ε0 =

sx2 − S X2 ε 3= ⇒ sx2= S X2 (1 + ε 3 ) 2 SX

Then E (ε 0 ) 0,= E (ε1 ) 0, = E (ε 2 ) 0,= E (ε 3 ) 0, = f 2 CY , n f E (ε12 ) = C X2 , n f E (ε 0 ε1 ) = ρ C X CY n

E (ε 02 ) =

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 5


and sxy Yreg = y + 2 (X − x) sx = Y (1 + ε 0 ) +

S XY (1 + ε 2 ) (−ε1 X ) S x2 (1 + ε 3 )

The estimation error of Yˆreg is

(Yˆreg − Y ) = Y ε 0 − β X ε1 (1 + ε 2 )(1 + ε 3 ) −1 where β =

S XY is the population regression coefficient. S X2

Assuming ε 3 <1,

(Yˆreg − Y )= Y ε 0 − β X (ε1 + ε1ε 2 )(1 − ε 3 + ε 32 − ....) Retaining the terms upto second power of ε ' s and ignoring other terms, we have (Yˆreg − Y )  Y ε 0 − β X (ε1 + ε1ε 2 )(1 − ε 3 + ε 32 )  Y ε 0 − β X (ε1 − ε1ε 3 + ε1ε 2 )

Bias of Yˆreg Now the bias of Yˆreg upto the second order of approximation is E (Yˆreg − Y )  E Y ε 0 − β X (ε1 + ε1ε 2 )(1 − ε 3 + ε 32 ) 

µ  β Xf  µ21 = − − 302   n  XS XY XS X  where f =

N −n and (r , s)th cross product moment is given by N

µrs =E ( x − X ) r ( y − Y ) s  So that

µ21 =E ( x − X ) 2 ( y − Y )  = µ30 E ( x − X )3  . Thus

β f  µ21 µ30  − − E (Yˆreg ) =  . n  S XY S X2 

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 6


Also,

E (Yˆreg ) = E ( y ) + E[ βˆ ( X − x )] = Y + XE ( βˆ ) − E ( βˆ x ) = Y + E ( x ) E ( βˆ ) − E ( βˆ x ) = Y − Cov( βˆ , x ) Bias (Yˆ ) = E (Yˆ ) − Y = −Cov( βˆ , x ) reg

reg

MSE of Yˆreg To obtain the MSE of Yˆreg , consider 2 E (Yˆreg − Y ) 2 ≈ E ε 0Y − β X (ε1 − ε1ε 3 + ε1ε 2 ) 

Retaining the terms of ε ' s upto the second power second and ignoring others, we have E (Yˆreg − Y ) 2 ≈ E ε 02Y 2 + β 2 X 2ε12 − 2 β XY ε 0ε 1  = Y 2 E (ε 02 ) + β 2 X 2 E (ε12 ) − 2 β XYE (ε 0ε1 ) 2  2 SY2 S S  2 2 SX + − 2 β XY ρ X Y  Y X β  2 2 X XY   Y MSE (= Yˆreg ) E (Yˆreg − Y ) 2

=

f n

f 2 ( SY + β 2 S X2 − 2 βρ S X SY ) n S XY S Since= β = ρ Y , 2 SX SX =

so substituting it in MSE (Yˆreg ), we get

MSE (Yˆreg) =

f 2 SY (1 − ρ 2 ). n

So upto second order of approximation, the regression estimator is better than the conventional sample mean estimator under SRSWOR. This is because the regression estimator uses some extra information also. Moreover, such extra information requires some extra cost also. This shows a false superiority in some sense. So the regression estimators and SRS estimates can be combined if cost aspect is also taken into consideration.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 7


Comparison of Yˆreg with ratio estimate and SRS sample mean estimate MSE= (Yˆreg )

f 2 SY (1 − ρ 2 ) n

f MSE (YˆR ) = ( SY2 + R 2 S X2 − 2 ρ RS X SY ) n f VarSRS ( y ) = SY2 . n

(Yˆreg ) VarSRS ( y )(1 − ρ 2 ) and because ρ 2 < 1, so Yˆreg is always superior to y . = (i) As MSE (ii ) Yˆreg is better than YˆR if MSE (Yˆreg ) ≤ MSE (YˆR ) f 2 f SY (1 − ρ 2 ) ≤ ( SY2 + R 2 S X2 − 2 ρ RS X SY ) n n 2 or if ( RS X − ρ SY ) ≥ 0 or if

which always holds true.

So regression estimate is always superior to the ratio estimate upto the second order of approximation.

Regression estimates in stratified sampling Under the set up of stratified sampling, let the population of N sampling units be divided into k k

strata. The strata sizes are

N1 , N 2 ,.., N k such that

∑N i =1

i

= N.

A sample of size ni on

( xij , yij ), j = 1, 2,.., ni , is drawn from ith strata (i = 1,2,..,k) by SRSWOR where xij and yij denote the jth unit from ith strata on auxiliary and study variables, respectively.

In order to estimate the population mean, there are two approaches.

1. Separate regression estimator •

Estimate regression estimator

Yˆreg = y + β0 ( X − x ) from each stratum separately, i.e., the regression estimate in the ith stratum is

Yˆreg (i ) = yi + βi ( X i − xi ).

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 8


Find the stratified mean as the weighted mean of Yˆreg (i ) i = 1, 2,.., k as k N Yˆ i reg ( i ) Yˆsreg = ∑ N i =1

=

k

∑ [w { y + β ( X i =1

= βi where

i

i

i

i

− xi )}]

Sixy Ni = , wi . 2 Six N

In this approach , the regression estimator is separately obtained in each of the stratum and then combined using the philosophy of stratified sample. So Yˆsreg is termed as separate regression estimator,

2. Combined regression estimator Another strategy is to estimate x and y in the Yˆreg as respective stratified mean. Replacing x k

k

i =1

i =1

by xst = ∑ wi xi and y by yst = ∑ wi yi , we have

Yˆcreg =yst + β ( X − xst ). In this case, all the sample information is combined first and then implemented in regression estimator, so Yˆreg is termed as combined regression estimator.

Properties of separate and combined regression In order to derive the mean and variance of Yˆsreg and Yˆcreg , there are two cases -

when β is pre-assigned as β 0

-

when β is estimated from the sample.

sxy We consider here the case that β is pre-assigned as β 0 . Other case when β is estimated as βˆ = 2 sx can be dealt with the same approach based on defining various ε ' s and using the approximation theory as in the case of Yˆreg .

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 9


1. Separate regression estimator Assume β is known, say β 0 . Then

Yˆs reg =

k

∑w [y + β

E (Yˆs= reg ) =

i

i =1

i

0i

( X i − xi )]

k

∑ w  E ( y ) + β ( X i

i =1

i

k

∑ w [Y + ( X i

i =1

i

i

0i

i

− E ( xi ) ) 

− X i )]

=Y. (Yˆs reg ) E Yˆs reg − E (Yˆs reg )  Var=  

2

k  k  = E  ∑ wi yi + i + ∑ wi β 0i ( X i − xi ) − Y  =  i 1 =i 1  k   k = E  ∑ wi ( yi − Y ) − ∑ wi β 0i ( xi − X i )  =  i 1 =i 1  k

k 2 2 i i i =i 1 =i 1

∑ w E( y − Y ) + ∑ w

=

k

2 i

2

2

k

β 02i E ( xi − X i )]2 − 2∑ wi2 β 0i E ( xi − X i )( yi − Yi ) =i 1

k

k

= ∑ wi2Var ( yi ) + ∑ wi2 β02iVar ( xi ) − 2∑ wi2 β0iCov( xi , yi )

=i 1 =i 1

=

k

∑ i =1

=i 1

2 i i

w f ( SiY2 + β 02i SiX2 − 2 β 0i SiXY )] ni

S Var (Yˆs reg ) is minimum when β 0i = iXY SiX2

and so substituting β 0i , we have

 wi2 fi 2  ( SiY − β 02i SiX2 )  ∑  i =1  ni  N −n where f i = i i . Ni

= Vmin (Yˆs reg )

k

Since SRSWOR is followed in drawing the samples from each stratum, so

E ( six2 ) = SiX2 E ( siy2 ) = SiY2 E ( sixy ) = SiXY Thus an unbiased estimator of variance can be obtained by replacing SiX2 and SiY2 by their respective unbiased estimators six2 and siy2 , respectively as

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 10


 (= Var Yˆs reg )

 wi2 fi 2  ( siy + β oi2 six2 − 2 β 0i sixy )  ∑  i =1  ni  k

and ˆ  = Var min (Ys reg )

 wi2 fi 2  ( siy − β oi2 six2 )  ∑  i =1  ni  k

2. Combined regression estimator: Assume β is known as β 0 . Then k

k

∑ w y + β (X − ∑ w x )

Yˆc reg =

0 i i =i 1 =i 1

( )

i i

k

k

∑ wi E ( yi ) + β0 [ X − ∑ wi E ( xi )]

E Yˆc= reg

=i 1 =i 1 k

k

∑ wiYi + β0 [ X − ∑ wi X i ]

=

=i 1 =i 1

= Y + β0 ( X − X ) =Y. Thus Yˆc reg is an unbiased estimator of Y .

Var (= Yˆc reg ) E[Yc reg − E (Yc reg )]2 k

k

= E[∑ wi yi + β 0 ( X − ∑ wi xi ) − Y ]2

=i 1 =i 1 k

k

= E[∑ wi ( yi − Y ) − β 0 ∑ wi ( xi − X i )]2

=i 1 =i 1 k

k

k

= ∑ wi2Var ( yi ) + β02 ∑ wi2Var ( xi ) − 2∑ wi2 β0Cov( xi , yi )

=i 1 =i 1 =i 1

=

wi2 fi 2  SiY + β 02 SiX2 − 2 β 0 SiXY . ∑ i =1 ni k

Var (Yˆc reg ) is minimum when

β0 =

Cov( xst , yst ) Var ( xst )

wi2 fi SiXY ∑ i =1 ni = k 2 wi fi 2 SiX ∑ i =1 ni k

and the minimum variance is given by

ˆ = Var min (Yc reg )

wi2 fi 2 ( SiY − β 02 SiX2 ). ∑ i =1 ni k

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 11


( )

( )

2 2 = E six2 S= Siy2 and Since SRSWOR is followed to draw the sample from strata, so using ix , E siy

E ( sixy ) = SiXY , we get the estimate of variance as

 (= Var Yˆc reg )

 wi2 fi 2  ( siy + β o2 six2 − 2 β 0i sixy )  ∑  i =1  ni  k

and ˆ  = Var min (Yc reg )

 wi2 fi 2  ( siy − β oi2 six2 )  ∑  i =1  ni  k

Comparison of Yˆs reg and Yˆc reg : The variance of Yˆs reg is minimum when β 0i = β 0 for all i. Cov( xst , yst ) The variance of Yˆc reg is minimum when β 0 = = β 0* . Var ( xst )

Cov( xst , yst ) . Var ( yst )(1 − ρ*2 ) where ρ* = The minimum variance is Var (Yˆc reg ) min = Var ( xst )Var ( yst ) 2

k

w f ( β 02i − β 02 ) i i SiX2 Var (Yˆc reg ) − Var (Yˆs reg ) = ∑ ni i =1 Var (Yˆc reg ) min − Var (Yˆs reg ) β

0 i = β0

k fi ( β 0i − β 0 ) 2 wi2 SiX2 = ∑ i =1 ni

≥0 which is always true. So if the regression line of y on x is approximately linear and the regression coefficients do not vary much among the strata, then separate regression estimate is more efficient than combined regression estimator.

Sampling Theory| Chapter 6 | Regression Method of Estimation | Shalabh, IIT Kanpur

Page 12


Chapter 7 Varying Probability Sampling The simple random sampling scheme provides a random sample where every unit in the population has equal probability of selection. Under certain circumstances, more efficient estimators are obtained by assigning unequal probabilities of selection to the units in the population. This type of sampling is known as varying probability sampling scheme. If Y is the variable under study and X is an auxiliary variable related to Y, then in the most commonly used varying probability scheme, the units are selected with probability proportional to the value of X, called as size. This is termed as probability proportional to a given measure of size (pps) sampling. If the sampling units vary considerably in size, then SRS does not takes into account the possible importance of the larger units in the population. A large unit, i.e., a unit with large value of Y contributes more to the population total than the units with smaller values, so it is natural to expect that a selection scheme which assigns more probability of inclusion in a sample to the larger units than to the smaller units would provide more efficient estimators than the estimators which provide equal probability to all the units. This is accomplished through pps sampling. Note that the â&#x20AC;&#x153;sizeâ&#x20AC;? considered is the value of auxiliary variable X and not the value of study variable Y. For example in an agriculture survey, the yield depends on the area under cultivation. So bigger areas are likely to have larger population and they will contribute more towards the population total, so the value of the area can be considered as the size of auxiliary variable. Also, the cultivated area for a previous period can also be taken as the size while estimating the yield of crop. Similarly, in an industrial survey, the number of workers in a factory can be considered as the measure of size when studying the industrial output from the respective factory.

Difference between the methods of SRS and varying probability scheme: In SRS, the probability of drawing a specified unit at any given draw is the same. In varying probability scheme, the probability of drawing a specified unit differs from draw to draw. It appears in pps sampling that such procedure would give biased estimators as the larger units are overrepresented and the smaller units are under-represented in the sample. This will happen in case of sample mean as an estimator of population mean where all the units are given equal weight. Instead of giving equal weights to all the units, if the sample observations are suitably weighted at the estimation stage by taking the probabilities of selection into account, then it is possible to obtain unbiased estimators. Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 1


In pps sampling, there are two possibilities to draw the sample, i.e., with replacement and without replacement.

Selection of units with replacement: The probability of selection of a unit will not change and the probability of selecting a specified unit is same at any stage. There is no redistribution of the probabilities after a draw.

Selection of units without replacement: The probability of selection of a unit will change at any stage and the probabilities are redistributed after each draw. PPS without replacement (WOR) is more complex than PPS with replacement (WR) . We consider both the cases separately.

PPS sampling with replacement (WR): First we discuss the two methods to draw a sample with PPS and WR.

1. Cumulative total method: The procedure of selection a simple random sample of size n consists of -

associating the natural numbers from 1 to N units in the population and

-

then selecting those n units whose serial numbers correspond to a set of n numbers where each number is less than or equal to N which is drawn from a random number table.

In selection of a sample with varying probabilities, the procedure is to associate with each unit a set of consecutive natural numbers, the size of the set being proportional to the desired probability. If X 1 , X 2 ,..., X N are the positive integers proportional to the probabilities assigned to the N units in the population, then a possible way to associate the cumulative totals of the units. Then the units are selected based on the values of cumulative totals. This is illustrated in the following table:

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 2


Units

Size

Cumulative

X1

T1  X 1

X2 

T2  X 1  X 2 

i 1

X i 1

Ti 1   X j

1 2

i 1

j 1

i

i 

Xi  N

N

XN   X j j 1

Select a random number R between 1 and TN by using random number table.

Ti   X j j 1

If Ti 1  R  Ti , then ith unit is selected with probability Xi , i = 1,2,…, N . TN

Repeat the procedure n times to get a sample of size n.

N

TN   X j j 1

In this case, the probability of selection of ith unit is Pi 

Ti  Ti 1 X i  TN TN

 Pi  X i . Note that TN is the population total which remains constant. Drawback : This procedure involves writing down the successive cumulative totals. This is time

consuming and tedious if the number of units in the population is large. This problem is overcome in the Lahiri’s method.

Lahiri’s method: M  Max X i , i.e., maximum of the sizes of N units in the population or some convenient

Let

i 1,2,..., N

number greater than M . The sampling procedure has following steps: 1.

Select a pair of random number (i, j) such that 1  i  N , 1  j  M .

2.

If

j  X i , then ith unit is selected otherwise rejected and another pair of random number is

chosen. 3.

To get a sample of size n , this procedure is repeated till n units are selected.

Now we see how this method ensures that the probabilities of selection of units are varying and are proportional to size.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 3


Probability of selection of ith unit at a trial depends on two possible outcomes – either it is selected at the first draw – or it is selected in the subsequent draws preceded by ineffective draws. Such probability is given by P (1  i  N ) P (1  j  M | i ) 1 X  . i  Pi * , say. N M Probability that no unit is selected at a trial 

1 N

N

Xi 

 1  M  i 1

1 NX  N   N M  X  1  Q, say. M

Probability that unit i is selected at a given draw (all other previous draws result in the non selection of unit i)

 Pi*  QPi*  Q 2 Pi*  ... Pi* 1 Q X / NM X Xi  i  i   Xi. X /M NX X total 

Thus the probability of selection of unit i is proportional to the size X i . So this method generates a pps sample.

Advantage: 1. It does not require writing down all cumulative totals for each unit. 2. Sizes of all the units need not be known before hand. We need only some number greater than the

maximum size and the sizes of those units which are selected by the choice of the first set of random numbers 1 to N for drawing sample under this scheme.

Disadvantage: It results in the wastage of time and efforts if units get rejected. The probability of rejection  1 

X . M

The expected numbers of draws required to draw one unit 

M . X

This number is large if M is much larger than X .

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 4


Example: Consider the following data set of 10 number of workers in the factory and its output. We illustrate the selection of units using the cumulative total method. Factory no.

Number of workers Industrial production (X) (in thousands)

Cumulative total of sizes

(in metric tonns) (Y)

1

2

30

T1  2

2

5

60

T2  2  5  7

3

10

12

T3  2  5  10  17

4

4

6

T4  17  4  21

5

7

8

T5  21  7  28

6

2

13

T6  28  2  30

7

3

4

T7  30  3  33

8

14

17

T8  33  14  47

9

11

13

T9  47  11  58

10

6

8

T10  58  6  64

Selection of sample using cumulative total method: 1. First draw: - Draw a random number between 1 and 64.

- Suppose it is 23 - T4  23  T5 - Unit Y is selected and Y5  8 enters in the sample . 2. Second draw:

-

Draw a random number between 1 and 64

-

Suppose it is 38

-

T7  38  T8

-

Unit 8 is selected and Y8  17 enters in the sample

-

and so on.

-

This procedure is repeated till the sample of required size is obtained.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 5


Selection of sample using Lahiri’s Method In this case M  Max X i  14 i 1,2,...,10

So we need to select a pair of random number (i, j ) such that 1  i  10, 1  j  14 . Following table shows the sample obtained by Lahiri’s scheme: Random no

Random no

Observation

Selection of unit

1  i  10

1  j  14

3

7

j  7  X 3  10

trial accepted ( y3 )

8

13

j  13  X 8  14

trial rejected

4

7

j  7  X4  4

trial rejected

2

9

j  9  X2  5

trial rejected

9

2

j  2  X 9  11

trial accepted ( y9 )

and so on. Here ( y3 , y9 ) are selected into the sample.

Varying probability scheme with replacement: Estimation of population mean Let

Yi : Value of study variable for the ith unit of the population, i = 1, 2,…,N. X i : Known value of auxiliary variable (size) for the ith unit of the population. Pi : Probability of selection of ith unit in the population at any given draw and is proportional to size X i . Consider the varying probability scheme and with replacement for a sample of size n. Let yr be the value of rth observation on study variable in the sample and pr be its initial probability of selection. Define

zr  then z 

yr , r  1, 2,..., n, Npr

1 n  zi is an unbiased estimator of population mean Y , variance of z is n i 1

 z2 n

where

2

 Y  s2 1 n ( zr  z ) 2 .    Pi  i  Y  and an unbiased estimate of variance of z is z   n n  1 r 1 i 1  NPi  2 z

N

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 6


Proof:

Note that zr can take any one of the N values out of Z1 , Z 2 ,..., Z N with corresponding initial probabilities P1 , P2 ,..., PN , respectively. So N

E ( zr )   Z i Pi i 1

N

 i 1

Yi Pi NPi

Y.

Thus E(z )  

1 n  E ( zr ) n i 1 1 n Y n i 1

Y. So z is an unbiased estimator of population mean Y . The variance of z is

1  n  Var ( z )  2 Var   zr  n  r 1  

1 n2

n

Var ( z ) r

r 1

( zr' s are independent in WR case).

Now

Var ( zr )  E  zr  E ( zr ) 

2

 E  zr  Y 

2

N

   Z i  Y  Pi 2

i 1

2

 Y     i  Y  Pi i 1  NPi  N

  z2 (say) . Thus Var ( z )  

1 n2



 z2

.

n

n

r 1

2 z

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 7


To show that

sz2 is an unbiased estimator of variance of z , consider n

 n  (n  1) E ( sz2 )  E   ( zr  z ) 2   r 1   n   E   zr2  nz 2   r 1   n     E ( zr2 )  nE ( z 2 )   r 1  n

2 2   Var ( zr )   E ( zr )   n Var ( z )   E ( z )      r 1

    Y n

r 1

2 z

2

  n

 z2 n

Y

2

2 N    Yi   using Var ( zr )     Y  Pi   z2    i 1  NPi   

 (n  1) z2 E ( sz2 )   z2  sz2   z2  Var ( z )  n n

or E 

2  sz2 1  n  yr  2     Var ( z )     nz  . n n(n  1)  r 1  Npr   

Note: If Pi 

1 , then z  y , N 2

    y2 1 1 N  Yi Y   Var ( z )   n N i 1  N . 1 n   N 

which is the same as in the case of SRSWR.

Estimation of population total: An estimate of population total is 1 n y  Yˆtot    r   N z . . n r 1  pr 

Taking expectation, we get

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 8


 Y Y 1 n Y E (Yˆtot )    1 P1  2 P2  ...  N PN  n r 1  P1 P2 PN  1 n N      Yi  n r 1  i 1  1 n  Ytot n r 1  Ytot . 

Thus Yˆtot is an unbiased estimator of population total. Its variance is Var (Yˆtot )  N 2Var ( z ) 2

 1 N 1 Y  N  2  i  NY  Pi n i 1 N  Pi  2

2

 1 N Y    i  Ytot  Pi n i 1  Pi   1  N Yi 2    Ytot2  . n  i 1 Pi 

An estimate of the variance 2  (Yˆ )  N 2 sz . Var tot n

Varying probability scheme without replacement In varying probability scheme without replacement, when the initial probabilities of selection are unequal, then the probability of drawing a specified unit of the population at a given draw changes with the draw. Generally, the sampling WOR provides a more efficient estimator than sampling WR. The estimators for population mean and variance are more complicated. So this scheme is not commonly used in practice, especially in large scale sample surveys with small sampling fractions.

Let U i : i th unit, Pi : Probability of selection of U i at the first draw, i  1, 2,..., N N

 P 1 i 1

i

Pi ( r ) : Probability of selecting U i at the r th draw

Pi (1)  Pi . Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 9


Consider Pi (2)  Probability of selection of U i at 2nd draw. Such an event can occur in the following possible ways: U i is selected at 2nd draw when - U1 is selected at 1st draw and U i is selected at 2nd draw - U 2 is selected at 1st draw and U i is selected at 2nd draw 

- U i -1 is selected at 1st draw and U i is selected at 2nd draw - U i 1 is selected at 1st draw and U i is selected at 2nd draw 

- U N is selected at 1st draw and U i is selected at 2nd draw

So Pi (2) can be expressed as Pi (2)  P1

Pi P Pi Pi Pi  P2 i  ...  Pi 1  Pi 1  ...  PN 1  P1 1  P2 1  Pi 1 1  Pi 1 1  PN

N

Pj

Pi 1  Pj

Pj

Pi P P  Pi i  Pi i 1  Pj 1  Pi 1  Pi

j (  i ) 1

N

j (  i ) 1 N

  Pj j 1

Pi P  Pi i 1  Pj 1  Pi

N P P   Pi   j  i   j 1 1  Pj 1  Pi 

Pi (2)  Pi (1) for all i unless Pi 

1 . N

y  Pi (2) will, in general, be different for each i = 1,2,…, N . So E  i  will change with successive draws.  pi 

This makes the varying probability scheme WOR more complex. Only estimator of Y . In general,

y1 will provide an unbiased Np1

yi (i  1) will not provide an unbiased estimator of Y . Npi

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 10


Ordered estimates To overcome the difficulty of changing expectation with each draw, associate a new variate with each draw such that its expectation is equal to the population value of the variate under study. Such estimators take into account the order of the draw. They are called the ordered estimates. The order of the value obtained at previous draw will affect the unbiasedness of population mean. We consider the ordered estimators proposed by Des Raj, first for the case of two draws and then generalize the result.

Des Raj ordered estimator Case 1: Case of two draws: Let

y1 and y2 denote the values of

units U i (1) and U i (2)

drawn at the first and second draws

respectively. Note that any one out of the N units can be the first unit or second unit, so we use the notations U i (1) and U i (2) instead of U1 and U 2 . Also note that y1 and y2 are not the values of the first two units in the population. Further, let p1 and p2 denote the initial probabilities of selection of Ui(1) and Ui(2), respectively. Consider the estimators z1 

y1 Np1

z2 

1 N

  y2  y1   p2 / (1  p1 )  

1 N

 (1  p1 )   y1  y2  p2  

z Note that

z1  z2 . 2

p2 is the probability P(U i (2) | U i (1) ). 1  p1

Estimation of Population Mean: First we show that z is an unbiased estimator of Y . E(z )  Y . Note that

N

 P  1. i 1

i

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 11


Consider E ( z1 )  

1  y1  E  N  p1  1 N

 Y  y1 Y Y can take any one of out of the N values 1 , 2 ,..., N   Note that p1 P1 P2 PN  

 Y1  YN Y2 PN   P1  P2  ...  P2 PN   P1

Y E ( z2 ) 

1  (1  p1 )  E  y1  y2  N  p2  1 N

   (1  P1 ) U i (1)  E ( y1 )  E1  E2  y2 p2   

     

(Using E (Y )  E X [ EY (Y | X )].

where E2 is the conditional expectation after fixing the unit U i (1) selected in the first draw. Since

Y y2 can take any one of the (N – 1) values (except the value selected in the first draw) j with p2 Pj

probability

Pj 1  P1

, so

 (1  P1 )  y  P  * Y E2  y2 U i (1)   (1  P1 ) E2  2 U i (1)   (1  P1 ) j  j . j  . p2    p2   Pj 1  P1  where the summation is taken over all the values of Y except the value y1 which is selected at the first draw. So  (1  P1 )  * E2  y2 U i (1)    j Y j  Ytot  y1. p2  

Substituting it in E ( z2 ), we have E ( z2 ) 

1  E ( y1 )  E1 (Ytot  y1 ) N

1  E ( y1 )  E (Ytot  y1 ) N

Y 1 E (Ytot )  tot  Y . N N

Thus E ( z1 )  E ( z2 ) 2 Y Y  2 Y.

E(z ) 

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 12


Variance: The variance of z for the case of two draws is given as  1 N 2  1 Var ( z )  1   Pi   2  2 i 1   2 N

Y  Pi  i  Ytot   i 1  Pi  N

2

 1  2  4 N

Y  Pi  i  Ytot   i 1  Pi  N

2

2

Proof: Before starting the proof, we note the following property N

N

N

 a b   a  b

i  j 1

i

j

i 1

i

 j 1

j

  bi  

which is used in the proof. The variance of z is Var ( z )  E ( z 2 )   E ( z ) 

2 2

 1  y1 y2 (1  p1 )   2  E   y1    Y 2 N p p 2  1   2

 y1 (1  p1 ) y2 (1  p1 )  1 2 E     Y 2 4N p1 p2     nature of

nature of

variable

variable

depends only on

depends

1st draw

upon1st and 2nd draw

2  1  N  Yi (1  Pi ) Y j (1  Pi )  PP i j  =  Y 2    Pj P  4 N 2  i  j 1  Pi 1  i     Y j2 (1  Pi ) 2 PP (1  Pi 2 ) PP 1  N  Yi 2 (1  Pi ) 2 PP i j i j i j  2 YY = 2     Y i j 2   2 2 Pi Pj PP 4 N  i  j 1  1  Pi 1  Pi 1  Pi  i j

=

1 4N 2

 N  Y 2 (1  P ) 2 Pj Y j2 (1  Pi ) 2 Pi  2 i i   2YY   i j (1  Pi )   Y . Pi Pj 1  Pi 1  Pi  i  j 1  

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 13


Using the property N N   a b a   i j i   b j  bi  , we can write i  j 1 i 1  j 1  N

2 2 N N  N 1  N Yi 2 (1  Pi ) 2  N  N Y j Yi  2       Var ( z )  P P P (1 P ) 2 Y (1 P )(   j i   i   i  i  i  Y j  Yi )]  Y 4 N 2  i 1 Pi (1  Pi )  j 1 P P i 1 j 1  j 1 j i   i 1  N N N   N Y j2 Yi 2  1  N Yi 2 2      (1 P 2 P ) P (1     Y 2 P Y P Y Y ) 2 (1 )( )          i i i j i i i i 2 4 N  i 1 Pi i 1 i 1 j 1  j 1 Pj Pi  

1 4N 2

N N N Y2 N N N Y2  N Yi 2 N 2 j j 2 2 2      Y P Y P Y P 2       i i i i i  i 1 i 1 j 1 Pj i 1 i 1 j 1 Pj  i 1 Pi i 1 N

N

i 1

j 1

N

N

N

N

i 1

i 1

j 1

i 1

2 2 2 2   PY i i  2 Yi  Y j  2 Yi Pi  2 Yi Pi  Y j  2 Yi ]  Y i

N N  1  N Yi 2 N 2 N Y 2 2 2 2  P  Y  2 Y  2 Y Y P    i  i tot tot  i i   Y 4 N 2  i 1 Pi i 1 P j 1 i 1 i 1 j  2 j

N 1 N 2  1 N  1  N Yi 2 2 2  2 2 2  2 1   Pi 2   Y  Y  Y  2 Y  2 Y tot tot  Yi Pi  4 N Y  tot tot  2  2  i i 1  2 i 1  4 N  i 1 Pi   4 N  i 1

 1 N  1  1   Pi 2  2  2 i 1  2 N

2

N N  Yi  1 2 2 2    P Y ( Y 2 Y  i tot  tot  Yi Pi  2Ytot  4Ytot ) 2  i i 1 i 1  Pi  4 N i 1 N

 1 N  1  1   Pi 2  Y2 2 tot 2 2 N i 1   2

 1 N  1  1   Pi 2  2  2 i 1  2 N

N N  Yi  1 2 2 2 2 2    P Y Y Y ( 2  i tot  tot  Yi Pi  2Ytot  2Ytot   Pi Ytot ) 2  i P N 4 i 1 i 1 i 1 i  i 

 1 N  1  1   Pi 2  2  2 i 1  2 N

Y  1 Pi  i  Ytot    2 i 1  Pi  4N

N

2

N

 1  1 N 2  N  Yi 1   P 1  P Y     i tot i  2 2  2 N  2 i 1  i 1  Pi  4N 2

 1 N  Y 1   Pi  i  Y   2 2 i 1  NPi  4N

2

 1 N  Y 1 Var ( z )   Pi  i  Y   2 2 i 1  NPi  4N  variance of WR case for n  2

 Y N

i

i 1

2

 2YtotYi Pi  Pi 2Ytot2 

Y  Pi  i  Ytot   i 1  Pi  N

2

Y  1 Pi   i  Ytot    2 i 1 i 1  Pi  4N N

2

N

2

Y  1 Pi   i  Ytot    2 i 1 i 1  Pi  4N N

2

N

2

2

Y  Pi  i  Ytot   i 1  Pi  N

2

2

`

2

Y  Pi  i  Ytot   i 1  Pi   reduction of variance in WR with varying probability N

2

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 14


Estimation of Var ( z ) Var ( z )  E ( z 2 )  ( E ( z )) 2  E(z 2 )  Y 2 Since E ( z1 z2 )  E  z1 E ( z2 | u1 )   E  z1Y   YE ( z1 )  Y 2. Consider E  z 2  z1 z2   E ( z 2 )  E ( z1 z2 )  E(z 2 )  Y 2  Var ( z )  ( z )  z 2  z z is an unbiased estimator of Var ( z )  Var 1 2

Alternative form (z )  z 2  z z Var 1 2 2

z z    1 2   z1 z2  2  

( z1  z2 ) 2 4

y y 1  p1  1  y   1  1 2  4  Np1 N N p2  1  4N 2

2

 y1 y2 (1  p1 )  (1  p1 )   p1 p2  

2

2

(1  p1 ) 2  y1 y2      . 4 N 2  p1 p2 

Case 2: General Case Let (U i (1) ,U i (2) ,...,U i ( r ) ,..., U i ( n ) ) be the units selected in the order in which they are drawn in n draws where

U i ( r ) denotes that the ith

unit is drawn at the rth draw.

Let ( y1 , y2 ,.., yr ,..., yn )

and

( p1 , p2 ,..., pr ,..., pn ) be the values of study variable and corresponding initial probabilities of selection, respectively.

Further,

let

Pi (1) , Pi (2) ,..., Pi ( r ) ,..., Pi ( n )

be

the

initial

probabilities

of

U i (1) ,U i (2) ,..., U i ( r ) ,...,U i ( n ) , respectively. Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 15


Further, let

z1 

y1 Np1

zr 

1 N

Consider z 

  yr  y1  y2  ...  yr 1  (1  p1  ...  pr 1 )  for r  2,3,..., n. pr  

1 n  zr as an estimator of population mean Y . n r 1

We already have shown in case 1 that E ( z1 )  Y . Now we consider E ( zr ) 

E ( zr ), r  2,3,..., n. We can write

1 E1 E2  zr U i (1) ,U i (2) ,..., U i ( r 1)  N

where E2 is the conditional expectation after fixing the units U i (1) ,U i (2) ,..., U i ( r 1)

drawn in the first (r -

1) draws.

Consider y  y   E  r (1  p  ...  p )   E E  r (1  p  ...  p ) U ,U ,...,U r 1  r  1 i(1) i(2) i(r  1)  1 1 2p 1 p  r   r   y   .  E (1  P  P ...  P ) E  r U ,U ,...,U i(1) i(2) i(r  1) 2  p i(1) i(2) i (r  1)   1  r  

Y y j r Since conditionally can take any one of the N - (r -1) values , j  1, 2,..., N with probabilities p P r j P

j

1  P  P ...  P i(1) i (2) i(r  1)

, so

  P y  N Yj j  E  r (1  p  ...  p )   E (1  P  P ...  P )  * . r 1  i(1) i(2) i(r  1) 1 1 P (1  P  P ...  P ) p j  1 j i(1) i(2) i(r  1)   r    N   E   *Y  j 1  j 1 

where

N * denotes that the summation is taken over all the values of y except the y values selected in the first (r -1) draws  j 1

like as

N , i.e., except the values y , y ,..., y which are selected in the first (r -1) draws.  1 2 r 1 j  1( i(1), i(2),..., i(r  1))

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 16


Thus now we can express

E ( zr ) 

  y 1 E1E2  y1  y2  ...  yr 1  r (1  p1  ...  pr 1 )  N pr  

N  1   E1 Yi (1)  Yi (2)  ...  Yi ( r 1)   *Y j  N  j 1 

N  1  E1 Yi (1)  Yi (2)  ...  Yi ( r 1)  Yj   N  j 1( i (1),i (2),...,i ( r 1)) 



1  E1 Yi (1)  Yi (2)  ...  Yi ( r 1)  Ytot  Yi (1)  Yi (2)  ...  Yi ( r 1)   N  1  E Y  N 1  tot  Y  tot N 

 Y

for all r  1, 2,..., n.

Then

1 n  E  zr  n r 1 1 n  Y n r 1

Ez  

Y. Thus z is an unbiased estimator of population mean Y . The expression for variance of z

in general case is complex but its estimate is simple.

Estimate of variance: Var ( z )  E ( z 2 )  Y 2 . Consider for r  s, E ( zr zs )  E  zr E ( zs | U1 ,U 2 ,...,U s 1 )   E  zrY   YE ( zr ) Y2

because for r  s, zr will not contribute and similarly for s  r , zs will not contribute in the expectation. Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 17


Further, for s  r , E ( zr zs )  E  zs E ( zr | U1 ,U 2 ,...,U r 1 )   E  zsY   YE ( zs )  Y 2.

Consider n n n n  1  1  E z z E ( zr z s )     r s ( 1) ( 1)   n n n n r s s r s s (  )  1  1 (  )  1  1   1  n(n  1)Y 2 n(n  1)

 Y 2. Substituting Y 2 in Var ( z ), we get Var ( z )  E ( z 2 )  Y 2 n n  1   E( z 2 )  E  zr z s     n(n  1) r (  s ) 1 s 1  n n 1   Var (z )  z 2    zr z s n(n  1) r (  s ) 1 s 1

2

n n n  n  Using   zr    zr2    zr zs r 1 r (  s ) 1 s 1  r 1 

n

n

n

 zr zs  n2 z 2   zr2 ,

r (  s ) 1 s 1

r 1

 ( z ) can be further simplified as The expression of Var

 (z )  z 2  Var

 

1  2 2 n 2 n z   zr  n(n  1)  r 1 

1  n 2  zr  nz 2    n(n  1)  r 1  n 1 ( zr  z ) 2 .  n(n  1) r 1

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 18


Unordered estimator: In ordered estimator, the order in which the units are drawn is considered. Corresponding to any ordered estimator, there exist an unordered estimator which does not depend on the order in which the units are drawn and has smaller variance than the ordered estimator. N In case of sampling WOR from a population of size N , there are   unordered sample(s) of size n . n Corresponding to any unordered sample(s) of size n units, there are n ! ordered samples. For example, for n  2 if the units are u1 and u2 , then -

there are 2! ordered samples - (u1 , u2 ) and (u2 , u1 )

-

there is one unordered sample (u1 , u2 ) .

Moreover,  Probability of unordered   Probability of ordered   Probability of ordered        sample (u1 , u2 )   sample (u1 , u 2 )   sample (u2 , u 1 ) 

For n  3, there are three units u1 , u2 , u3 and -there are following 3! = 6 ordered samples: (u1 , u2 , u3 ), (u1 , u3 , u2 ), (u2 , u1 , u3 ), (u2 , u3 , u1 ), (u3 , u1 , u2 ), (u3 , u2 , u1 ) - there is one unordered sample (u1 , u2 , u3 ). Moreover, Probability of unordered sample = Sum of probability of ordered sample, i.e. P(u1 , u2 , u3 )  P(u1 , u3 , u2 )  P(u2 , u1 , u3 )  P(u2 , u3 , u1 )  P(u3 , u1 , u2 )  P(u3 , u2 , u1 ),

N Let zsi , s  1, 2,..,   , i  1, 2,..., n !( M ) be an estimator of population parameter  based on ordered n sample si . Consider a scheme of selection in which the probability of selecting the ordered sample ( si ) is psi . The probability of getting the unordered sample(s) is the sum of the probabilities, i.e., M

ps   psi . i 1

For a population of size N with units denoted as 1, 2,…, N , the samples of size n are n  tuples. In the nth draw, the sample space will consist of N ( N  1)...( N  n  1) unordered sample points.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 19


psio  P selection of any ordered sample  

1 N ( N  1)...( N  n  1)

psiu  P selection of any unordered sample   then

ps 

M (  n!)

 i 1

psio 

selection of any  n!  n! P   N ( N  1)...( N  n  1)  ordered sample 

n !( N  n)! 1  . N! N   n

M N Theorem : If ˆ0  zsi , s  1, 2,...,   ; i  1, 2,..., M ( n !) and ˆu   zsi psi are the ordered and unordered i 1 n

estimators of  repectively, then (i) E (ˆu )  E (ˆ0 ) (ii) Var (ˆu )  Var (ˆ0 ) where zsi is a function of si th ordered sample (hence a random variable) and psi is the probability of selection of si th ordered sample and

psi 

psi . ps

N Proof: Total number of ordered sample = n !  n N   n M

(i ) E (ˆ0 )   zsi psi s 1 i 1

N   n

M  E (ˆu )     zsi psi  ps s 1  i 1   p     zsi si ps s  i   zsi psi s

  ps 

i

 E (ˆ0 )

N (ii) Since ˆ0  zsi , so ˆ02  zsi2 with probability psi , i  1, 2,..., M , s  1, 2,...,   . n 2

M M  Similarly, ˆu   zsi psi , so ˆu2    zsi psi  with probability ps i 1  i 1 

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 20


Consider Var (ˆ0 )  E (ˆ02 )   E (ˆ0 ) 

2

  zsi2 psi   E (ˆ0 )  s

2

i

Var (ˆu )  E (ˆu2 )   E (ˆu ) 

2

2

2       zsi psi  ps   E (ˆ0 )  s  i  2

  Var (ˆ0 )  Var (ˆu )   z psi     zsi psi  ps s i s  i  2 si

2

    z psi     zsi psi  ps s i s  i      2   zsi psi    zsi psi  ps s  i  i  2              zsi2 psi    zsi psi    psi   2   zsi psi    zsi psi  ps  s   i   i   i  i    i 2        2       zsi psi    zsi psi  psi  2   zsi psi  zsi psi  s  i   i   i     2 si

     ( zsi   zsi psi ) 2 psi   0 s i  i  ˆ ˆ  Var ( 0 )  Var (u )  0 or Var (ˆ )  Var (ˆ ) u

0

Estimate of Var (ˆu ) Since   Var (ˆ0 )  Var (ˆu )   ( zsi   zsi psi ) 2 psi  s i  i    2  (ˆ )  Var  (ˆ )  Var  0 u ( zsi   zsi psi ) psi  s i  i   (ˆ )  p (  p Var z  z p ) 2 . 

 i

si

0

 i

si

si

si

si

i

Based on this result, now we use the ordered estimators to construct an unordered estimator. It follows from this theorem that the unordered estimator will be more efficient than the corresponding ordered estimators.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 21


Murthy’s unordered estimator corresponding to Des Raj’s ordered estimator for the sample size 2 Suppose yi and y j are the values of units U i and U j selected in the first and second draws respectively with varying probability and WOR in a sample of size 2 and let pi and p j be the corresponding initial probabilities of selection. So now we have two ordered estimates corresponding to the ordered samples s1* and s2* as follows s1*  ( yi , y j ) with (U i , U j ) s2*  ( y j , yi ) with (U j , U i ) which are given as z ( s1* ) 

yj  yi 1  (1  pi )  (1  pi )  pi p j  2 N 

where the corresponding Des Raj estimator is given by yi y j (1  pi )  1   yi    2 N  pi pj  and z ( s2* ) 

yj yi  1   (1  p j )  (1  p j )  2 N  pj pi 

where the corresponding Des Raj estimator is given by y j yi (1  p j )  1   yj  . 2 N  pj pi  The probabilities corresponding to z ( s1* ) and z ( s2* ) are p ( s1* )  p ( s2* ) 

pi p j 1  pi p j pi 1 p j

p ( s )  p( s1* )  p ( s2* ) 

pi p j (2  pi  p j ) (1  pi )(1  p j ) p '( s1* )  p '( s2* ) 

1 p j 2  pi  p j 1  pi . 2  pi  p j

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 22


Murthy’s unordered estimate z (u ) corresponding to the Des Raj’s ordered estimate is given as z (u )  z ( s1* ) p '( s1 )  z ( s2* ) p '( s2 ) 

z ( s1* ) p ( s1* )  z ( s2* ) p ( s2* ) p ( s1* )  p ( s2* )

 1   2 N 

 y j   pi p j    1  yj yi yi   p j pi (1 ) (1 )     p p   (1  pi )  (1  pi )      j j pi p j   1  pi    2 N  pj pi   1 p j  pi p j p p  j i 1  pi 1  p j

1 2N

   y j  yj  y y   (1  pi ) i  (1  pi )  (1  p j )  (1  p j )  (1  p j ) i  (1  pi )  pi p j  pi pi      (1  p j )  (1  pi )

1 2N

  yj yi (1  p j ) (1  pi )  (1  pi )  (1  pi ) (1  p j )  (1  p j  pi pj   2  pi  p j

(1  p j ) 

y yi  (1  pi ) j pi pj

N (2  pi  p j )

    

.

Unbiasedness: Note that yi and pi can take any one of the values out of Y1 , Y2 ,..., YN and respectively. Then

y j and p j can take any one of the remaining values out of

P1 , P2 ,..., PN , Y1 , Y2 ,..., YN and

P1 , P2 ,..., PN , respectively, i.e., all the values except the values taken at the first draw. Now

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 23


  Y j   PP PP  Y i j  i j   (1  Pj ) i  (1  Pi )   Pi Pj  1  Pi 1  Pj    1 E  z (u )     2  Pi  Pj N i j   Y j   PP P P  Y i j  j i   (1  Pj ) i  (1  Pi )   Pi Pj  1  Pi 1  Pj    1 2   2 N i j 2  Pi  Pj   Y j   PP P P  Y i j  j i   (1  Pj ) i  (1  Pi )   Pi Pj  1  Pi 1  Pj    1   2  Pi  Pj 2 N i j

1 2N

  Y j   PP Yi  i j P P (1 ) (1 )         j i P P P P (1 )(1 )   i j     i j i j  

1 2N

 1  P  1  P 

 Yi Pj

 N N N  Using result  ai b j   ai  b j  bi , we have i  j 1 i 1  j 1  i j



Y j Pi 

i

j

E  z (u )  

1 2N

N N  N Y   N Y j  ( Pi Pj )    i ( Pj  Pi )      i 1 1  Pi j 1   j 1 1  Pj i 1 

1 2N

  N Yi   N Yj  P (1 ) (1  Pj )    i    i 1 1  Pi   j 1 1  Pj

N  1 N  Yi   Y j  2 N  i 1 j 1 

Y Y 2 Y.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 24


Variance: The variance of z (u ) can be found as 2

1 N (1  Pi  Pj )(1  Pi )(1  Pj )  Yi Y j  PP i j (2  Pi  Pj ) Var  z (u )       2 2 i  j 1 N (2  Pi  Pj )  Pi Pj  (1  Pi )(1  Pj ) 1 N PP (1  Pi  Pj )  Yi Y j    i 2j    2 i  j 1 N (2  Pi  Pj )  Pi Pj 

2

Using the theorem that Var (ˆu )  Var (ˆ0 ) we get

Var  z (u )   Var  z ( s1* )  and Var  z (u )   Var  z ( s2* ) 

Unbiased estimator of V  z (u ) An unbiased estimator of Var  z | u  is 2

  z (u )   (1  pi  p j )(1  pi )(1  p j )  yi  y j  . Var p p  N 2 (2  pi  p j ) 2 j   i

Horvitz Thompson (HT) estimate The unordered estimates have limited applicability as they lack simplicity and the expressions for the estimators and their variance becomes unmanageable when sample size is even moderately large. The HT estimate is simpler than other estimators. Let N be the population size and yi , (i  1, 2,..., N ) be the value of characteristic under study and a sample of size n is drawn by WOR using arbitrary probability of selection at each draw. Thus prior to each succeeding draw, there is defined a new probability distribution for the units available at that draw. The probability distribution at each draw may or may not depend upon the initial probability at the first draw. Define a random variable  i (i  1, 2,.., N ) as 1 if Yi is included in a sample ' s ' of size n

i  

0 otherwise.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 25


Let zi 

nyi , i  1...N assuming E ( i )  0 for all i NE ( i )

where E ( i )  1.P (Yi  s )  0.P (Yi  s )

 i is the probability of including the unit i in the sample and is called as inclusion probability.

The HT estimator of Y based on y1 , y2 ,..., yn is 1 n zn  YˆHT   zi n i 1 

1 N  i zi . n i 1

Unbiasedness 1 N E (YˆHT )   E ( zi i ) n i 1 1 N  zi E ( i ) n i 1 1 N nyi   E ( i ) n i 1 NE ( i ) 

1 N nyi  Y n i 1 N

which shows that HT estimator is an unbiased estimator of population mean.

Variance V (YˆHT )  V ( zn )  E ( zn2)   E ( zn ) 

2

 E ( zn2)  Y 2 . Consider 1 N  E ( z )  2 E    i zi  n  i 1 

2

2 n

N N  1 N 2 2 E z   i j zi z j      i i 2 n  i 1 i (  j ) 1 j 1 

N N  1 N 2 2 z E zi z j E ( i j )  .  ( )    i 2  i n  i 1 i (  j ) 1 j 1 

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 26


If S  s is the set of all possible samples and  i is probability of selection of ith unit in the sample s then E ( i )  1 P( yi  s )  0.P ( yi  s )  1. i  0.(1   i )   i E ( )  12. P( yi  s )  02.P( yi  s ) 2 i

 i.

So E ( i )  E ( i2 )   N N 1 N 2 E ( z )  2  zi  i     ij zi z j   n  i 1 i (# j ) i 1   2 n

where  ij is the probability of inclusion of ith and jth unit in the sample. This is called as second order inclusion probability.

Now Y 2   E ( zn ) 

2

1  2 n

  N   E    i zi      i 1

2

N N 1 N 2 2  z E ( ) zi z j E ( i ) E ( j )       i i  n 2  i 1  i (  j )1 j 1

N N  1 N 2 2   i  j zi z j  . z    2  i i n  i 1 i (  j ) 1 j 1 

Thus N N  1 N Var (YˆHT )  2    i zi2     ij zi z j  n  i 1 i (  j ) 1 j 1  N N  1 N  2    i2 zi2     i j zi z j  n  i 1 i (  j ) 1 j 1  N N  1 N  2    i (1   i ) zi2    ( ij   i i ) zi z j  n  i 1 i (  j ) 1 j 1  N N n 2 yi y j  n 2 yi2 1 N  2    i (1   i ) 2 2    ( ij   i i ) 2  N  i j  n  i 1 N  i i (  j ) 1 j 1

N N     1  N  1 i  2 ij i i y         i 2  i j N  i 1   i  i (  j ) 1 j 1 

   yi y j   

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 27


Estimate of variance n n      n 2 ij i j  (Yˆ )  1  yi (1   i )  Vˆ1  Var   2   HT  N 2  i 1 i (  j ) 1 j 1  i ij

 yi y j  .     i j 

This is an unbiased estimator of variance .

Drawback: It does not reduces to zero when all

yi

i

are same, i.e., when yi   i .

Consequently, this may assume negative values for some samples. A more elegant expression for the variance of yˆ HT has been obtained by Yates and Grundy.

Yates and Grundy form of variance Since there are exactly n values of  i which are 1 and ( N  n) values which are zero, so N

 i 1

i

 n.

Taking expectation on both sides N

 E ( )  n. i

i 1

Also 2

N N N  N  E    i    E ( i2 )    E ( i j ) i 1 i (  j ) 1 j 1  i 1  N

E  n    E ( i )  2

i 1

n2  n 

N

  E ( 

i (  j ) 1 j 1

  E ( 

N

  E ( 

i (  j ) 1 j 1

N

N

i (  j ) 1 j 1 N

N

i

J

i

J

i

J

) (using E ( i )  E ( i2 ))

)

)  n(n  1)

Thus E ( i j )  P( i  1,  j  1)  P ( i  1) P( j  1  i  1)  E ( i ) E ( j  i  1)

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 28


Therefore N

j (  i ) 1

 E ( i  j )  E ( i ) E ( j )  N

j (  i ) 1

 E ( i ) E ( j |  i  1)  E ( i ) E ( j ) 

 E ( i )

N

j (  i ) 1

 E ( j |  i  1)  E ( j ) 

 E ( i )  (n  1)  (n  E ( i )   E ( i ) 1  E ( i )    i (1   i )

(1)

Similarly N

i (  j ) 1

 E ( i  j )  E ( i ) E ( j )    j (1   j ).

(2)

We had earlier derived the variance of HT estimator as N N  1 N Var (YˆHT )  2    i (1   i ) zi2    ( ij   i j ) zi z j  n  i 1 i (  j ) 1 j 1 

Using (1) and (2) in this expression, we get N N N  1 N Var (YˆHT )  2    i (1   i ) zi2    j (1   j ) z 2j  2   ( i j   ij ) z i z j  2n  i 1 j 1 i  j 1 j 1 

 1  N  N  E ( i j )  E ( i ) E ( j )  zi2 2    2n  i 1  j ( i ) 1  N  N N n       E ( i j )  E ( i ) E ( j )  z 2j  2    E ( i ) E ( j )  E ( i j ) zi z j  j 1 i (  j ) 1 i (  j ) 1 j 1  

N N N N  N N  2 2       (    ) z (    ) z 2 ( ij   i i ) zi z j         ij i i i ij i i j i (  j ) 1 j 1 i (  j ) 1 j 1   i (  j ) 1 j 1 

1 2n 2

 1  N N ( i j   ij )( zi2  z 2j  2 zi z j )  . 2    2n  i (  j ) 1 j 1 

The expression for  i and  ij can be written for any given sample size.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 29


For example, for n  2 , assume that at the second draw, the probability of selecting a unit from the units available is proportional to the probability of selecting it at the first draw. Since E ( i )  Probability of selecting Yi in a sample of two  Pi1  Pi 2 where Pir is the probability of selecting Yi at r th draw (r  1, 2). If Pi is the probability of selecting the ith unit at first draw (i  1, 2,..., N ) then we had earlier derived that Pi1  Pi  yi is not selected   yi is selected at 2nd draw|  Pi 2  P  st  P st  at 1 draw   yi is not selected at 1 draw  N PP   j i j (  i ) 1 1  Pj N P P     j  i  Pi .  j 1 1  Pj 1  Pi  So N P P  E ( i )  Pi   j  i   Pi  j 1 1  Pj 1  Pi  Again E ( i j )  Probability of including both yi and y j in a sample of size two  Pi1 Pj 2|i  Pj1 Pi 2| j  Pi

Pj 1  Pi

 Pj

Pi 1  Pj

 1 1   =PP   Pi . i j  1  Pi 1  Pj 

Estimate of Variance The estimate of variance is given by

 (Yˆ )  1 Var HT 2n 2

n

 i j   ij ( zi z j ) 2 .  ij j 1 n



i( j )

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 30


Midzuno system of sampling: Under this system of selection of probabilities, the unit in the first draw is selected with unequal probabilities of selection (i.e., pps) and remaining all the units are selected with SRSWOR at all subsequent draws. Under this system E ( i )   i  P (unit i (U i ) is included in the sample)  P (U i is included in 1st draw) + P(U i is included in any other draw )  Probability that U i is not selected at the first draw and  Pi    is selected at any of subsequent ( n -1) draws   Pi  (1  Pi ) 

   

n 1 N 1

N n n 1 Pi  . N 1 N 1

Similarly, E ( i j )  Probability that both the units U i and U j are in the sample  Probability that U i is selected at the first draw and

   U is selected at any of the subsequent draws (n  1) draws   j 

 

 Probability that U j is selected at the first draw and

   U is selected at any of the subsequent (n  1) draws   i 

 

 Probability that neither U i nor U j is selected at the first draw but    both of them are selected during the subsequent (n  1) draws   

 

 Pi

(n  1)(n  2) n 1 n 1  Pj  (1  Pi  Pj ) ( N  1)( N  2) N 1 N 1

(n  1)  N  n n2  ( Pi  Pj )   ( N  1)  N  2 N  2 

 ij 

n 1  N  n n2  ( Pi  Pj )  .  N 1  N  2 N  2 

Similarly, E ( i j k )   ijk  Probability of including U i , U j and U k in the sample

n3  (n  1)(n  2)  N  n ( Pi  Pj  Pk )  .  N  3  ( N  1)( N  2)  N  3

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 31


By an extension of this argument, if U i , U j ,..., U r are the r units in the sample of size n(r  n), the probability of including these r units in the sample is E ( i j ... r )   ij ...r 

(n  1)(n  2)...(n  r  1)  N  n nr  ( Pi  Pj  ...  Pr )   N  r  ( N  1)( N  2)...( N  r  1)  N  r

Similarly, if U1 ,U 2 ,...,U q be the n units, the probability of including these units in the sample is E ( i j ... q )   ij ...q  

(n  1)(n  2)...1 ( Pi  Pj  ...  Pq ) ( N  1)( N  2)...( N  n  1)

1 ( Pi  Pj  ...  Pq )  N  1    n 1 

which is obtained by substituting r  n . Thus if Pi ' s are proportional to some measure of size of units in the population then the probability of selecting a specified sample is proportional to the total measure of the size of units included in the sample. Substituting these  i ,  ij ,  ijk etc. in the HT estimator, we can obtain the estimator of population’s mean and variance. In particular, an unbiased estimate of variance of HT estimator given by n n    i j ij  (Yˆ )  1 ( zi  z j ) 2 Var   HT 2 2n i  j 1 j 1  ij

where

 i j   ij 

N n ( N  1) 2

n 1   (1  Pi  Pj )  . i j  ( N  n) PP N 2 

The main advantage of this method of sampling is that it is possible to compute a set of revised probabilities of selection such that the inclusion probabilities resulting from the revised probabilities are proportional to the initial probabilities of selection. It is desirable to do so since the initial probabilities can be chosen proportional to some measure of size.

Sampling Theory| Chapter 7 | Varying Probability Sampling | Shalabh, IIT Kanpur

Page 32


Chapter 8 Double Sampling (Two Phase Sampling) The ratio and regression methods of estimation require the knowledge of population mean of auxiliary variable ( X ) to estimate the population mean of study variable (Y ). If information on the auxiliary variable is not available, then there are two options – one option is to collect a sample only on study variable and use sample mean as an estimator of population mean. An alternative solution is to use a part of the budget for collecting information on auxiliary variable to collect a large preliminary sample in which xi alone is measured. The purpose of this sampling is to furnish a good estimate of X . This method is appropriate when the information about xi is on file cards that have not been tabulated. After collecting a large preliminary sample of size n ' units from the population, select a smaller sample of size n from it and collect the information on y . These two estimates are then used to obtain an estimator of population mean Y . This procedure of selecting a large sample for collecting information on auxiliary variable x and then selecting a subsample from it for collecting the information on the study variable y is called double sampling or two phase sampling. It is useful when it is considerably cheaper and quicker to collect data on x than y and there is high correlation between x and y. In this sampling, the randomization is done twice. First a random sample of size n ' is drawn from a population of size N and then again a random sample of size n is drawn from the first sample of size n'.

So the sample mean in this sampling is a function of the two phases of sampling. If SRSWOR is utilized to draw the samples at both the phases, then -

number of possible samples at the first phase when a sample of size n is drawn from a

N population of size N is    M 0 , say.  n' -

number of possible samples at the second phase where a sample of size n is drawn from the first

 n ' phase sample of size n ' is    M1 , say. n

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 1


Population of X (N units)

Sample (Large) n ' units

M 0 samples

Subsample (small) n units

M1 samples

Then the sample mean is a function of two variables. If  is the statistic calculated at the second phase such that  ij , i  1, 2,..., M 0 , j  1, 2,..., M 1 with Pij being the probability that ith sample is chosen at first phase and jth sample is chosen at second phase, then E ( )  E1  E2 ( ) 

where E2 ( ) denotes the expectation over second phase and E1 denotes the expectation over the first phase. Thus M 0 M1

E ( )   Pij ij i 1 j 1

M 0 M1

  PP i j / i  ij

(using P( A  B)  P( A) P( B / A))

i 1 j 1 M0

M1

  Pi  Pj / i  ij i 1 j 1     st 1 stage

2nd stage

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 2


Variance of  Var ( )  E   E ( ) 

2

 E  (  E2 ( ))  ( E2 ( )  E ( )) 

2

 E   E2 ( )    E2 ( )  E ( )   0 2

2

 E1 E2   E2 ( )]2  [ E2 ( )  E ( ) 

2

 E1 E2   E2 ( )   E1 E2  E2 ( )  E ( )  2

2

 constant for E2  E1 V2 ( )   E1  E2 ( )  E1 ( E2 ( )) 

2

 E1 V2 ( )   V1  E2 ( )  Note: The two phase sampling can be extended to more than two phases depending upon the need and objective of the experiment. Various expectations can also be extended on the similar lines .

Double sampling in ratio method of estimation If the population mean

X is not known then double sampling technique is applied. Take a large

initial sample of size n ' by SRSWOR to estimate the population mean X as 1 n' Xˆ  x '   xi . n ' i 1

Then a second sample is a subsample of size n selected from the initial sample by SRSWOR. Let

y and x be the means of y and x based on the subsample. Then E ( x ')  X , E ( x )  X , E ( y )  Y . The ratio estimator under double sampling now becomes y YˆRd  x ' . x

The exact expressions for the bias and mean squared error of YˆRd are difficult to derive. So we find their approximate expressions using the same approach mentioned while describing the ratio method of estimation.

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 3


Let

0 

y Y xX x ' X , 1  , 2  Y X X

E ( 0 )  E (1 )  E ( 2 )  0 1 1  E (12 )     C x2 n N  1 E (1 2 )  2 E ( x  X )( x ' X ) X 1  2 E1  E2 ( x  X )( x ' X ) | n ' X 1  2 E1 ( x ' X ) 2  X 2 1 1S     x2  n' N  X 1 1     C x2  n' N   E ( 22 ).

1 Cov( y , x ') XY 1 1 Cov  E ( y | n '), E ( x ' | n ')   E Cov( y , x ') | n '  XY XY 1 1 Cov Y , X   E Cov( y ', x ')   XY XY 1  Cov  ( y ', x ' XY  1 1  S xy     n ' N  XY 1 1 S S    x y  n' N  X Y

E ( 0 2 ) 

1 1      CxC y  n' N  where y ' is the sample mean of y ' s based on the sample size n '.

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 4


1 Cov( y , x ) XY 1 1  S     xy n N  XY 1 1  S S    x y n N  X Y 1 1       Cx C y n N  1 E ( 02 )  2 Var ( y ) Y 1  2 V1  E2 ( y | n ')  E1 V2 ( yn | n ') Y  1 1   1   2 V1 ( yn' )  E1    s '2y  Y   n n '  

E ( 01 ) 

1 Y2

 1 1  2  1 1  2   n '  N  S y   n  n '  S y       2

1 1 S     y2  n N Y 1 1      C y2 n N  '2 where s y is the mean sum of squares of y based on initial sample of size n '.

1 Cov( x , x ') X2 1  2 Cov  E ( x | n '), E ( x ' | n ')  0  X 1  2 Var ( X ') X where Var ( X ') is the variance of mean of x based on initial sample of size n ' . E (1 2 ) 

Estimation error of YˆRd Write YˆRd as

(1   0 ) Y (1   2 ) X YˆRd  (1  1 ) X  Y (1   0 )(1   2 )(1  1 ) 1  Y (1   0 )(1   2 )(1  1  12  ...)  Y (1   0   2   0 2  1   o1  1 2  12 ) upto the terms of order two. Other terms of degree greater than two are assumed to be negligible. Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 5


Bias of YRd E (YˆRd )  Y 1  0  0  E ( 0 2 )  0  E ( 01 )  E (1 2 )  E (12 )  Bias(Yˆ )  E (Yˆ )  Y Rd

Rd

 Y  E ( 0 2 )  E ( 01 )  E (1 2 )  E (12 )   1 1  1 1  1 1 1 1    Y     CxC y      CxC y     Cx2     Cx2  n N   n' N  n N    n ' N  1 1   Y     Cx2   CxC y   n n' 1 1   Y    Cx (Cx   C y ).  n n' The bias is negligible if n is large and relative bias vanishes if C x2  C xy , i.e., the regression line passes through origin.

MSE of YˆRd : MSE (YˆRd )  E (YˆRd  Y ) 2`  Y 2 E ( 0   2  1 ) 2 (retaining the terms upto order two)  Y 2 E  02  12   22  2 0 2  2 01  21 2   Y 2 E  02  12   22  2 0 2  2 01  2 22   1 1  1 1 1 1   Y 2    C y2     C x2     n' N n N   n N 

 2 1 1  Cx  2     n' N

 1 1   CxC y  2    n N

    CxC y   

1 1  1 1  Y 2     C x2  C y2  2  C x C y   Y 2    C x (2  C y  C x ) n N   n' N   1 1  MSE (ratio estimator)  Y 2     2  C x C y  C x2  .  n' n 

The second term is the contribution of second phase of sampling. This method is preferred over ratio method if

2  CxC y  Cx2  0 or  

1 Cx 2 Cy

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 6


Choice of n and n ' Write V V' MSE (YˆRd )   n n'

where V and V ' contain all the terms containing n and n ' respectively. The cost function is C0  nC  n ' C ' where C and C ' are the costs per unit for selecting the samples

n and n ' respectively. Now we find the optimum sample sizes n and n ' for fixed cost C0 . The Lagrangian function is V V'    (nC  n ' C ' C0 ) n n'  V  0  C  2 n n  V'  0  C '  2 . n ' n'



Cn 2  V

Thus or

n

or

V C

 nC  VC .

Similarly

 n ' C '  V ' C '.

Thus



VC  V ' C ' C0

and so

Optimum n 

C0 V  nopt , say VC  V ' C ' C

C0 V' ' , say  nopt VC  V ' C ' C ' V V' Varopt (YˆRd )   ' nopt nopt Optimum n ' 

( VC  V ' C ') 2 C0

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 7


Comparison with SRS If X is ignored and all resources are used to estimate Y by y , then required sample size = Var ( y ) 

S y2

C0 / C

C0 . C

CS y2 C0

Relative effiiency =

CS y2 Var ( y )  2 Varopt (YˆRd ) ( VC  V ' C ')

Double sampling in regression method of estimation When the population mean of auxiliary variable X is not known, then double sampling is used as follows: -

A large sample of size n ' is taken from of the population by SRSWOR from which the population mean X is estimated as x ' , i.e. Xˆ  x '.

-

Then a subsample of size n is chosen from the larger sample and both the variables x and y are measured from it by taking x ' in place of X and treat it as if it is known.

Then E ( x ')  X , E ( x )  X , E ( y )  Y . The regression estimate of Y in this case is given by

Yˆregd  y  ˆ ( x ' x ) n

where

ˆ 

sxy sx2

 ( x  x )( y  y ) i

i 1

i

n

 (x  x ) i 1

2

is an estimator of  

S xy S x2

based on the sample of size n .

i

It is difficult to find the exact properties like bias and mean squared error of Yˆregd , so we derive the approximate expressions.

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 8


Let

y Y  y  (1   0 )Y Y xX 1   x  (1  1 ) X X x ' X 2   x '  (1   2 ) X X s S  3  xy xy  sxy  (1   3 ) S xy S xy

0 

4 

sx2  S x2  sx2  (1   4 ) S x2 2 Sx

E (1 )  0, E ( 2 )  0, E ( 3 )  0, E ( 4 )  0 Define

21  E ( x  X )2 ( y  Y )  30  E  x  X 

3

Estimation error: Then

Yˆregd  y  ˆ ( x ' x ) y

S xy (1   3 ) S x2 (1   4 )

yX

S xy S

2 x

( 2  1 ) X

(1   3 )( 2  1 )(1   4 )1

 y  X  (1   3 )( 2  1 )(1   4   42  ...) Retaining the powers of  ' s up to order two assuming  3  1, (using the same concept as detailed in the case of ratio method of estimation) Yˆregd  y  X  ( 2   2 3   2 4   1   1 3   1 4 ).

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 9


Bias: The bias of Yˆ upto the second order of approximation is regd E (Yˆregd )  Y  X   E ( 2 3 )  E ( 2 4 )  E (1 3 )  E (1 4 )  Bias (Yˆregd )  E (Yˆregd )  Y  1 1  1  X      n ' N  N

 ( x ' X )( sxy  S xy )       XS xy   

1 11     n' N  N

 ( x ' X )( sx2  S X2 )    XS X2  

1 1  1    n N N

 

1 1  1    n N N

 ( x  X )( sx2  S x2 )    XS x2  

 ( x  X )( sxy  S xy )   XS xy  

 1 1   1 1  1 1   1 1     X     21     302     21     302   n ' N  XS xy  n ' N  XS x  n N  XS xy  n N  XS x 

   1 1          21  302  .  n n '   S xy S x 

Mean squared error: MSE (Yˆregd )  E (Yregd  Y ) 2   y  ˆ ( x ' x )  Y 

2

 E ( y  Y )  X  (1   3 )( 2  1 )(1   4   42  ...) 

2

Retaining the powers of  ' s up to order two, the mean squared error up to the second order of approximation is

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 10


2 MSE (Yˆregd )  E  ( y  Y )  X  ( 2   2 3   2 4  1  1 3  1 4 ) 

 E ( y  Y ) 2  X 2  2 E (12   22  21 2 )  2 X  E[( y  Y )(1   2 )]  E ( y  Y ) 2  X 2  2 E (12   22  21 2 )  2 XY  E[ 0 (1   2 )] 2  1 1  S x2  1 1  S x2  1 1  Sx   Var ( y )  X     2     2  2    2   n' N  X n N  X   n N  X  1 1  S xy  1 1  S xy   2  XY         n ' N  XY  n N  XY  2

2

1 1  1 1   Var ( y )   2    S x2  2     S xy  n n'  n n' 1 1   Var ( y )       2 S x2  2  S xy   n n' 2  S  1 1   S xy 2  Var ( y )      4 S x  2 xy2 S xy   Sx  n n '   S x  2

1 1   1 1   S xy      S y2       n N   n n '   Sx  1 1  1 1      S y2      2 S y2 (using S xy   S x S y ) n N   n n' 

(1   2 ) S y2 n

 2 S y2 n'

. (Ignoring the finite population correction)

Clearly, Yˆregd is more efficient than sample mean SRS, i.e. when no auxiliary variable is used.

Now we address the issue that whether the reduction in variability is worth the extra expenditure required to observe the auxiliary variable. Let the total cost of survey is

C0  C1n  C2 n ' where C1 and C2 are the costs per unit observing the study variable y and auxiliary variable x respectively. Now minimize the MSE (Yˆregd ) for fixed cost C0

using Lagrangian function with Lagranagian

multiplier  as

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 11




S y2 (1   2 ) n

 2 S y2 n'

  (C1n  C2 n ' C0 )

 1  0   2 S y2 (1   2 )   C1  0 n n 1   0   2 S y2  2   C2  0 n' n '

Thus

n

and

n' 

S y2 (1   2 )

C1 Sy C2

.

Substituting these values in the cost function, we have

C0  C1n  C2 n '  C1

S y2 (1   2 ) C1

 C2

 2 S y2 C2

or C0   C1S y2 (1   2 )  C2  2 S y2 or  

2 1  2  . S C   S C   (1 ) 1 2 y y  C02 

Thus the optimum values of n and n ' are '  nopt

nopt 

 S y C0 C2  S y C1 (1   2 )   S y C2    C0 S y 1   2 C1  S y C1 (1   2 )   S y C2   

.

' The optimum mean squared error of Yˆregd is obtained by substituting n  nopt and n '  nopt as

MSE (Yˆregd )opt 

 

S y2 (1   2 )  C1 

C1S y2 (1   2 )   S y C2  

C0 S y2 (1   2 ) S y2  2 C2  S y 

C1 (1   2 )   S y C2    S y C0

2 1  S y C1 (1   2 )   S y C2   C0 

2 S y2  C1 (1   2 )   C2    C0 

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 12


The optimum variance of y under SRS for SRS where no auxiliary information is used is Var ( ySRS )opt 

C1S y2 C0

which is obtained by substituting   0, C2  0 in MSE (YˆSRS )opt . The relative efficiency is

Var ( ySRS )opt  RE  MSE (Yˆ ) regd opt

C1S y2 S y2  C1 (1   2 )   C2   

2

1

 C2  2  1     C1    1.

2

Thus the double sampling in regression estimator will lead to gain in precision if C1 2 .  C2 1  1   2  2  

Double sampling for probability proportional to size estimation: Suppose it is desired to select the sample with probability proportional to auxiliary variable x but information on x is not available. Then, in this situation, the double sampling can be used. An initial sample of size n ' is selected with SRSWOR from a population of size N , and information on x is collected for this sample. Then a second sample of size n is selected with replacement and with probability proportional to x from the initial sample of size n ' . Let x ' denote the mean of x for the initial sample of size n ' , Let x

and y denote means respectively of x and y for the second

sample of size n . Then we have the following theorem.

Theorem: (1) An unbiased estimator of population mean Y is given as

x' n  y  Yˆ  tot   i , n ' n i 1  xi  ' where xtot denotes the total for x in the first sample.

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 13


2

    N x y  1 1 ( ' 1) n  ˆ  i i   , where X tot and Ytot denote the totals  Y (2) Var (Y )     S y2   tot N ( N  1)nn ' i 1 X tot  xi   n' N  X   tot  of x and y respectively in the population. (3) An unbiased estimator of the variance of Yˆ is given by '2 ' n  ' n yi2 xtot  xtot ( A  B)  yi ˆ  1  (Yˆ )   1  1  1 Var x    Y     tot     n '(n  1)  n(n  1) i 1  n ' xi  n ' N  n( n ' 1)  i 1 xi  2

n  n y  y2 where A    i  and B   i2 i 1 xi  i 1 xi 

Proof. Before deriving the results, we first mention the following result proved in varying probability scheme sampling.

Result: In sampling with varying probability scheme for drawing a sample of size

n from a

population of size N and with replacement . (i) z 

1 n  zi is an unbiased estimator of population mean n i 1

y where zi 

yi , pi being the Npi

probability of selection of ith unit. Note that yi and pi can take anyone of the N values Y1 , Y2 ,..., YN with initial probabilities P1 , P2 ,..., PN , respectively.  1  N Yi 2 1 (ii) Var ( z )   N 2Y 2   2  2 nN  i 1 Pi  nN

2

Y  Pi  i  Y  . .  i 1  Pi  N

(iii) An unbiased estimator of variance of z is 2

(z )  Var

n  yi  1  z  ..   n( n  1) i 1  Npi 

Let E2 denote the expectation of Yˆ , when the first sample is fixed. The second is selected with probability proportional to x , hence using the result (i) with Pi 

xi , we find that ' xtot

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 14


 1 n y  Yˆ  E2    E2   i  n'  n i 1 n ' xi   '  xtot

    

 x' n  y  E2  tot   i  nn ' i 1  xi  y'

  

where y ' is the mean of y for the first sample. Hence

 

E (Yˆ )  E1  E2 Yˆ | n '     E1 ( yn ' )  Yˆ , which proves the part (1) of the theorem. Further,

     V ( y ')  E V Yˆ | n ' 1 1     S  E V Yˆ | n ' .  n' N 

Var (Yˆ )  V1 E2 Yˆ | n '  E1V2 Yˆ | n ' 1

1 2 2 y

1 2

Now, using the result (ii), we get

    n' x y 1 '  V2 Yˆ | n '  '2  ' i  i  ytot nn i 1 xtot  xi   x'   tot 

2

 

1  '2 nn

2

y y  xi x j  i  j  ,   xi x j  i 1 i  j   n'

n'

and hence

2

y y  1 n '( n ' 1) N n ' E1V2 Yˆ | n '  '2 xi x j  i  j  ,    xi x j  nn N ( N  1) i 1 i  j  

using the probability of a specified pair of units being selected in the sample is

n '(n ' 1) . So we can N ( N  1)

express

Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 15


2

 

1 n '( n ' 1) N E1V2 Yˆ | n '  '2  nn N ( N  1) i 1

    xi y  i  Ytot  . X tot  xtot  X   tot 

 

ˆ Substituting this in V2 Y | n ' , we get 2

N ( n ' 1) 1 1 Var (Yˆ )     S y2   nn ' N ( N  1) i 1  n' N 

    xi y  i  Ytot  . X tot  xi  X   tot 

This proves the second part (2) of the theorem. We now consider the estimation of Var (Yˆ ). Given the first sample, we obtain

 1 n y2  n' E2   i    yi2 ,  n i 1 pi  i 1 where pi 

xi . Also, given the first sample, ' xtot

2 n  1  yi   ˆ E2   Y    V2 (Yˆ )  E2 (Yˆ 2 )  y '2 .    n(n  1) i 1  n ' pi  

Hence 2 n    yi 1 ˆ ˆ 2 2 2  E2 Y  Y    y' .  n(n  1) i 1  n ' pi   

x' n  y Substituting Yˆ  tot   i n ' n i 1  xi

 xi  and pi  ' the expression becomes xtot 

  n y 2  n y 2   x '2 2 i i E2  '2       2    y ' ( 1) nn n x x    i 1 i   i 1 i    Using

 1 n yi2  n ' 2 E2      yi ,  n i 1 pi  i 1 we get Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 16


'2 1 n  n' x' xtot ( A  B)    yi2  n ' y '2 E2   yi2 tot  xi nn '(n  1)  n i 1  i 1

2

n  n yi  yi2 where A     , and B   2 which further simplifies to i 1 xi  i 1 xi 

 1  ' n y 2 xtot'2 ( A  B)  '2 i E2   xtot     s y , n '(n  1)   n(n ' 1)  i 1 xi where s '2y is the mean sum of squares of y for the first sample. Thus, we obtain '2  1  ' n yi2 xtot ( A  B)  '2 2  E1 E2  x  tot     E1 ( s y )  S y n '( n  1)    n( n ' 1)  i 1 xi

(1)

which gives an unbiased estimator of S y2 . Next, since we have 2

    N xi yi 1 ( n ' 1)   ,  E1V2 Yˆ | n '  Y  tot nn ' N ( N  1) i 1 X tot  xi  X   tot 

 

and from this result we obtain 2 ' n  1  yi xtot   ˆ E2   Y    V2 Yˆ | n ' .    n(n  1) i 1  n ' xi  

 

Thus 2 ' n N  1  xtot yi ˆ   ( n ' 1)  E1 E2  Y       n(n  1) i 1  n ' xi   nn ' N ( n  1) i 1

    xi y  i  Ytot  X tot  xi  X   tot 

2

(2)

when gives an unbiased estimator of 2

N (n ' 1)  nn ' N ( N  1) i 1

    xi y  i  Ytot  . X tot  xi  X   tot 

Using (1) and (2) an unbiased estimator of the variance of Yˆ is obtained as '2 ' n  ' n yi2 xtot  xtot ( A  B)  yi ˆ  1  (Yˆ )   1  1  1   Y  Var x    tot     n '( n  1)  n( n  1) i 1  n ' xi  n ' N  n( n ' 1)  i 1 xi 

2

Thus, the theorem is proved. Sampling Theory| Chapter 8 | Double Sampling | Shalabh, IIT Kanpur

Page 17


Chapter 9 Cluster Sampling It is one of the basic assumptions in any sampling procedure that the population can be divided into a finite number of distinct and identifiable units, called sampling units. The smallest units into which the population can be divided are called elements of the population. The groups of such elements are called clusters. In many practical situations and many types of populations, a list of elements is not available and so the use of an element as a sampling unit is not feasible. The method of cluster sampling or area sampling can be used in such situations. In cluster sampling -

divide the whole population into clusters according to some well defined rule.

-

Treat the clusters as sampling units.

-

Choose a sample of clusters according to some procedure.

-

Carry out a complete enumeration of the selected clusters, i.e., collect information on all the sampling units available in selected clusters.

Area sampling In case, the entire area containing the populations is subdivided into smaller area segments and each element in the population is associated with one and only one such area segment, the procedure is called as area sampling.

Examples: ď&#x201A;ˇ

In a city, the list of all the individual persons staying in the houses may be difficult to obtain or even may be not available but a list of all the houses in the city may be available. So every individual person will be treated as sampling unit and every house will be a cluster.

ď&#x201A;ˇ

The list of all the agricultural farms in a village or a district may not be easily available but the list of village or districts are generally available. In this case, every farm in sampling unit and every village or district is the cluster.

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 1


Moreover, it is easier, faster, cheaper and convenient to collect information

on clusters rather than on

sampling units. In both the examples, draw a sample of clusters from houses/villages and then collect the observations on all the sampling units available in the selected clusters.

Conditions under which the cluster sampling is used: Cluster sampling is preferred when (i)

No reliable listing of elements is available and it is expensive to prepare it.

(ii)

Even if the list of elements is available, the location or identification of the units may be difficult.

(iii)

A necessary condition for the validity of this procedure is that every unit of the population under study must correspond to one and only one unit of the cluster so that the total number of sampling units in the frame may cover all the units of the population under study without any omission or duplication. When this condition is not satisfied, bias is introduced.

Open segment and closed segment: It is not necessary that all the elements associated with an area segment need be located physically within its boundaries. For example, in the study of farms, the different fields of the same farm need not lie within the same area segment. Such a segment is called an open segment. In a closed segment, the sum of the characteristic under study, i.e., area, livestock etc. for all the elements associated with the segment will account for all the area, livestock etc. within the segment.

Construction of clusters: The clusters are constructed such that the sampling units are heterogeneous within the clusters and homogeneous among the clusters. The reason for this will become clear later. This is opposite to the construction of the strata in the stratified sampling. There are two options to construct the clusters â&#x20AC;&#x201C; equal size and unequal size. We discuss the estimation of population means and its variance in both the cases.

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 2


Case of equal clusters 

Suppose the population is divided into N clusters and each cluster is of size M .

Select a sample of n clusters from N clusters by the method of SRS, generally WOR.

So total population size = NM total sample size = nM . Let yij : Value of the characteristic under study for the value of j th element

( j  1, 2,..., M ) in the i th cluster

(i  1, 2,..., N ).

yi 

1 M

M

y j 1

ij

mean per element of i th cluster .

Population (NM units)

Cluster M units

Cluster M units

… … …

Cluster M units

Population N clusters

N Clusters

Cluster M units

Cluster M units

… … …

Cluster M units

Sample n clusters

n Clusters

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 3


Estimation of population mean: First select n clusters from N clusters by SRSWOR. Based on n clusters, find the mean of each cluster separately based on all the units in every cluster. So we have the cluster means as y1 , y2 ,..., yn . Consider the mean of all such cluster means as an estimator of

population mean as ycl 

1 n  yi . n i 1

Bias: 1 n  E ( yi ) n i 1

E ( ycl )  

1 n Y n i 1

(since SRS is used)

Y.

Thus ycl is an unbiased estimator of Y .

Variance: The variance of

ycl can be derived on the same lines as deriving the variance of sample mean in

SRSWOR. The only difference is that in SRSWOR, the sampling units are y1 , y2 ,..., yn whereas in case of ycl , the sampling units are y1 , y2 ,..., yn .

N n 2 N n 2    Note that is case of SRSWOR, Var ( y )  Nn S and Var ( y )  Nn s  , Var ( ycl )  E ( ycl  Y ) 2 

where

Sb2 

N n 2 Sb Nn

1 N  ( yi  Y )2 which is the mean sum of square between the cluster means in the N  1 i 1

population.

Estimate of variance: Using again the philosophy of estimate of variance in case of SRSWOR, we can find

 ( y )  N  n s2 Var cl b Nn where sb2 

1 n ( yi  ycl ) 2 is the mean sum of squares between cluster means in the sample .  n  1 i 1

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 4


Comparison with SRS : If an equivalent sample of nM units were to be selected from the population of NM units by SRSWOR, the variance of the mean per element would be

NM  nM S 2 . NM nM 2 f S  . n M N M 1 N -n where f  and S 2  ( yij  Y ) 2 .  N NM  1 i 1 j 1 Var ( ynM ) 

N n 2 Sb Nn f  Sb2 . n

Var ( ycl ) 

Also

Consider N

M

( NM  1) S 2   ( yij  Y ) 2 i 1 j 1

N

M

  ( yij  yi )  ( yi  Y ) 

2

i 1 j 1

N

M

N

M

  ( yij  yi ) 2   ( yi  Y ) 2 i 1 j 1

i 1 j 1

 N ( M  1) S  M ( N  1) Sb2 2 w

where S w2 

1 N

Si2 

1 M  ( yij  yi )2 is the mean sum of squares for the ith cluster. M  1 j 1

N

S i 1

2 i

is the mean sum of squares within clusters in the population

The efficiency of cluster sampling over SRSWOR is

E 

Var ( ynM ) Var ( ycl ) S2 MSb2

 N ( M  1) S w2  1   ( N  1)  .  2 ( NM  1)  M Sb  Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 5


Thus the relative efficiency increases when S w2 is large and Sb2 is small. So cluster sampling will be efficient if clusters are so formed that the variation the between cluster means is as small as possible while variation within the clusters is as large as possible.

Efficiency in terms of intra class correlation The intra class correlation between the elements within a cluster is given by



E ( yij  Y )( yik  Y ) E ( yij  Y )

2

;

1   1 M 1

N M M 1 ( yij  Y )( yik  Y )   MN ( M  1) i 1 j 1 k (  j ) 1  1 N M  ( yij  Y )2 MN i 1 j 1 N M M 1 ( yij  Y )( yik  Y )   MN ( M  1) i 1 j 1 k (  j )1   MN  1  2  S  MN  N

M

M

 

i 1 j 1 k (  j ) 1

( yij  Y )( yik  Y )

( MN  1)( M  1) S 2

.

Consider 2

1 M  ( yi  Y )     ( yij  Y )   i 1 i 1  M j 1  N  M 1 1    2  ( yij  Y ) 2  2 M i 1  M j 1 N

2

N

M

 

M

i 1 j 1 k (  j ) 1

N

 ( yij  Y )( yik  Y )  j 1 k (  j ) 1  M

M

 

N

N

M

i 1

i 1 j 1

( yij  Y )( yik  Y )  M 2  ( yi  Y ) 2   ( yij  Y ) 2

or

 ( MN  1)( M  1) S 2  M 2 ( N  1) Sb2  ( NM  1) S 2 or

Sb2 

( MN  1) 1   ( M  1) S 2 . M 2 ( N  1)

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 6


The variance of ycl now becomes

N n 2 Sb Nn N  n MN  1 S 2  1  ( M  1)  . Nn N  1 M 2

Var ( ycl ) 

For large N ,

MN  1 N n  1, N  1  N ,  1 and so MN N

Var ( ycl ) 

1 S2 1  ( M  1)  . nM

The variance of sample mean under SRSWOR for large N is Var ( ynM ) 

S2 . nM

The relative efficiency for large N is now given by E

Var ( ynM ) Var ( ycl ) S2 nM

S2 1  ( M  1)   nM 1 1 ;      1. 1  ( M  1)  M 1 

If M  1 then E  1, i.e., SRS and cluster sampling are equally efficient. Each cluster will consist of one unit, i.e., SRS.

If M  1, then cluster sampling is more efficient when E 1

If

or

( M  1)   0

or

  0.

  0, then E  1 , i.e., there is no error which means that the units in each cluster are arranged

randomly. So sample is heterogeneous. 

In practice,  is usually positive and  decreases as M increases but the rate of decrease in  is much lower in comparison to the rate of increase in M . The situation that   0 is possible when the nearby units are grouped together to form cluster and which are completely enumerated.

There are situations when   0.

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 7


Estimation of relative efficiency: The relative efficiency of cluster sampling relative to an equivalent SRSWOR is obtained as E

S2 . MSb2

An estimator of E can be obtained by substituting the estimates of S 2 and Sb2 . ycl 

Since

1 n  yi is the mean of n means yi from a population of N means yi , i  1, 2,..., N which n i 1

are drawn by SRSWOR, so from the theory of SRSWOR,

1 n  E ( sb2 )  E   ( yi  yc ) 2   n i 1  N 1 ( yi  Y ) 2   N  1 i 1  Sb2 . Thus sb2 is an unbiased estimator of Sb2 . Since sw2 

1 n 2 Si is the mean of n mean sum of squares Si2 drawn from the population of N mean  n i 1

sums of squares Si2 , i  1, 2,..., N , so it follows from the theory of SRSWOR that 1 n 1 1 n  1 n E ( sw2 )  E   Si2    E ( Si2 )    n i 1  N  n i 1  n i 1 1 N   Si2 N i 1

N

S i 1

2 i

  

 S w2 .

Thus sw2 is an unbiased estimator of S w2 . Consider S2 

N M 1 ( yij  Y ) 2  MN  1 i 1 j 1 N

M

or ( MN  1) S   ( yij  yi )  ( yi  Y )  2

2

i 1 j 1 N

M

  ( yij  yi ) 2  ( yi  Y ) 2  i 1 j 1

N

  ( M  1) Si2  M ( N  1) Sb2 i 1

 N ( M  1) S w2  M ( N  1) Sb2 .

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 8


An unbiased estimator of S 2 can be obtained as Sˆ 2 

1  N ( M  1) sw2  M ( N  1) sb2  . MN  1 

So

 ( y )  N  n s2 Var cl b Nn ˆ2 ( y )  N  n S Var nM Nn M n 1 where sb2   ( yi  ycl )2 . n  1 i 1

An estimate of efficiency E 

S2 is MSb2

N ( M  1) sw2  M ( N  1) sb2 . Eˆ  M ( NM  1) sb2 If N is large so that M ( N  1)  MN and MN  1  MN , then E

1  M  1  S w2   M  M  MSb2

and its estimate is 1  M  1  sw2 .  Eˆ   M  M  Msb2

Estimation of a proportion in case of equal cluster Now, we consider the problem of estimation of the proportion of units in the population having a specified attribute on the basis of a sample of clusters. Let this proportion be P . Suppose that a sample of n clusters is drawn from N clusters by SRSWOR. Defining yij  1 if the j th unit in the i th cluster belongs to the specified category (i.e. possessing the given attribute) and yij  0 otherwise, we find that

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 9


yi  Pi , 1 N  Pi  P, N i 1 MPQ i i , Si2  ( M  1)

Y 

N

S w2 

M  PQ i i i 1

N ( M  1) NMPQ , S2  NM  1) Sb2 

1 N  ( Pi  P)2 , N  1 i 1

1 N 2  Pi  NP 2    N  1  i 1 

N 1  N   Pi (1  Pi )   Pi  NP 2   ( N  1)  i 1 i 1 

where

,

N 1   NPQ PQ   i i,  ( N  1)  i 1 

Pi is the proportion of elements in the i th cluster, belonging to the specified category and

Qi  1  Pi , i  1, 2,..., N and Q  1  P. Then, using the result that ycl is an unbiased estimator of Y , we find that 1 n Pˆcl   Pi n i 1

is an unbiased estimator of P and N   NPQ PQ   i i  ( N  n)  i 1 . Var ( Pˆcl )  Nn ( N  1)

This variance of Pˆcl can be expressed as N  n PQ Var ( Pˆcl )  [1  ( M  1)  ], N  1 nM

where the value of 



can be obtained from

M ( N  1) Sb2  NS w2 ( M  1)( MN  1) S 2

and

( MN  1) S 2  N ( M  1) S w2  M ( N  1) Sb2

by substituting Sb2 , S w2 and S 2 in  , we obtain Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 10


N

1 M   1 ( M  1) N

 PQ i 1

i

PQ

i

.

The variance of Pˆcl can be estimated unbiasedly by  ( Pˆ )  N  n s 2 Var cl b nN n N n 1   ( Pi  Pˆcl )2 nN (n  1) i 1 

n N n  ˆ ˆ   nP Q PQ  cl cl i i  Nn(n  1)  i 1 

where Qˆ cl  I  Pˆcl . The efficiency of cluster sampling relative to SRSWOR is given by E 

M ( N  1) 1 ( MN  1) 1  ( M  1)   ( N  1) NPQ . N NM  1   i i  NPQ   PQ i 1  

If N is large, then E 

1 . M

An estimator of the total number of elements belonging to a specified category is obtained by multiplying

Pˆcl by NM , i.e. by NMPˆcl . The expressions of variance and its estimator are obtained by multiplying the corresponding expressions for Pˆcl by N 2 M 2 .

Case of unequal clusters: In practice, the equal size of clusters are available only when planned. For example,

in a screw

manufacturing company, the packets of screws can be prepared such that every packet contains same number of screws. In real applications, it is hard to get clusters of equal size. For example, the villages with equal areas are difficult to find, the districts with same number of persons are difficult to find, the number of members in a household may not be same in each household in a given area. Let there be N clusters and M i be the size of i th cluster, let

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 11


N

M0   Mi i 1

1 M N

N

M i 1

yi 

1 Mi

Y 

1 M0 N

 i 1

1 N

i

Mi

y

ij

j 1 N

: mean of i th cluster

Mi

 y i 1 j 1

ij

Mi yi M0 N

Mi

M i 1

yi

Suppose that n clusters are selected with SRSWOR and all the elements in these selected clusters are surveyed. Assume that M i ’s (i  1, 2,..., N ) are known.

Population

Cluster M1 units

Cluster M2 units

… … …

Cluster MN units

Population N clusters

N Clusters

Cluster M1 units

Cluster M2 units

… … …

Cluster Mn units

Sample n clusters

n Clusters

Based on this scheme, several estimators can be obtained to estimate the population mean. We consider four type of such estimators.

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 12


1. Mean of cluster means: Consider the simple arithmetic mean of the cluster means as

yc 

1 n  yi n i 1

E  yc  

1 N

N

y

i

i 1

N

 Y (where Y   i 1

Mi yi ). M0

The bias of yc is Bias  yc   E  yc   Y 

1 N



N

 Mi   yi i 1  0  N

 y   M i 1

i

M 1 N M i yi  0   M 0  i 1 N

N

 y  i 1

i

  N  N   M  i N    yi   1  i 1   i 1    M i yi    M 0  i 1 N     N 1   (M i  M )( yi  Y ) M 0 i 1  N 1     Smy  M0  Bias  yc   0 if M i and yi are uncorrelated .

The mean squared error is MSE  yc   Var  yc    Bias  yc  

2

2

N  n 2  N 1  2 Sb     S my Nn M 0   where Sb2 

1 N  ( yi  Y )2 N  1 i 1

S my 

1 N  (M i  M )( yi  Y ). N  1 i 1

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 13


An estimate of Var  yc  is   y   N  n s2 Var c b Nn

where sb2 

2 1 n yi  yc  .   n  1 i 1

2. Weighted mean of cluster means Consider the arithmetic mean based on cluster total as 1 n  M i yi nM i 1 1 n 1 E ( yc* )   E ( yi M i ) n i 1 M n 1 N   M i yi n M 0 i 1 yc* 

1 M0

N

Mi

 y i 1 j 1

ij

Y. Thus yc* is an unbiased estimator of Y . The variance of yc* and its estimate are given by

1 n M  Var ( yc* )  Var   i yi   n i 1 M  N  n *2  Sb Nn  ( y * )  N  n s*2 Var c b Nn where

Sb*2 

1 N  Mi  yi  Y    N  1 i 1  M 

sb*2 

1 n  Mi  yi  yc*    n  1 i 1  M 

2

2

E ( sb*2 )  Sb*2 .

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 14


Note that the expressions of variance of yc* and its estimate can be derived using directly the theory of SRSWOR as follows:

Let zi 

Mi 1 n yi , then yc*   zi  z . M n i 1

Since SRSWOR is followed, so Var ( yc* )  Var ( z ) 

N n 1 n  ( zi  Y )2 Nn N  1 i 1

N  n 1 N  Mi  yi  Y    Nn N  1 i 1  M  N  n *2  Sb . Nn

2

Since

 1 n  ( zi  z ) 2  E ( sb*2 )  E    n  1 i 1  2  1 n  Mi  *  E yi  yc        n  1 i 1  M

1 N  Mi    yi  Y  N  1 i 1  M

2

 Sb*2 So an unbiased estimator of variance can be easily derived.

3. Estimator based on ratio method of estimation Consider the weighted mean of the cluster means as n

y  ** c

M y i 1 n

i

M i 1

i

i

It is easy to see that this estimator is a biased estimator of population mean. Before deriving its bias and mean squared error, we note that this estimator can be derived using the philosophy of ratio method of estimation. To see this, consider the study variable U i and auxiliary variable Vi as

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 15


M i yi M M Vi  i i  1, 2,..., N M Ui 

N

1 N 1 V   Vi  N i 1 N n 1 u   ui n i 1

M i 1

M

i

1

1 n  vi . n i 1

v

The ratio estimator based on U and V is

u YˆR  V v n

u

i 1 n

i

v i 1

i

n

M i yi M  i 1n Mi  i 1 M

 n

M y i 1 n

i

 Mi

i

.

i 1

Since the ratio estimator is biased, so

yc** is also a biased estimator. The approximate bias and mean

squared errors of yc** can be derived directly by using the bias and MSE of ratio estimator. So using the results from the ratio method of estimation, the bias up to second order of approximation is given as follows N  n  Sv2 Suv   Bias ( y )   U Nn  V 2 UV  N  n  2 Suv    Sv  U Nn  U  ** c

where U 

1 N

N

U i  i 1

1 N  M i yi NM i 1

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 16


Sv2 

1 N  (Vi  V )2 N  1 i 1 2

1 N  Mi     1 N  1 i 1  M 1 N Suv   (U i  U )(Vi  V ) N  1 i 1 1 N  M i yi 1    N  1 i 1  M NM

 Ruv 

1 U U  V NM

 Mi

N

 M y   M i

i 1

i

  1 

N

M y . i 1

i

i

The MSE of yc** up to second order of approximation can be obtained as follows: MSE ( yc** ) 

N n 2  Su  R 2 Sv2  2 RSuv  Nn

1 N  M i yi 1  where S    N  1 i 1  M NM 2 u

 M i yi   i 1  N

2

Alternatively,

MSE ( yc** ) 

N n 1 N  U i  RuvVi  Nn N  1 i 1

2

N  n 1 N  M i yi  1    Nn N  1 i 1  M  NM

M  M i yi  i   i 1 M N

2

2

N   M i yi  2  N  n 1 N  Mi   i 1  .      yi  Nn N  1 i 1  M   NM   

An estimator of MSE can be obtained as 2

n  Mi  ** 2  ( y ** )  N  n 1 MSE  c   ( yi  yc ) . Nn n  1 i 1  M 

The estimator yc** is biased but consistent.

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 17


4. Estimator based on unbiased ratio type estimation Since yc 

1 1 n yi (where yi   Mi n i 1

Mi

 y ) is a biased estimator of population mean and i 1

ij

 N 1  Bias ( yc )     Smy  M0   N 1     Smy  NM  Since SRSWOR is used, so smy 

1 n  (M i  m)( yi  yc ), n  1 i 1

m

1 n  Mi n i 1

is an unbiased estimator of S my 

i.e.,

1 N  ( M i  M )( yi  Y ), N  1 i 1

E ( smy )  S my .

So it follow that  N 1  E ( yc )  Y     E ( smy )  NM 

or

  N 1   E  yc    smy   Y .  NM   

So  N 1  yc**  yc    smy  NM 

is an unbiased estimator of the population mean Y . This estimator is based on unbiased ratio type estimator. This can be obtained by replacing the study variable (earlier yi ) by

Mi yi and auxiliary variable (earlier xi ) by M

Mi . The exact variance of this M

estimate is complicated and does not reduces to a simple form. The approximate variance upto first order of approximation is N  M i 1   1 Var  y   yi  Y )      n( N  1) i 1  M   NM ** c

2

  yi  ( M i  M )  .  i 1   N

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 18


A consistent estimate of this variance is 2

n    M i    n  n M 1  1    y **    i yi  yc )     . Var yi   M i  i 1   c n(n  1) i 1  M n    nM i 1         ** ** The variance of ycc will be smaller than that of yc (based on the ratio method of estimation) provided

the regression coefficient of

M i yi M 1 on i is nearer to N M M

N

 yi than to i 1

1 M0

N

M y . i 1

i

i

Comparison between SRS and cluster sampling: In case of unequal clusters,

n

M i 1

i

is a random variable such that

 n  E   M i   nM .  i 1  Now if a sample of size nM is drawn from a population of size NM , then the variance of corresponding sample mean based on SRSWOR is NM  nM S 2 NM nM 2 N n S .  Nn M

Var ( ySRS ) 

This variance can be compared with any of the four proposed estimators. For example, in case of

yc* 

1 nM

n

M y i 1

Var ( yc* ) 

i

i

N  n *2 Sb Nn 2

N  n 1 N  Mi  yi  Y  .    Nn N  1 i 1  M  The relative efficiency of yc** relative to SRS based sample mean E 

Var ( ySRS ) Var ( yc* ) S2 . MSb*2

For Var ( yc* )  Var ( ySRS ), the variance between the clusters ( Sb*2 ) should be less. So the clusters should be formed in such a way that the variation between them is as small as possible. Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 19


Sampling with replacement and unequal probabilities (PPSWR) In many practical situations, the cluster total for the study variable is likely to be positively correlated with the number of units in the cluster. In this situation, it is advantageous to select the clusters with probability proportional to the number of units in the cluster instead of with equal probability, or to stratify the clusters according to their sizes and then to draw a SRSWOR of clusters from each of the stratum. We consider here the case where clusters are selected with probability proportional to the number of units in the cluster and with replacement. Suppose that n clusters are selected with ppswr, the size being the number of units in the cluster. Here Pi is the probability of selection assigned to the i th cluster which is given by Pi 

Mi Mi  , i  1, 2,..., N . M 0 NM

Consider the following estimator of the population mean: 1 n Yˆc   yi . n i 1

Then this estimator can be expressed as 1 N Yˆc    i yi n i 1

where  i denotes the number of times the

i th

cluster occurs in the sample. The random variables

1 ,  2 ,...,  N follow a multinomial probability distribution with E ( i )  nPi , Var ( i )  nPi (1  Pi ) Cov( i ,  j )  nPP i j , i  j. Hence, 1 N E (Yˆc )   E ( i ) yi n i 1 1 N   nPi yi n i 1 N M   i yi i 1 NM N

Mi

 y i 1 j 1

NM

ij

Y.

Thus Yˆc is an unbiased estimator of Y . Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 20


We now derive the variance of Yˆc . 1 N From Yˆc    i yi , n i 1 N  1 N Var (Yˆc )  2   Var ( i ) yi2   Cov( i ,  j ) yi y j  n  i 1 i j  N  1 N 2 (1 ) P P y PP y y       i i i i j i j n 2  i 1 i j  2  N   1 N  2   Pi yi2    Pi yi   n  i 1  i j    2 1 N  2  Pi  yi  Y  n i 1

1 nNM

N

 M (y Y ) . i 1

2

i

i

An unbiased estimator of the variance of Yˆc is  (Yˆ )  Var c

n 1 ( yi  Yˆc ) 2  n(n  1) i 1

which can be seen to satisfy the unbiasedness property as follows: Consider n  1  E ( yi  Yˆc ) 2    n(n  1) i 1 

 1  n  E ( yi2  nYˆc ) 2       n(n  1)  i 1 

1   n ˆ 2 2  E    i yi   nVar (Yc )  nY  n(n  1)   i 1  

where E ( i )  nPi , Var ( i )  nPi (1  Pi ), Cov ( i ,  j )   nPP i j ,i  j n  1  1 N 1 N  2 ( yi  Yˆc ) 2   E n P y n Pi ( yi  Y ) 2  nY 2      i i i  n i 1   n(n  1) i 1  n(n  1)  i 1 N N 1  1   Pi ( yi2  Y 2)   Pi ( yi  Y ) 2    (n  1)  i 1 n i 1 

1 N 1 N  2 ( )   P y Y Pi ( yi  Y ) 2    i i  (n  1)  i 1 n i 1 

1 N Pi ( yi  Y ) 2  (n  1) i 1  Var (Yˆ ). 

c

Sampling Theory| Chapter 9 | Cluster Sampling | Shalabh, IIT Kanpur

Page 21


Chapter 10 Two Stage Sampling (Subsampling) In cluster sampling, all the elements in the selected clusters are surveyed. Moreover, the efficiency in cluster sampling depends on size of the cluster. As the size increases, the efficiency decreases. It suggests that higher precision can be attained by distributing a given number of elements over a large number of clusters and then by taking a small number of clusters and enumerating all elements within them. This is achieved in subsampling.

In subsampling -

divide the population into clusters.

-

Select a sample of clusters [first stage}

-

From each of the selected cluster, select a sample of specified number of elements [second stage]

The clusters which form the units of sampling at the first stage are called the first stage units and the units or group of units within clusters which form the unit of clusters are called the second stage units or subunits.

The procedure is generalized to three or more stages and is then termed as multistage sampling.

For example, in a crop survey -

villages are the first stage units,

-

fields within the villages are the second stage units and

-

plots within the fields are the third stage units.

In another example, to obtain a sample of fishes from a commercial fishery -

first take a sample of boats and

-

then take a sample of fishes from each selected boat.

Two stage sampling with equal first stage units: Assume that -

population consists of NM elements.

-

NM elements are grouped into N first stage units of M second stage units each, (i.e., N

clusters, each cluster is of size M ) -

Sample of n first stage units is selected (i.e., choose n clusters)

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 1


-

Sample of m second stage units is selected from each selected first stage unit (i.e., choose m units from each cluster).

-

Units at each stage are selected with SRSWOR.

Cluster sampling is a special case of two stage sampling in the sense that from a population of N clusters of equal size m = M , a sample of n clusters are chosen. If further M= m= 1, we get SRSWOR. If n = N , we have the case of stratified sampling.

yij : Value of the characteristic under study for the j th second stage units of the i th first

stage

unit; i 1,= = 2,..., N ; j 1, 2,.., m.

Yi =

= Y

yi =

= y

1 M

M

∑y j =1

ij

: mean per 2nd stage unit of i th 1st stage units in the population.

1 N M 1 N = y ∑∑ ij N= ∑ yi YMN : mean per second stage unit in the population MN =i 1 =j 1 =i 1

1 m yij : mean per second stage unit in the i th first stage unit in the sample. ∑ m j =1

1 n m 1 n = y ∑∑ ij n= ∑ yi ymn : mean per second stage in the sample. mn=i 1 =j 1 =i 1

Advantages: The principle advantage of two stage sampling is that it is more flexible than the one stage sampling. It reduces to one stage sampling when m = M but unless this is the best choice of

m , we have the

opportunity of taking some smaller value that appears more efficient. As usual, this choice reduces to a balance between statistical precision and cost. When units of the first stage agree very closely, then consideration of precision suggests a small value of m . On the other hand, it is sometimes as cheap to measure the whole of a unit as to a sample. For example, when the unit is a household and a single respondent can give as accurate data as all the members of the household.

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 2


A pictorial scheme of two stage sampling scheme is as follows:

Population (MN units)

Cluster M units

Cluster M units

… … …

Cluster M units

Population N clusters (large in number)

N clusters

Cluster M units

Cluster M units

… … …

Cluster M units

n clusters

Cluster m units

Cluster m units

… … …

Cluster m units

mn units

First stage sample n clusters (small in number)

Second stage sample m units n clusters (large number of elements from each cluster)

Note: The expectations under two stage sampling scheme depend on the stages. For example, the expectation at second stage unit will be dependent on first stage unit in the sense that second stage unit will be in the sample provided it was selected in the first stage.

To calculate the average -

First average the estimator over all the second stage selections that can be drawn from a fixed set of n units that the plan selects.

-

Then average over all the possible selections of n units by the plan.

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 3


In case of two stage sampling, E (θˆ) =

E1[ E2 (θˆ)]

average over

average over all

average over

all samples

1st stage samples

selections from a fixed set of units

all possible 2nd stage

In case of three stage sampling,

{

}

E (θˆ) = E1  E2 E3 (θˆ)  .  

To calculate the variance, we proceed as follows: In case of two stage sampling, Var (= θˆ) E (θˆ − θ ) 2 = E E (θˆ − θ ) 2 1

2

Consider 2 E2 (θˆ − θ )= E2 (θˆ 2 ) − 2θ E2 (θˆ) + θ 2

{

}

. 2 =  E2 (θˆ + V2 (θˆ)  − 2θ E2 (θˆ) + θ 2  

Now average over first stage selection as 2

ˆ − θ ) 2 E  E (θˆ)  + E V (θˆ)  − 2θ E E (θˆ) + E (θ 2 ) E1 E2 (θ= 1 2 1 2 1 2 1   2 = E1  E2 (θˆ) − θ 2  + E1 V2 (θˆ)    Var (θˆ) V1  E2 (θˆ)  + E1 V2 (θˆ)  . =

{

}

In case of three stage sampling,

{

}

{

}

{

}

Var (θˆ) = V1  E E3 (θˆ)  + E1 V2 E3 (θˆ)  + E1  E2 V3 (θˆ)  .      2 

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 4


Estimation of population mean: Consider y = ymn as an estimator of the population mean Y .

Bias: Consider E ( y ) = E1 [ E2 ( ymn ) ] = E1  E2 ( yim i ) 

(as 2nd stage is dependent on 1st stage)

= E1  E2 ( yim i ) 

(as yi is unbiased for Yi due to SRSWOR)

1 n  = E1  ∑ Yi   n i =1  1 N = ∑ Yi N i =1 =Y . Thus ymn is an unbiased estimator of the population mean.

Variance Var ( y = ) E1 V2 ( y i )  + V1 [ E2 ( y / i ) ]  1 n  1 n   E1 V2  ∑ yi i  + V1  E2  ∑ yi / i  =     n i 1=  n i 1 1 n  1 n  E1  2 ∑ V ( yi i )  + V1  ∑ E2 ( yi / i )  =  n i 1=  n i 1  n n 1 1  1 1   = E1  2 ∑  −  Si2  + V1  ∑ Yi  m M   = n i 1   n i 1= 1 n 1 1  − E1 ( Si2 ) + V1 ( yc ) 2 ∑ n i =1  m M                                        (where yc is based on cluster means as in cluster sampling)

=

N −n 2 1 1  n  −  S w2 + Sb Nn m M  1 1 1  2 1 1  2 =  −  S w +  −  Sb nm M  n N  =

1 n2

where = S w2

N M 1 N 2 1 = S (Yij − Yi ) ∑ i N (M − 1)=∑∑ N =i 1 i 1 =j 1

2

1 N (Yi − Y ) 2 = S ∑ N − 1 i =1 2 b

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 5


Estimate of variance An unbiased estimator of variance of

y can be obtained by replacing Sb2 and S w2 by their unbiased

estimators in the expression of variance of y .

Consider an estimator of S w2 =

1 N

N

∑S i =1

2 i

1 M ∑ ( yij − Yi ) M − 1 j =1

where Si2 =

2

1 n 2 ∑ si n i =1 1 m where si2 = ∑ ( yij − yi )2 . m − 1 j =1 as

sw2 =

So

E ( sw2 ) = E1 E2 ( sw2 i ) 1 n  = E1 E2  ∑ si2 i   n i =1  n 1 = E1 ∑  E2 ( si2 i )  n i =1 = E1 =

1 n 2 ∑ Si n i =1

(as SRSWOR is used)

1 n E1 ( Si2 ) ∑ n i =1

1 N  1 N 2 Si  ∑  N ∑ N i 1= = i 1 

=

=

1 N

=S

N

∑S i =1

2 i

2 w

so sw2 is an unbiased estimator of S w2 .

Consider = sb2

1 n ( yi − y ) 2 ∑ n − 1 i =1

as an estimator of = Sb2

1 N (Yi − Y ) 2 . ∑ N − 1 i =1

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 6


So E ( sb2 ) =

1  n  E  ∑ ( yi − y ) 2  n − 1  i =1 

 n  (n − 1) E ( sb2 ) = E  ∑ yi2 − ny 2   i =1  n   = E  ∑ yi2  − nE ( y 2 )  i =1    n  2 = E1  E2  ∑ yi2   − n Var ( y ) + { E ( y )}      i =1    1 1    n   1 1 1 = E1  ∑ E2 ( yi2 ) i )  − n  −  Sb2 +  −  S w2 + Y 2  m M n  i =1   n N  

{

}

 1 1    n 2   1 1 1 = E1  ∑ Var ( yi ) + ( E ( yi )  − n  −  Sb2 +  −  S w2 + Y 2  m M n  i =1   n N    n  1 1 = E1  ∑  −  i =1  m M

 1 1  2  1 1  2 2   Si + Yi  − n  −  Sb +  −  m M   n N 

1  n  1 1 = nE1  ∑  −  n  i =1  m M

1 2 2  Sw + Y  n 

 1 1  2  1 1  2 2  Si + Yi  − n  −  Sb +  −  m M   n N 

1 2 2  Sw + Y  n 

1 N  1 1  1 N   1 1    1 1 1 = n  −  ∑ Si2 + ∑ Yi 2  − n  −  Sb2 +  −  S w2 + Y 2  M  N i 1= N i1  m M n  m =  n N   N 1  1 1    1 1    1 1 1 = n  −  S w2 + ∑ Yi 2  − n  −  Sb2 +  −  S w2 + Y 2  N i =1  m M n  m M   n N   1 1 (n − 1)  − = m M 1 1 (n − 1)  − = m M

 2 n N 2 1 1  2 2  S w + ∑ Yi − nY − n  −  Sb N i =1  n N   2 nN 2 1 1  2 2  S w +  ∑ Yi − NY  − n  −  Sb N  i =1  n N   n 1 1  1 1  = (n − 1)  −  S w2 + ( N − 1) Sb2 − n  −  Sb2 N m M  n N  1 1  = (n − 1)  −  S w2 + (n − 1) Sb2 . m M  1 1  2 2 ⇒ E ( sb2 ) =  −  S w + Sb m M   1 1   or E  sb2 −  −  sw2  = Sb2 . m M    

Thus  ( y )= 1  1 − 1  Sˆ 2 +  1 − 1  Sˆ 2 Var   ω   b nm M  n N  1  1 1  2  1 1  2  1 1 =  −  sw +  −   sb −  − nm M   n N  m M =

11 1  − N m M

 2  sw   

 2 1 1  2  sw +  −  sb .  n N 

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 7


Allocation of sample to the two stages: Equal first stage units: The variance of sample mean in the case of two stage sampling is  ( y )= 1  1 − 1  S 2 +  1 − 1  S 2 . Var   w   b nm M  n N  Sb2 , S w2 , n and m. So the cost of survey of units in the two stage sample depends on

It depends on n and m.

Case 1. When cost is fixed We find the values of n and m so that the variance is minimum for given cost.

(I) When cost function is C = kmn Let the cost of survey be proportional to sample size as C = knm

where C is the total cost and k is constant. When cost is fixed as C = C0 . Substituting m = Var ( y = )

1  2 S w2  Sb2 1 kn 2 Sw +  Sb −  − n M  N n C0

=

1  2 S w2   Sb2 kS w2  .  Sb − − − n M   N C0 

C0 in Var ( y ), we get kn

 S2  This variance is monotonic decreasing function of n if  Sb2 − w  > 0. M  

The variance is minimum

when n assumes maximum value, i.e., nˆ

C0 = corresponding to m 1. k

 S2  If  Sb2 − w  < 0 (i.e., intraclass correlation is negative for large N ) , then the variance is a monotonic M   increasing function of n ,

It reaches minimum when n assumes the minimum value, i.e.,

nˆ =

C0 kM

(i.e., no subsampling).

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 8


(II) When cost function is = C k1n + k2 mn Let cost C be fixed as C= k1n + k2 mn where k1 and k2 are positive constants. The terms k1 and k2 0 denote the costs of per unit observations in first and second stages respectively. Minimize the variance of sample mean under the two stage with respect to m subject to the restriction C= k1n + k2 mn . 0

We have    S2  S2  S2  k S2 C0 Var ( y ) + b  = k1  Sb2 − w  + k2 S w2 + mk2  Sb2 − w  + 1 w . N M M m     S2  When  Sb2 − w  > 0, then M   S2  C0 Var ( y ) + b=  N 

2

     S2  S2  k S2   k1  Sb2 − w  + k2 S w2  +  mk2  Sb2 − w  − 1 w  M M m       

2

which is minimum when the second term of right hand side is zero. So we obtain S w2 k1 . mˆ = k2  2 S w2   Sb −  M  The optimum n follows from C= k1n + k2 mn as 0 nˆ =

C0 . k1 + k2 mˆ

 S2  When  Sb2 − w  ≤ 0 then M     S2  S2  S2  k S2 C0 Var ( y ) + b  = k1  Sb2 − w  + k2 S w2 + mk2  Sb2 − w  + 1 w N M M m    is minimum if

m

is the greatest attainable integer. Hence

C0 ≥ k1 + k2= M ; mˆ M = and nˆ ˆ If C0 ≥ k1 + k= 2 M ; then m

If N is large, then

in this case,

when

C0 . k1 + k2 M

C0 − k1 = and nˆ 1. k2

S w2 ≈ S 2 (1 − ρ )

S w2 − mˆ ≈

S w2 ≈ ρS2 M k1  1   − 1 . k2  ρ 

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 9


Case 2: When variance is fixed Now we find the sample sizes when variance is fixed, say as V0 . 1 1 1  2 1 1  2  −  S w +  −  Sb nm M  n N  1 1  Sb2 +  −  S w2 m M ⇒n=  Sb2 V0 + N

V0=

So  2 S w2  Sb − M C kmn = = km  2  V + Sb  0 N 

  kS w2 . + 2  V + Sb  0 N 

 2 S w2  If  Sb −  > 0, C attains minimum when m assumes the smallest integral value, i.e., 1. M   S2  If  Sb2 − w  < 0 , C attains minimum when mˆ = M . M 

Comparison of two stage sampling with one stage sampling One stage sampling procedures are comparable with two stage sampling procedures when either (i) sampling mn elements in one single stage or (ii) sampling

mn first stage units as cluster without sub-sampling. M

We consider both the cases.

Case 1: Sampling mn elements in one single stage The variance of sample mean based on - mn elements selected by SRSWOR (one stage) is given by 1  2  1 V ( y= − SRS )  S  mn MN 

- two stage sampling is given by V ( yTS )=

1 1 1  − nm M

 2 1 1  2  S w +  −  Sb .  n N 

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 10


The intraclass correlation coefficient is 2  N − 1  2 Sw 2 2   Sb − 1 N  M M ( N − 1) Sb − NS w  = = ;− ≤ ρ ≤1 ρ 2 ( MN − 1) S M −1  NM − 1  2  S  NM 

(1)

and using the identity N

M

)2 ∑∑ ( yij − Y=

=i 1 =j 1

N

M

N

M

∑∑ ( yij − Yi )2 + ∑∑ (Yi − Y )2

=i 1 =j 1

=i 1 =j 1

( NM − 1) S =( N − 1) MS + N ( M − 1) S w2 2

where Y =

2 b

(2)

1 N M 1 M , = y Y yij . ∑∑ ij i M =∑ MN =i 1 =j 1 j 1

Now we need to find Sb2 and S w2 from (1) and (2) in terms of S 2 . From (1), we have  MN − 1   N −1  2 2 − S w2 =  MS ρ +   MSb . N N    

(3)

Substituting it in (2) gives  N − 1   MN − 1  2 2  ( NM − 1) S 2 =( N − 1) MSb2 + N ( M − 1)   MSb −   MS ρ   N   N   2 2 2 = ( N − 1) MSb + ( M − 1)( N − 1) Sb − ρ M ( M − 1)( MN − 1) S = ( N − 1) MSb2 [1 + ( M − 1)] − ρ M ( M − 1)( MN − 1) S 2 = ( N − 1) MSb2 − ρ M ( M − 1)( MN − 1) S 2 = ⇒ Sb2

( MN − 1) S 2 [1 + ( M − 1) ρ ] M 2 ( N − 1)

Substituting it in (3) gives N ( M − 1) S w2= ( NM − 1) S 2 − ( N − 1) MSb2  ( MN − 1) S 2  = ( NM − 1) S 2 − ( N − 1) M  2 [1 + ( M − 1) ρ ]   M ( N − 1)   M − 1 − ( M − 1) ρ  = ( NM − 1) S 2   M  = ( NM − 1) S 2 ( M − 1)(1 − ρ )  MN − 1  2 S w2  ⇒=  S (1 − ρ ).  MN 

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 11


Substituting Sb2 and S w2 in Var ( yTS ) 2 m(n − 1) M − m   MN − 1  S  N −n m = V ( yTS )  1− +ρ ( M − 1) −  .   M   MN  mn  M ( N − 1)  N −1 M

When subsampling rate

m is small, MN − 1 ≈ MN and M − 1 ≈ M , then M

S2 mn S2   N −n  V ( yTS ) =1 + ρ  m − 1  . mn   N −1  V ( ySRS ) =

The relative efficiency of the two stage in relation to one stage sampling of SRSWOR is

Var ( yTS )  N −n  RE = m − 1 . = 1+ ρ  Var ( ySRS )  N −1  If N − 1 ≈ N and finite population correction is ignorable, then

N −n N −n ≈ ≈ 1, then N −1 N

RE = 1 + ρ (m − 1).

Case 2: Comparison with cluster sampling Suppose a random sample of

mn clusters, without further subsampling is selected. M

The variance of the sample mean of equivalent mn / M clusters is M 1 2 Var (= ycl )  −  Sb .  mn N 

The variance of sample mean under the two stage sampling is Var ( yTS )=

1 1 1  − nm M

 2 1 1  2  S w +  −  Sb .  n N 

So Var ( ycl ) exceedes Var ( yTS ) by 1M  2 1 2   − 1   Sb − S w  n m M  

which is approximately  2 S w2  1M  2 for large and N 1 S − ρ  Sb −  > 0.   M n m   MN − 1 S 2 [1 + ρ ( M − 1)] M ( N − 1) M MN − 1 2 = S w2 S (1 − ρ ) MN = Sb2 where

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 12


So smaller the m / M , larger the reduction in the variance of two stage sample over a cluster sample.

 S2  When  Sb2 − w  < 0 then the subsampling will lead to loss in precision. M 

Two stage sampling with unequal first stage units: Consider two stage sampling when the first stage units are of unequal size and SRSWOR is employed at each stage. Let yij : value of j th second stage unit of the i th first stage unit.

M i : number of second stage units in i th first stage units (i = 1, 2,..., N ) . N

M 0 = ∑ M i : total number of second stage units in the population. i =1

mi : number of second stage units to be selected from i th first stage unit, if it is in the sample. n

m0 = ∑ mi : total number of second stage units in the sample. i =1

yi ( mi ) = Yi = Y =

1 mi

mi

∑y j =1

Mi

1 Mi

∑y j =1

ij

1 N = ∑ yi N i =1 N

ij

Mi

∑∑ yij =i 1 =j 1 Y = = N ∑ Mi

YN N

∑M Y

i i

1 = MN N

i =1

N

∑u Y i =1

i i

i =1

Mi M 1 N M = ∑ Mi N i =1

ui =

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 13


The pictorial scheme of two stage sampling with unequal first stage units case is as follows:

Population (MN units)

Cluster M1 units

Cluster M2 units

… … …

Cluster MN units

Population N clusters

N clusters

Cluster M1 units

Cluster m1 units

Cluster M2 units

Cluster m2 units

… … … n clusters

… … …

Cluster Mn units

Cluster mn units

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

First stage sample n clusters (small)

Second stage sample n clusters (small)

Page 14


Now we consider different estimators for the estimation of population mean.

1. Estimator based on the first stage unit means in the sample: 1 n ˆ = Y y= ∑ yi ( mi ) S2 n i =1

Bias: 1 n  E ( yS 2 ) = E  ∑ yi ( mi )   n i =1  n 1  = E1  ∑ E2 ( yi ( mi ) )   n i =1  1 n  = E1  ∑ Yi   n i =1  1 N = ∑ Yi N i =1

[Since a sample of size mi is selected out of M i units by SRSWOR]

=YN ≠ Y. So yS 2 is a biased estimator of Y and its bias is given by Bias= ( yS 2 ) E ( yS 2 ) − Y 1 N 1 N Y − M iYi ∑ i NM ∑ N i 1= = i 1

=

1 N 1  N  N  = −  ∑ M iYi −  ∑ Yi  ∑ M i   NM N  i 1= = =  i 1  i 1 N 1 = ∑ (M i − M )(Yi − YN ). NM i =1 This bias can be estimated by n N −1 (y ) = Bias − ∑ (M i − m)( yi ( mi ) − yS 2 ) S2 NM (n − 1) i =1

which can be seen as follows: N −1  1 n   ( y ) = E  Bias E1  E2 {( M i − m)( yi ( mi ) − yS 2 ) / n} − ∑ 2 S   NM  n − 1 i =1  N −1  1 n  E ( M i − m)(Yi − yn )  = − ∑ NM  n − 1 i =1  1 = − NM

N

∑ (M i =1

i

− M )(Yi − YN )

= YN − Y

where yn =

1 n ∑ Yi . n i =1

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 15


An unbiased estimator of the population mean Y is thus obtained as yS 2 +

N −1 1 n ∑ (M i − m)( yi ( mi ) − yS 2 ) . NM n − 1 i =1

Note that the bias arises due to the inequality of sizes of the first stage units and probability of selection of second stage units varies from one first stage to another.

Variance: Var ( yS= Var  E ( yS 2 n)  + E Var ( yS 2 n)  2) 1 n  1 n  = Var  ∑ yi  + E  2 ∑ Var ( yi ( mi ) i )  =  n i 1=  n i 1  1 1 1  2 =  −  Sb + E  2 n N  n

n

 1

∑ m i =1

i

1 Mi

1 1 1  2 1 N  1 =  − ∑  −  Sb + Nn i =1  mi M i n N 

(

1 N where S = ∑ Yi − YN N − 1 i =1 2 b

)

 2  Si   

 2  Si 

2

2

1 Mi = S ∑ ( yij − Yi ) . M i − 1 j =1 2 i

The MSE can be obtained as MSE = ( yS 2 ) Var ( yS 2 ) + [ Bias ( yS 2 ) ] . 2

Estimation of variance: Consider mean square between cluster means in the sample = sb2

(

)

2 1 n yi ( mi ) − yS 2 . ∑ n − 1 i =1

It can be shown that

1 E ( sb2 ) = Sb2 + N Also si2 =

N

 1

i =1

∑ m

i

1 Mi

 2 Si . 

1 mi ( yij − yi ( mi ) ) 2 ∑ mi − 1 j =1

1 Mi ) S= ( yij − Yi ) 2 E( s = ∑ M i − 1 j =1 2 i

2 i

1 n  1 1  2 1 N  1 1  2 So E  ∑  −  si  =  −  Si . ∑ n m M N m M 1 1 = i i i  i   i  i   Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 16


Thus 1 n  1 1  2 E ( sb2 ) = Sb2 + E  ∑  −  si  n m M 1 = i i   i  

and an unbiased estimator of Sb2 is

1 n  1 1  2 2 2 ˆ Sb = sb − ∑  −  si . n i =1  mi M i  So an estimator of the variance can be obtained by replacing Sb2 and Si2 by their unbiased estimators as

1  1 1  ˆ2 1 N  1 (y ) = Var  − ∑ S2  −  Sb + Nn i =1  mi M i n N 

 ˆ2 Si . 

2. Estimation based on first stage unit totals: 1 n M i yi ( mi ) * = Yˆ y= ∑ S2 n i =1 M 1 n = ∑ ui yi ( mi ) n i =1 where ui =

Mi . M

Bias 1 n  E ( yS* 2 ) = E  ∑ ui yi ( mi )   n i =1  n 1  = E  ∑ ui E2 ( yi ( mi ) i )   n i =1  n 1  = E  ∑ uiYi   n i =1  =

1 N

N

∑u Y i =1

i i

=Y. Thus yS* 2 is an unbiased estimator of Y .

Variance: = Var ( yS* 2 ) Var  E ( yS* 2 n)  + E Var ( yS* 2 n)  1 n  1 n  = Var  ∑ uiYi  + E  2 ∑ ui2Var ( yi ( mi ) i )  =  n i 1=  n i 1   1 1  *2 1 =−   Sb + nN n N 

N

∑u i =1

2 i

 1 1  2  −  Si  mi M i 

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 17


where Si2 =

1 Mi ( yij − Yi ) 2 ∑ M i − 1 j =1

= Sb*2

1 N (uiYi − Y ) 2 . ∑ N − 1 j =1

3. Estimator based on ratio estimator: n

** = Yˆ y= S2

∑M y

i ( mi )

i

i =1

n

∑M i =1

i

n

=

∑u y i

i =1

n

∑u i =1

= where = ui

i ( mi )

i

yS* 2 un

Mi 1 n , un = ∑ ui . M n i =1

This estimator can be seen as if arising by the ratio method of estimation as follows: Let yi* = ui yi ( mi ) = xi*

Mi = , i 1, 2,..., N M

be the values of study variable and auxiliary variable in reference to the ratio method of estimation. Then 1 n * = yi yS* 2 ∑ n i =1 1 n * = x* = ∑ xi un n i =1 1 N * = X* = ∑ X i 1. N i =1 = y*

The corresponding ratio estimator of Y is yS* 2 y* = YˆR = X* = 1 yS**2 . x* un So the bias and mean squared error of yS**2 can be obtained directly from the results of ratio estimator. Recall that in ratio method of estimation, the bias and MSE of the ratio estimator upto second order of approximation is Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 18


N −n Bias ( yˆ R ) ≈ Y (C x2 − 2 ρ C xC y ) Nn Var ( x ) Cov( x , y )  = Y − 2  XY  X MSE (YˆR ) ≈ Var ( y ) + R 2Var ( x ) − 2 RCov( x , y )  where R =

Y . X

Bias: The bias of yS**2 up to second order of approximation is

Var ( xS*2 ) Cov( xS*2 , yS* 2 )  = Bias ( yS**2 ) Y  −  2 XY  X  where xS*2 is the mean of auxiliary variable similar to yS* 2 as xS*2 =

1 n ∑ xi ( mi ) . n i =1

Now we find Cov( xS*2 , yS* 2 ).  1 n  1 n 1 n  1 n  Cov( xS*2 , yS* 2 ) Cov  E  ∑ ui xi ( mi ) , ∑ ui yi ( mi )   + E Cov  ∑ ui xi ( mi ) , ∑ ui yi ( mi )   ni1 ni1 =  =  n i 1=    n i 1=  1 n 1 n  1 n  Cov  ∑ ui E ( xi ( mi ) ), ∑ ui E ( yi ( mi ) )  + E  2 ∑ ui2Cov( xi ( mi ) , yi ( mi ) ) i  ni1 = =  n i 1=  n i 1  1 n  1 1 n 1   1 n  = Cov  ∑ ui X i , ∑ uiYi  + E  2 ∑ ui2  − Sixy  ni1 =  n i 1=  =  n i 1  mi M i   1 1 1  * =  −  Sbxy + nN n N 

N

∑u i =1

2 i

 1 1   − Sixy  mi M i 

where * = Sbxy

1 N ∑ (ui X i − X )(uiYi − Y ) N − 1 i =1

1 Mi = Sixy ∑ ( xij − X i )( yij − Yi ). M i − 1 j =1 Similarly, Var ( xS*2 ) can be obtained by replacing x in place of y in Cov( xS*2 , yS* 2 ) as

 1 1  *2 1 Var ( xS*2 ) =−   Sbx + nN n N  where Sbx*2 =

1 N ∑ (ui X i − X )2 N − 1 i =1

= Six*2

1 Mi ∑ ( xij − X i )2 . M i − 1 i =1

N

∑u i =1

2 i

 1 1  −  mi M i

 2 Six 

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 19


Substituting Cov( xS*2 , yS* 2 ) and Var ( xS*2 ) in Bias ( yS**2 ), we obtain the approximate bias as *  1 1   Sbx*2 Sbxy  1 Bias ( y ) ≈ Y  −   2 − +  XY  nN  n N   X ** S2

N



∑ u i =1

2 i

 1 1   Six2 Sixy    −  2 −   . XY    mi M i   X

Mean squared error MSE ( yS**2 ) ≈ Var ( yS* 2 ) − 2 R*Cov( xS*2 , yS* 2 ) + R*2Var ( xS*2 )  1 1  *2 1 Var ( yS**2 ) =−   Sby + nN n N 

N

∑u

2 i

i =1

 1 1  2  −  Siy  mi M i 

 1 1  2  −  Six i =1  mi M i  1 N 2 1 1 1 1  * Cov( xS*2 , yS**2 ) = S ui  − − + ∑   bxy nN i =1  mi M i n N 

 1 1  *2 1 Var ( xS**2 ) =−   Sbx + nN n N 

N

∑u

2 i

  Sixy 

where Sby*2 =

1 N ∑ (uiYi − Y )2 N − 1 i =1

Siy*2 =

1 Mi ∑ ( yij − Yi )2 M i − 1 j =1

* R=

Y = Y. X

Thus 1 1 1  * + R*2 Sbx*2 ) + MSE ( yS**2 ) ≈  −  ( Sby*2 − 2 R* Sbxy nN n N 

N

∑ u i =1

2 i

  1 1  2 * *2 2  −  ( Siy − 2 R Sixy + R Six ) .  mi M i  

Also  2 1 N  2 1 1  2 1 1  1 N 2 * *2 2 MSE ( yS**2 ) ≈  −  ui (Yi − R* X i ) + ui  −  ( Siy − 2 R Sixy + R Six ) . ∑ ∑ nN i 1   mi M i   n N  N −1 i 1= = 

Estimate of variance Consider 1 n ( ui yi ( mi ) − yS* 2 )( ui xi ( mi ) − xS*2 )  ∑  n − 1 i =1  1 n s= ∑ ( xij − xi ( mi ) )( yij − yi ( mi ) ). ixy mi − 1 j =1 

* s= bxy

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 20


It can be shown that

 1 1 N 1  * * + ∑ ui2  − E ( sbxy Sbxy )= Sixy N i =1  mi M i  E ( sixy ) = Sixy . So 1 n  1 1   1 N  2 1 1 E  ∑ ui2  −  ui  −  sixy= ∑  mi M i   N i 1   mi M i  n i 1=

   Sixy .  

Thus  1 1 n 1  * * = − ∑ ui2  − Sˆbxy sbxy  sixy n i =1  mi M i   1 1 n 1  2 *2 − ∑ ui2  − Sˆbx*2 = sbx  six n i =1  mi M i   1 1 n 1  2 *2 − ∑ ui2  − Sˆby*2 = sby  siy . n i =1  mi M i  Also  1 n   1 1  2  1 N  2  1 1  2 E  ∑ ui2  −  ui  −  six =  Six  ∑   mi M i   N i 1   mi M i    n i 1 =  1 n   1 1  2  1 N  2  1 1  2 E  ∑ ui2  −  ui  −  siy =  Siy . ∑   mi M i   N i 1   mi M i    n i 1 = A consistent estimator of MSE of

yS**2 can be obtained by substituting the unbiased estimators of

respective statistics in MSE ( yS**2 ) as  ( y ** ) ≈  1 − 1  ( s*2 − 2r * s* + r *2 s*2 ) MSE S2 bxy bx   by n N  1 n 2 1 1  2 * *2 2 + ui  −  ( siy − 2r sixy + r six ) ∑ nN i =1  mi M i  2 1 1  1 n ≈ −  yi ( mi ) − r * xi ( mi ) ) ( ∑  n N  n − 1 i =1  1 n  2 1 1  2 * *2 2 + ui  −  ( siy − 2r sixy + r six )  ∑ nN i =1   mi M i  

yS* 2 where r * = . xS*2

Sampling Theory| Chapter 10 | Two Stage Sampling | Shalabh, IIT Kanpur

Page 21


Chapter 11 Systematic Sampling The systematic sampling technique is operationally more convenient than the simple random sampling. It also ensures at the same time that each unit has equal probability of inclusion in the sample. In this method of sampling, the first unit is selected with the help of random numbers and the remaining units are selected automatically according to a predetermined pattern. This method is known as systematic sampling. Suppose the N units in the population are numbered 1 to N in some order. Suppose further that N is expressible as a product of two integers n and k , so that N = nk .

To draw a sample of size n , -

select a random number between 1 and k .

-

Suppose it is i .

-

Select the first unit whose serial number is i .

-

Select every k th unit after i th unit.

-

Sample will contain i, i + k ,1 + 2k ,..., i + (n − 1)k serial number units.

So first unit is selected at random and other units are selected systematically. This systematic sample is called kth systematic sample and k is termed as sampling interval. This is also known as linear systematic sampling. The observations in the systematic sampling are arranged as in the following table: 1

2

3

i

k

1

y1

y2

y3

yi

yk

2

yk +1 

yk + 2 

yk + 3 

 

 

y( n −1) k +1

y( n −1) k + 2

y( n −1) k +3

yk + i 

y( n −1) k +i

y2k  ynk

Systematic sample number Sample composition 

n Probability

1 k

1 k

1 k

1 k

1 k

Sample mean

y1

y2

y3

yi

yk

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 1


Example: Let N = 50 and n = 5. So k = 10. Suppose first selected number between 1 and 10 is 3. Then systematic sample consists of units with following serial number 3, 13, 23, 33, 43.

Systematic sampling in two dimensions: Assume that the units in a population are arranged in the form of m rows and each row contains nk units. A sample of size mn is required. Then -

select a pair of random numbers (i, j ) such that i ≤  and j ≤ k .

-

Select the (i, j )th unit, i.e., j th unit in i th row as the first unit.

-

Then the rows to be selected are i, i + , i + 2,..., i + (m − 1)

and columns to be selected are j , j + k , j + 2k ,..., j + (n − 1)k .

-

The points at which the m selected rows and n selected columns intersect determine the position of mn selected units in the sample.

Such a sample is called an aligned sample. Alternative approach to select the sample is independently select n random integers i1 , i2 ,..., in such that each of them is less than or equal to . - Independently select m random integers j1 , j2 ,..., jm such that each of them is less than or equal to k . - The units selected in the sample will have following coordinates: (i1 + r , jr +1 ), (i2 + r , jr +1 + k ), (i3 + r , jr +1 + 2k ),..., (in + r , jr +1 + (n − 1)k ) . Such a sample is called an unaligned sample. -

Under certain conditions, an unaligned sample is often superior to an aligned sample as well as a stratified random sample.

Advantages of systematic sampling: 1.

It is easier to draw a sample and often easier to execute it without mistakes. This is more advantageous when the drawing is done in fields and offices as there may be substantial saving in time.

2.

The cost is low and the selection of units is simple. Much less training is needed for surveyors to collect units through systematic sampling .

3.

The systematic sample is spread more evenly over the population. So no large part will fail to be represented in the sample. The sample is evenly spread and cross section is better. Systematic sampling fails in case of too many blanks.

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 2


Relation to the cluster sampling The systematic sample can be viewed from the cluster sampling point of view. With n = nk , there are k possible systematic samples. The same population can be viewed as if divided into k large sampling

units, each of which contains n of the original units. The operation of choosing a systematic sample is equivalent to choosing one of the large sampling unit at random which constitutes the whole sample. A systematic sample is thus a simple random sample of one cluster unit from a population of k cluster units.

Estimation of population mean : When N = nk: Let yij : observation

on

the

unit

bearing

the

serial

number

i + ( j − 1)k

in

the

population,

i = 1, 2,..., k , j = 1, 2,..., n.

Suppose the drawn random number is i ≤ k . Sample consists of i th column (in earlier table). Consider the sample mean given by

ysy= y= i

1 n ∑ yij n j =1

as an estimator of the population mean given by Y = =

1 k n ∑∑ yij nk=i 1 =j 1 1 k ∑ yi . k i =1

Probability of selecting i th column as systematic sample =

1 . k

So ( ysy ) = E

1 k = ∑ yi Y . k i =1

Thus ysy is an unbiased estimator of Y .

Further, ( ysy ) = Var

1 k ( yi − Y ) 2 . ∑ k i =1

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 3


Consider

( N − 1)= S2

k

n

∑∑ ( y

=i 1 =j 1 k

ij

− Y )2

n

∑∑ ( yij − yi ) + ( yi − Y ) 

=

2

=i 1 =j 1 k

n

k

∑∑ ( yij − yi )2 + n∑ ( yi − Y )2

=

=i 1 =j 1

=i 1

k

2 = k (n − 1) S wsy + n∑ ( yi − Y ) 2 i =1

where 2 = S wsy

k n 1 ∑∑ ( yij − yi )2 k (n − 1)=i 1 =j 1

is the variation among the units that lie within the same systematic sample . Thus N − 1 2 k (n − 1) 2 S − S wsy N N N − 1 2 (n − 1) 2 S − S wsy = N n ↓ ↓

Var = ( ysy )

Variation as a whole

Pooled within variation of the k systematic sample

with N = nk . This expression indicates that when the within variation is large, then Var ( yi ) becomes smaller. Thus higher heterogeneity makes the estimator more efficient and higher heterogeneity is well expected in systematic sample.

Alternative form of variance: 1 k ( ysy ) ( yi − Y ) 2 Var = ∑ k i =1  1 k 1 n = ∑  ∑ yij − Y  k i 1= = n j 1  =

2

 1 k  n ( yij − Y )  2 ∑ ∑ kn=i 1 = j 1 

n n  n  2 y Y − + ( ) ( yij − Y )( yi − Y )  ∑ ∑ ∑  ∑ ij i= 1  j= 1 j ( ≠ )= 1 = 1  k n n  1  2 nk S ( 1) ( yij − Y )( yi − Y )  . = − + ∑ ∑ ∑  2 kn  i = 1 j ( ≠ )= 1 = 1 

=

1 kn 2

k

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 4


The intraclass correlation between the pairs of units that are in the same systematic sample is

ρw =

E ( yij − Y )( yi − Y ) E ( yij − Y )

2

;

1 ≤ ρw ≤ 1 nk − 1

k n n 1 ∑ ∑ ∑ ( yij − Y )( yi − Y ) nk (n − 1) i = 1 j ( ≠  ) = 1  = 1 . =  nk − 1  2  S  nk 

So substituting k

n

n

∑ ∑ ∑(y i = 1 j ( ≠ )= 1 = 1

ij

− Y )( yi − Y ) = (n − 1)(nk − 1) ρ w S 2

in Var ( yi ) gives nk − 1 S 2 Var (= ysy ) [1 + ρ w (n − 1)] nk n N −1 S 2 = [1 + ρ w (n − 1)]. N n

Comparison with SRSWOR: For a SRSWOR sample of size n , N −n 2 S Nn nk − n 2 = S Nn k −1 2 = S . N

Var ( ySRS ) =

Since

N −1 2 n −1 2 S − S wsy N n N = nk

= Var ( ysy )

 k −1 N −1  2 n −1 2 − Var ( ySRS ) − Var ( ysy ) = S wsy  S + N  n  N n −1 2 = ( S wsy − S 2 ). n Thus ysy is -

2 more efficient than ySRS when S wsy > S2 .

-

2 < S2. less efficient than ySRS when S wsy

-

2 equally efficient as ySRS when S wsy = S 2.

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 5


Also, the relative efficiency of ysy relative to ySRS is RE =

Var ( ySRS ) Var ( ysy ) N −n 2 S Nn

=

N −1 2 S [1 + ρ w (n − 1) ] Nn  N −n  1 =   N − 1 1 + ρ w (n − 1)   n(k − 1)  1  ; (nk − 1) 1 + ρ w (n − 1) 

1 ≤ ρ ≤ 1. nk − 1

Thus ysy is -

more efficient than ySRS when ρ w < −

-

less efficient than ySRS when ρ w > −

-

equally efficient as ySRS when ρ w = −

1 nk − 1

1 nk − 1

1 . nk − 1

Comparison with stratified sampling: The systematic sample can also be viewed as if arising as a stratified sample. If population of N = nk units is divided into n strata and suppose one unit is randomly drawn from each of the strata. Then we get a stratified sample of size n . In doing so, just

consider each row of the following arrangement as a

stratum.

1

2

3

i

k

1

y1

y2

y3

yi

yk

2

yk +1 

yk + 2 

yk + 3 

 

 

y( n −1) k +1

y( n −1) k + 2

y( n −1) k +3

yk + i 

y( n −1) k +i

y2k  ynk

Systematic sample number Sample composition 

n Probability

1 k

1 k

1 k

1 k

1 k

Sample mean

y1

y2

y3

yi

yk

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 6


Recall that in case of stratified sampling with k strata, the stratum mean

yst =

1 N

k

∑N j =1

j

yj

is an unbiased estimator of population mean.

Considering the set up of stratified sample in the set up of systematic sample, we have -

Number of strata = n

-

Size of strata = k (row size)

-

Sample size to be drawn from each stratum = 1

and yst becomes yst = =

1 n ∑ ky j nk j =1 1 n ∑ yj n j =1

Var ( yst ) =

1 n2

n

∑Var ( y ) j =1

j

N −n 2 1 n k −1 2  S j  using Var ( ySRS ) S  = 2 ∑ n j =1 k .1 Nn   =

k −1 n 2 ∑Sj kn 2 j =1

k −1 2 S wst nk N −n 2 S wst = Nn =

where = S 2j

1 k ∑ ( yij − y j )2 k − 1 i =1

is the mean sum of squares of units in the j th stratum. 2 = S wst

k n 1 n 2 1 = S ( yij − y j ) 2 ∑ ∑∑ j n=j 1 n(k − 1)=i 1 =j 1

is the mean sum of squares within strata (or rows). The variance of systematic sample mean is

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 7


1 k ( yi − Y ) 2 ∑ k i =1

( ysy ) Var =

 1 k 1 n 1 n y yj  = − ∑ ∑  ∑ ij k i 1= n j1  = n j 1 =  1 k  n = 2 ∑  ∑ ( yij − y j )  n k=i 1 = j 1  =

2

2

k n n  1  k n 2 ( ) ( )( ) y y y y y y − + − − ∑∑ ∑∑∑  .   ij j ij j i n 2 k  =i 1 =j 1 =i 1 j ≠ = 1 

Now we simplify and express this expression in terms of intraclass correlation coefficient. The intraclass correlation coefficient between the pairs of deviations of units which lie along the same row measured from their stratum means is defined as

ρ wst =

E ( yij − Y )( yi − Y ) E ( yij − Y ) 2

k n n 1 ∑∑∑ ( yij − y j )( yi − y ) nk (n − 1)=i 1 j ≠= 1 = 1 k n ( yij − y j ) 2 ∑∑ nk=i 1 =j 1 k

=

n

n

∑∑∑

=i 1

j ≠= 1

( yij − y j )( yi − y )

2 ( N − 1)(n − 1) S wst

So 1 2 2 ( N − n) S wst  + ( N − n)(n − 1) ρ wst S wst 2  nk N −n 2 S wst [1 + (n − 1) ρ wst ] . (using N nk ) = = Nn Var ( y= sy )

Thus Var ( ysy ) − Var ( yst ) =

N −n 2 (n − 1) ρ wst S wst Nn

and the relative efficiency of systematic sampling relative to equivalent stratified sampling is given by

= RE

Var ( yst ) 1 = . Var ( ysy ) 1 + (n − 1) ρ wst

So the systematic sampling is -

more efficient than the corresponding equivalent stratified sample when ρ wst > 0 .

-

less efficient than the corresponding equivalent stratified sample when ρ wst < 0

-

equally efficient than the corresponding equivalent stratified sample when ρ wst = 0.

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 8


Comparison of systematic sampling, stratified sampling and SRS with population with linear trend: We assume that the values of units in the population increase according to linear trend.

So the values of successive units in the population increase in accordance with a linear model so that yi =+ a bi, i = 1, 2,..., N .

Now we determine the variances of ySRS , ysy and yst under this linear trend.

Under SRSWOR V ( ySRS ) =

N −n 2 S . Nn

Here N = nk 1 N ∑i N i =1 1 N ( N + 1) = a+b N 2 N +1 = a+b 2

Y= a + b

1 N ∑ ( yi − Y )2 N − 1 i =1

= S2 =

N + 1 1 N  a + bi − a − b ∑  N − 1 i =1  2 

b2 N  N + 1  = ∑  i − 2  N − 1 i =1  =

2

2

2 b2  N 2  N +1   i N − ∑    N − 1  i =1  2  

b 2  N ( N + 1)(2 N + 1) N ( N + 1) 2  −  N − 1  6 4  = b2

N ( N + 1) 12

nk − n 2 nk (nk + 1) b 12 nk .n 2 b (k − 1)(nk + 1). = 12

Var ( ySRS ) =

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 9


Under systematic sampling Earlier yij denoted the value of study variable with the j th unit in the i th systematic sample. Now yij represents the value of [i + ( j − 1)k ] unit of the population, so th

yij =a + b [i + ( j − 1)k ] , i =1, 2,..., k ; j =1, 2,..., n. ysy = yi Var = ( ysy ) yi =

1 k ( yi − Y ) 2 ∑ k i =1

1 n ∑ yij n j =1 1 n ∑ a + b {i + ( j − 1)k} n j =1 

=

 n −1  a + bi + k = 2  

 nk + 1   n −1  ( yi − Y= ) ∑ a + b  i + k −a −b ∑ 2  2   =i 1 =i 1  k

k

2

2

k  k +1  = b2 ∑  i −  2  i =1  2  k 2 k +1 k   k +1  2 2 =b  ∑ i + k  − ∑ i  2 i 1   2  =  i 1 = 2

 k (k + 1)(2k + 1)  k + 1 2 k (k + 1)  = b2  +   − (k + 1) 6 2   2   b2 k (k 2 − 1) = 12

1 b2 k (k 2 − 1) k 12 b2 2 = (k − 1). 12

= Var ( ysy )

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 10


Under stratified sampling yij =a + b [i + ( j − 1)k ] , i =1, 2,..., k , j =1, 2,..., n yst =

k

1 N

∑N y i =1

i

N −n 2 k −1 2 = S wst S wst Nn nk

= Var ( yst )

1 n 2 ∑Sj n j =1

2 where S wst =

=

i

k n 1 ∑∑ ( yij − y j )2 n(k − 1)=i 1 =j 1

k n  1  k +1  = a + b {i + ( j − 1)k} − a − b  + ( j − 1)k  ∑∑  n(k − 1)=i 1 =j 1   2  k n b2  k +1 = ∑∑ i −  2  n(k − 1)=i 1 =j 1 

2

2

b 2 nk (k 2 − 1) 12 n(k − 1) k (k + 1) = b2 12 =

k − 1 2 k (k + 1) b 12 nk b2  k 2 − 1  =   12  n 

Var ( yst ) =

If k is large, so that

1 is negligible, then comparing Var ( yst ), Var ( ysy ) and V ( ySRS ), k

Var ( yst ) :

Var ( ysy ) :

Var ( ySRS )

or

k 2 −1 n

:

k 2 −1

:

(k − 1)(1 + nk )

or

k +1 n

:

k +1

:

nk + 1

or

k +1 : n(k + 1)

k +1 k +1

:

nk + 1 k +1

1

:

n

1 n

Thus Var ( yst ) : Var ( ysy ) : Var ( ySRS )

::

1 : 1 : n n

So stratified sampling is best for linearly trended population. Next best is systematic sampling. Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 11


Estimation of variance: As such there is only one cluster, so variance in principle, cannot be estimated. Some approximations have been suggested. 1.

Treat systematic sample as if it were a random sample. In this case, an estimate of variance is 1 1  2 (y = Var sy )  −  swc  n nk  1 n −1 2 where swc ( yi + jk − yi ) 2 . = ∑ n − 1 j =0

This estimator under-estimates the true variance. 2.

Use of successive differences of the values gives the estimate of variance as 2

n −1 1 1  1 (y ) = − Var ∑ ( yi + jk − yi +( j +1) k ) . sy    n nk  2(n − 1) j =0

This estimator is a biased estimator of true variance. 3.

Use the balanced difference of y1 , y2 ,..., yn to get the estimate of variance as n−2 yi + 2   yi  ( y ) = 1 − 1  1 − + Var y ∑ sy i + 1   2   n nk  5(n − 2) i  2 or

2

2

n−4 y  1  yi  ( y ) = 1 − 1  − yi +1 + yi + 2 − yi +3 + i + 4  . Var ∑ sy    2   n nk  15(n − 4) i  2

4.

The interpenetrating subsamples can be utilized by dividing the sample into C groups each of size y=

n . Then the group means are y1 , y2 ,..., yc . Now find c 1 c ∑ yt c t =1

(y ) = Var sy

c 1 ( yt − y ) 2 . ∑ c(c − 1) t =1

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 12


Systematic sampling when N ≠ nk . When N is not expressible as nk then suppose N can be expressed as N= nk + p; p < k .

Then consider the following sample mean as an estimator of population mean

 1 n +1  n + 1 ∑ yij j =1  ysy= y= i n 1  ∑ yij  n j =1

if i ≤ p if i > p.

In this case  1  p  1 n +1  n  1 n yij  + ∑  ∑ yij   ∑  ∑ k  i = 1  n + 1 j= 1 1 p +1  n j =  i=  

= E ( yi )

≠ Y. So ysy is a biased estimator of Y .

An unbiased estimator of Y is ysy* = =

k N

∑y

ij

j

k Ci N

where Ci = nyi is the total of values of the i th column. k E (Ci ) N k 1 k = . ∑ Ci N k i =1

E ( ysy* ) =

=Y

Var ( ysy* ) =

k 2  k − 1  *2   Sc N2  k  2

where Sc*2 =

1 k  NY  ∑  nyi −  . k − 1 i =1  k 

Now we consider another procedure which is opted when N ≠ nk . [Reference: Theory of Sample Surveys, A.K. Gupta, D.G. Kabe, 2011, World Scientific Publishing Co.]

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 13


When population size N is not expressible as the product of n and k , then let N = nq + r. Then take the sampling interval as  q k = q + 1 

n 2. n if r > 2

if r ≤

M  M Let   denotes the largest integer contained in . g g * If = k q= ( q or q + 1) , then the

 N  N  N   *  with probability  *  + 1 −  *   q  q  q  number of units expected in sample =    N  + 1 with probability  N  −  N  .  *   *  *   q  q   q 

If q = q* , then we get  r  r r  n +   with probability   + 1 −    q q q . n* =  n +  r  + 1 with probability  r  −  r        q   q  q 

Similarly, if q*= q + 1, then  n−r   (n − r )  n−r  +1−  n −   with probability     (q + 1)   q +1    q +1  * n = n +  n − r  + 1 with probability  n − r  −  (n − r )  .          q + 1   (q + 1)    q + 1  

n Example: Let N = 17 and n = 5. Then q = 3 and r = 2 . Since r < , k = q= 3. 2

Then sample sizes would be  r  r  r = = +1−   n +   5 with probability    q q q n* =   r  r  n +  r  + 1 6 with probability = =  −    q   q  q 

1 3 2 . 3

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 14


This can be verified from the following example: Systematic sample number

Systematic sample

Probability

1

Y1 , Y4 , Y7 , Y10 , Y13 , Y16

1/3

2

Y4 , Y5 , Y8 , Y11 , Y14 , Y17

1/3

3

Y3 , Y6 , Y9 , Y12 , Y15

1/3

We now prove the following theorem which shows how to obtain an unbiased estimator of the population mean when N ≠ nk . Theorem: In systematic sampling with sampling interval k from a population with size N ≠ nk , an unbiased estimator of the population mean Y is given by k  n'  Yˆ =  ∑ y  N i where i stands for the i th systematic sample, i = 1, 2,..., k and n' denotes the size of i th systematic sample. Proof. Each systematic sample has probability

1 . Hence k

k 1 k  n'  E (Yˆ ) = ∑ .  ∑ y  i =1 k N  i

=

1 N

 n'  ∑ ∑ y . i =1  i k

Now, each unit occurs in only one of the k possible systematic samples. Hence N  n'  Yi , ∑  ∑ y  = ∑ =i 1 = i 1 i k

which on substitution in E (Yˆ ) proves the theorem. When N ≠ nk , the systematic samples are not of the same size and the sample mean is not an unbiased estimator of the population mean. To overcome these disadvantages of systematic sampling when N ≠ nk , circular systematic sampling is proposed. Circular systematic sampling consists of selecting a

random number from 1 to N and then selecting

the unit corresponding to this random number.

Thereafter every k th unit in a cyclical manner is selected till a sample of n units is obtained, k being the nearest integer to

N . n

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 15


In other words, if i is a number selected at random from 1 to N , then the circular systematic sample consists of units with serial numbers i + jk , if i = jk ≤ N  j 0,1, 2,..., (n − 1). = i + jk − N , if i = jk > N  This sampling scheme ensures equal probability of inclusion in the sample for every unit.

Example: Let N = 14 and n = 5. Then, k = nearest integer to

14 = 3. Let the first number selected at random 5

from 1 to 14 be 7. Then, the circular systematic sample consists of units with serial numbers 7,10,13,

16-14=2,

19-14=5.

This procedure is illustrated diagrammatically in following figure.

1 12 2 13 3

12 4

11 5

6

10 7

9 8

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 16


Theorem: In circular systematic sampling, the sample mean is an unbiased estimator of the population mean. Proof: If i is the number selected at random, then the circular systematic sample mean is 1 n  y = ∑ y , n i  n  where  ∑ y  denotes the total of y values in the i th circular systematic sample, i = 1, 2,..., N . We  i

note here that in circular systematic sampling, there are N circular systematic samples, each having probability

1 of its selection. Hence, N

1 n  1 1 N  n  = × y ∑ n  ∑  N Nn ∑ ∑ y i 1 =i 1 = i i = E( y )

N

Clearly, each unit of the population occurs in n of the N possible circular systematic sample means. Hence, N  n  y n Yi , = ∑  ∑  ∑ 1 =i 1 = i i N

which on substitution in E ( y ) proves the theorem.

What to do when N ≠ nk One of the following possible procedures may be adopted when N ≠ nk . (i)

Drop one unit at random if sample has (n + 1) units.

(ii)

Eliminate some units so that N = nk .

(iii)

Adopt circular systematic sampling scheme.

(iv)

Round off the fractional interval k .

Sampling Theory| Chapter 11 | Systematic Sampling | Shalabh, IIT Kanpur

Page 17


Chapter 12 Sampling on Successive Occasions Many times, we are interested in measuring a characteristic of a population on several occasions to estimate the trend in time of population means as a time series of the current value of population mean or the value of population mean over several points of time.

When the same population is sampled repeatedly, the opportunities for a flexible sampling scheme are greatly enhanced. For example, on the hth occasion we may have a part of sample that are matched with (common to) the sample at (h − 1)th occasion, parts matching with both (h − 1)th and (h − 2)th occasions, etc. Such a partial matching is termed as sampling on successive occasions with partial replacement of units or rotation sampling or sampling for a time series.

Notations: Let P be the fixed population with N units. yt : value of certain dynamic character which changes with time t and can be measured for each unit on a number of occasions, t = 1, 2,.., n . yij : value of y on j th unit in the population at the i th occasion, = i 1,= 2,..., h, j 1,..., N .

Yi =

= Si2

1 N

∑y

ij

: population mean for the i th occasion

j

1 N ∑ ( yij − Yi )2 : population variance for the ith occasion. N − 1 j =1

Generally we assume S12= S 22= ...= S 2 . = ρ ii*

1 N ∑ ( yij − Yi )( yi* j − Yi* ) . N − 1 j =1

is the population correlation coefficient between the observations at the occasions i and

i*

(i < i* = 1, 2,..., h) .

ρ = ρ12 si* : sample of size ni selected at the i th occasion * sim : part of si* which is common to (i.e. matched with) si*−1 ,

* * * = sim s= 2,3,..., = h ( s1m s2 m ) i  si −1 , i

Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 1


Note that s1m and s2 m are of the sizes n1* and n2* respectively. * . siu* : set of units in si* not obtained by the selection in sim

Often siu*= si*−c1  si (i= 2,..., h) ( s1*u= P − s1*m ) . Note that siu* is of size ni**=( ni − ni* ) .

yi = sample mean of units in i th occasion. * on the i th occasion. yi* = sample mean of the units in sim

yi** = sample mean of units in siu on the i th occasion. yi*** = sample mean of units in sim on the (i − 1)th occasion, i = 2,3,..., h

(y

*** 2

= y1* , yi*** depends on yi −1 and yi**−1 )

Sampling on two occasions Assume that ni = n ni* = m 1, 2 ni** = u (= n − m), i = Suppose that the sample s1* is an SRSWOR from P. The sample s2* = s2*m  s2*u where s2*m is an SRSWOR sample of size m from s1* and

s2*u is an SRSWOR sample of size u from ( P − s1* ) .

Estimation of Population mean Two types of estimators are available for the estimation of population mean: 1. Type 1 estimators: They are obtained by taking a linear combination of estimators obtained from s2*u and s2*m .

2. Type 2 estimators: They are obtained by considering the best linear combination of sample means.

Type 1 estimators: Two estimators are available for estimating Y2 (i) t2u = y2** S 22 1 with Var ( y = ) = (say) u Wu ** 2

Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 2


(ii) t2m = linear regression estimate of Y2 based on the regression of y2 j on y1 j = y2* + b( y1 − y1* )

∑ (y

1j

where b =

i∈s*2m

− y1* )( y2 j − y2* ) ( y1 j − y1* ) 2

is the sample regression coefficient.

j∈s1*m

Recall in case of double sampling, we had

1 1  1 1  Var (Yˆregd )= S y2  −  − ρ 2 s y2  − *  n N  n n  2 Sy 1 1  = − ρ 2 S y2  − *  . n n n  2 2 (1 − ρ 2 ) 2 ρ S y 1 (ignoring term of order ). = Sy + * n n N So in this case S 22 (1 − ρ 2 ) ρ 2 S 22 + m n 1 (say). = Wm

= Var ( t2 m )

If there are two uncorrelated unbiased estimators of a parameter, then the best linear unbiased estimator of parameter can be obtained by combining them using a linear combination with suitably chosen weights. Now we discuss how to choose weights in such a linear combination of estimators.

Let

θˆ1 and θˆ2 be two uncorrelated and unbiased estimators of

(θˆ1 ) E= (θˆ2 ) θ and θ , i.e., E=

2 ˆ ˆ Var (θˆ1 ) == 0. σ 12 , Var (θˆ2 ) σ= 2 , Cov (θ1 , θ 2 )

Consider

θˆ= ωθˆ1 + (1 − ω )θˆ2 where

0 ≤ ω ≤ 1 is the weight. Now choose ω such that Var (θˆ) is

minimum. ˆ) ω 2σ 2 + (1 − ω ) 2 σ 2 Var (θ= 1 2 ∂Var (θˆ) =0 ∂ω ⇒ 2ωσ 12 − 2(1 − ω )σ 22 = 0

ω ⇒ =

σ 22 ω * , say = 2 2 σ1 + σ 2

∂ 2Var (θˆ) > 0. ∂ω 2 ω =ω* Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 3


The minimum variance achieved by θˆ is 2 = ω * σ 12 + (1 − ω * ) 2 σ 22 Var (θˆ) Min

= =

σ 24σ 12 2 1

+ σ 22 )

2

+

σ 2σ 2

2 1 = 2 σ 1 + σ 22

σ 14σ 22 2 1

+ σ 22 )

1 1

σ 22

+

1

2

.

σ 12

Now we implement this result in our case.

Consider the linear combination of t2u and t2m as Yˆ2= ωt2u + (1 − ω )t2 m

where the weights ω are obtained as

ω=

Wu Wu + Wm

so that Yˆ2 is the best combined estimate. The minimum variance with this choice of ω is Var = (Yˆ2 )

S 22 (n − u ρ 2 ) 1 . = Wu + Wm (n 2 − u 2 ρ 2 )

S2 For u = 0 (complete matching), Var (Yˆ2 ) = 2 . n S2 For u = n (no matching), Var (Yˆ2 ) = 2 . n

Type II estimators: We now consider the minimum variance linear unbiased estimator of Y2 under the same sampling scheme as under Type I estimator.

A best linear (linear in terms of observed means) unbiased estimator of Yˆ2 is of the form Yˆ2* = ay1** + by1* + cy2* + dy2**

 m n −1 where constants a, b, c, d and matching fraction λ=  =  are to be suitably chosen so as to n   n

minimize the variance. Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 4


Assume S12 = S 22 . Now E (Yˆ2* ) = (a + b)Y1 + (c + d )Y2 .

If Yˆ2* has to be an unbiased estimator of Y2 , i.e. E (Yˆ2* ) = Y2 ,

it requires a+b = 0 c+d = 1.

Since a minimum variance unbiased estimator would be uncorrelated with any unbiased estimator of zero, we must have Cov(Yˆ2* , y1** − y1* ) = 0 Cov(Yˆ * , y * − y ** ) = 0. 2

2

2

(1) (2)

Since Cov( y2* , y1** )= 0= Cov( y2* , y2** ) Cov( y2* , y1* ) =

ρS2

m Cov = ( y , y ) Cov = ( y2** , y2* ) 0 ** 2

** 1

S2 Var ( y ) = m S2 Var ( y2** ) = . u * 2

Now solving (1) and (2) by neglecting terms of order

1 , we have N

Cov(Yˆ2* , y1** −= y1* ) Cov(ay1** + by1* + cy2* + dy2** , y1** − y1* ) = aVar ( y1** ) + bC ov( y1* , y1** ) + cCov( y2* , y1** ) + dC ov( y2** , y1** ) − aC ov( y1** , y1* ) − bVar ( y1* ) − cC ov( y1* , y2* ) − dCov( y2** , y1* ) or

ρS2

cS 2 (1 − c) S 2 −a + = . m m u

(3)

Similarly, from (2), we have Cov( y2* , y2* − y2** ) = 0 ⇒−

aS 2 c ρ S 2 aS 2 + = . m m u

(4)

Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 5


Solving (3) and (4) gives = a

λµρ

λ

= , c 1 − ρ 2µ 2 1 − ρ 2µ 2

u n−u where µ == 1− λ, λ = n n b= − a, d = 1 − c.

Substituting a, b, c, d , the best linear unbiased estimator of Y2 is

 µ (1 − ρ 2 µ ) y2**  . = Yˆ2* λµρ ( y1** − y1* ) + λ y2* + (1 − ρ 2 µ 2 )   For these values of a and c ,

 1 − ρ 2µ S 2  . Var (Yˆ2* ) =  2 2   1− ρ µ n  Alternatively, minimize Var (Yˆ2* ) with respect to a and c and find optimum values of a and c . Then find the estimator and its variance.

Till now, we used SRSWOR for the two occasions. We now consider unequal probability sampling schemes on two occasions for

estimating Yˆ2 . We use the

same notations as defined in varying

probability scheme.

Des Raj Scheme: Let s1* be the sample selected by PPSWR from P using x as a size (auxiliary) variable.

Then pi =

xi is the size measure of i , where X tot is the population total of auxiliary variable. X tot

s2* = s2*m  s2*u * * where s2m is an SRSWR( m ) from s1* and s2u is an independent sample selected from P by PPSWR

using u draws (m + u = n).

The estimator is Yˆ2 des= ω t2 m + (1 − ω )t2u ; 0 ≤ ω ≤ 1 where Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 6


 y2 j − y1 j   y1 j  + ∑  mp j  j∈s1*  np j j∈s2* m  y  t2u = ∑  2 j  .   j∈s2* u  up j  Assuming

∑ 

= t2 m

  

2

2

N  Y1 j   Y2 j  − = − Y= P Y Pj     V0 (say) . ∑ ∑ 1 2 j  P =j 1 = j 1  Pj   j  For the optimum sampling fraction N

λ=

m , n

Var (Y2 des ) =

V0 (1 + 2(1 − δ ) 2n

where  Y1i

 − Y2   i  i  σ pps ( y1 )σ pps ( y2 )

N

δ=

  Y2i

∑ P  P −Y  P i

i =1

1

2

Z  2 ( z ) ∑ Pi  i − Z= Varpps=  σ pps ( z ) i =1  Pi  N

N

Z = ∑ Zi . i =1

(ii) Chaudhuri-Arnab sampling scheme Let s1* be a sample selected by Midzuno’s sampling scheme, s2* = s2*m  s2*u * where s2m = SRSWOR (m) sample from s1* * = sample of size u from P by Midzuno’s sampling scheme. s2u

Then an estimator of Y is

Yˆ2 ca= α t2 m + (1 − α )t2u ;

0 ≤α ≤1

where t2 m =

t2u =

 ( y2 j − y1 )n   y1 j  + ∑  mπ j j∈s2* m   j∈s1*  π j

∑ 

  

y2 j

∑π

j∈s2 u

* j

π j = np j π *j = up j . Similarly other schemes are also there. Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 7


Sampling on more than two occasions When there are more than two occasions, one has a large flexibility in using both sampling procedures and estimating the character. Thus on occasion i •

one may have parts of the sample that are matched with occasion (i − 1)

parts that are matched with occasion (i − 2) j

and so on.

One may consider a single multiple regression of all previous matchings on the current occasion. However, it has been seen that the loss of efficiency incurred by using the information from the latest two or three occasions only is fairly small in many occasions.

Consider the simple sampling design where * si* = sim  siu* , * is a sample by SRSWOR of size mi from s(*i−1) , where sim

siu* is a sample by SRSWOR of size ui (= n − mi ) from the units not already sampled.

Assume= ni n= , S 22 S 2 for all i . On the i th occasion, we have therefore two estimators ) tiu = yi** with Var (tiu=

S2 1 = ui Wiu

tim = yi* + b(i −1)i (Yˆ(i −1) − yi*** ) where b(i−1),i is the regression of yij on y(i−1) j

∑(y

− yi*** )( yij − yi* )

( i −1) j

b(i−1),i =

si**

∑(y

( i −1) j

− yi*** ) 2

si**

and S 2 (1 − ρ 2 ) 1 Var (tim ) = + ρ 2Var (Yˆ(i −1) ) = mi Wim assuming that ρ(i−= ρ= , i 2,3,...,. and terms of order 1),i

1 are negligible. N

The expression of Var (tim ) has been obtained from the variance of regression estimator under the double sampling

Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 8


1 1  1 1  Var ( yˆ regd )= Su2  −  − ρ 2 S y2  − *  n N  n n  2 2 2 2 S y (1 − ρ ) ρ S y = − * n n

which is obtained after ignoring the terms of

ρ 2 S y2 n

*

( = β V ( x *) by 2

1 by using mi for n and replacing N

ρ 2Var (Yˆ(i −1) ) since β = ρ and Si2 is constant. Using weights as the inverse of

variance, the best weighted estimator from tiu and tim is Yˆiu = ωi + tiu + (1 − ωi ) + tim

where

ωi =

Wiu . Wiu + Wim

Then = Var ( yˆi ) Substituting

gi S 2 1 (say), i 1,= 2,..., ( g1 1). = = Wiu + Wim n

1 S2 = Wiu ui

gi S 2 1 in , = Wiu + Wim n we have n 1 . = ui + 2 ρ 2 gi −1 1− ρ gi + mi n Now maximize

n n with respect to mi so as to minimize Var ( yˆi ). So differentiate with respect to mi gi gi

and substituting it to be zero, we get (1 − ρ 2 )  1 − ρ 2 ρ 2 gi −1  = +   mi2 n   mi

2

n 1− ρ 2 ⇒ mˆ i = . gi −1 (1 + 1 − ρ 2 )

Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 9


Now the optimum sampling fracture

mˆ i can be determined successively for i = 2,3,... for given values n

of ρ . Substituting this in the expression of

n , we have gi

(1 − 1 − ρ 2 ) 2 1 = 1+ gi gi−1ρ 2 or

qi = 1 + aqi−1

where (1- 1- ρ 2 ) 1 , q1 = 1, a= ; 0 < a < 1. qi = gi (1 + 1 − ρ 2 ) Repeated use of this relation gives qi −1 = 1 + aqi − 2 ⇒ qi =1 + a (1 + aqi −1 ) =1 + a + a 2 qi −1 qi − 2 = 1 + aqi −3 ⇒ qi =1 + a + a 2 (1 + aqi − 2 )  =

(1 − a i ) 1 = as i → ∞. (1 − a ) 1 − a

For sampling an infinite number of times, the limiting variance factor g ∞ is 2 1− ρ 2

g ∞ =1 − a = . 1+ 1− ρ 2

The limiting value of V (Yˆi ) as i → ∞ is lim Var (Yˆi ) Var (Yˆ∞ ) = = i →∞

2S 2 1 − ρ 2

(

n 1+ 1− ρ 2

)

.

The limiting value of optimum sampling fraction as i → ∞ is 1− ρ 2 1 mˆ i mˆ ∞ . lim = = = i →∞ n 2 n g∞ 1 + 1 − ρ 2

(

)

Thus for the estimation of current population mean by this procedure, one would not have to match more than 50% of the sample drawn on the last occasion. Unless ρ is very high, say more than 0.8, the reduction in variance (1 − g h ) is only modest. Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 10


Type II estimation Consider Yˆi = aiYˆi −1 + bi yi**−1 + cyi*** + di yi** + ei yi*

Now

E (Yˆi ) = (ai + bi + ci )Y(i −1) + (di + ei )Yi .

So for unbiasedness, ci = −(ai + bi ) di = 1 − ei. .

.

An unbiased estimator is of the form

= Yˆi ai Yˆ(i −1) + bi yi** − (ai + bi ) yi*** + di yi** + (1 − di ) yi* . To find optimum weights, minimize Var (Yˆi ) with respect to ai , bi , di .

Alternatively, one can consider that ˆ a Yˆ + b y ** − (a + b ) y *** + d y ** + (1 − a ) y * . Y= i i i −1 i i i i i i i i i must be uncorrelated with all unbiased estimators of zero. Thus 0 Cov(Yˆi , yi**−1 − yi*** ) = 0 Cov(Yˆ , y ** − y *** ) = i −1

i −1

i

0 Cov(Yˆi , yi**− 2 − yi*** −1 ) =

Using these restrictions, find the constants and get the estimator.

Sampling Theory| Chapter 12 | Sampling on Successive Occassions | Shalabh, IIT Kanpur

Page 11


Chapter 13 Non Sampling Errors It is a general assumption in the sampling theory that the true value of each unit in the population can be obtained and tabulated without any errors. In practice, this assumption may be violated due to several reasons and practical constraints. This results in errors in the observations as well as in the tabulation. Such errors which are due to the factors other than sampling are called non-sampling errors.

The non-sampling errors are unavoidable in census and surveys. The data collected

by complete

enumeration in census is free from sampling error but would not remain free from non-sampling errors. The data collected through sample surveys can have both – sampling errors as well as non-sampling errors. The non-sampling errors arise because of the factors other than the inductive process of inferring about the population from a sample. In general, the sampling errors decrease as the sample size increases whereas non-sampling error increases as the sample size increases. In some situations, the non-sampling errors may be large and deserve greater attention than the sampling error. In any survey, it is assumed that the value of the characteristic to be measured has been defined precisely for every population unit. Such a value exists and is unique. This is called the true value of the characteristic for the population value. In practical applications, data collected on the selected units are called survey values and they differ from the true values. Such difference between the true and observed values is termed as the observational error or response error. Such an error arises mainly from the lack of precision in measurement techniques and variability in the performance of the investigators.

Sources of non-sampling errors: Non sampling errors can occur at every stage of planning and execution of survey or census. It occurs at planning stage, field work stage as well as at tabulation and computation stage. The main sources of the nonsampling errors are  lack of proper specification of the domain of study and scope of investigation, 

incomplete coverage of the population or sample,

faulty definition,

defective methods of data collection and

tabulation errors.

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 1


More specifically, one or more of the following reasons may give rise to nonsampling errors or indicate its presence: •

The data specification may be inadequate and inconsistent with the objectives of the survey or census.

Due to imprecise definition of the boundaries of area units, incomplete or wrong identification of units, faulty methods of enumeration etc, the data may be duplicated or may be omitted.

The methods of interview and observation collection may be inaccurate or inappropriate.

The questionnaire, definitions and instructions may be ambiguous.

The investigators may be inexperienced or not trained properly.

The recall errors may pose difficulty in reporting the true data.

The scrutiny of data is not adequate.

The coding, tabulation etc. of the data may be erroneous.

There can be errors in presenting and printing the tabulated results, graphs etc.

In a sample survey, the non-sampling errors arise due to defective frames and faulty selection of sampling units.

These sources are not exhaustive but surely indicate the possible source of errors.

Non-sampling errors may be broadly classified into three categories.

(a) Specification errors: These errors occur at planning stage due to various reasons, e.g., inadequate and inconsistent specification of data with respect to the objectives of surveys/census, omission or duplication of units due to imprecise definitions, faulty method of enumeration/interview/ambiguous schedules etc.

(b) Ascertainment errors: These errors occur at field stage due to various reasons e.g., lack of trained and experienced investigations, recall errors and other type of errors in data collection, lack of adequate inspection and lack of supervision of primary staff etc.

(c) Tabulation errors: These errors occur at tabulation stage due to various reasons, e.g., inadequate scrutiny of data, errors in processing the data, errors in publishing the tabulated results, graphs etc.

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 2


Ascertainment errors may be further sub-divided into (i) Coverage errors owing to over-enumeration or under-enumeration of the population or the sample, resulting from duplication or omission of units and from the non-response.

(ii) Content errors relating to the wrong entries due to the errors on the part of investigators and respondents.

Same division can be made in the case of tabulation error also. There is a possibility of missing data or repetition of data at tabulation stage which gives rise to coverage errors and also of errors in coding, calculations etc. which gives rise to content errors.

Treatment of non-sampling errors: Some conceptual background is needed for the mathematical treatment of non-sampling errors.

Total error: Difference between the sample survey estimate and the parametric true value being estimated is termed as total error.

Sampling error: If complete accuracy can be ensured in the procedures such as determination,

identification and

observation of sample units and the tabulation of collected data, then the total error would consist only of the error due to sampling, termed as sampling error.

Measure of sampling error is mean squared error (MSE). The MSE is the difference between the estimator and the true value and has two components: -

square of sampling bias.

-

sampling variance.

If the results are also subjected to the non-sampling errors, then the total error would have both sampling and non-sampling error.

Total bias: The difference between the expected value and the true value of the estimator is termed as total bias. This consists of sampling bias and nonsampling bias.

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 3


Non-sampling bias: For the sake of simplicity, assume that the two following steps are involved in the randomization: (i)

for selecting the sample of units and

(ii)

for selecting the survey personnel.

Let Yˆsr be the estimate of population mean Y based on s th sample of units supplied by the r th sample of the survey personnel. The conditional expected value of Yˆsr taken over the second step of randomization for a fixed sample of units is Er (Yˆsr ) = Yˆso ,

which may be different from Yˆs based on true values of the units in the sample.

The expected value of Yˆso over the first step of randomization gives Es (Yˆso ) = Y * ,

which is the value for which an unbiased estimator can be had by the specified survey process. The value Y * may be different from true population mean Y and the total bias is given as ) Y * −Y. Biast (Yˆsr= The sampling bias is given by (Yˆ ) Es (Yˆs ) − Y . Bias =

The non-sampling bias is ˆ Bias = Biast (Yˆsr ) − Biass (Yˆs ) r (Ysr ) = Y * − E (Yˆ ) s

s

= Es (Yˆso − Yˆs )

which is the expected value of the non-sampling deviation.

In case of complete enumeration, there is no sampling bias and the total bias consists only of nonsampling bias.

In case of sample surveys, the total bias consists only of the non-sampling bias.

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 4


The non-sampling bias in a census can be estimated by surveying a sample of units in the population using better techniques of data collection and compilation than those adopted under general census condition. The surveys are called post-enumeration surveys, which are usually conducted just after the census for studying the quality of census data, may be used for this purpose. In a large scale sample survey, the ascertainment bias can be estimated by resurveying a sub-sample of the original sample using better survey techniques. Another method of checking survey data is to compare the values of the units obtained in the two surveys and to reconcile the discrepant figures by further investigation. This method of checking is termed reconciliation (check ) surveys.

Non-sampling variance: The MSE of Yˆsr based on s th sample of units and supplied by r th sample of the survey personnel is (Yˆsr ) Esr (Yˆsr − Y ) 2 MSE=

where Y is the true value being estimated. This takes into account both the sampling and the nonsampling errors, i.e., 2

(Yˆsr ) Var (Yˆsr ) +  Bias (Yˆsr )  MSE =   = E (Yˆ − Y * ) 2 + (Y * − Y ) 2 sr

where Y * is the expected value of the estimator taken over both steps of randomization.

Taking the variance over the two steps of randomization, we get Varsr (Yˆsr ) Vars  Er (Yˆsr )  + Es Varr (Yˆsr )  =     =Vars Yˆso  + Es  Er (Yˆsr − Yˆso ) 2  .     ↓ ↓ sampling variance

non-samping variance

Note that Yˆsr − Yˆso = (Yˆsr − Yˆso − Yˆor + Y * ) + (Yˆor − Yˆ * )

where Yˆor = Es (Yˆsr ) .

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 5


) 2 Esr (Yˆsr − Yˆso − Yˆor + Yˆ * ) 2 + Er (Yˆor − Yˆ * ) 2 E (Yˆsr − Yˆso= ↓

Interaction between sampling and

Variance between

non-sampling errors

survey personnel

The MSE of an estimator consists of -

sampling variance,

-

interaction between the sampling and the non-sampling errors,

-

variance between survey personnel and

-

square of the sum of sampling and non-sampling biases.

In complete census, the MSE is composed of only the non-sampling variance and square of the nonsampling bias.

Non-response error: The non-response error may occur due to refusal by respondents to give information or the sampling units may be inaccessible.

This error arises because the set of units getting excluded may have

characteristic so different from the set of units actually surveyed as to make the results biased. This error is termed as non-response error since it arises from the exclusion of some of the anticipated units in the sample or population. One way of dealing with the problem of non-response is to make all the efforts to collect information from a sub-sample of the units not responding in the first attempt.

Measurement and control of errors: Some suitable methods and adequate procedures for control can be adopted before initiating the main census or sample survey. Some separate programmes for estimating the different types of non-sampling errors are also required. Some such procedures are as follows:

1. Consistency checks: Certain items in the questionnaires can be added which may serve as a check on the quality of collected data. To locate the doubtful observations, the data can be arranged in increasing order of some basic variable. Then they can be plotted against each sample unit. Such graph is expected to follow a certain pattern and any deviation from this pattern would help in spotting the discrepant values.

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 6


2. Sample check An independent duplicate census or sample survey can be conducted on a comparatively smaller group by trained and experienced staff. If the sample is properly designed and if the checking operation is efficiently carried out, then it is possible to detect the presence of non-sampling errors and to get an idea of their magnitude . Such procedure is termed as method of sample check.

3. Post-census and post-survey checks: It is a type of sample check in which a sample (or subsample) is selected of the units covered in the census (or survey) and re-enumerate or re-survey it by using better trained and more experienced survey staff than those involved in the main investigation. This procedure is called as post-survey check or post-census. The effectiveness of such check surveys can be increased by -

re-enumerating or re-surveying immediately after the main census to avoid recall error

-

taking steps to minimize the conditioning effect that the main survey may have on the work of the check-survey.

4. External record check: Take a sample of relevant units from a different source, if available, and to check whether all the units have been enumerated in the main investigation and whether there are discrepancies between the values when matched. The list from which the check-sample is drawn for this purpose, need not be a complete one.

5. Quality control techniques: The use of tools of statistical quality control like control chart and acceptance sampling techniques can be used in assessing the quality of data and in improving the reliability of final results in large scale surveys and census.

6. Study or recall error: Response errors arise due to various factors like the attitude of respondents towards the survey, method of interview, skill of the investigators and recall errors. Recall error depends on the length of the reporting period and on the interval between the reporting period and data of survey. One way of studying recall error is to collect and analyze data related to more than one reporting period in a sample (or sub-sample) of units covered in the census or survey.

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 7


7. Interpenetrating sub-samples: The use of interpenetrating sub-sample technique helps in providing an appraisal of the quality of information as the interpenetrating sub-samples can be used to secure information on non-sampling errors such as differences arising from differential interviewer bias, different methods of eliciting information etc. After the sub-samples have been surveyed by different groups of investigators and processed by different team of workers at the tabulation stage, a comparison of the final estimates based on the sub-samples provides a broad check on the quality of the survey results..

Sampling Theory| Chapter1 13 | Non Sampling Errors | Shalabh, IIT Kanpur

Page 8

Profile for Patrick Ho

MTH 432A - Introduction to Sampling Theory  

MTH 432A - Introduction to Sampling Theory

MTH 432A - Introduction to Sampling Theory  

MTH 432A - Introduction to Sampling Theory

Advertisement