2 minute read

Chapter 5 Cluster Sampling with Equal Probabilities

5.1 If the nonresponse can be ignored, then p ˆ is the ratio estimate of the proportion. The variance estimate given in the problem, though, assumes that an SRS of voters was taken. But this was a cluster sample the sampling unit was a residential telephone number, not an individual voter. As we expect that voters in the same household are more likely to have similar opinions, the estimated variance using simple random sampling is probably too small.

5.2 (a) This is a cluster sample because there are two sizes of sampling units: the clinics are the psus and parents who attended the clinics are ssus. It is a one-stage sample because questionnaires were distributed to all parents visiting the selected clinics.

(b) This is not a probability sample. The clinics were chosen conveniently. It is certainly not a representative sample of households with children because, even if the clinics were a representative sample, many parents do not take their children to a clinic.

5.3 (a) This is a cluster sample because there are two levels of sampling units: the wetlands are the psus and the sites are the ssus.

(b) The analysis is not appropriate. A two-sample t test assumes that all observations are independent. This is a cluster sample, however, and sites within the same wetland are expected to be more similar than sites selected at random from the population.

5.4 (a) This is a cluster sample because the primary sampling unit is the journal, and the secondary sampling unit is an article in the journal from 1988.

CHAPTER 5. CLUSTER SAMPLING WITH EQUAL PROBABILITIES

(b) Let and Mi = number of articles in journal i ti = number of articles in journal i that use non-probability sampling designs

From the data file, and

Then, using (5.20),

The estimated variance of the residuals is using (5.17),

Here is SAS code: data journal; infile journal delimiter=' , ' firstobs=2; input numemp prob nonprob ; sampwt = 1285/26;

/* weight = N/n since this is a one-stage cluster sample */ proc surveymeans data=journal total = 1285 mean clm sum clsum; weight sampwt; var numemp nonprob; ratio 'nonprob/(number of articles)' nonprob/numemp; run; data(spanish)

5.5 Here is code and output from R: spanish$popsize <- rep(72,nrow(spanish)) dspan <- svydesign(id=~class,fpc=~popsize,data=spanish) degf(dspan)

## [1] 9 mscore<-svymean(~trip+score,dspan) mscore

## mean SE

## trip 0.32143 0.0733

## score 66.79592 2.7091 confint(mscore,df=degf(dspan)) ## 2.5 % 97.5 %

## trip 0.1555907 0.4872665

## score 60.6675152 72.9243216 svytotal(~trip+score,dspan)

## total SE

## trip 453.6 111.82

## score 94262.4 6775.43 data spanish; set datalib.spanish; sampwt = 72/10; /* weight = N/n = 72/10 since one-stage cluster sample*/ proc surveymeans data=spanish total = 72 mean clm sum clsum; weight sampwt; cluster class; var trip score; run; worms<-data.frame( case= rep(c(1:12),each=3), can = rep(c(1:3),times=12), worms = c(1,5,7,4,2,4,0,1,2,3,6,6,4,9,8,0,7,3, 5,5,1,3,0,2,7,3,5,3,1,4,4,7,9,0,0,0) ) worms$Mi<-rep(24,nrow(worms)) worms$N<-rep(580,nrow(worms))

The following code for SAS software gives the same results. Since this is a one-stage cluster sample, there is no contribution to variance from subsampling. The fpc computed by the SURVEYMEANS procedure gives the correct without-replacement variance.

Note that the sum of the weights is 1411.2.

5.6 (a) The R code and output below was used to calculate summary statistics and the ANOVA table.

# worms # check data entry--using rep can be tricky! dworms<-svydesign(id=~case+can,fpc=~N+Mi,data=worms) mworms<-svymean(~worms,dworms) mworms

##

## mean SE

## worms 3.6389 0.6102 confint(mworms,df=degf(dworms)))

This article is from: