
1 minute read
CHAPTER 5. CLUSTER SAMPLING WITH EQUAL PROBABILITIES
5.14 (a) Treating the proportions as means, and letting Mi and mi be the number of female students in the school and interviewed, respectively, we have the following summaries.
(b) We construct a sample ANOVA table from the summary information in part (a). Note that
Between psus 3
Within psus 96 0.837 0.0087
Total 99
5.15 (a) A cluster sample was used for this study because Arizona has no list of all elementary school teachers in the state. All schools would have to be contacted to construct a sampling frame of teachers, and this would be expensive. Taking a cluster sample also makes it easier to distribute surveys. It’s possible that distributing questionnaires through the schools might improve cooperation with the survey and give respondents more assurance that their data are kept confidential.
(b) The means and standard deviations, after eliminating records with missing values, are in Table SM5.5.1.
There appears to be large variation among the school means. An ANOVA table for the data is shown below.
(c) There appears to be a wider range of standard deviations for schools with higher means.
(d) Calculations are in Table SM5.5.2.
5.16 (a) Summary quantities for estimating y and its variance are given in the table below. Here, ki denotes the number sampled in school i We use the number of respondents in school i as mi approximate 95% confidence interval for the percentage of parents who returned the questionnaire is
(c) If the clustering were (incorrectly!) ignored, we would have had p ˆ = 160/281 = 569 with
=
569)/280 = 000876
5.16 (a) Table SM5.3 gives summary quantities; the column yi gives the estimated proportion of children who had previously had measles in each school.
CHAPTER 5. CLUSTER SAMPLING WITH EQUAL PROBABILITIES Table 5.3: Calculations for Exercise 5.16
5.18 Here is R code and output for analyzing reading:
data(schools)
# calculate with-replacement variance; no fpc argument # include psu variable in id; include weights dschools<-svydesign(id=~schoolid,weights=~finalwt,data=schools) readmean<-svymean(~reading,dschools) readmean mean SE reading 30.604 1.5533 confint(readmean,df=degf(dschools)) ## 2.5 % 97.5 % reading 27.09028 34.11797
# estimate proportion and total number of students with mathlevel=2 rlevel2<-svymean(~factor(readlevel),dschools) r2total<-svytotal(~factor(readlevel),dschools) confint(r2total,df=degf(dschools)) ## 2.5 % 97.5 %
# without replacement schools$studentid<-1:(nrow(schools)) dschoolwor<-svydesign(id=~schoolid+studentid,fpc=~rep(75,nrow(schools))+Mi, data=schools) readmeanwor<-svymean(~reading,dschoolwor) readmeanwor
## mean SE
## reading 30.604 1.4612 confint(readmeanwor,df=degf(dschoolwor))
## 2.5 % 97.5 %
## reading 27.29872 33.90953
# estimate proportion and total number of students with mathlevel=2 svymean(~factor(readlevel),dschoolwor)
Problem 5.20 library(nlme) readmixed <- lme(fixed=reading~1,random=~1|factor(schoolid),data=schools) summary(readmixed)
## Linear mixed-effects model fit by REML ## Data: schools ## AIC BIC logLik ## 1404.03 1413.91 -699.015
## Random effects: ## Formula: ~1 | factor(schoolid) ## (Intercept) Residual