guidelines for impact evaluation in education using experimental design by IDB

guidelines for impact evaluation in education using experimental design

groups and no operative restrictions or differentiated costs. It is best to increase the sample size of groups for which observations across rounds of data collection are expected. Usually, control units are less likely to be willing to participate than treated units. Attrition is discussed in Section 5.1.3. For details on the derivation of the formula with different costs for data collection in treatment and control groups, see Duflo, Glennerster, and Kremer (2008). Some teachers may have a class size smaller than the estimated size as determined by the procedure described in equation 13. In these cases data should be collected on all students in the class. Omission of small classes or schools biases impact estimates toward the effect in larger schools. Information on the number of students in the class should be collected to weight results for estimations related to the student-level distribution. Two considerations should be present when considering carrying out impact evaluations with a small number of observations. First, the rule of thumb is that at least 30 observations are needed to invoke the law of large numbers associated with the assumption of normality for rejection of the null. Normality is required for consistent estimators. Second, power calculations based on 15 treatment and 15 controls would lead to a minimum detectable effect of one standard deviation, which is roughly equivalent to about one school year in the United States (Hill et al., 2007). Given that education interventions take time to bring large results, it is unlikely that effects will be found with such a sample size. On the other hand, there have been successful impact evaluations with a limited number of observations. For example, Banerjee and Somanathan (2007) include 55 schools in year 2 and 56 schools in year 3 to evaluate a computer-assisted learning component. They find effects of 0.42 and 0.27 standard deviations for students in the bottom and top tertiles, respectively. They also find that the effect fell after a year to 0.09 standard deviations. Other examples are Basinga et al. (2010), who used randomization at the district level on eight pairs for health outcomes, and Bloom et al. (2011), who used randomization with 20 experimental textile treatment plants. The factors listed in Table 6 should be taken into account when thinking about the minimum number of observations required to detect a given effect. Section 5 includes a discussion on techniques that can be applied to evaluations with randomization using a small group.

3.2

Assigning Treatment and Control Groups: Strata Randomization

Block or stratified randomization is based on the idea of creating groups that are as similar as possible and then randomizing within these groups. An example is the paper by Banerjee and Somanathan (2007) that looks at the impact of a program on test scores. The researchers stratified according to class size, language of instruction, school, gender, and pre-intervention test scores. Blocking improves precision to the extent the variables used for blocking explain the variation in test scores (Duflo, 33