SRP Follow-up 2 Report copy by Kampuchean Action for Primary Education (KAPE)

WORKING PAPERS: EXPANDED BASIC EDUCATION PROGRAM & EQUIP 1 (FINAL DRAFT)

School Readiness Program Evaluation Follow-up For Year 2: Terminal Achievement Testing Prepared by: Kampuchean Action for Primary Education (KAPE)

August, 2006 Phnom Penh, Cambodia

Funded with Support from UNICEF, the Swedish International Development Agency (Sida), and the United States Agency for

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

International Development (USAID)

ACKNOWLEDGEMENTS This Assessment was Generously Funded by UNICEF/Swedish International Development Agency (Sida)

And US Agency for International Development in collaboration with the American Institutes for Research and World Education

Affiliated Projects o Expanded Basic Education Project (EBEP) o Child Friendly School Initiative/KAPE o EQUIP 1/Educational Support to Children in Underserved Populations (ESCUP)

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

Table of Contents Abstract

INTRODUCTION 1.1 Background 1.2 About the Current Assessment

1 1 1

ASSESSMENT METHODOLOGY 2.1. Overall Assessment Design 2.2 Sampling Considerations 2.3 Test Development and Administration

2 2 3 4

REEARCH RESULTS 5 3.1 Comparison of Mean Scores & Mean Differences for Experimental and Control 5 Groups 3.1.1 General Trends in Achievement 5 3.1.2 Trends in Khmer Language Achievement 5 3.1.3 Trends in Mathematics Achievement 7 8 3.2 Comparison of Current Performance with Student Achievement in the Pilot Year 9 3.3 Performance Patterns in Individual Provinces 11 3.4 Relationships between Test Performance and Ascribed Characteristics

DISCUSSION OF RESULTS AND CONCLUSIONS 4.1 General Overview and Commentary 4.2 Constraints to Consider 4.3 Implications and Conclusions

12 12 13 14

References ATTACHMENTS Attachment 1: Tables of Specifications (Khmer Language and Mathematics) Attachment 2: T-test Probability Values

List of Tables Table 2.1 Table 2.2 Table 2.3

Equivalency between Research Conditions by Sub-sample (Number of Schools) Characteristics of the Test Sample with respect to Age and Sex Test Content Specifications

Table 3.1 Table 3.2 Table 3.3

Mean Score Test Results for Khmer Language Mean Score Test Results for Mathematics Comparison of Test Performance in Pilot and Current Years across Selected Parameters Comparison of SRP and Control Group Mean Scores by Province Correlation Coefficients for Total Test Score and Ascribed Student Characteristics

Table 3.4 Table 3.5

iii

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

Abstract This study is the fourth and final on-going investigation of learning impacts generated by the School Readiness Program in Cambodian state schools, which are supported by selected donors. These studies, including the present one, have spanned a period of two years and have been commissioned by the donor in order to provide feedback about program implementation. The SRP program was designed to improve academic performance among Grade 1 children and reduce repetition rates. The primary research question addressed by the current study was whether exposure to the experimental condition (i.e., study in an SRP classroom) would have any impact on terminal learning achievement in core curriculum areas among beneficiaries in Grade 1. Core curriculum areas were defined as Khmer Language and Mathematics. Other ancillary questions investigated included: (i) the identification of topical areas where interventions had the most and least impact; (ii) the degree to which performance advantages observed in the pilot year have been maintained; (iii) variations in test performance across each of the four provinces participating in the study; and (iv) variations in test performance with respect to ascribed characteristics such as age and sex. The research study employed a pre-experimental static group comparison design that generated comparative data from a terminal test administration in two independent samples of children in the closing months of the academic year. Total sample size comprised 2,484 children in 66 primary schools across four provinces. Study results indicated mean differences that were generally significant in favor of the experimental group in 17 out of 25 sub-topical areas. Mean differences with the control group were particularly large for Mathematics with smaller but still significant differences being registered for Language. The study also found that although impacts continued to be significant in favor of the experimental group across a majority of topical areas, these impacts appear to be diminishing in comparison to those registered during the pilot year of implementation. In addition, there appeared to be major performance differences between those provinces with local support networks as well as high exposure to project-affiliated interventions and those without such exposure. The latter province did not exhibit differences with control group children to the extent found in the other provinces, suggesting the crucial role played by support networks in ensuring effective program implementation.

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

1. INTRODUCTION 1.1 Background The current study is the last in a series of four assessments to monitor on-going implementation of the School Readiness Program (SRP) in selected school sites supported by UNICEF/Sida. The latter program has been a major initiative of the Ministry of Education, Youth, and Sport and a range of cooperating donors to address rates of student repetition among Grade 1 children, which have been steadily increasing over the last several years. Such rates have been fixed at a level of over 20% for the whole kingdom in recent years and have shown little disposition to decrease (MoEYS, 2006). The School Readiness Program was introduced as a pilot in the 2004/5 academic year to help reduce these reported trends in student repetition. After an earlier series of empirical assessments commissioned by UNICEF in the first year of SRP implementation (i.e., 2004/5) demonstrated Table 1.1: Student Repetition Patterns in Selected Provinces, 2000-03 cautious optimism with % Change respect to a wide number 2000/01 Province 2001/02 2002/03 2003/04 from Base (Base Yr) of programmatic outcomes Year (e.g., Bredenberg, 2004), Kampong Thom 13.3% 15.7% 17.1% 20.0% +50.4% the MoEYS decided to Kampong Cham 21.3% 22.7% 24.9% 29.4% +38.0% embark on a rapid Kratie 18.8% 16.2% 17.3% 21.9% +16.5% program of expansion. Prey Veng 24.5% 24.5% 26.1% 27.5% +12.2% This culminated in a series National 17.5% 17.7% 19.0% 23.6% +34.9% of massive teacher Source: EMIS, 2000-05 trainings across the country at the beginning of the academic year just ending (2005/6). Nominal coverage of the program claimed by government currently stands at 30% of all Grade 1 teachers (MoEYS, 2005). The School Readiness Program is based on the premise that interventions, which increase the learning readiness skills among young children and improve classroom practices employed by teachers should help to address a large part of the causality underlying high repetition rates. Although it is recognized that high rates of repetition reflect a highly complex set of problems in the educational system including chronic teacher shortages, high pupil teacher ratios, poor infrastructure, and irregular attendance patterns among children, it is hoped that SRP can at least effectively address problems relating to curricular appropriateness and teaching methodology. Thus, SRP is seen as an intervention that attempts to make needed changes in classroom practice and learning content, which by itself is perhaps not sufficient to reduce repetition but which nonetheless constitutes a necessary set of interventions through which to do so. 1.2 About the Current Assessment Each year, UNICEF has requested KAPE, a local NGO, to help monitor the implementation of SRP interventions through a number of empirical investigations such as the present one that can help guide future programming. These assessments generally comprise two steps each year. The first set of investigations, which occurs at the beginning of the school year, focuses on a review of classroom organization, teaching and learning, and the development of key skills among children. Systematic data collection activities in this context employ classroom observations that use standardized assessment tools and teacher interviews in order to generate qualitative data, which can provide useful insights not afforded by quantitative data sets. These investigations are then followed by a complementary study such as the current one, which focuses more narrowly on student achievement in Khmer Language and Mathematics competencies at the end of the school year. Since the beginning of

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

the program in 2004, four studies (including the present one) have been conducted by KAPE to provide information to government and UNICEF about successes and challenges encountered each year. The testing activities conducted in the current investigation have focused on newly trained SRP teachers who began teaching in the academic year just ending. In general, the selection of teachers and classrooms for study has been guided by the areas where UNICEF is supporting school improvement activities including Prey Veng, Kampong Thom, and Kampong Cham Provinces. An important change in the scope of SRP investigations from the previous year has been the decision to include Kratie Province where School Readiness interventions were recently introduced under USAIDâ&#x20AC;&#x2122;s Educational Support to Children in Underserved Populations Program (ESCUP). With support from World Education who implements the ESCUP program in cooperation with KAPE, investigators have, therefore, also carried out testing activities in this 4th province. The addition of Kratie to the sample of teachers and students studied has proven interesting because it is a province that has never received large-scale technical support in the past, unlike the other three. Thus, it helps to provide some insights on the challenges of expanding SRP activities into areas where education officials and teachers have had little exposure to some of the new initiatives now supported by MoEYS.

2. ASSESSMENT METHODOLOGY 2.1 Overall Assessment Design The design of the current assessment is very similar to the one conducted at the end of 2005. In this respect, the study employed a pre-experimental static group comparison design that generated comparative data from a terminal test administration in two independent samples of children in the closing months of the academic year. These comparisons derive from test scores of children studying in an Box 1: Research Questions experimental condition (i.e., those learning in SRP classrooms) and those in a control condition in non-SRP Primary Research Question: program sites. Approximately 1,200 children participated in o Does exposure to the each research condition. The total number of children tested experimental condition (i.e., SRP classroom environments) in all was 2,484 (see below). Research questions were also similar to those addressed in 2005 and these are summarized in Box 1. An additional question, however, that has been included in this yearâ&#x20AC;&#x2122;s investigation relates to a comparison of the performance of SRP children in the current year with those who were tested in the first year of the pilot. In this respect, researchers were interested in whether SRP children would maintain their performance advantage over non-SRP children and if so, to what extent. This question is of some importance as the program mounts a large expansion under the direction of the Ministry and assorted donors. Determinations of impact in this study are based on comparisons of mean scores for each core subject with a breakdown of scores for specific sub-topical areas also provided. Inter-group comparisons were made using a t test for samples of unequal variance to determine whether any perceived differences in mean scores were significant at a

have any impact on overall learning achievement in core curriculum areas among beneficiaries in Grade 1?

Secondary Research Questions: o Which topical areas, if any, do SRP interventions have the most and least impact? o To what degree has a performance advantage demonstrated by SRP children in Year 1 been maintained into the current year? o What variations in performance were observed across the four provinces? o Are there variations in test performance with regard to ascribed characteristics such as sex and age?

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

probability level of p<.05. Variations in performance by age and sex were also investigated using the Pearsonâ&#x20AC;&#x2122;s r to determine the direction and strength of relationships, if any, between test scores and ascribed characteristics. Tests were administered during June and July of 2006, which is when many schools in Cambodia start closing. The research team had to walk a tight line between waiting long enough for as much curriculum content as possible to be covered but not to the point where teachers stop teaching and schools close to accommodate the monsoon planting season, which begins in June (depending on rainfall). Test sites were spread across four provinces where the School Readiness Program had recently been expanded including Kampong Thom, Prey Veng, Kampong Cham, and Kratie Provinces. 2.2 Sampling Considerations As noted earlier, the current study has focused on newly trained SRP teachers across the four provinces where the two donors are supporting schools. These considerations defined the research population of the study. Sample construction utilized a judgmental or purposeful sampling technique in which schools were chosen from this population to form a mix of children from both rural and urban schools as well poor and affluent communes. The selection criteria used to generate such a mix of schools included the demographic characteristics of school settings and Table 2.1: Equivalency between Research Conditions poverty rates reported by the Census for by Sub-sample (Number of Schools) each commune where a school is situated. Sample Experimental Control Schools in the control condition were Characteristic Group Group matched with SRP schools on the basis of Demographic Background these criteria (see Table 2.1). A total of Urban 30 30 33 schools were tested in each research Rural 3 3 condition involving a total sample of 66 Poverty Tier primary schools in all. The number of 33% poor or less 13 14 schools participating in the study was 34-66% poor 16 15 equally spread out among each province 66% or more poor 4 4 with about eight schools selected in each Total Schools 33 33 province in each research condition. When tests were conducted in any given school, all children in the SRP classroom situated in that school were interviewed to prevent the selection bias that usually occurs when teachers or administrators are asked to choose students for testing. The same selection rule was employed Table 2.2: Characteristics of the Test Sample with for Grade 1 classes in non-SRP schools. This respect to Age and Sex approach necessitated a highly labor-intensive Experimental Control design for testing considering that each child Group Group was interviewed for approximately 15 minutes for each subject test (of which there were two). Age 7.84 years 8.22 years In all, 1,234 children were tested from SRP Average Age Sex classrooms in comparison to 1,250 children 637 619 from non-SRP classrooms (see Table 2.2). The Boys Girls 597 631 age of the control group sample was slightly Province older than that of the experimental group by a Kampong Cham 356 369 margin of 0.38 years. The distribution of boys Kratie 256 260 and girls in each group, however, was roughly Kampong Thom 330 334 equivalent. The number of children Prey Veng 292 287 interviewed in each province ranged between Total 1,234 1,250

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

200 and 300 for each research condition. The number of children in each condition for each province was also approximately equivalent within a range of four to twelve students. 2.3 Test Development and Administration The same achievement test batteries were used during this year’s test administration with some slight modifications based on a review of performance on last year’s tests. As implied above, tests were administered in an interview format to address the problem of children’s young age and their inexperience in taking paper and pencil tests. Each subject interview was designed to last 10 to 15 minutes requiring up to 30 minutes per child. This required approximately 1,242 Table 2.3: Test Content Specifications person hours to complete all student interviews. Each subject test covered a Topical Approximate Sub-topical Areas number of sub-topical areas, which are Area Weighting summarized in Table 2.3 below. In this KHMER LANGUAGE respect, the subject test for Khmer Listening Counting Syllables Language covered 10 sub-topics outlined in and 29% Syntax the curriculum while the subject test for Speaking Oral Word Usage Mathematics included 15 such sub-topics. In Word Meanings total, test interviews, therefore, made Reading 33% Sound Letter Discrimination inquiries across 25 topical areas from the Reading Aloud core curriculum. In all, both subject tests Reading Comprehension comprised 69 discrete tasks (40 for Spelling Language and 29 for Math) that consisted Writing 39% Writing Words primarily of questions requiring oral Sentence Composition responses or psychokinetic manipulations of MATHEMATICS test material. These task characteristics Math Notation allowed test content to be covered quickly. Sequencing Numbers 1-20 Number

Sequencing Numbers 1-100

41% The development of questions for each test Concepts Concept of Tens & Units followed field-testing of question items and Comparing Number Size appropriate revisions based on classical item Writing Numbers analysis techniques. Most test items fall Adding & Subtracting Addition & within a range of moderate difficulty. As Numbers less than 10 12% Subtraction noted above, test questions were diverse in Adding & Subtracting Algorithms format and included task work requiring Numbers less than 20 manipulation of letter and number cards, Oral Problem Solving with Numbers 1-10 oral responses, and slate writing. Test Problem 9% Oral Problem Solving with developers tried to formulate questions in a Solving Numbers 1-20 way that both isolated specific skills and Geometry Shape Recognition 12% also minimized the confounding influence Volume Comparing Volume of associated skills that might block or 9% hinder assessment of the target area. For and Weight Concept of Weight Knowing Days of the Week 12% example, students were asked to spell out Time Using Money 6% words not by writing out the words but by Money arranging letter cards in meaningful strings. Thus, even if a child had limited writing skills, s/he could still arrange letters in meaningful groupings simply by manipulating the letter cards provided.

A total of 32 interviewers (eight per site) were recruited locally to administer subject tests in each provincial site. Although many had participated in similar testing exercises in previous years, some were new, particularly in Kratie where testing of this nature had never before been conducted. All

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

proctors received a one-day training from the visiting research team in an explicit behavioral protocol to ensure standardized testing conditions in each site. This protocol included guidelines on the set-up of test stations that were suitably separate from each other to prevent cuing, greetings to be used with children to put them at ease, guidelines that prohibited any commentary on student performance that might inhibit future responding, and other behaviors that might affect childrenâ&#x20AC;&#x2122;s ability to answer to the best of their knowledge.

3. RESEARCH RESULTS 3.1 Comparison of Mean Scores & Mean Differences for Experimental and Control Groups 3.1.1 General Trends in Achievement In general, SRP children continued to outperform their peers in unsupported schools in both Khmer Language and Mathematics. In this regard the overall mean difference (MD) between the two research conditions for Language was 3% while for Mathematics it was 5%, both in favor of the experimental group (see Tables 3.1 and 3.2). Mean differences in this regard were statistically significant at the p<.05 probability level. Using absolute percentage scores as the standard of performance, SRP children seemed to perform better in Mathematics than was true of Language (53% versus 37%) and the same also held true for scores among control group children. With some exceptions, mean differences for the various sub-topics that were statistically significant did not range greatly in magnitude and were mainly in the single digits. In addition, the overall performance advantage demonstrated by SRP was not uniformly true across all sub-topical areas. In the case of Language, SRP children demonstrated statistically significant performance advantages in seven sub-topical areas out of ten, while in Mathematics this was true in the case of ten sub-topical areas out of 15. In the three cases in Language and five cases in Mathematics where control group children demonstrated a performance advantage, these proved to be not statistically significant, suggesting that overall performance was about the same. 3.1.2 Trends in Khmer Language Achievement The overall mean score for Language achievement among SRP children was 37% compared with 34% among control group children. Although the overall mean score for Language among children studying in SRP classrooms was significantly higher than the control group by a small margin (+3%), performance in individual sub-topics was highly varied (see Table 3.1). In general, SRP children showed the best performance in Listening and Speaking (MD=+6%) followed by Writing (MD=+4%). Mean differences in performance were highest in sub-topical areas dealing with Counting Syllables (MD=+7%), Syntax (MD=+6%), Spelling (MD=+5%), and Reading Comprehension (MD=+5%). Differences were least in topics dealing with Word Meanings (MD=-1%), Sound Letter Discrimination (MD=-1%), and Reading Aloud (MD=+2%). Although control group children registered a slight numerical advantage in terms of their mean scores for Word Meanings and Sound Letter Discrimination, these differences were not statistically significant. Indeed, mean scores were generally not significantly different between the two groups for tasks in Reading at all with the exception of Reading Comprehension where a relatively strong performance advantage was observed as noted above. The equivalent scoring among experimental and control group children in Reading nevertheless represents a change of some significance from last yearâ&#x20AC;&#x2122;s testing where the former exhibited a significant performance advantage in the double digits.

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

In terms of absolute percentage scores, children in both research conditions appeared to perform best on tasks relating to Listening & Speaking and Reading. Among SRP children, the mean score across all topics was 49% for Listening and Speaking and 48% for Reading. Although SRP children demonstrated significantly better scoring in Writing tasks than their control group counterparts, the level of scoring was still relatively low in absolute percentage terms. In this respect, SRP children showed an overall mean score of only 19% for Writing. Sub-topic scores included 22% for writing dictated words on a slate, 19% for correctly spelling dictated words using letter cards, and 16% for arranging word cards to form a sentence. The latter may have been particularly challenging for children as they were asked to invent their own sentences using a number of given words. Still, the requested sentences were relatively simple involving as few as three words in Khmer (e.g., ‘Phhu tao srae’ or ‘Uncle Box 2: Sorted Language Scores goes to the field’). These low scores echo trends from for SRP Children by Sub-topic earlier assessments where tested children evinced the lowest level of achievement in writing. 1. Reading Comprehension: 62% 2. Counting Syllables:

61%

When confining relative comparisons to within the SRP 55% 3. Word Meanings:* group, one observes that the areas where children performed 4. Sound Letter Discrimin:* 50% 44% 5. Oral Word Usage: best included Reading Comprehension, which was the top 41% 6. Syntax: scoring sub-topic, Counting Syllables, and Word Meanings 22% 7. Writing Words: (see Box 2). The very high mean score for Reading 19% 8. Reading Aloud:* Comprehension, which was also significantly higher than 19% 9. Spelling: that of the control group, helps to mitigate the observation 16% 10. Sentence Composition: made above that reading scores as a whole across all subtopical areas were about the same as the control group. On *Score not significantly different from Control Group. the other hand, the areas where SRP children continue to perform poorly relative to other sub-topical areas include Sentence Composition, Spelling, and Reading Aloud as was noted earlier. Indeed, at only 16%, Sentence Composition is the lowest score registered by children receiving SRP interventions. Table 3.1: Mean Score Test Results for Khmer Language

Writing

Reading

Listening and Speaking

Content Area Counting Syllables Syntax Oral Word Usage Subtotal Word Meanings Sound Letter Discrimination Reading Aloud Reading Comprehension Subtotal Spelling Writing Words Sentence Composition Subtotal GRAND TOTAL

Experimental Group Mean Score 61% 41% 44% 49% 55%

Control Group Mean Score 54% 35% 40% 43% 56%

Mean Differences (MD) +7% +6% +4% +6% -1%

Differences between Mean Scores are Significant at p<.05

50%

51%

-1%

19%

17%

+2%

62%

57%

+5%

Yes

48% 19% 22%

47% 14% 20%

+1% +5% +2%

No Yes Yes

16%

12%

+4%

Yes

19% 37%

15% 34%

+4% +3%

Yes Yes

Yes Yes Yes Yes No

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

3.1.3 Trends in Mathematics Achievement The performance advantage exhibited by SRP children over the control group tended to be somewhat larger in Mathematics than was true for Language (see Table 3.2). In this respect, the overall mean difference between the SRP and control groups was +5% in favor of the experimental group. In this regard, the overall mean score for the experimental group was 53% versus 48% for the control group. Of the 15 sub-topical areas tested in Mathematics, SRP children demonstrated a statistically significant performance advantage in ten of them in comparison to their control group counterparts. Areas where the performance advantage was greatest included Geometrical Concepts (MD=+10%), Sequencing Numbers 1-20 (MD=+10%), and Comparing Number Size (MD=+7%). The sub-topical areas where this advantage was least (though still significant) included Math Notation (MD=+3%) and Comparing Volume (MD=+3%). It should be noted, however, that in most cases SRP performance advantages tended to be in the single digits and were, therefore, relatively small in magnitude, albeit still significant statistically. Areas where SRP children demonstrated no performance advantage included Writing Numbers, Oral Problem Solving, Concept of Weight, and Using Money. Table 3.2: Mean Score Test Results for Mathematics

Time

Volume & Weight

Geometry

Problem Solving

Addition & Subtraction Algorithms

Number Concepts

Content Area Math Notation Sequencing Nos. 1-20 Sequencing Nos 1-100 Concept of Tens & Units Comparing Number Size Writing Numbers Subtotal Adding & Subtracting Numbers less than 10 Adding & Subtracting Numbers less than 20 Subtotal Oral Problem Solving with Numbers 1-10 Oral Problem Solving with Numbers 1-20 Subtotal Shape Recognition Subtotal Comparing Volume Concept of Weight Subtotal Knowing Days of the Week Subtotal

Experimental Group Mean Score 46% 42% 71%

Control Group Mean Score 43% 32% 66%

Mean Differences (MD) +3% +10% +5%

Differences between Mean Scores are Significant at p<.05

42%

38%

+4%

Yes

32%

25%

+7%

Yes

48% 46%

46% 42%

+2% +4%

No Yes

47%

43%

+4%

Yes

34%

28%

+6%

Yes

40%

35%

+5%

Yes

59%

57%

+2%

46%

44%

+2%

51%

48%

+3%

66%

56%

+10%

Yes

66%

56%

+10%

Yes

90%

87%

+3%

Yes

94%

91%

89%

+2%

Yes

50%

44%

+6%

Yes

50%

44%

+6%

Yes

Yes Yes Yes

Money

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

Using Money Subtotal GRAND TOTAL

48%

53%

48%

+5%

Yes

Within group comparisons of the absolute percentage Box 3: Sorted Mathematics Scores scores of SRP children in Mathematics indicate that the for SRP Children by Sub-topic areas where they performed best included Concept of Weight, Concept of Volume, and Sequencing Numbers 94% 1. Concept of Weight:* 1 to 100. Absolute percentage mean scores in this regard 90% 2. Concept of Volume: were 94%, 90%, and 72%, respectively (see Box 3). To 72% 3. Sequencing Nos. 1-100: 66% be sure, not all of these mean scores registered 4. Shape Recognition: 59% significant differences with the control group suggesting 5. Problem Solving Nos. 1-10:* 50% 6. Days of the Week: that it may simply be natural for children from the 48% 7. Using Money:* chosen population to perform well in such areas. Areas 48% 8. Writing Numbers:* where SRP children seemed to have the most difficulty 9. Adding/Subtracting Nos >10: 47% included Comparing Number Size, Adding and 10. Problem Solving Nos 1-20:* 46% Subtracting Numbers less than 20, and Sequencing 46% 11. Math Notation: Numbers 1 to 20. The corresponding mean scores in this 42% 12. Concept of Tens/Units: regard were 32%, 34%, and 42%, respectively. 42% 13. Sequencing Nos. 1-20: Although these sub-topics represent the lowest scoring 14. Adding/Subtracting Nos >20: 34% 32% areas in a relative sense among children studying in 15. Comparing Number Size: SRP learning environments, they were nevertheless *Score not significantly different from significantly better than the performances of control Control Group. group children. In addition, these low scores are certainly in a Table 3.3: Comparison of Test Performance in Pilot and numerical range, which is more Current Years across Selected Parameters acceptable than comparable low scores for Language. This once Parameter Number of Instances again suggests that language Pilot Current Change competencies are where children Year Year seem to have the most difficulty Total Percentage Test Scores for with respect to their achievement. SRP Children 3.2 Comparison of Current Performance with Student Achievement in the Pilot Year

Khmer Language Mathematics Sub-topics Showing Statistically Significant Difference in Favor of SRP Group Khmer Language Mathematics Total Subtopics showing Statistically Significant Mean Differences of Two Digits Khmer Language Mathematics Total

38% 59%

37% 53%

-1% -6%

10 7 -3 In view of the rapidly expanding 12 10 -2 nature of the School Readiness 22 17 -5 Program nationally, an interesting question that this study hoped to address at least in part is to what degree SRP populations have 8 0 -8 maintained their performance 10 2 -8 advantage in comparison to last 18 2 -16 yearâ&#x20AC;&#x2122;s very encouraging results. A *Note: Total number of sub-topical areas is 25 review of test results along several parameters is shown in Table 3.3. In this regard, it can be seen that mean scores for Khmer Language have been surprisingly constant over the last two years (38% in the pilot year versus 37% in the

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

current year). Similarly, mean scores for Mathematics fell into the same general range over both years of testing though with evidence of some slippage in the current year (59% in the pilot year versus 53% currently). A review of mean scores for sub-topical areas presents a more complex picture with regard to changes in student performance over the last two years. In this respect, the number of sub-topical areas where mean differences between experimental and control groups were statistically significant has slipped from 22 instances out of 25 in the pilot year to only 17 out of 25 currently. In percentage terms, this means that SRP children outperformed control group counterparts in 88% of the topical areas tested in the pilot year compared to 68% of the topics tested in the current year. Similarly, the number of subtopical areas where mean differences have been characterized by double digit magnitudes has also slipped downwards from 18 cases in the pilot year to only two instances in the current year. A review of Table 3.3 indicates that no instances of double-digit differences emerged in Language (compared to eight instances in the pilot year) whereas there were two such instances in Mathematics compared to 10 in the previous pilot year. In an overall sense, it would appear that while SRP children continue to maintain a broad-based performance advantage over control group children in both core subjects, this advantage seems to have shrunk somewhat in both breadth and magnitude in the current year of testing. 3.3 Performance Patterns in Individual Provinces Researchers also examined performance trends in each individual province in order to detect possible differences. As noted earlier, the discovery of different patterns of performance was seen as a potentially important finding given the addition of Kratie Province to the studyâ&#x20AC;&#x2122;s purview and its contrasting background in comparison to the other three provinces. That is, it is a province with very little history of technical support or exposure to new technical initiatives currently supported by MoEYS. Test Box 4: No. of Significant Mean Differences by Topical Area in results by province, subject, and major topical area are Favor of SRP Groups (p<.05) summarized in Table 3.4. In general, all provinces except Kratie demonstrated significantly better performance by Khmer SRP children across both subject tests. For Language, 3 Kampong Cham Kampong Cham exhibited the highest margin of difference 1 Kampong Thom at 9% followed by Prey Veng and Kampong Thom with 7% 2 Prey Veng and 4%, respectively. For Mathematics, Kampong Thom 1 Kratie led the pack with the largest margin of difference at 12%, followed by Prey Veng and Kampong Cham at 6% and 4%, Mathematics 3 Kampong Cham respectively. To be sure, it should be noted that only one 7 Kampong Thom province (Kampong Cham) maintained a performance 3 Prey Veng advantage in Reading (albeit by a much smaller margin), as 1 Kratie was true of cross-province sample of children tested last year. This helps to validate the observation that the poorer Note: performance of children in Kratie was not entirely Total Topics for Khmer: Language: 3 Total topics for Mathematics: 7 responsible for depressing Language scores in this yearâ&#x20AC;&#x2122;s test results. Test results were less sanguine in Kratie where SRP and control group mean differences were inverse from what was expected. That is, control group children actually did better than children from SRP classes, albeit by an extremely small margin. Although this inverse mean difference was not statistically significant in the case of Mathematics, it was significant in the context of Language. To be sure, the performance advantage in Language was slight at 1%. Various explanations are being examined to account for this disappointing outcome. These include the observation that MoEYS was

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

Table 3.4: Comparison of SRP and Control Group Mean Scores by Province

Mathematics

Language

Topical Area

Listening & Speaking Reading Writing Total Number Concepts Addition & Subtraction Algorithms Problem Solving Geometry Volume & Weight Time Money Total

Kampong Cham SRP Control Mean Mean Group Diff Score Mean Score 58% 40% 18%

Kampong Thom SRP Control Mean Mean Group Diff Score Mean Score 44% 45% -1%*

44%

Prey Veng Control Group Mean Score 31%

48% 20% 42% 45%

47% 14% 33% 38%

1% 6% 9% 7%

57% 28% 43% 56%

54% 19% 39% 45%

3%* 9% 4% 11%

37% 14% 32% 35%

34% 10% 25% 32%

3%* 4% 7% 3%*

41% 13% 39% 39%

49% 19% 40% 45%

-8% -6% -1% -7%

42%

35%

52%

30%

22%

32%

29%

3%*

36%

39%

-3%*

49%

46%

3%*

70%

63%

49%

40%

39%

55%

-17%

63% 95%

55% 91%

8% 4%*

73% 97%

65% 91%

8% 7%

71% 93%

60% 87%

11% 6%*

61% 93%

61% 73%

0%* 20%

46% 38% 54%

42% 42% 50%

4%* -4% 4%

61% 60% 67%

44% 50% 55%

17% 10% 12%

50% 58% 55%

43% 52% 49%

7% 6%* 6%

47% 45% 51%

49% 53% 53%

-2%* -8% -2%*

SRP Mean Score

Mean Diff

SRP Mean Score

13%

62%

Kratie Control Group Mean Score 52%

Mean Diff

10%

Note: Kampong Cham: N=725; Kampong Thom: N=664; Prey Veng: N= 579; Kratie: N=506 *Indicates mean differences are not statistically significant at p<.05

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

not able to field any trainers in Kratie until the end of October leading to a very late start for SRP children and the loss of 14 study days. In contrast, training of teachers (by local trainers) in the other provinces occurred well before the start of the school year and did not result in any loss of study days. These outcomes suggest the potency of planning arrangements with MoEYS to build the capacity of local trainers in each province. Other possible factors to consider include the inexperience of the province and district in supporting the intervention, inadequate technical support during the year, and the highly remote terrain that characterizes many of the schools there (e.g., island schools, etc.). Although each of the three high performance provinces registered statistically significant mean differences for both subject tests, an examination of mean differences for the various topical sections in each subject test provide additional insights into performance patterns. These are noted in Box 4. Echoing the results shared earlier, Kampong Cham appeared to lead the other provinces in Language with three statistically significant mean differences out of the three topics tested while Kampong Thom demonstrated seven such mean differences out of seven topical areas tested in Mathematics. Performance among children in Prey Veng was moderately good with significant mean differences for two topics out of three in Language and three out of seven for Mathematics. Once again, performance in Kratie was more marginal with significant mean differences in favor of SRP children in only one topic in each subject test. 3.4 Relationships between Test Performance and Ascribed Characteristics Analyses of differential test performance with respect to ascribed characteristics did not yield any compelling findings that suggest a strong relationship between sex and test performance in either research condition. Values for r were not statistically significant for Khmer Language for either of the tested groups and were exceedingly weak (albeit statistically significant) in the case of Mathematics among SRP children. The computation of the Pearson product moment coefficient in this regard yielded a weak value of r = â&#x20AC;&#x201C;0.06 only, indicating that boys tended to perform slightly better than girls (see Table 3.5). In contrast, similar analyses did yield significant relationships of a moderate magnitude for test performance and age across all research conditions. These relationships were particularly strong for Mathematics where Pearson product moment coefficient values corresponded with 0.28 for the experimental group and 0.31 for the control group. Values for r were weaker for Khmer Language at 0.16 and 0.18 for the experimental and control groups, respectively. These coefficient values suggest a moderately strong positive relationship between a childâ&#x20AC;&#x2122;s age and test score, which is an intuitively logical relationship. Coefficient values where highest among control group children where it should also be remembered that the mean age of testees was slightly higher than among SRP children. Table 3.5: Correlation Coefficients for Total Test Score and Ascribed Student Characteristics

Parameter

Experimental Group Khmer Language Mathematics Signif Correlation Signif Correlation at at Coefficient Coefficient p<.05

p<.05

Sex -0.02 No -0.06 Yes Age 0.16 Yes 0.28 Yes N=1,234 (Experimental Group); N=1,250 (Control Group)

Control Group Khmer Language Mathematics Signif Correlation Correlation Signif at at Coefficient Coefficient p<.05

0.05 0.18

No Yes

p<.05

0.00 0.31

-Yes

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

4. DISCUSSION OF RESULTS AND CONCLUSIONS 4.1 General Overview and Commentary With some qualifications, the results of this year’s testing activities generally continue to parallel those found during the pilot year. In the main, this has meant that SRP children still tend to outperform control group children from comparable schools by statistically significant margins in both Khmer Language and Mathematics. This suggests that there is a high likelihood that these trends occurred not by chance but were due in some part to program interventions. Although there were some exceptions to these observations when reviewing individual sub-topical scores in each subject test (see below), the overall trend in performance appeared to suggest a clear advantage enjoyed in most (though not all) areas by SRP children. This stronger performance among SRP children is all the more significant given the moderate positive relationship found between children’s age and test scores in both research conditions, suggesting a slender advantage for the control group, which evinced a slightly higher mean age among testees of 0.38 years. Performance advantages enjoyed by SRP children seemed to be greatest in Mathematics both in terms of the mean difference registered when comparing total subject test scores from each research condition but also in terms of absolute percentage values. This result is highly consistent with other achievement test results (e.g., Bredenberg, 2004; KAPE, 2005) where Language scores tend to lag behind those in Mathematics. Indeed, in this year’s test administration, mean scores for Reading showed a precipitous decline from last year across all tested provinces. This suggests that the acquisition of Language skills is where young Cambodian children are encountering the most difficulty during their first year at school. The above observations, however, were qualified in a number of respects. In this regard, it should be noted that although overall percentage scores, which were higher for SRP children, present a simple picture in terms of children’s performance, a review of learning outcomes for individual sub-topical areas suggests a more complex pattern of performance. For example, tasks requiring higher order thinking skills such as sentence composition or problem solving in Mathematics tended to register relatively low scores relative to other areas, although it should be noted that in some cases these were still significantly better than control group scores (e.g., Sentence Composition). This pattern echoes the findings of case study observers of SRP classrooms earlier in the school year in which it was noted that, ‘teachers preferred to focus on the acquisition of basic skills in literacy and numeracy where right/wrong questions and uni-dimensional task work tended to predominate [thereby limiting] the scope for children to engage in activities that emphasized inquiry or task work.’ (Pigott, 2006) Thus, SRP seems to be having the most success in facilitating the acquisition of basic thinking skills but somewhat less so in areas where more higher order thinking is required. Divergent patterns in performance were also observed between provinces where low exposure to school improvement projects may greatly undermine SRP outcomes as they relate to student achievement. This tentative conclusion primarily reflects the very disappointing results registered by children studying in SRP classrooms in Kratie Province in comparison to other provinces, which have a long history of association with projects supported by UNICEF and other NGOs. Although this study did not conduct any systematic investigation of specific variables in Kratie in relation to test score

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

results, it seems likely that factors relating to remoteness, levels of exposure to development projects, the availability of local trainers, and the intensity of technical support for teachers play a large role in the success of activities such as SRP. Finally, a review of performance patterns in the current and pilot years suggests some slippage in achievement scores. Although total scores for both subject tests were remarkably similar year-on-year, a pattern of slippage was reflected in other more subtle ways. This included a smaller number of instances where sub-topical mean differences between experimental and control groups were significantly different as well as the lower frequency of mean differences of a two-digit nature. In both cases, the change from year to year was somewhat stark. Although it must be remembered that the low scores for children in Kratie clearly helped to depress total scores, it must also be noted that children from Kratie only constituted about 21% of the experimental group and that achievement outcomes in individual provinces also seem to suggest a genuine diminution in test performance. For example, of the three provinces that participated in last yearâ&#x20AC;&#x2122;s testing exercise, only one was able to maintain a statistically significant performance advantage in an important area such as Reading, and even this advantage was very slight. This pattern of achievement is in marked contrast to last yearâ&#x20AC;&#x2122;s results where a double-digit advantage was registered in Reading. It, therefore, seems unlikely that the low achievement of children from Kratie alone could have been entirely responsible for the diminution in performance with respect to the above observations. Rather, there is a high probability that an expansion in program activities is causing outcomes to diminish, at least in part, as is true when all pilots move from a small scale to become large, national programs. 4.2 Constraints to Consider The use of achievement testing as a means to assess program outcomes is an inherently risky way to determine the effectiveness of any set of interventions. Such tests often tend to greatly oversimplify the complexities of childrenâ&#x20AC;&#x2122;s learning, are highly reductionist in their approach to measurement, and are often the basis for rash conclusions. Even under the best of circumstances, a great deal can go wrong when conducting achievement tests, which suggests the need for caution in the interpretation of their results. Unfortunately, the need for expediency when assessing programs and the desire for simple answers always seem to dictate against such cautionary advice. Nevertheless, the reader should be reminded that it is best to look at the results of the current investigation with the above thoughts in mind. It is equally true that the greatest credence should probably be placed in those trends, which are most consistent with findings from earlier studies. That is, the probability that such findings occurred by chance or factors that compromise test validity are much lower. Divergent findings, on the other hand, should be treated with much greater great caution. Another constraint to consider in light of the outcomes of this study refers to the degree to which its findings can be generalized to the School Readiness Program as a whole. In this regard, it must be remembered that the sample used for this assessment was drawn from a population of schools that receive both material and technical support from long-term projects supported by UNICEF and USAID. Since the majority of areas where the government has recently expanded the School Readiness Program do not have such advantages, it would be dangerous to infer that the outcomes found in this study are analogous to those schools where the conditions of SRP implementation are very different. Indeed, it might be highly advisable for the donors to consider an investigation similar to the present one in unsupported SRP schools to determine the extent to which project support has played a role in the success of the program achieved to date. The findings in Kratie should be of particular concern in this regard.

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

4.3 Implications and Conclusions The above observations have certain very important implications. Perhaps the most important of these is not that SRP children continue to perform better than control group children in comparable settings for this finding now seems well established, but that they are doing so by smaller margins than was previously observed. Although it is true that this outcome is partly due to the addition of a province with little experience in SRP implementation to the study sample, it also seems highly likely that a genuine diminution is in play, particularly in Language. As the program expands and more and more teachers require assistance from local technical support networks, lower levels of efficiency are perhaps unavoidable. Nevertheless, it should be of some concern that if these trends are occurring in provinces with project affiliated technical support networks, efficiency levels may be much lower in provinces where such networks either do not exist or are very much under resourced. Once again, this suggests the need to assist MoEYS in carrying out systematic investigations of the effectiveness SRP instruction and student learning in provinces outside of UNICEF targeted areas.1 This is really the only way to properly inform decision-making about whether the School Readiness Program should continue to be expanded or whether a pause is now recommended in order to consolidate earlier outcomes. The experience of SRP implementation in Kratie may be indicative of what is happening in other unsupported provinces. Although the ESCUP program provided material assistance for refurbishment of SRP classrooms in this province as well as financial support to Ministry facilitators to train teachers there (26 in all), it was understood that technical support would be the responsibility of the Ministry. Given the high demands for technical support from the limited number of Ministry staff available and the absence of local support networks, it is highly likely that few SRP teachers received the technical assistance that was required during the year past. In contrast, UNICEF and KAPE have provided for the development of local technical support networks in the six UNICEF provinces and Kampong Cham. Given the experience in Kratie in comparison to the other three provinces, it would appear that such networks might be an essential element for muting the deleterious effects on outcomes (as this relates to student achievement) that go hand in hand with an expansion of the program. A final conclusion that should not be forgotten echoes those of earlier studies. This refers to the need for SRP interventions to better address learning deficits in higher order thinking skills and in particular those that occur in Language. Although it is certainly an important achievement that SRP interventions appear to be helping children to achieve the basics, the program cannot be properly subsumed under the Child Friendly Schools banner as the Ministry maintains if it does not improve achievement in critical and creative thinking skills as implied by the ability to compose sentences and solve problems in Mathematics. As noted earlier, children seem to perform most poorly in these areas. If Ministry planners do decide to focus on consolidating the School Readiness Program in the future, this would be a good place to concentrate such efforts.

According to MoEYS reports, 18 provinces are now fielding SRP teachers of which only nine have project affiliated support networks.

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

References 1. Bredenberg, K. (2004) School Readiness Program: Assessment Report, Phnom Penh: KAPE. 2. KAPE, (2005) School Readiness Program Evaluation Follow-up: Terminal Achievement Testing, Phnom Penh. 3. MoEYS (2005) School Readiness Program Teacher Statistics Report, Phnom Penh: Primary Education Dept. 4. MoEYS (2006) Educational Statistics and Indicators, Phnom Penh: Educational Management Information System. 5. Pigott, F. and Bredenberg, K. (2006) School Readiness Program Evaluation: Phase II, Phnom Penh: KAPE.

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

ATTACHMENTS

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

ATTACHMENT 1: Tables of Specification KHMER LANGUAGE (Number of Points by Topic and Skill Domain) Skill

Memory

Understanding

Application

Analysis

Total Pts

Writing

Reading

Listening and Speaking

Content Counting Syllables Syntax Oral Word Usage Subtotal* Word Meanings Sound Letter Discrimination Reading Aloud Reading Comprehension Subtotal* Spelling Writing Words Sentence Composition Subtotal* GRAND TOTAL

10.2%

---

-4

---

5 --

---

5 --

---

-5

5 4 14 5 5

10.2% 8.1% 29% 10.2% 10.2%

3 --

-3

---

3 3

6.1% 6.1%

5 ---

----

-6 8

----

16 5 6 8

33% 10.2% 12.2% 16.3%

8 16%

12 24%

14 29%

15 31%

19 49 100%

39% 100%**

*Rounded **Includes rounding error of 1%

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

MATHEMATICS (Number of Points by Topic and Skill Domain) Skill

Money

Time

Volume & Weight

Geometry

Problem Solving

Addition & Subtraction Algorithms

Number Concepts

Content Math Notation Sequencing Numbers 1-20 Comparing Number Size Concept of Tens & Units Sequencing Number 1100 Writing Numbers Subtotal* Adding & Subtracting Number less than 10 Adding & Subtracting Numbers less than 20 Subtotal* Oral Problem Solving with Numbers 1-10 Oral Problem Solving with Numbers 1-20 Subtotal Shape Recognition

Memory

Understanding

Calculation

Application

-2

---

Total Pts

4 --

4 2

11.8% 5.9%

2.9%

5.9%

2.9%

4 14 2

11.8% 41% 5.9%

5.9%

12%

2.9%

5.9%

3 4

9% 11.8%

12%

Subtotal* Comparing Volume

5.9%

Concept of Weight

2.9%

3 4

9% 11.8%

4 2

12% 5.9%

34 100%

100%**

Subtotal* Knowing Days of the Week Subtotal* Using Money Subtotal* GRAND TOTAL

7 21%

10 29%

4 12%

13 38%

*Rounded **Includes rounding error of 1%

School Readiness Program Evaluation Follow-up for Year 2: Terminal Achievement Testing

ATTACHMENT 2: T-TEST PROBABILITY VALUES KHMER LANGUAGE Topic Listening & Speaking Counting Syllables Syntax Oral Word Usage Overall Topical Value Reading Letter Sound Discrimination Reading Aloud Reading Comprehension Overall Topical Value Writing Spelling Writing Words Sentence Completion Overall Topical Value Total Subject Test Value

Probability Value 0.0002

0.000001 0.007 0.000001

0.46 0.14 0.001 0.49 0.000002 0.07 0.001 0.0002 0.0003

MATHEMATICS Topic Number Concepts Math Notation Sequencing 1 to 20 Sequencing 1 to 100 Concept of Tens Comparison of Number Size Writing Numbers Overall Topical Value Addition and Sub traction Overall Topical Value Problem Solving Overall Topical Value Geometry Overall Topical Value Volume and Weight Overall Topical Value Time Overall Topical Value Money Overall Topical Value Total Subject Test Value

Probability Value 0.04 0.000000001 0.02 0.02 0.001 0.20 0.0002 0.0004 0.16 0.00000000000001 0.03 0.000011 0.72 0.000001