RA5.2 Reliability and Validity of EPP Assessments
Alignment to National Standard: CAEP A5.2 Data Quality The provider’s quality assurance system from R5.1 relies on relevant, verifiable, representative, cumulative, and actionable measures to ensure interpretations of data are valid and consistent.
How Alignment is assured: The Assessment Coordinator in consultation with Program/Discipline Chairs, aligns the evaluation measures and assessment tasks with CAEP, InTASC, and appropriate Technology Standards. The Assessment Coordinator maintains alignments and adherence to multiple Louisiana state laws and policy regulations. All Standards have been maintained utilizing Watermark – Taskstream. This standards database is maintained by the Assessment Coordinator so that alignments can accommodate updates to standards, program competencies, courses, or assessments.
Evidence Overview
Evidence for this compendium will be presented in the following manner (1) Process for conducting validity and reliability study, (2) Presentation of Reliability Evidence, and (3) Presentation of Validity Evidence. Evidence will document that the EPP-created assessment has met the minimal 80% or .80 or above to establish content validity, and 75% or .75 or above to establish inter-rater reliability or agreement.
Evidence and Analysis
Process for Designing and Developing Assessments: Once an evaluation measure has been established, program leads work with a team of subject matter experts to create individual activities, assessment prompts, and the associated rubric that are all aligned with the SPA standards. Because they offer subject matter expertise to guarantee that content validity is embedded into the final design, the team of SMEs is a crucial component of this process.
The team concentrates on the rubric after completing the work instructions. GSU has basic rules for each of the rubric levels and how they should be constructed, even though the content specifies the precise criteria for each level under a particular rubric element. These definitions are listed in Table 1.
Table 1: Performance Indicators- Descriptions
Performance Indicators - Descriptions
Novice
This rating is equivalent to having emerging performance skills/content knowledge that can be enriched with additional coursework.
Effective: Emerging This rating is equivalent to having the performance skills/content knowledge needed to move forward into student teaching; however, additional remediation might be needed to hone the candidate's performance.
Effective: Proficient (Target)
This rating is equivalent to having the performance skills/content knowledge needed to be an effective student teacher where additional skills will be practiced.
Highly Effective
This rating is equivalent to having the performance skills/content knowledge needed as a highly effective first year teacher.
Template for the Presentation of Evidence by Dr. Michele Brewer and Dr. Amber Vraim is licensed under Attribution 4.0
International
"College of Education Office of Technology, Assessment, and Compliance: Template for the Presentation of Evidence." Copyright 2020 by Wilmington University.


Grambling State University Standard Five Compendium 4
RA5.2 Reliability and Validity of EPP Assessments
There was a small section labeled "Assurance of Reliability and Validity" in each of the evidence items tagged to Standards 1 and 4 that includes information from assessments made by GSU. Also, the Constant Improvement/Actionability of Outcomes section of Standard Five Compendium 3 underlines how data insights are also actionable at GSU. We have a systematic approach to data quality in our EPP. We typically set validity reviews to every three years unless substantive changes are made to the instrument. As well, reliability will be explored every 3 years unless there are changes to the evaluators/instructors in the courses (Data Quality Review Table).
Program leads and faculty (SMEs) participate in training and calibration exercises to make sure that evaluators are using and interpreting rubrics in a consistent manner, which is necessary to ensure interrater reliability regarding the consistency of evaluating candidate performance on assessments (IRR and Norming Training). All evaluators in that area utilize the scoring rubric to evaluate a particular candidate submission that is selected during calibration. In order to ensure consistency among raters, evaluators receive personalized feedback to help them understand where they converge with and diverge from the broader team.
Faculty members are also occasionally chosen t o take part in a formal inter-rater reliability study. The same pre-selected work sample from a course that the faculty members actively teach is scored individually for this study by other members of the faculty. Internal and external subject-matter and content experts are invited t o participate in content validity studies of common, EPP-created key assessments on a 3-year cycle or following instrument or description revisions
Formal content validity and reliability studies are conducted electronically via Google Forms surveys using the format presented by Drs. Monaco and Horne at CAEPCon Spring 2022 (Monaco & Horne, 2022). Reliability study forms (Sample- ED 545: Action Research Implementation Assessment Reliability Study Form) provide student work samples to teams of faculty members along with assessment rubrics and assignment directions Percent Agreement is calculated using the scores of the faculty members to evaluate the amount of inter-rater reliability. GSU seeks 75% or higher agreement (Sample- ED 545: Action Research Implementation Assessment Percentage of Agreement Worksheet). Newly revised assessments are piloted upon completion of the reliability and validity studies and are adopted following review by the QAS Review Panel.
References
Monaco, M., Ph. D., & Horne, E. T., Ph. D. (2022, March 9). Data Quality: Deconstructing CAEP R5.2 [Conference Presentation]. CAEP. https://caepnet.org/
Template for the Presentation of Evidence by Dr. Michele Brewer and Dr. Amber Vraim is licensed under Attribution 4.0
International
"College of Education Office of Technology, Assessment, and Compliance: Template for the Presentation of Evidence." Copyright 2020 by Wilmington University.


Grambling State University Standard Five Compendium 4
RA5.2 Reliability and Validity of EPP Assessments
Reliability of Assessments (Initial Programs)
Compendium CAEP Standard Assessment Inter-Rater Reliability
Standard One
Compendium 2 –Applications of Data Literacy
Standard One
Compendium 2
–Applications of Data Literacy
RA1.1
ED 505: Analysis of Reading Difficulties- Word Study
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 85%
Standard One
Compendium 3 –Collaborative Activities
RA1.1
ED 545: Evaluation and Assessment of P-12 Students in Educational Settings- Action Research Implementation Assessment
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 86%
Standard One
Compendium 4 –Use of Research
RA1.1
SPED 542: Methods & Materials for Teach Children with Exceptional Learning Needs- Inclusive Lesson Planning
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 100 %
Standard R1
Compendium 2
Application of Content
Standard R1
Compendium 3
Instructional Practices
RA1.1


ED 549 Introduction to Techniques of ResearchAction Research Proposal Assessment
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 85%
R1.2, R1.3
Praxis II SPED Praxis 5543 (Proprietary)
Proprietary assessment scored outside of GSU by ETS.
Template for the Presentation of Evidence by Dr. Michele Brewer and Dr. Amber Vraim is licensed under Attribution 4.0 International "College of Education Office of Technology, Assessment, and Compliance: Template for the Presentation of Evidence." Copyright 2020 by Wilmington University.
Grambling State University Standard Five Compendium 4
RA5.2 Reliability and Validity of EPP Assessments
Reliability of Assessments (Initial Programs)
Compendium CAEP Standard Assessment Inter-Rater Reliability
Standard One Compendium 5 –Provider Responsibilities
Standard One Compendium 5 –Provider Responsibilities
RA1.2 SPED 543: Humanistic Approaches- Behavioral Intervention
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 100%
RA1.2 ED 505 Analysis of Reading Difficulties- Informal Reading Inventory
Reliability study results for this assessment show that the percent of agreement, across all programs, was as follows:
AY 2022-23: 90%
The EPP uses the proprietary Educator Dispositions Assessment developed by Almerico, Johnston & Wilson (2015). The ratings were completed for each candidate by two reviewers who knew or had taught the candidate. Those ratings were then compared for interrater agreement using an internet site: https://calculator.academy/interrater-reliability-calculator/ . Over the past three (3) years, we obtained the following data:


Standard R1 Compendium 4Professional Responsibility
R1.4 Educator Disposition Assessment
Fall 2020 N = 1 IRR = 78% Reading
Fall 2021- Spring 2022 N = 1 IRR = 78% Reading
Fall 2022-Spring 2023 N = 4 IRR range 67-78%*
Reading
N = 1 IRR range = 88% Special Education
* even though the ratings for 2 candidates in 2022 was 67%, the difference was generally between ranking the candidate as “Meets Expectations”(3) vs. “Exceeds Expectations”(4). Only one score where there was a disagreement between scorers was between “Developing”(2) vs. “Meets Expectations”(4).
Template for the Presentation of Evidence by Dr. Michele Brewer and Dr. Amber Vraim is licensed under Attribution 4.0 International "College of Education Office of Technology, Assessment, and Compliance: Template for the Presentation of Evidence." Copyright 2020 by Wilmington University.
Grambling State University Standard Five Compendium 4
RA5.2 Reliability and Validity of EPP Assessments
Validity Evidence: CAEP recommends establishing content validity using Lawshe’s approach. To determine the content validity of EPP created assessments, GSU uses a panel of subject matter experts (SMEs) to determine how well the elements included within the assessment align with the intended outcomes. Using the Lawshe Method (recommended by CAEP), SMEs are provided with a copy of the assessment’s directions and rubric. They are then asked to determine if each element is essential, useful but not essential, or not necessary (Sample- ED 545: Action Research Implementation Assessment Content Validity Study Form. The content validity ratio (CVR) is calculated for each element and the content validity index (CVI) is calculated for the instrument using a an Excel worksheet (Sample- ED 545- Action Research Implementation Assessment CVR and CVI Outcomes) formatted with the following formulas:
CVR = (ne – n/2)/(n/2)


S-CVI (R)
Feedback from the experts is reviewed and discussed by program leads and faculty members to discuss what modifications and updates might be necessary, particularly for those items or instruments that fail to meet the acceptable CAEP Sufficiency of Evidence Standards.
Validity of GSU Assessments Initial Programs (ITP)
Validity study results for this assessment are as follows:
Standard One Compendium 2 –Applications of Data Literacy
ED 505: Analysis of Reading DifficultiesWord Study
Items 4, 11, and 12 do not meet content validity with a CVR of .60
Template for the Presentation of Evidence by Dr. Michele Brewer and Dr. Amber Vraim is licensed under Attribution 4.0 International
"College of Education Office of Technology, Assessment, and Compliance: Template for the Presentation of Evidence." Copyright 2020 by Wilmington University.
Grambling State University Standard Five Compendium 4


RA5.2 Reliability and Validity of EPP Assessments
Validity of GSU Assessments Initial Programs (ITP)
Template for the Presentation of Evidence by Dr. Michele Brewer and Dr. Amber Vraim is licensed under Attribution 4.0
Grambling State University Standard Five Compendium 4


RA5.2 Reliability and Validity of EPP Assessments
Grambling State University Standard Five Compendium 4


RA5.2 Reliability and Validity of EPP Assessments
Grambling State University Standard Five Compendium 4


RA5.2 Reliability and Validity of EPP Assessments
Validity study results for this assessment are as follows:
Standard
Grambling State University Standard Five Compendium 4
RA5.2 Reliability and Validity of EPP Assessments
Standard Three


3
Continuous Improvement
Questions or topics are explicitly aligned with aspects of the EPP’s mission and also CAEP, InTASC, national/professional, and state standards. Individual items have a single subject; language is unambiguous. Leading questions are avoided. Items are stated in terms of behaviors or practices instead of opinions, whenever possible. Surveys of dispositions make clear to candidates how the survey is related to effective teaching.
Educator Disposition Assessment was presented at the CAEP Conference (during September 17-19, 2015 in Washington, D. C.). The session was entitled, Educator Disposition Assessment: A Research-Based Measure of Teacher Dispositional Behaviors by the University of Tampa. They indicated that the instrument has already gone through the validity processes.
Focus Area(s): GSU will continue to follow its schedule for reviewing EPP-created assessments to establish content validity and inter-rater reliability percentages that meet and/or exceed the requirements to meet CAEP sufficiency of evidence.
Template for the Presentation of Evidence by Dr. Michele Brewer and Dr. Amber Vraim is licensed under Attribution 4.0