top of page

RESOURCES

A Vision for Using an Argument-Based Framework for Validity Applied to a Comprehensive System of Assessments for English Learners in Secondary Grades

By Margaret Heritage, Caroline Wylie, Molly Faulkner-Bond, and Aída Walqui

INTRODUCTION   |   PART 1   |   PART 2   |   PART 3    |   PART 4    |   PART 5    |   PART 6    |   PART 7    |   PART 8    |   PART 9   |   REFERENCES    |   APPENDICES

Appendix A: Reliability

​

Reliability, a necessary but not sufficient condition for validity, refers to the consistency of assessment results across settings, students, and users. For example, if Juanita completes a writing task related to a specific prompt today, tomorrow, or next Wednesday, then we would expect her ability to answer the question to be essentially the same on all three occasions. Without such consistency we cannot have confidence that student scores are meaningful representations of student knowledge and skills. The question of how high the reliability for an assessment depends on the consequences and stakes of the use of the results (see Figure 1 for characteristics of higher and lower stakes decisions).

​

For assessments that take the form of tests (e.g., those administered at the end of the year to assess achievement of standards), this consistency is measured in the form of reliability, which usually involves calculating a reliability coefficient    to determine how well assessment results agree over repeated uses of the assessment. The expectation is that a student who takes the same test on different occasions or in different settings would earn roughly the same score. The higher the stakes of the decisions made from test results, the higher the level of reliability will need to be.

8

Figure 1. Characteristics of Higher and Lower Stakes Decisions

Figure A-1 Characteristics of Higher and Lower Stakes Decisions

In the case of classroom-based assessment, including formative assessment, Smith (2003) proposes sufficiency of information to determine reliability. He suggests that teachers can think about the question “does this assessment provide me with enough information to make a judgment about each student’s level of accomplishment with regard to this learning?”(Smith, 2003, p. 26) to guide instructional next steps.  Similar to reliability coefficients, the amount of evidence that a teacher needs will vary based on intended use of the assessment. Contrast three cases: (1) use of a quick poll of student ideas in order to adjust the lesson in the moment based on immediate feedback; (2) an end-of-lesson exit ticket with questions for students to identify something they understand, are puzzled by, and are curious about in order to add to teacher observations and support a more comprehensive plan for the next lesson; (3) a longer, more formal task such as problem-solving in mathematics to help the teacher plan out the remainder of time spent on a unit. In each case, the evidence must be sufficient for teachers to feel confident their judgment about students’ learning status.

 

For assessments that take other forms, such as portfolios, consistency is studied through the lens of generalizability (Webb & Shavelson, 2005). For generalizability, consistency stems primarily from the prompts, raters, and rating tools (e.g., rubrics) used to produce scores. An important goal is confirming that scores are meaningful representations of student knowledge, rather than idiosyncratic representations of, say, rater preference, or the features of a particular assignment. In the case of Juanita’s writing task: if her teacher reads Juanita’s written response either tonight, tomorrow, or next Wednesday, then we would expect her to draw the same conclusions about Juanita’s strengths and needs (Herman, Ashbacher, & Winter, 1992).

​

​

Appendix B: Protocol to Guide a COP Validity Discussion Focused on Single Proposition for a Specific Type of Assessment

Purpose: To help the COP members develop familiarity with the process, members will engage in an in-depth review of the claims and evidence associated with each proposition.

 

Process: The process is divided into three parts. The Initial Planning stage will identify the targeted proposition, ensure common understanding and identify the types of relevant evidence that may need to be collected for the evaluation of the validity argument can take place. The validity review is the core work of the COP and entails reviewing evidence for proposition claims and identifying strengths and areas to improve. The final step is the Action Review to examine whether recommended changes were made, or to reflect on how the discussion from the Validity Review impacted ongoing assessment development.

Figure B-1. Three-Part Validity Review

Figure B-1. Three-Part Validity Review

Documentation: We recommend that COP members develop approaches to document discussions and action steps, both to ensure that identified revisions are made to already-reviewed assessments and that plans to revise future assessments are also documented and periodically reviewed. This documentation might be in the form of a shared Google doc that all COP members have access to, or individual teacher journals in which they can capture reflections and action steps.

 

1. Initial Planning

Evidence Sources

​

  • Documentation of classroom practice (e.g., learning goals presented to students, directions for portfolio selection)

  • Individual teacher reflection (e.g., teacher reflection on whether the questions, tasks, and activities are accessible to the range of students’ zone of proximal development (ZPD) present within the class)

  • Peer observation (e.g., how effectively teachers communicate learning goals to students)

  •  Student feedback (e.g., survey or interviews) 

  • Peer feedback (e.g., peer review and feedback on the alignment between the breadth and depth of cognitive complexity and language usage represented by the unit goals)

  • Teacher to help ensure a common understanding of quality and to calibrate scoring.

Step 1: Identify the type of the assessment for the COP discussion

​

Step 2: Select the specific proposition for the discussion (we recommend a COP work through the propositions sequentially)

​

Step 3: Ensure a common understanding among group members of the proposition, claims and evidence.

​

Step 4: Review the list of potential evidence sources against the specific evidence described for the targeted proposition.

​

Step 5: Determine who will present evidence, how and when evidence can be collected, and schedule the Validity Review.

2. Validity Review

Example of Teacher Presentation

(based on Proposition 2 for Formative Assessment)

​

  • Reflect how your lesson plans articulated goals and success criteria derived from the standards along with questions, tasks, or activities selected or developed

  • Reflect on the alignment between goals/success criteria and questions, tasks or activities, the position of the goals in a trajectory of learning (i.e., do the goals build on prior learning and can they be extended to new learning?) and the breadth of cognitive complexity represented by goals

  • Reflect on your own or peer’s observations of the alignment of goals/criteria with formative assessment opportunities, clarity of performance criteria to standards, communication of goals, and criteria to students to ensure common understanding

  • Reflect on your own or peer’s observations of the breadth of cognitive complexity represented by goals and assessment questions, task, activities

Step 1: Remind the group of the ground rules for discussion. The focus of the discussion should always be on the assessment and ways in which it can be improved or how strengths in an assessment can be applied to other assessments. The critique should never be of an individual.

​

Step 2: Invite a group member to present evidence for the claims associated with the targeted proposition, drawing on the sources of evidence identified during the initial planning. While the group member makes the initial presentation other group members do not interrupt.

​

Step 3: Once the initial review is complete, group members can ask clarifying questions to help the presenter provide more specific details from a particular source of evidence or to make connections between specific claims and supporting evidence more explicit. This stage of the discussion is focused only on clarification, not evaluation of the evidence.

​

Step 4: The fourth step is to come to a develop a consensus judgment of the validity evidence for the proposition. Review the proposition and summarize the evidence for each claim. Where is the evidence convincing? Where could the assessment or assessment process be improved? Where does the evidence need to be strengthened? How can we do this?

​

Step 5: The final step is to discuss and document how understandings about the proposition and/or nature of the assessment could be applied both to revisions of the specific assessment under review and to future assessments of this format. Whether in a shared Google doc or individual teacher journals documenting reflections and action steps is critical for moving to the Application Review stage.

3. Application Review

 

Step 1: Reconvene to review what the identified revisions were from the Validity Review meeting.

​

Step 2: Confirm whether revisions were made to the assessment as planned. For a formative assessment question or probe this might be just making a note in a lesson plan for the following year about a revision. An end of unit revision that is used by several teachers might require a more substantive revision.

​

Step 3: Provide an opportunity for COP members to discuss any ways in which the previous Validity Review discussion has impacted their assessment work since the meeting.

​

​

Appendix C: Protocol to Guide a COP Validity Discussion Focused on the Set of Propositions for a Specific Type of Assessment

Purpose: To help the COP members engage in a comprehensive review of the claims and evidence associated with the full set of propositions for a specific assessment.

 

Process: The process is divided into three parts. The Initial Planning stage will identify the targeted assessment, ensure common understanding and identify the types of relevant evidence that may need to be collected for the evaluation of the validity argument can take place. The validity review is the core work of the COP to review evidence for proposition claims and identify strengths and areas to improve. The final step is the Action Review to examine whether recommended changes were made or to reflect on how the discussion from the Validity Review impacted ongoing assessment development.

Figure C-1. Three-Part Validity Review

Figure B-1. Three-Part Validity Review

Documentation: We recommend that COP members develop approaches to document discussions and action steps, both to ensure that identified revisions are made to already-reviewed assessments and that plans to revise future assessments are also documented and periodically reviewed. This documentation might be in the form of a shared Google doc that all COP members have access to or individual teacher journals in which they can capture reflections and action steps.

 

1. Initial Planning

Evidence Sources

​

  • Documentation of classroom practice (e.g., learning goals presented to students, directions for portfolio selection)

  • Individual teacher reflection (e.g., teacher reflection on whether the questions, tasks, and activities are accessible to the range of students’ zone of proximal development (ZPD) present within the class)

  • Peer observation (e.g., how effectively teachers communicate learning goals to students)

  •  Student feedback (e.g., survey or interviews) 

  • Peer feedback (e.g., peer review and feedback on the alignment between the breadth and depth of cognitive complexity and language usage represented by the unit goals)

  • Teacher to help ensure a common understanding of quality and to calibrate scoring.

Step 1: Identify the specific the assessment for the COP discussion.

​

Step 2: Identify which claims across all the propositions are the most critical to address.

​

Step 3: Ensure a common understanding among group members of the propositions, claims and evidence.

​

Step 4: Review the list of potential evidence sources against the specific evidence described for the targeted proposition.

​

Step 5: Determine who will present evidence, how and when evidence can be collected, and schedule the Validity Review.

2. Validity Review

Example of Teacher Presentation

(based on all claims for Formative Assessment)

​

  • Reflect on your lesson plan and how successful you think you were at articulating lesson-sized goals from the standards. Did they lead to the learning you [the teacher] expected and or not?

  • Reflect on one question/task/activity that successfully helped you [the teacher] ascertain the current learning status of individual students’ learning in terms of both language and academic content? What was a less successful question/task/activity?

  • Reflect on your lesson and how successful you [the teacher] were at integrating your knowledge of the student’s language learning, conceptual understanding, analytical practices and funds of knowledge into the assessment. Was there one time that was more successful than others? Was there another that was less successful?

  • What does this work reveal about the status of student learning relative to goals? Are there other modalities that might be more effective at revealing specific students’ learning status? What would be our next steps for each student based on the evidence?

  • What judgments were you able to make about student learning based on your evidence? Which judgments do you feel confident about and which are you less confident about?

Step 1: Remind the group of the ground rules for discussion. The focus of the discussion should always be on the assessment and ways in which it can be improved or how strengths in an assessment can be applied to other assessments. The critique should never be of an individual.

​

Step 2: Invite a group member to present evidence for the focus claims for the assessment and on the sources of evidence identified during the initial planning. While the group member makes the initial presentation other group members do not interrupt.

​

Step 3: Once the initial review is complete, group members can ask clarifying questions to help the presenter provide more specific details from a particular source of evidence or to make connections between specific claims and supporting evidence more explicit. This stage of the discussion is focused only on clarification, not evaluation of the evidence.

​

Step 4: The fourth step is to come to a develop a consensus judgment of the validity evidence for the assessment. Review the set of propositions and summarize the evidence for the targeted claim. Where is the evidence convincing? Where could the assessment or assessment process be improved? Where does the evidence need to be strengthened? How can we do this?

​

Step 5: The final step is to discuss and document how understandings about the propositions, claims and evidence and/or nature of the assessment could be applied both to revisions of the specific assessment under review and to future assessments of this format. Whether in a shared Google doc or individual teacher journals documenting reflections and action steps is critical for moving to the Application Review stage.

3. Application Review

 

Step 1: Reconvene to review what the identified revisions were from the Validity Review meeting.

​

Step 2: Confirm whether revisions were made to the assessment as planned. For a formative assessment question or probe this might be just making a note in a lesson plan for the following year about a revision. An end of unit revision, that is used by several teachers might require a more substantive revision.

 

Step 3: Provide an opportunity for COP members to discuss any ways in which the previous Validity Review discussion has impacted their assessment work since the meeting.

8

    Coefficients at or above 0.80 are often considered sufficiently reliable to make decisions about individuals. A higher value, perhaps ).90 is preferable if decisions have a significant consequence (Webb, Shavelson, & Haertel, 2006).

bottom of page