Report

Instructionally Relevant Assessment Systems: What is the Role of Performance Assessments?

Published

June 30, 2025

By Aneesha Badrinarayan

A student working on a robotics assignment.

REPORT

Since the rise of state assessments whose primary function is to yield scores that can be used to compare schools and groups of students, most states have developed their state assessment programs under the assumption that either: (a) state tests are not intended to meaningfully shape instruction, or (b), if they are, the information provided in score reports is sufficient to support instruction. Indeed, the prevailing guidance about large-scale assessments is that they should serve a program monitoring role and not be used to guide instruction. This approach reflects sound reasoning. It is hard for an external, efficient, infrequent assessment to play a meaningful role in guiding instruction, and many measurement experts suggest that state assessments should be supplemented by other supports, closer to the classroom, to provide real instructional support. State tests have been designed accordingly, making trade-offs that value efficiency and reliability over impact on teaching and learning.

While these recommendations to keep the summative assessment separate from instruction might reflect some conceptions of best practice, they unfortunately do not reflect real practice. When policymakers and researchers listen to teachers and local leaders, they routinely hear not only that teachers are changing what they teach to better match test content, but that teachers are often encouraged to change instruction in ways that actively trade off features of high-quality learning experiences for those that reflect testing experiences.

While state officials may not intend for their assessments to have this impact, the footprint of state assessments grows each time a district chooses an interim assessment that promises to predict performance on the state test; purchases access to formative assessment item banks with questions that mirror (or reproduce) released state assessment questions; or provides guidance about curriculum choices, scope and sequence, and time for content-specific teaching and learning that is driven by state assessment design and scores. The result is a cascade of signals that position state assessments as a major driver shaping the learning experiences that students have in the classroom.

Faced with this reality, many state leaders are reconsidering their approach to assessment programs. Instead of designing assessment systems under the assumption that state assessments are not influencing teaching and learning—or that those influences are an unfortunate “cost of doing business”—state leaders are asking: If we know teachers and local leaders take cues from state-provided assessments, how can we create instructionally relevant assessments that incentivize shifts toward better teaching and learning?

Designing Assessments With an Emphasis on Positive Instructional Impact

Designing state assessments with instructional impact in mind requires reconsidering what features and values to prioritize in assessment design. For example, many developers of current state assessments view assessment tasks that require human scoring (tasks like constructed response items and student-written essays) as an unnecessary burden. Such tasks cost more in both money and time than single-select multiple choice questions and can require a great deal of coordination and capacity from state education agencies. In addition, the points given for these kinds of items generally get combined with points from much more efficient multiple choice questions in ways that limit the impact of student performance of human-scored items on test scores or achievement levels. It makes sense then to devalue these kinds of items if the only important outcome of the test is a reliable numerical score. If, however, an important outcome of the assessment is building teacher capacity around understanding standards and disciplinary pedagogy (as they are when instructional impact is centered), human-scored performance-based assessment items and tasks become much more valuable. Participating in scoring activities can:

allow teachers to participate in facilitated professional learning connecting standards to expectations on the assessment;
provide opportunities to practice analyzing student work;
encourage teachers to collaborate with colleagues across classrooms, schools, and districts;
provide examples of the kinds of performances students may need to practice during the course of instruction;
foster teachers’ understanding of their own students’ strengths (if scoring their own students’ work); and
disrupt deficit narratives about what their students can and cannot do.

This kind of information is much more valuable for teaching and learning than decontextualized test scores that often prompt teachers to turn to ineffective reteaching and remediation strategies. High-quality performance assessments increase the validity of scores resulting from an assessment by providing better insight into what students know and can do relative to the standards being measured. Moreover, they are a particularly compelling component of assessment programs when leaders shift from centering reliable and comparable scores as the only important outcome of an assessment and shift toward including impact on instruction as an equally important outcome.

When state leaders center instructional impact as just as important to surfacing data that can serve program monitoring functions, performance assessments consistently emerge as an essential element of many large-scale system designs. Done well, performance assessments surface evidence of what students know and can do in deeply authentic and meaningful ways. This leads to better alignment between assessments and state standards like Common Core and the Next Generation Science Standards, and, as a consequence, more valid assessment scores. High-quality performance assessments are also more relevant and meaningful to students than a diet of decontextualized selected-response questions. Relevance can increase student engagement and perseverance through complex tasks and improve the assessments’ ability to surface the range of sophisticated understanding that diverse learners may possess. Perhaps most importantly to many state and district leaders, performance assessment can position large-scale assessments as tools that support high-quality teaching and learning by signaling features of effective learning and assessment environments and by providing actual classroom experiences (in the case of curriculum-embedded tasks).

When they are designed appropriately and used in conjunction with other measures, performance assessments can be reliably scored, generating trustworthy and comparable scores at the student and aggregate levels. In this report, the focus is on the use of authentic performance tasks used together with more standardized kinds of assessment, as this is what most systems are exploring. Many large-scale systems—states and districts as well as national and international curricular and assessment programs like Advanced Placement and International Baccalaureate—use performance tasks as part of their system because they elicit evidence of student thinking that is not readily surfaced through selected-response items and influence instruction in positive ways. When designing performance tasks for use in larger-scale systems, developers and leaders often emphasize certain features of assessment design, implementation, and scoring such as common tasks and rubrics, calibrated scoring, and rigorous assessment development processes that produce tasks that contribute to trustworthy student scores that can be aggregated and compared as needed. These features distinguish performance tasks that can be used to generate trustworthy and comparable scores from those that are often developed locally as part of meaningful instruction and lack features that would allow them to be used within large-scale systems (e.g., a project or task developed by an individual teacher to be used as part of coursework).

Recommendations for System Leaders

Performance assessments can transform assessment systems into forces for improved teaching and learning. Doing so requires that system leaders position performance assessments—and the supports needed for their design and use—as a valued element of both instruction and student performance. As leaders consider how to reorient their assessment systems toward instructional relevance, it may be useful to consider the following recommendations:

Demand assessments that measure what matters. Ensure that assessments actually measure the higher-order thinking and problem-solving, disciplinary practices, and other deeper learning competencies that students need to be ready for college, careers, and citizenship.
Recognize the transformative potential of signaling as it shapes student learning experiences. Large-scale assessment systems frequently make their biggest mark on instruction through their signaling function, influencing decisions about what gets taught, how students experience learning, and what success should look like. Including performance assessments in assessment systems can be transformative doing so encourages instructional shifts toward deeper learning.
Leverage performance assessments strategically. Many large-scale systems that leverage performance assessments do so in conjunction with other assessment instruments, such as on-demand selected-response items. The combination of both approaches to assessment allows assessment designs that can sample wider coverage while still providing substantial and sufficient evidence of students’ ability to reason and demonstrate learning in sophisticated ways within and (if appropriate) across disciplines. The key is striking a strategic and effective balance and ensuring that performance assessments count for enough of students’ final scores that stakeholders pay attention to the knowledge, skills, and abilities needed to complete these tasks.
Make the assessment worth the investment. A major element of the value proposition of performance assessments lies in their authenticity and educative nature—that is, how assessments build educators’ understanding of content and pedagogy in their disciplines. Leaders should prioritize the development and use of authentic, relevant, and sophisticated tasks that motivate students and provide a beacon of what their routine experiences in the classroom should look like. Leaders also should engage all classroom teachers in the development and interpretation of these tasks so that teachers can have access to the rich information about student thinking that such tasks produce and to build support for the pedagogy that enables deep learning.
Consider creative resource allocation to support professional learning. Many systems that center performance assessments emphasize the impact on teaching and learning for students. Leaders should also consider the potential for meaningful and sustained professional learning when making resource allocation and budgeting decisions. While performance assessments cost more to design and score than multiple choice questions, resources can be reallocated from test preparation and interim assessments that are essentially practice tests. Professional development time can also be allocated for design and scoring, as teachers consistently note the benefits of designing and reviewing performance assessments for their own learning and planning. Resources spent on meaningful performance task development and scoring can contribute to improved teacher practice and student learning experiences, which is likely to lead to better student outcomes than simply practicing the questions on a superficial final exam.

As leaders in a growing number of states and systems consider how to break the cycle of assessments being used to limit learning opportunities for students, they are considering a different purpose—and a different set of trade-offs—for their assessment system designs. Many leaders are drawing a line in the sand. Teaching and learning are paramount, and any assessment system that does not have a positive impact on teaching and learning cannot be acceptable. When leaders make positive instructional impact a necessary condition of high-quality assessment systems, performance assessments routinely emerge as an important element of system designs. Done well, and often in conjunction with other measures, performance assessments can provide better evidence of what students know and can do while helping students and teachers alike better understand how meaningful instruction should look and feel.

Instructionally Relevant Assessment Systems: What is the Role of Performance Assessments? by Aneesha Badrinarayan is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This research was supported by the Carnegie Corporation of New York, Chan Zuckerberg Initiative, William and Flora Hewlett Foundation, and Walton Family Foundation. The Heising-Simons Foundation, Raikes Foundation, Sandler Foundation, Skyline Foundation, and MacKenzie Scott provided additional core operating support for LPI. The ideas voiced here are those of the author and not those of funders.