What Standardized Testing Reform Misses

The standardized testing reform debate is framed around two extreme poles. It’s either, “why there can be no changes to standardized testing,” or, “why we must completely remove standardized testing.” Yet this framing misses the simple fact that these tests are broken due to a flaw in their design, and this flaw can be fixed with Pattern-Based Items (PBIs). If this flaw remains, however, the same problems will occur in assessment solution after assessment solution despite claims to the contrary.

The wordy jargon around standardized testing, like Item Response Theory and Dichotomous Items, disguises a critical flaw with the legacy testing system: the legacy tests measure test-taking ability rather than student knowledge. The most tragic real world example is that teachers have almost no impact on legacy test scores. For decades numerous organizations have completed peer-reviewed studies concluding that the tests are not sensitive to instruction (Linn, 1990Polikoff & Porter, 2014American Statistical Association, 2014). These legacy tests are like a pencil with no graphite. The pencil looks like a pencil at a glance, but the pencil can’t write anything. Legacy tests look like they assess a students’ knowledge, for example, but they are as useless as a pencil without graphite.

Most teachers detest legacy tests, yet they are contractually forced to expend time, money, and emotion into the system. On average, a teacher spends 26 days per school year teaching towards the test. A student graduating from high school has lost nearly 10,000 hours and around 260 days of classroom time to legacy testing. Based on the U.S. Foreign Services ranking chart for languages, one could learn how to speak fluent Russian in 260 days of class. In monetary terms, teachers will be forced to invest 300,000 dollars of their time into test prep over the course of one student’s journey through the public school system. We provide the first real alternative to legacy testing that gives teachers increased agency in their classroom and will help reform the US education system.

The Solution

Legacy testing should be scrapped immediately. And we can replace legacy testing with PBIs that are needed to reform education at a grassroots level. The central problem with the design of standardized tests is that a student gets the question completely wrong or completely right. By allowing for partial credit on each question, the tests examine student knowledge rather than test-taking skill. Rather than demanding teachers to “get scores up,” GenEd can provide exact, actionable information to teachers.

The crux of the standardized testing problem is the coin flip where a student gets all the points or none of the points. In reality, student understanding is much more nuanced than a one-hundred percent understanding of the topic or a zero percent understanding of the topic. 

The solution is simple: allow for partial credit on each item. So if a student bubbles ¾ of the correct answers, they will get ¾ of that questions’ total points. If they bubble ½ of the correct answers, they get ½ of the correct points.

Below is a fraction equivalence problem for 3rd grade students authored by a teacher in our teacher network.

Which of the following choices are equivalent to ⅖ ? 

     A) ½ + ⅓ 

     B) 1/10 + 1/10 + 1/10 + 1/10

     C) ⅕ + ⅕ 

     D) .4

The correct answer is  BCD, which means that they would get all of the points only if they selected B, C, D

Option A: tests if students can multiply the denominator, without turning the question into a vocabulary test by using the term “denominator” in the question.

Option B: has a similar theme to Option A, as this is a common problem in student understanding. 

Option C: is an answer designed to check if students are guessing or struggling. 

Option D: tests if students can convert decimals to fractions, demonstrating comprehensive understanding of fraction equivalence.

If a student selects BCD, teachers can be confident that the student understands fraction equivalence. 

If they select B, C, they understand the general concept but need help converting decimals to fractions.

If they select just D, they are likely guessing as they selected the hardest option but none of the easier options.

Patterns form in student answers that are designed to give teachers the most actionable information in the least amount of time. Over a test, a clear snapshot of student understanding forms as each question turns speculation into an objective fact. 

Let’s compare PBI’s feedback to the feedback offered by legacy testing. From a logistical perspective, our feedback arrives instantly so that low stakes corrections in the classroom can occur. And the legacy results, when they are finally delivered, offer little useful information as there is considerable ambiguity in the data from legacy responses. A student may have guessed the right answer, they may have understood the concept but were tricked, or their focus may have broken after multiple hours of testing. Then these ambiguous responses are bundled into a scale score that disguises the shoddy quality of the data in the same way that soup can disguise poor quality ingredients. PBIs detect when a student is guessing, are written to eliminate tricky questions, and can be completed in 30 minutes rather than a 3 hour legacy test. The student data is then structured so that a student, teacher, parent, or administrator can compare their data to class, school, district, and state data.

In a pilot delivered to more than 400,000 students across the state of Texas, PBIs were able to provide a clear picture of student comprehension. Below are the responses from nearly 70,000 5th graders to a question that assesses their ability to interpret a poem. Responses and call for basic interpretation, response C calls for comprehensive interpretation, and response D indicates that they are guessing or struggling.

From the data, the majority of students selected at least AB, meaning that the majority of the students assessed meet reading level standards. In Texas, numerous alarmist articles have debated the drop in reading scores. From our data, we can provide an accurate — rather than alarmist — assessment of the situation. Students grasp the basics of interpretation and would benefit from increased time with topics that allow for comprehensive understanding. To those who say that this question may be based around students guessing, the data indicates that relatively few students guessed as only a tiny fraction selected response D. The rest of the test confirms this pattern, as the students could identify key details or vocabulary.

If one wanted to reform reading in Texas based on the data, there would be additional class time devoted to creative and engaging stories that add nuance to a student’s reading ability rather than key details or vocab work. In addition to the top down perspective, GenEd gives teachers the tools to identify the needs of their specific class. Each class has unique needs and learning challenges that have largely been shoved to the side in favor of a top down model of legacy testing. Perhaps a teacher’s particular class would be better served by focusing on a particular set of skills rather than a statewide trend. A teacher can use the data to justify personalized lesson plans that address the needs of their class. With legacy tests, a teacher cannot stray too far from the prescribed curriculum. With PBIs, teachers have the agency to address their students’ needs.

Administrators also have increased agency as they’re supplied with data to make accurate decisions. Without quality data, decisions can feel like a shot in the dark. And local administrators are faced with the same impossible task: address the needs of their student or lose their job. GenEd provides administrators with a comprehensive one-pager of their student body that can inform decisions about curriculum, education products, and areas of need.

Leave a Reply

Your email address will not be published. Required fields are marked *