Why Do Schools Keep Changing the Way They Grade Teachers?

Can teachers really improve when the way they’re evaluated changes every year?


Last week, the District of Columbia Public Schools district announced major changes to its teacher evaluation program, called IMPACT. Again. Administrators have tweaked the program constantly since implementing it in 2009, mirroring a troubling national trend: The way teachers are graded in the U.S. has changed so many times in so few years that teachers have lost faith in a system meant to make them better. You couldn’t blame parents for giving up on it, either.

Since 2009, more than 25 states have overhauled their teacher-evaluation systems—some making changes every single year. One Wisconsin elementary school teacher, in the classroom for just six years, told Slate last year that she’s been evaluated using three completely different systems. Last spring, New York Gov. Andrew Cuomo implemented the use of outside evaluators and a 50 percent reliance on test scores in teachers’ evaluations; by December, after 20 percent of students across the state sat out of state math and reading tests in protest, the state Board of Regents, following Cuomo’s recommendation, eliminated the use of test scores (at least through 2018–19) altogether.

This kind of seesawing within education is common. No single education reform has seen a more “dramatic transformation” in the past decade than teacher evaluations, according to a National Council on Teacher Quality report that tracks the state of teacher evaluations in all 50 states.* Many of those changes came as states scrambled to qualify for President Obama’s now-defunded Race to the Top grant program or to adjust to the Common Core educational standards. But the changes were often hasty, leading to years of tweaks if not outright overhauls, further intensifying the reform fatigue that many teachers have felt for years. But despite a decade of change and experimentation, most teachers are still rated as effective, a major criticism of the old, insufficiently stringent evaluation systems. On the other hand, some teachers complain that the new wave of evaluations became too punitive and were ultimately unhelpful in improving their practice. 

D.C.’s IMPACT system emerged under the leadership of controversial education-reform leader Michelle Rhee. Since its 2009 debut, the school district has used a team of outside “master educators” to evaluate teachers, a method praised by some researchers. The latest changes will eliminate them completely, returning the responsibility of evaluations back to principals, a method that teachers, unions and researchers alike have criticized at times. The district will also reintroduce the controversial “value-added” model, which uses student test scores to judge teacher performance (the district suspended the method while adjusting to Common Core–aligned tests); student surveys, increasingly being used in districts across the country, will also be part of the new evaluations.

Proponents of strong teacher evaluation systems are defending the changes, saying any effective evaluation system must be periodically examined for effectiveness and altered as necessary. “It’s just not possible to create a system of this complexity without having to make midcourse corrections,” says Kate Walsh, president of the National Council on Teacher Quality, an organization that advocates for the use of strong teacher evaluations. While she recognizes that the changes have caused “insecurities” and “frustrate” teachers, she doesn’t believe they result in any harm to teachers or their practice.

“Good teaching is good teaching, and they are either going to do it or not,” Walsh says.

Elizabeth Davis, the president of the Washington Teachers’ Union, disagrees. She believes the rocky history of changing evaluations creates an unhealthy culture of skepticism in a system that’s supposed to support them. “Now we have yet another complete reinvention. This only suggests that their models and methods continue to not work and it continues to wreak havoc among teachers and schools,” one teacher wrote to Davis in an email she shared with Slate.

Other teachers echoed the sentiment: “How are we supposed to trust them with another evaluation system when so many of the others have failed?” said another teacher in an email to Davis.

Davis says the union wasn’t invited to participate in the redesign, an oversight she called an act of disrespect. And Thomas Dee, a Stanford-based education researcher, said such exclusion can contribute to teachers’ “perpetual feeling of disenfranchisement.”

In New Mexico, for example, a judge issued a temporary injunction against the state’s teacher evaluation system in December, citing a lack of transparency and describing it as “not objective and uniform.” The ruling prevents the state’s department of education from stripping teachers of their licenses for low scores, or issuing merit wage increases for high ratings. The state’s two unions initially tried to block the use of the evaluations altogether; a spokesman for the New Mexico Public Education Department told local reporters the lawsuit was a “disappointing distraction.”

There is no easy consensus on exactly what type of evaluation works best—outside evaluators have earned praise for their objectivity and criticism for their lack of context, for example. Some parties want peer evaluations, and some want teachers to be evaluated by more than one person. Many oppose tying test scores to evaluations, a method prominent researchers have called unreliable. What we do know is that one way or another, teachers must be evaluated, that they need to be involved in the design of these systems (particularly if the evaluations impact job security), and that the ultimate point of any evaluation should be to identify and ameliorate teachers’ weaknesses. For decades, a narrative that blames the many failures of public education on teachers alone has prevailed; the politicization of the field has caused nasty fights between unions, teachers, state and federal education officials, and reformers. And as always, when teachers feel unsupported, their most vulnerable students pay the biggest price.

While some changes have come at the demand of teachers themselves (just Wednesday, Tennessee officials announced they’d eliminate the mandatory use of test scores in teacher evaluations, which teachers opposed), the fluctuations can create anxiety among teachers, making it difficult for them to gauge their effectiveness over time and ever-changing metrics. And that, according to Davis, is the biggest danger of them all. “Instability is the biggest consequence,” says Davis, the D.C. teachers union president. “Teachers learning not to trust the evaluation process.”

*Correction, Feb. 17, 2016: This post originally misidentified the National Council on Teacher Quality as the National Center for Teacher Quality.