New York Gov. Andrew M. Cuomo’s proposal for overhauling how teachers are evaluated, compensated, and protected from losing their jobs appears to be meeting stiff resistance from the state legislature, which may create a special commission as a next step. The most controversial element of Cuomo’s plan would tie 50 percent of reading and math teachers’ evaluations to their students’ standardized test scores.
If the ultimate goal is to improve the quality of teaching, depending so much on highly unreliable scores is likely to be counterproductive. A much more effective and proven approach is to promote team-based, data-driven support systems to help teachers become better at their jobs, day in and day out.
Currently, teacher ratings are primarily based on principals’ evaluations, sometimes in combination with parent or student surveys. Few would argue for retaining the status quo, which provides minimal useful feedback to teachers. But embracing standardized test scores because of their purported objectivity is a false promise. Abundant research shows that test scores are a poor proxy for teacher quality and that adopting them will make it harder to implement changes that are proven to help student outcomes.
New York is running behind the national stampede toward incorporating “objective” measures of student performance in evaluating teachers. Drawing from the vernacular of economics and business, such “value-added” measures suggest that the worth of teachers can be derived from comparing how their students’ standardized test scores change over the course of a school year. The Obama administration has used its Race to the Top grants and waivers of No Child Left Behind requirements to push states to adopt value-added provisions, which are championed by well-heeled foundations as well as conservative critics of teachers’ unions. From 2010 to 2013, the number of states requiring that teacher evaluations include standardized test scores soared from 16 to 41, according to the National Council on Teacher Quality. As of September 2013, 35 states’ and the District of Columbia’s public schools require that standardized test scores be counted as either a significant or the most significant factor in teacher evaluations.
Bad ideas tend to blossom in vacuums, and no one can cogently defend the old-fashioned teacher evaluation system that held sway throughout most of the past century. Typically, a principal would conduct one or two visits to each classroom a year and then grade the entire faculty as satisfactory or better. In New York last year, 96 percent of teachers were rated either effective or highly effective, much like Lake Wobegone’s entirely above-average student body.
Clearly teachers need to be held to higher standards and receive meaningful feedback about their strengths and weaknesses, as teachers’ unions generally acknowledge. But there are better ways of replacing this transparently feckless system.
The research identifies three basic, persistent problems with value-added methods of teacher evaluation. One is that the results tend to be very unstable for all but the very best and very worst teachers. When a teacher’s standardized test scores vary significantly from year to year and even from test to test, factors unrelated to the teacher’s performance are likely to be causing the volatility.
Second, the students assigned to each teacher will affect variations in test score performance, and those differences can be both substantial and difficult to quantify. A few especially disruptive students can make a teacher’s job much harder than it would be in their absence. Moreover, teachers with many English-language learners and special-needs students tend to receive lower value-added scores.
Third, students in any given classroom are affected by all sorts of factors outside a teacher’s control. Experiences at home, with peers, and elsewhere in the school and beyond can critically impact how students perform on standardized tests. In an important 2011 study, the authors wrote: “With respect to value-added measures of student achievement tied to individual teachers, current research says that high-stakes, individual level decisions, or comparisons across highly dissimilar schools or student populations, should be avoided.” That conclusion was shared in a report by the American Education Research Association and the National Academy of Education.
Using test scores to judge teachers isn’t just ineffective. It risks draining the lifeblood of otherwise highly successful, vital schools in which teachers and administrators collaborate to improve teaching and learning. If decisions about compensation and promotions are tied to how each teacher’s students perform, why would Ms. Johnson share ideas that helped her students with her colleague Mr. Jackson? Given a fixed bonus pool, Ms. Johnson’s income will be higher if Mr. Jackson’s students’ scores are lower. The reliance on scores fosters competition—the unhealthy kind—and perpetuates isolation between teachers who are being held accountable for factors they cannot control.
Education reformers such as Michelle Rhee and Joel Klein have fixated on value-added approaches because those methods are easy to explain and incorporate seemingly scientific measurement tools. But sophisticated studies have shown that the most effective schools engage in practices that aren’t so simple. Studies by the Consortium on Chicago School Research, the National Center for Educational Achievement, and McKinsey & Co. have all found that administrators, teachers, parents, and other stakeholders in successful schools share a high degree of trust and work closely together to improve the classroom experience. For example, teachers use video footage to receive feedback from peers and administrators about what’s working and what isn’t, much as athletes review videotape of their swing or jump shot with coaches and teammates.
Skeptics who assume teachers’ unions are inherently inflexible argue that they will never be willing to accept such ongoing scrutiny of their work. But the National Education Association, the American Federation of Teachers, and the Teachers Union Reform Network all support efforts to create evaluation and training systems analogous to medical residency programs, with ongoing mentoring and teamwork. This powerful video about the transformation of Peoria High School in Illinois demonstrates the willingness of one teachers’ union to accept classroom videotaping in response to the trust-building efforts of the school’s principal. In education, health care, and many sectors of the private economy, once-a-year annual reviews are far less important to improving productivity than sustained training systems. Workers in high-performing organizations hear regular feedback about their strengths and weaknesses from supervisors and peers, including their effectiveness at working in teams. The content of that feedback should be the central basis for gauging each teacher’s compensation.
Enrollment in teacher training colleges has been plummeting between 20 and 50 percent in big states like New York, California, Texas and North Carolina over the past five years. Teacher attrition rates from urban school districts are ballooning. At a moment when public-school teacher morale seems to be hitting rock bottom, adopting value-added evaluation and compensation schemes that produce arbitrary outcomes seems likely to accelerate the brain drain among teachers and would-be teachers. In great public schools—ones that nurture a culture focused on a shared sense of mission—teachers can’t wait to come to work. Creating that kind of environment is the clearest path to genuinely adding value for students. Evaluation systems like Gov. Cuomo’s will only block that path.