Why Not Get Rid of Student Evaluations?

The answer requires us to think about power.

Three Students filling out a teacher evaluation.
If we use student evaluations in the right ways, they become evidence that can help us mitigate the unfairness that they reflect.

Photo by Thinkstock

Everybody has reasons to hate student evaluations. If they represented the judgments of one individual, rather than being a big heterogeneous data set, we would call that individual sexist. Women face double standards and, often, tougher expectations. Students in some experiments give higher scores to teachers who they think are men. Evaluations may also favor the white and the young, or punish the homely (when we find someone attractive, we tend to rate their other qualities more highly, in what psychologists call the halo effect). And better scores don’t mean that students learned more—not if you go by subsequent performance in sequenced classes (e.g., Spanish 1 and Spanish 2) in which you can measure such things.  

Based on such findings, and on her own experience as a language teacher, Rebecca Schuman called evaluations biased and worthless in Slate last year. They are biased. Everyone’s biased. But they’re far from worthless, and now that the end of the school year—the season of student evaluations—is upon us, it’s worth looking again at what’s wrong with them, and what’s irreplaceably right.  

The cases against them are hard to ignore. Student evaluations encourage pandering, given that most people—most teachers—want to be liked. Some teachers have to be liked, and to prove they are liked in order to keep their jobs. Evaluations also encourage a consumer mentality—students are shopping for classes, as it were—and they create a perfect forum for trolling. (Schuman compared them to Yelp ratings.) My university recently decided to stop publishing the number that shows how hard students think a course is because students were using that number to pick easy courses. As with election polls and athletes’ stats, quantitative evaluations can attract enthusiasts who lack statistical literacy, who then build castles from quantitative sand. (“Twentieth-Century Novel got consistently higher ratings than 18th-Century Novel—I guess we need to hire more modernists!”) Careful studies by people soaked in statistical literacy, such as Philip Stark and Richard Freishtat at the University of California–Berkeley, conclude that effective teaching is quite hard to measure and that anonymous student evaluations cannot do that job.

So why not bid goodbye to commencement season by just getting rid of them? Why not replace them with nothing, if they’re worse than nothing, or (as Schuman suggested) replace them with end-of-term comments to which students must sign their names?

The answer requires us to think about power. If you look hard at the structure of academia, you will see a lot of teachers who, in one way or another, lack power: adjuncts and term hires (a large population, and growing); untenured faculty (especially in universities like mine); faculty, even tenured faculty, in schools where budget cuts loom; graduate students, always and everywhere. You might see evaluations as instruments by which students, or administrators, exercise power over those vulnerable employees. But if you are a student, and especially if you are a student who cares what grades you get or who needs recommendations, then teachers, for you—even adjuncts and graduate teaching assistants—hold power.

That means you might not tell them to their faces, nor over your signature, what they did wrong as your teacher. You’re not going to tell them, for example, that you never finished Tennyson’s In Memoriam, or that (as far as you know) nobody in the class finished Red Mars. (Both of them are great, by the way. Please do finish them.) And you’re almost certainly not going to tell your professor that his English class was too easy, that it felt more like a book group than like the demanding seminar it was supposed to be.

All of the above examples come from classes I’ve taught; two of them were problems I discovered only through reading evaluations. The more stake your students have in your opinion of them—even when you’re done grading their work—the less honest they can be when their name is attached. Some of those students have information you need, even if others are trolls.

As Stark and Freishtat also conclude, “students are ideally situated to comment about their experience of a course” as long as they don’t think they’re going to lose out by doing it. Conversely, because evaluations remain anonymous, we can take their praise as sincere. If anonymous student evaluators tell you they recommended your course to their friends, they aren’t just saying so to get an A.

Teachers should not become simply entertainers. But we would like students to view our classes as time well spent; we would like them to regard what we teach as dulce et utile—sweet and useful, as Horace put it, or at least one out of the two. We believe that Goethe and Marianne Moore and Theocritus and Max Weber and W. E.B. Du Bois are worth the time it took students to learn how to read them, and we’d like to know whether they came to agree.

But don’t all those numbers in student evals get misused? They do. So do all numbers in fields where the modern demand for data exceeds the useful supply (see also: K–12 education, where I’d love to see more attention to what students prefer). The worst cases of inappropriate quantification in higher ed these days come from Britain, where government demands for quantifiable research results, and enforced competition for funding, make it look like U.K.  higher ed has been taken over by some combination of the Borg, Friedrich Hayek, and Ultron. There, too, more attention to what students think, or at least to what students say they think, might be a good thing.

But student evaluations are racist! And sexist! And they punish things no instructor can control, such as class size! That’s true, but if we use evals in the right ways, they become evidence that can help us mitigate the unfairness that they reflect. Small courses in a given topic almost always get better ratings than large ones—which shows that students prefer small courses, which is something that people making decisions about class size, section size, and requirements should know. The high satisfaction that comes with small class sizes also gives fields and disciplines that will never attract large numbers of majors a useful argument for their existence. (If a very small department has very dissatisfied students, someone should ask why.)

As for the racism, sexism, ableism: These are problems not with anonymous student evaluations in particular, but with any evaluation of anything performed by human beings, whose implicit bias ought to be noticed before it can be addressed. (Symphony orchestras alleviated sexism in hiring by conducting auditions behind a curtain, which isn’t a solution those of us who value face-to-face teaching can use.)   Yet numbers can show one teacher whether she’s improving, or where a particular course went wrong. Bias that favors men over women, or tall teachers over short ones, won’t affect evaluations that pertain to the same teacher’s various courses, given that most teachers don’t change height or gender between semesters.  (Midterm evaluations can also help a teacher improve a course while there’s still time.)

Of course, some teachers do change gender; evaluations have helped at least one transgender professor consider how students saw him during and after transition. Evaluations identify both good and bad reactions to visible difference, telling us what to encourage and what should raise an alarm. Chairmen and deans should want to find out whether students respect teachers’ gender transitions and whether teachers show respect for students. These forms are one way students can let us know.

Chairmen and deans also need to know when classroom teaching fails: when a professor makes catastrophically wrong assumptions as to what students already know, for example, or when students find a professor incomprehensible thanks to her thick Scottish accent. Student evaluations are one of just a few venues through which that information can travel, short of a student’s coming to visit a chairman in person or filing a formal complaint. Responsible registrars—my institution has one—also have ways to reduce the impact of trolls. When I got a long, jeering, negative evaluation some years back, stuck in among the positives (and lowering their average), I complained, and the registrar then discovered that this student had done the same thing in every course. (I’ve been told there’s a note in the files to that effect, though I don’t know if the tallied average changed.)

“O wad some Pow’r the giftie gie us/ To see oursels as others see us!” Robert Burns wrote, in a poem with a thick Scottish accent. “It wad frae manie a blunder free us.”  That power lies in student evaluations. They have obvious flaws, and all college teachers know how they can be misused—but colleges, and instructors, do better with them than without them. They can free teachers from blunders as well as flatter our self-regard, they remind us that if we care what our students learn, we ought to care about what they think; anonymous evaluations are one of the few ways that we can try to find out.