Needs Improvement

Student evaluations of professors aren’t just biased and absurd—they don’t even work.

Student evaluation
Because student evaluations aren’t just useless: They’re biased.

Photo by LuckyBusiness/Thinkstock

It’s student evaluation time again—and I should be the last professor in the world to complain. With slight exceptions for “caring too much” and courses that meet “too early” (9:10 a.m.), my evaluations are quite good. And yet the student evaluations of teaching (SETs) I’ve received during my decade-long teaching career have meant absolutely nothing. This is because student evaluations are useless.

Ostensibly, SETs give us valuable feedback on our teaching effectiveness, factor importantly into our career trajectories, and provide accountability to the institution that employs us. None of this, however, is true.

First, evaluations promote sucking up to customers—I’m sorry, students—often at the expense of teaching effectiveness. A recent comprehensive study, for example, showed that professors get good evaluations by teaching to the test and being entertaining. Student learning hardly factors in, because (surprise) students are often poor judges of what will help them learn. (They are, instead, excellent judges of how to get an easy A.) 

Indeed, some of the worst evaluations I ever got were for hands-down the best teaching I’ve ever done—which I measured by the revolutionary metric of “the students were way better at German walking out than they were walking in.” Alas, this took work, and some of the Kinder attempted to stage a mutiny on evaluation day. Little did they know that a “too much work” dig is the #humblebrag of the academy—and, indeed, anything less on evals is seen as pandering at best, and out-and-out grade-bribery at worst.

Speaking of grade bribery: Evaluations impact career trajectories, all right, but only of the most vulnerable faculty in the university—yes, adjuncts, whose semester-long contracts are often renewed (or not) on the basis of student feedback alone. Meanwhile, only in the rarest and most politicized cases do even scathing evaluations harm tenured big shots—who, unsurprisingly, often care about undergraduate teaching the least. In short, asking students to evaluate their professors anonymously is basically like Trader Joe’s soliciting Yelp reviews from a shoplifter.

I’m sorry—a bigoted shoplifter. Because student evaluations aren’t just useless: They’re biased. The other day I put a call out for notable evaluation stories, and the response was both overwhelmingly depressing and depressingly unsurprising:

Indeed, many evaluations, no matter who the professors were, focused on hair (and beards!), clothes, general disdain for the subject matter (“Philosophy sucks!”)—anything but constructive assessment of teaching. Seriously, though, anecdotal data notwithstanding: Bias in evaluations is widely accepted, so much so that some who use evals as assessment tools already control for it:

Just what I wanted, to be “pretty smart, for a girl.” Oh wait:

Because of all this—off-topic vitriol, irrelevance, bias—most tenure-track professors I know (who aren’t hanging onto their evals for dear life) don’t read their evaluations at all.

This isn’t to say that professors should be left solely to their own devices, a million poorly-dressed sovereign nations, left to declare “constructive” naptime during a Freud seminar, or emulate Wittgenstein’s turn as a schoolteacher and employ corporal punishment. Egad. Assessment of teaching is vitally important—but how can we actually do it so that it works?

Peer evaluations are a common suggestion (and, indeed, often common practice). But those only work if your peer actually cares about teaching in the first place—or doesn’t want to sabotage you. Outside reviewers (from other departments) could solve for this, but only if you underestimate the academic’s propensity toward petty vindictiveness: One bad review from English of a history professor, and we’ve got a permanent schism between two departments that should be clinging onto each other for survival.

All right, so what about “effectiveness measures” from the administration? Yes, let’s create even more administrators—what today’s universities need are more people who’ve never taught a day, highly invested in running departments on the cheap. All right, fine, how about we just test the students, and base professor effectiveness on the results? Sure, because that’s worked out so fantastically for K–12.

Or, OK, we could measure performance in subsequent classes—but many of us teach general ed, and our departments will never see those kids again. Measuring “good teaching” is a touchy, complicated subject, and all solutions involve both massive compromises in pedagogical autonomy and substantial amounts of “service work”—two of professors’ very favorite things.

I see two actual resolutions to the evaluation calamity. One of them is massively important and will never happen; the other is fairly trivial and could happen tomorrow.

The first: A complete cultural shift at doctoral-granting institutions about the importance and value of teaching. Damn near everyone with a doctorate learned to teach (or didn’t!) in an environment where undergraduate teaching is, to paraphrase Nietzsche, an affair of the rabble: Graduate TAs and adjuncts.

In grad school, I was actively told “not to care too much” about teaching—advice that is standard practice:

So again, the first, best and most important way to measure teaching effectiveness would be to create a culture at elite research institutions where the instruction of undergraduates actually matters. Fat chance.

So here’s another solution, almost breathtaking in its simplicity. Combine peer evaluative measures (of lesson plans and assignments, not just classroom charisma or test scores) with student evaluations—but make the students leave their names on the evals.

The day the first yahoo on Yahoo wrote a comment was the day we should have stopped anonymous student evaluations dead. The “online disinhibition effect” both enables and encourages unethical, rash behavior, and today’s digital native students see no difference between evaluations and the abusive nonsense they read (and perhaps create) every day.

Actual constructive criticism can be delivered as it ought to be: to our faces. Any legitimate, substantive complaints can go to the chair or dean. There is no reason for anonymity—after all, we have no way to retaliate against a student for a nasty evaluation, because we can’t even see our evals until students’ grades have been handed in to the registrar (and if you hated us that much, you won’t take our class again). And besides, I hate to tell you this, but professors know handwriting; we recognize patterns of speech; we can glean the sources of grudges. We know who it was anyway.

Sure, this won’t change the culture of academia, where getting a position at a so-called “teaching college”—and thus spending all of your time with undergrads, as I now do—is considered abject failure. But it will certainly de-Yelp-ify the evaluation process, cut down on some of the bigotry, and it might even (gasp) offer us some constructive feedback. That’s a solution I evaluate at 4 out of 5 (“Agree”!)—which isn’t bad at all, for a girl.