Do The Math

Don’t Worry About Grade Inflation

Why it doesn’t matter that professors give out so many A’s.

Last spring, George W. Bush reassured new Yale graduates, “And to you C students, you too can be president of the United States.” He’s out of touch. The gentleman’s C, in Bush’s day a respectably mediocre grade, is now a poor one. One study suggests that the average GPA in research universities and selective liberal arts colleges rose around half a point between the mid-’80s and mid-’90s. That grade inflation is taking place is hard to dispute. The question is: Should anyone care?

The arguments put forward against grade inflation span all genres: psychological (easier grading saps a student’s will to achieve), moral (a weak performance is unworthy of the letter B), Marxist (grade inflation is a symptom of the consumerization of education), and even geopolitical (higher grades in the humanities draw American students away from the sciences, thereby compromising our ability to build better weapons and deflect other people’s).

One of the most powerful and popular arguments against grade inflation is that it makes it difficult to tell one student from another. Harvey Mansfield, a professor of government at Harvard and a vocal grade-inflation foe, puts it this way: “Grade inflation compresses all grades at the top, making it difficult to discriminate the best from the very good, the very good from the good, the good from the mediocre.”

That sounds reasonable. But it’s wrong.

Start with a thought experiment. Suppose colleges offered only one grade, which we’ll charitably call “A.” Then all students would get straight A’s. Indeed, the best and the very good, not to mention the so-so, are all in one box. Mansfield, so far, is vindicated.

Now suppose there are only two grades, which we’ll call “A” and “A-,” and suppose that, in each course, half the students get A’s and half get A-s. Once again, the best, the very good, and the good, all look the same. The mediocre, perhaps, are weeded out. Mansfield is right, to a point; it’s hard for him, in his class, with his grade, to distinguish one student to another. But college students take lots of classes. What happens on average over the four years of a college career?

Let’s take a closer look at this two-grade experiment. Let Eva be a student who, on average, lands in the 70th percentile of her class. That is, 70 percent of students have weaker “natural ability” than Eva does. Of course, Eva will be better at some subjects and worse at others; there’ll be some variation in her grades. We quantify this variability by supposing that the standard deviation of her percentile rank is 20. This means, extremely loosely, that the difference between Eva’s actual percentile in any given class and 70, her “natural percentile,” is “typically” about 20. For her to end up at the 30th percentile, two standard deviations below her average of 70, is an extremely unlikely event—according to the usual bell curve, Eva should score this low less than 3 percent of the time.

Now suppose Hans is in the 80th percentile of his class, also with a standard deviation of 20.

Both students are above average, and both will get mostly A’s. But Hans, in order to dip below the 50th percentile and into the A- range, has to perform 1.5 standard deviations below his mean; this ought to happen about 8 percent of the time, or just two or three times in a standard, 32-class undergraduate career. Eva, on the other hand, gets an A- whenever she performs 1 standard deviation below average; this ought to happen about 17 percent of the time, which means Eva will rack up about 5 A-s. A seat-of-the-pants computation suggests that Eva has just an 18 percent chance of ending up with a higher GPA than Hans. Pretty good, for a system with only two grades!

Of course, this two-grade system is still helpless to distinguish between students Zoe and William at the 90th and 95th percentiles. Both have a solid chance of getting straight A’s. But now suppose there are five grades, A through B- (A,  A-, B+, B, B-), each attained by 20 percent of students. In this inflated regime, even gentlemen can coast to a B+ average! Let’s say Zoe and William have average ranks of 90 and 95, again with a standard deviation of 20. Then one works out that Zoe should obtain, on average, 18 A’s, 11 A-s, and three B+s, while William comes in with 20, 10, and two. Zoe gets a 3.83 GPA, and William a 3.86. Not much of a difference—but it’s there—and admissions decisions have been made on much less. It’s worth pointing out that, for our present purposes, it makes no difference whether the grades are labeled A, A-, B+, B, B- or A, B, C, D, F. Their discriminatory power is the same.

At this point, readers possessing technical proficiency and a mistrustful temperament will be piping up: Where do all these assumptions come from? How do you know what the standard deviations are? (I made them up.) Shouldn’t the deviations be smaller at the top of the grade range? (I don’t care.) And what probability distribution, exactly, are you using? (A truncated normal distribution, you fussy creature.)

Trust me: None of these questions matter. However you compute, the point stands. A grading scale much too coarse to separate students’ performances in a single class (for instance, the system with just two grades) can—if it is not too coarse—be perfectly adequate when we have a whole transcript to look at.

How coarse is too coarse? It’s actually easy to tell. In the example above, Zoe and William weren’t separated on the two-grade scale, because both got straight A’s. And, in general, a grading scale is too coarse if there are a lot of students who get the same grade in almost every class they take. In my experience, very few undergraduates make straight A’s all through college, let alone straight B+s. That indicates that the present system, inflated as it is, is good enough to rank our students.

Indeed, a 2000 Department of Education study found that just 14.5 percent of undergraduates nationwide had a GPA above 3.75. And Henry Rosovsky and Matthew Hartley, in their well-reasoned monograph, found “no large body of writings in which, for example, employers or graduate schools complain about lack of information because of inflated grades.” (Certain of their informants did complain to this effect in “informal conversations.”) Anyone who’s read fellowship applications, or graduate school admission folders, knows that the best undergraduates aren’t hard to pick out—they’re the ones who excel in nearly every course, the ones with a healthy sprinkling of A+s, the ones whose recommendation letters read like mash notes.

So, Mansfield is wrong—which doesn’t mean grade inflation is all right. There are still those moral, psychological, Marxist, and geopolitical questions to think of.

And underlying all these questions is a deeper one: Why do we grade? Is the point to give students information? To reward, punish, or encourage them? Or just to hand them over to law-school admissions committees in accurate rank order? Until we answer this question, there’s little hope of making sense of grade inflation. It’s as if we were bankers trying to formulate a monetary policy, but we hadn’t quite decided whether dollar bills were a means of economic transaction or a collection of ritual fetish objects.

“In a healthy university,” Mansfield says, “it would not be necessary to say what is wrong with grade inflation.” There, again, he’s mistaken. In a healthy university, we would talk about every aspect of grading, down to the bottommost questions about why we grade at all. I suspect we’d all find it much less tempting, under those circumstances, to project onto our students’ GPAs our anxieties about moral leadership, honesty, and the rewards to be expected from hard work. That would be an improvement. Grades are—should be—many things. But ritual fetish objects they are not.