Macbeth and Banquo Try an Unorthodox Grading Method

We’re (hopefully) nearing the end of law school grading season. Personally, I take the Macbeth approach: “if it were done when ’tis done, then ’twere well it were done quickly.” In part, this is because I find grading unpleasant. I’m nervous about being unfair and inconsistent (and I also don’t want to get trolled by my students for being late).

There’s no avoiding that the grades we give make a substantial difference in our students’ near-term career prospects. While this adds to the stress to “get it right,” there is relatively little discussion in legal academia about how we grade. And although there are many different ways to grade, cognitive science provides at least two suggestions that seem broadly applicable.

First, grade by question, not by exam.

In his recent book, Thinking, Fast and Slow, Daniel Kahneman discusses grading. He describes how early in his career he would grade exams in the “conventional” way, “pick[ing] up one test booklet at a time and read[ing] all that student’s essays in immediate succession, grading them as I went.”

The problem with grading by exam is that it leaves the professor at the mercy of the “halo effect,” where the “first question . . . scored had a disproportionate effect on the overall grade.” Since Kahneman won a Noble Prize for his behavioral economics research while I once read a book about it, I’ll just quote him a bit more:

“The mechanism was simple: if I had given a high score to the first essay, I gave the student the benefit of the doubt whenever I encountered a vague or ambiguous statement later on . . . if a student had written two essays, one strong and one weak, I would end up with different final grades depending on which essay I read first.”

Khaneman’s solution: grade by question, not by exam.

Kahneman goes on to note that even knowing how well a student did on earlier questions on that same exam (for instance by writing the points earned on the front of the exam) can influence the grader, and therefore it’s best to put the point score somewhere not readily visible, like on the inside page. This all dovetails with why we grade exams blind: we don’t want to be influenced by our preconceived notions of student performance. Similarly, we should grade each question “blind,” uninfluenced by the students past performance on the exam itself.

Second, randomize the grading order across questions.

While grading by question eliminates the halo effect, it doesn’t eliminate another cognitive bias: the desire for regular distributions. For instance, if you are scoring a question out of five points, and you’ve given out fives to the past three exams, you’re more likely to give the fourth exam a lower score, regardless of how good the answer is. (Full disclosure: the author of the prior link, Jacoba Urist, is my sister).

Robert Shiller (who taught me behavioral economics), provides the solution: randomize exam order across question. That means that once you’ve graded all of the question ones, shuffle the papers and reorder the exams to grade the question twos.

These techniques won’t necessarily make grading any less nerve-wracking (or more fun), but they might make it a little more fair.

Anyone else have further grading tips?

  1. Howard Wasserman says:

    Not related to grading, but people on the meat market should keep in mind the advice from your sister’s piece.

  2. Orin Kerr says:

    I always regrade the first few exams I graded for each question, until the regraded scores match the initial scores. At least for me, it often takes a few exams to settle on a fairly-applied standard.

  3. Steph Tai says:

    I feel sort of vindicated, because I’ve done both of these things ever since I’ve been teaching. But I can’t take credit for drawing from the insights of cognitive psychology or anything. It’s just how we used to grade exams when I was in chemistry graduate school–we the TAs would all get together in a room with some pizza, then we’d go through question by question to grade (drawing randomly from a huge pile of exams) so that when ambiguous answers came up, we could share thoughts on how to deal with them consistently. Drawing from a big pile of exams meant that it’d be pretty random what we got, but that we as a group would end up “finishing” grading the question at the same time: when the middle pile ran out. Then we’d put everything back into that middle pile again and start with the next question.

    One other thing I also do (drawing from my earlier experience) is try to mimic that process of discussing ambiguity, which I think helped us reach greater consistency. What I do now (since I don’t have a group to grade with) is write down on a notepad each ambiguous grading issue I come across (as well as how I decided to deal with it) to remind me to do the same with similar issues in subsequent exam answers. Having it down on paper really helps, I think.

  4. Aaron Zelinsky says:

    Howard: a good point. Take the early interview if you can!

    Orin: Thanks. I found myself doing that too, particularly for more complicated questions, but on an ad hoc basis.

    Steph: Glad to vindicate you! My twin brother, a mathematician, had a similar reaction. I wonder if that style of grading is more common in the sciences because multiple people split the work? The ambiguity check sheet is a good idea. I wish I had done it. I would occasionally reach back into the pile to find an old exam, which is a lot less precise.

  5. Steph Tai says:

    Aaron: “I wonder if that style of grading is more common in the sciences because multiple people split the work?” Yeah, it probably is, though I know grad students in non-science departments who are graders for classes where they have to split the work; they seem to just split up the exams beforehand, and then each grader handles their pile of exams however they see fit. So I think it might be more a math/science thing. But I’m not sure where that norm (if there is one) comes from.

    I love the ambiguity checksheet! It’s also helpful for talking to students about the exam later, because it helps you contextualize for them their exam performance with the rest of the exams out there.

  6. Aaron Zelinsky says:


    Interestingly, it doesn’t seem to be the norm in psychology — or at least Kahneman doesn’t think it is — since he calls grading by exam the “conventional” way.

  7. Steph Tai says:

    Oops, this is really embarrassing. I went to college at a school where “science” just referred to the physical/natural sciences (seriously, you could fulfill what we generally referred to as our “humanities” requirement–which technically included social sciences but that wasn’t part of the colloquial reference–using some psychology classes.) As someone who writes about different “sciences” in my own work (and am aware that the categorization of psychology in academia sometimes varies from school to school, actually, mostly between “social sciences” and “biological sciences”), I should and do know better than that. But I slip sometimes, and this is an example. I meant to say “So I think it might be more a math/natural science thing.” Bad me!

  8. Aaron Zelinsky says:

    No problem! I was an econ major. I don’t think anyone considered us part of the “sciences.”

  9. Steph Tai says:

    Heh, in college, we could fulfill parts of our “humanities” requirement by taking econ courses, too. 🙂

  10. Ken Rhodes says:

    Orin: You have addressed the “Olympic scoring problem.” In subjective scoring, the judges tend to withhold high scores from early competitors so they (the judges) will have an upside potential for grading subsequent competitors.

    In the Olympics they try to address that problem by putting the lower rated gymnasts/skaters at the front of the rotation. That way, it is not unreasonable or unexpected when the early competitors “leave some points on the mat (or ice).” The fairness problem, though, is that a lower rated competitor who performs far above his expected level is screwed by his early position. Your approach of regrading the early competitors at the end of the rotation addresses that. (But it makes extra work for you.)

  11. Adam Garcia says:

  12. Marsha Cohen says:

    Kahneman’s book is on my to-read pile — but I wonder why reading-through-each-exam is deemed the “conventional” way to grade? Did he do a survey?

    I started teaching decades ago, and could never imagine grading law essay exams other than question-by-question. And before I grade any, I read a random sample before finalizing my grade sheet. After all, I may think I know what the answers are, but since I was NOT a learner in my own classroom I feel I need to see what got communicated. Oh, and I mark scores on a grade sheet which I tuck into the exam where I won’t see it when grading the subsequent question. Is there a professor who doesn’t feel a sense of surprise/shock/worry (did I grade this correctly?) when recording a question score and seeing it is way out of line with the others? (When this happens, and it does, it underscores the necessity of question-by-question blind grading.)

    I always randomize, so the same person’s first question isn’t read at the same “place” in the stack as his or her second question, etc. This is not because of a concern for score distribution, but rather that one’s view-of-the-best-answer (even having read a sample first) might change over the course of reading dozens and dozens — and whether one gets tougher or easier, and is or is not successful fighting one’s boredom, it just always seemed fair to randomize the potential for any of these subjective human factors to impinge on individual students.

    Grading is, as I’ve always told students, an art not a science. I have never regraded an entire set of law school exams just to see what would happen. I’m confident both that there would be some scoring changes and that they would be minor.