Exam Grading and Standard Deviations
Imagine you give an exam with two questions, each supposedly worth 50% of the final grade. Imagine further you grade both questions and properly normalize the scores for each one to a 50 point scale. (I’m not so sure all professors normalize properly, but that’s a different problem.)
What do you do if the standard deviations in the two normalized grade populations vary widely? In other words, imagine that question one elicits a long, flat curve: the lowest score is much lower than the highest score, and there is a lot of variation in the scores in between, while question two elicits a compact curve with a very high peak that drops off quickly in both directions.
Is it legitimate (fair, proper) simply to add the normalized scores for questions one and two to derive the final score? Does this cause the first question to exert an unfairly disproportionate effect on the final curve? First, consider the extreme case. In a class of 50 students, every student gets a different normalized score for question one–from one to fifty points–while every student in the class gets the exact same normalized score–say 20 points–for question two. Simply adding the scores together means the final curve will match the curve for question one exactly, and question two will have been written out of the exam.
This seems to be the fair result. Question two is a bad question. It didn’t differentiate between the students in the class, so it is fair to curve the class based solely on their performance on question one. What is the alternative?
But what if we’re not at the extreme case? Imagine question one’s curve is much flatter than (the standard deviation of the scores is much higher than) question two’s curve, yet question two’s curve nevertheless differentiates between the students. Is it fair simply to add the two, or are you failing to abide by your promise to your students to have each question be worth 50% of the exam?
If you think that it is not fair simply to add, you can apply a transformation to one set of data or the other to bring the standard deviations more in line with one another. Is this proper?
My initial take is that sometimes the transformation is fair and sometimes it is not. It depends on what you think about the objective quality of your grading methods and the uniformity of the difficulty of the questions you wrote. For example, if question one is much more difficult than question two, perhaps the curve should be driven by question one, and the data should not be transformed (you can make the opposite argument). In contrast, if question one is an issue spotter and question two is a policy question, simply adding the normalized scores may not reflect the greater subjectivity in grading policy questions, and a transformation may be in order.
There are no neutral choices here. Unless the scores for questions one and two are highly correlated, many students’ final grades will vary based on the choice made. At the very least, this is yet more proof of the inherent subjectivity of the entire grading process. Have others thought about this, and if so, which choices have you made?