## >Adjusting multiple choice examinations

>Multiple choice examinations have several benefits. They have no intra- or intermarker variability. In fact they can be automated. And I wouldn’t be surprised if they are as effective as any other system in effectively evaluating material.

They need to be well written.

- The correct answer needs to be clearly more correct than other options.
- The correct answer should not be able to be guessed by the construction of the question.
- The order of the answer option should be random.
- A reasonable number of options need to be given.
- And the same number of options for every question.
- A significant number of questions needs to be included.
- The problem with multiple choice questions is the chance element. This can be reduced by increasing the number of questions.

If we have 20 questions with 4 options for each question, then random guessing will lead to people getting 5 correct on average (Exam mark = 25%); 20 / 4. However the range of correct answers will be quite great. Some will get 1 correct (5%), others 10 (50%). Whereas 200 questions will mean that people get 50 correct on average (Exam mark still = 25%), but a much lower range. Some may get 40 correct (20%), others 60 correct (30%).

Thus both exams when taken by people ignorant of the topic will give an average mark of 25%, but the chance of any particular individual getting a high mark is much greater with a smaller number of questions.

This seems obvious based on the examples above. Mathematically the range of marks is (inversely) related to the number of questions. The standard deviation of the range of answer marks is inversely proportional to the square root of the number of questions.

The other issue is standardising the results. Because people are likely to get 25% of the answers correct by chance (for 4 options), then one could subtract 25% from the final mark. So if you get 25% as a raw mark, you likely didn’t know the answer to any of the questions, that is your knowledge is 0%. So we subtract 25% from your mark to get your adjusted mark, which is 0%.

However if you get 100%, it is unlikely you knew 75% and got the other 25% correct by chance. Rather you get the ones you know correct, and you tend to get about a quarter of the ones you don’t know correct. So if you know 50% of the questions you will get 50% plus a quarter of the remaining 50%, that is 12.5%, which gives you a total of 50% + 12.5% = 62.5%. So a raw mark of 62.5% needs to be scaled back to 50%. And 100% means you know all the answers and does not need to be scaled back at all.

So we need to adjust the raw marks linearly to get adjusted marks.

- Let N be the number of questions.
- Let R be the number of options.
- Let X be the number of questions correct.
- Let Y be the adjusted number of questions correct.

Then

- X/N is the raw mark.
- Y/N is the adjusted mark.
- N/R is the chance number of correct answers.

When X = N/R then the mark needs to be adjusted to zero, ie. Y = 0.

When X = N then the mark needs no adjustment, ie. Y = N and Y/N = 1 (= 100%).

The number of questions correct equals the number of questions known plus the remaining number of questions divided by the number of options.

Rearranging for Y we get

Or as a mark

And any negative numbers are given zero.

If you have a basic set of knowledge in the area being tested you can improve your chances on those that you do not know by ruling out answers that you know are incorrect.

Often your chance would then be 33% or even 50% as you remove the least likely answers.

There are a couple of problems with this idea. But all are based around one basic assumption: that you either know it or you don’t.

In reality, if people are honest, most of the questions they don’t know, they sorta know. For instance, most people can narrow an answer they don’t know to two options. Thus the probability of guessing the right answer isn’t 25%, but 50%. Also, most people of some sense of intuition, so if they studied, even if they were able to narrow it down to two options, they will still get it right slightly higher than 50% of the time (probably only 55% or something, but I bet it would be visible). This number would also vary from person to person based upon how well they studied (so the higher your mark, the more likely this number would be higher than 50%) and how well a person has trained their intuition over the years.

Over all, People are not computers, and they don’t always work in perfectly expected ways. Knowledge is not true or false for them. Sometimes they will just have an instinct based off of how the wording of the question is (especially if it is a long test, because they’ll start gaining an instinct for the question style). Also because sometimes we know something in the back of our heads, but we can’t perfectly access it.

I think the only true multiple choice test is “Who wants to be a millionaire?” Perhaps we should do our tests like that, where you have to commit to the answer in front of a real person… Lock it in!

I don’t think the result should be modified if people have an educated guess. I want to remove the entirely random component, the component that would give a clueless person ~20% (for 5 options).

If you can narrow it down to 2 each time from knowledge, and then get ~50% of them correct then that seems fair.

I’m not gonna comment on the probabilities or the equations you gave.

But I will try to give some insight from a person that takes tests very well.

If you give me a multiple choice test I will test 20% better than my true knowledge of the subject, or better, rarely worse than 20%.

Four choices per question is too easy for people like me. Five only comes close. Try seven or more.

Making the wording tricky cuts down on my ability to manipulate the test to my advantage.

Do not put two questions about the same thing in the same test. It’s almost the same as giving me the answer. Yes, I generally read the whole test before I start answering questions.

Bottom line, multiple choice tests are for lazy teachers or teachers with too many students.

Athor Pel, Yes if questions are poorly considered, but not when they are well written. The most difficult written test I have done was multi-choice.

The benefit of multi-choice is they have no intra- and inter-marker variability. And they can discriminate well if there are a significant number of questions. Don’t let poor use of the method disqualify the method. The questions I have been involved in setting have been extensively debated.