jump to content | main menu | tips on using this site | site map
OCT sitemap
assessment unit home
measuring student learning
measuring student learning home button
test planning button
test constructing and administering button
test analyzing button

print module; link opens in new window search the O C T site tell a friend about the O C T site; link opens in new window contact the O C T team; link opens in new window  meet the O C T team

 





  Printer-friendly page Printer-friendly page

Analyzing A Test

This Page Includes
  Item Statistics
  Item Difficulty
  Item Discrimination
  Activity
  Tips


Item Statistics

Most of you will be teaching at an institution that will have scoring services available. Especially if you are giving objective tests you might wonder if you should bother with these services; after all, it won’t take that long for you to do scoring yourself.

However, analyzing a test goes beyond counting the number of items right on student papers. The information that follows, when used, will help you give feedback to both yourself and your students. In addition, you’ll be able to use it in the future to improve your tests.


Item Difficulty Return to top of page

Item difficulty is simply the proportion of students who answered an item correctly. If j indicates item number, Nc is the number of students getting the item correct, and N is the total number of students taking the test, then the item difficulty for the jth item is

P sub J equals the fraction N sub C over N

Table 1 shows the student by item matrix for a six-item test. A “1” indicates that the student got that item correct. To get item difficulties, simply count the number of students who got an item correct and divide by the number of students. For example, the difficulty for item 1 is p1 = Nc / N = 9/10 or .90. Item 1 was an easy item because almost all students answered it correctly. On the other hand, item 6 has a difficulty p6 = Nc / N of 2/10 or .20. Item 6 was difficult for the group. In this example, students 2, 3, and 4 make up the upper 30% of the class and students 6, 7, and 8 make up the lower 30%.

Table 1: Student by Item Matrix Showing Correct Answers
Students Responses
1
2
3
4
5
6
7
8
9
10
Item total
Item 1
1
1
1
1
1
0
1
1
1
1
9
Item 2
1
0
1
0
0
0
1
0
1
1
5
Item 3
1
1
1
1
1
1
0
0
1
1
8
Item 4
1
1
1
1
1
1
0
0
0
0
6
Item 5
0
1
1
1
1
1
1
0
0
0
6
Item 6
0
1
1
0
0
0
0
0
0
0
2
Student total
4
5
6
4
4
3
3
1
3
3
 

 

Table 2, below, shows the answering pattern of the 10 students taking the six items. The asterisk above the option indicates the correct answer. The number in a cell indicates the number of students who answered a particular option.

Table 2: Item by Student Matrix of Responses on a 6-Item Multiple-Choice Test
Item A Item B Item C Item D
student 1
 9*
0
0
1
student 2
4
 5*
1
0
student 3
0
 8*
1
1
student 4
4
0
0
 6*
student 5
2
 6*
2
0
student 6
 2*
3
2
2

In looking at Table 2 you can see that for the first five items, most of the students selected the correct answer. In most classroom achievement tests, as a general rule of thumb you want to get an average difficulty of .7 - .8. Why? The reason here is psychological—not mathematical. Even if a student is “given” a B when he has correctly answered only half the items, he still may come away with a feeling of failure. A good B student should feel that he was able to show his knowledge.

What should you do in the future with a test that is too difficult? Consider adding items, usually to the topic areas that deserve more weight. Try to make these items somewhat easier than those measured on the original test.



Item Discrimination Return to top of page

Item discrimination is the ability of the item to differentiate those students with more knowledge from those with less. To calculate item discrimination, the total test score is used as a surrogate for this knowledge, the top scoring students are separated from the bottom scoring students, and you then compare their response patterns. Typically the group is divided into thirds and the middle group is excluded. For our purpose on the sample test let’s call

Students 2, 3, and 4 the top group (Nu)
Students 1, 6, and 7 the bottom (Nl) (see Table 1).

An item discriminates positively if more students in the upper group got an item right than students in the lower group. To calculate the discrimination index subtract the number of students in the lower group that got an item correct from those in the upper group, and divide by the number of students that made up the upper or lower group. The formula for the discrimination index of item j, where pju = Ncu / Nu is the item difficulty for the upper third and pjl = Ncl / Nl is the difficulty for the lower third

dj = pju - pjl

or, if Nu = Nl, that is, the number of students in the bottom one third is the same as that in the upper third,

dj = (Ncu - Ncl ) / Nu.

For example, item 1 was answered correctly by all three students in the upper group and two in the lower (students 1 and 7). The discrimination index is

dj = (Ncu - Ncl ) / Nu. = (3 – 2) / 3 = 1/3 or .33.

The discrimination index for item 6 is (2 – 0)/3 or .67. The discrimination index for item 6 is higher because more good students got this item correct. Another way of viewing this item is that if you got it correct, you are apt to be a good student.

Item 2, however, produces different results. One student in the upper group got it correct; one in the lower got it correct yielding a discrimination index of 0. Why did as many good as poor students get this item wrong? Going back to Table 2, you can see that many students were drawn to distractor “a”. What is in distractor “a” that pulls students? Have you possibly mis-keyed the item? Whatever it is, it’s something you want to look at before you hand back the exam. Perhaps you can see immediately why students chose this and you say to yourself, “I can see why this might be considered a correct answer.” If so, give credit for both options.

Item analyses can give you useful information. To illustrate various points, the example given had just 10 students. However, for small classes (<20) remember that there may be some special characteristics of the students that affect the responses. If you were to give this test again to another class, you might get somewhat different results. However, if you have a large class, your results are apt to be more stable

activity


Calculate the difficulty and discrimination indices for items 3 and 4.

Click here to launch an interactive version of this activity . It requires the Flash plug-in. If you don't have the plug in click on the graphic below.

Click to Get Flash Plug in



Tips Return to top of page

  • Easy items will prove to have low discrimination power. Why? Let’s take an example at the extreme. If an item is answered correctly by all students, as many good as poor students will have answered it correctly. Keeping the item, then, should be more a function of whether it addresses an important objective. However, very difficult items are also apt to produce poor discrimination indices. These items also should be examined carefully.
  • Scoring services on campuses typically provide a table similar to Table 2 in this module in addition to difficulty and discrimination indices. A smart instructor will examine the test and his table carefully before he returns his test. Forewarned can make you forearmed. Students will fight for every point.
  • If students make a good case for other than the “correct” option, rescore the test and learn from the experience.
  • If you have said that 30 points must be reached to get an “A”, rescoring will produce more A’s. Don’t, however, change (raise) the criterion to preserve a certain number of A’s. This will be viewed by some students as punishment. For example, suppose you got a score of 30 and thus got an A. However, because some students were given credit for alternate answers, the criterion for an A was raised to 31, so now you get a B. If you were in that student’s shoes, how would you regard the instructor who changes you grade from an A to a B?

If you give several tests, errors will eventually cancel out. Overall, remember, if you give a point after a reasoned argument, you are a hero; if you take away points or lower grades, you are a cad.


return to top

 

© CET, SFSU 2003 Introduction | Design | Development | Implementation | Assessment | Site Home
this is the end of the page.