jump to content | main menu | tips on using this site | site map
OCT sitemap
assessment unit home
evaluating tests
Evaluating Tests Home
Mathematical underpinnings button button

Mathematical
Underpinnings


print module; link opens in new window search the O C T site tell a friend about the O C T site; link opens in new window contact the O C T team; link opens in new window  meet the O C T team

 






  Printer-friendly page Printer-friendly page

How to Improve Test Reliability and Validity


Implications for Grading

two different students taking tests

Information about a test’s reliability and validity would be only of academic interest if we were unable to improve a test we are not satisfied with. The first hurdle to get across is one of interpretation. How large or small can a coefficient be and still be useful? The answer depends on the use to which the test will be put. For group decisions, the measure can indeed be rough, but where individual students are concerned, we need more precision. Let’s assume that our principal purpose is to assign letter grades, A through D, say.

Reliability of the data will affect the precision and repeatability of our grade assignments. If the test is purely random, that is with a reliability of zero, there is no consistency whatsoever from one grade assignment to another. With a reliability of one, we would have perfectly repeated grade assignments. With reliability somewhere in between, the assignment depends, in general, on the standard error of measurement, and in any specific case, it depends on the standard error and how close the student is to a cut point. For example, if a student is one point over a “B” on your test and the standard error of measurement is 5 points, he has a good (42%) chance of being classified as “C” next time.

If you are not sure of the reliability of a test or its standard error, you must be very careful in assigning grades, especially for those students near the cut scores (the boundaries between grades). For these students, even if you know you test’s statistics, you must have more than one source of information to help you fine-tune your decision.

related item

Improving reliability

The most straightforward ways to improve your test’s reliability follow directly from the above discussion.

First, calculate the item-test correlations and rewrite or reject any that are too low. There is no official decision point here and convenience often rules, but it is safe to say that any item that does not correlate with the total test at least (point-biserial) r = .25, should be studied.

Second, look at the items that did correlate well and write more like them. The longer the test, the higher the reliability up to a point.

Another point to keep in mind is restriction of range: this is a narrow range of scores which limits all statistics and hence correlations between test forms. Another way of looking at the issue is that with a homogeneous group of people, you will not have as great an degree of accuracy in grade decisions (say, A– D) as you would with a very heterogeneous group. Heterogeneity here refers only to the total test score.

If the content sampled from the domain is restricted, both validity and reliability suffer. If the test is too long or appears too difficult, the examinees will be tempted to guess, and this will increase error directly.

related item

Improving validity

Increasing validity is an on-going challenge. Here are some tips to help improve test validity.

  1. Clarify your test construct. Write down what you expect of the students. If you can’t verbalize it, you can’t test it.
  2. Match the table of specifications with the test. Better yet, ask another member of the faculty to do this for you. Do not take any of this personally.
  3. Try rewording some of the items, specially in light of class discussion. Listen to your students!
  4. Run a DIF analysis and adjust/remove items which fail.
  5. Compare the test to other data which might be available.

return to top

 

© CET, SFSU 2003 Introduction | Design | Development | Implementation | Assessment | Site Home
this is the end of the page.