|jump to content | main menu | tips on using this site | site map|
How to Improve Test Reliability and Validity
Implications for Grading
Information about a test’s reliability and validity would be only of academic interest if we were unable to improve a test we are not satisfied with. The first hurdle to get across is one of interpretation. How large or small can a coefficient be and still be useful? The answer depends on the use to which the test will be put. For group decisions, the measure can indeed be rough, but where individual students are concerned, we need more precision. Let’s assume that our principal purpose is to assign letter grades, A through D, say.
Reliability of the data will affect the precision and repeatability of our grade assignments. If the test is purely random, that is with a reliability of zero, there is no consistency whatsoever from one grade assignment to another. With a reliability of one, we would have perfectly repeated grade assignments. With reliability somewhere in between, the assignment depends, in general, on the standard error of measurement, and in any specific case, it depends on the standard error and how close the student is to a cut point. For example, if a student is one point over a “B” on your test and the standard error of measurement is 5 points, he has a good (42%) chance of being classified as “C” next time.
If you are not sure of the reliability of a test or its standard error, you must be very careful in assigning grades, especially for those students near the cut scores (the boundaries between grades). For these students, even if you know you test’s statistics, you must have more than one source of information to help you fine-tune your decision.
Another point to keep in mind is restriction of range: this is a narrow range of scores which limits all statistics and hence correlations between test forms. Another way of looking at the issue is that with a homogeneous group of people, you will not have as great an degree of accuracy in grade decisions (say, A– D) as you would with a very heterogeneous group. Heterogeneity here refers only to the total test score.
If the content sampled from the domain is restricted, both validity and reliability suffer. If the test is too long or appears too difficult, the examinees will be tempted to guess, and this will increase error directly.
|© CET, SFSU 2003||
this is the end of the page.