Purpose of Testing (Uses and Misuses)
Categorizing people
Tests are usually developed to make decisions about the ability or knowledge
of people. In order to make these decisions, we have to know the meaning
of each decision category and the consequences of the student being there.
It is often convenient to develop a test by working backward from this
point. In the classroom, for example, we assign letter grades of A, B,
C, and so forth. When a student gets a “B” in your class,
what are you telling
- Him—about his knowledge and skills, how bleak or sterling his
future in the field is, his chances of getting an “A” in
another class?
- His employer—about his skills and your confidence in his ability
to perform?
- Other professionals in your area—about him and his skills,
you and your grading procedures, your credibility?
The content of your classes, along with the skills and knowledge required
to successfully complete the test, what it takes for the student to communicate
these ideas to other people, and a good perception of the various interest
groups could very well be the starting point for designing your classroom
test.
Unintended Uses

As discussed above, the purposes of any classroom test are essentially
communication about student characteristics and, as such, are effective
indicators of instructional outcomes. They are very rarely ever designed
with the express purpose of determining the professor’s instructional
effectiveness. If this is the main goal of an evaluation instrument, then
that instrument should be designed with instruction in mind.
The purpose of this module is not to discuss measuring instructional
effectiveness. But it should be mentioned here that instruction involves
knowledge of the material, organization of the material, sensitivity to
the audience, rapport with the audience, structure and clarity of presentation,
the use of illustrative material, and so forth.
Language, Culture, and Social issues

Contemporary educators are well aware of the multicultural nature of
the United States, and there is a good deal of focused attention on test
bias. Much of this is politically generated or derived from other agendas,
but much also is a result of a genuine desire to develop evaluation methods
that are fair to all populations and still maintain the validity of the
technique.
We are talking here of any inherent characteristic of the instrument,
its design and delivery. We are not referring to the interpretation of
the scores derived from the tests.
We must first be sure that we know the population that will take the
test. While it is unnecessary to say this, please do not give an advanced
test to freshmen, a test of science in English to Hispanic immigrants,
or an oral exam while the street outside is being repaired. These are
not fictional examples. Do not give an exam written by one text author
to a class which had another author unless the terms are the same and
the interpretation of the authors are the same.
While it is not difficult to write an obviously biased test, it is very
difficult to write one that is surreptitiously slanted toward or against
a given group. A math test using baseball stats is not ipso facto biased
against women, but if that is all the content there is in the test, the
test is certainly sending some sort of message.
One point has to be made quite clear, and that is merely because one
group does more poorly on a test than another does not imply that the
test is biased against the lower performing group. If, on the other hand,
there is a group of Hispanic students with knowledge equivalent to another,
comparison group, and the Hispanics do more poorly on the exam, then and
only then is it biased. There are mathematical methods, upheld in court,
which can detect such bias.
 |
The beginning mathematics course at your college can be one of three
different courses: basic, intermediate, or advanced. The content
in the advanced course is analytic geometry and calculus while that
of the basic course involves algebra.
To place students in the various courses, someone suggests using
the quantitative scores from the SAT: students with high scores
would be placed in the advanced class, and so forth.
Develop a response to this plan in 100 words or less. Highlight
the most important reason for you views of this plan. |
Types of test interpretation

Tests are rarely designed and interpreted outside of a context. Here,
the metaphor of the thermometer is specially useful. When a person is
first introduced to the concept of temperature, he is probably unaware
of the meaning of the numbers. For one thing, a temperature of 32 degrees
does not imply anything unless I tell you that I am using a thermometer
with a Fahrenheit scale. You now know that water will freeze under standard
conditions. If the scale had been centigrade or Celsius, it would be hot
(90 degrees F).
Now, water does not always freeze at 32§ F. High in the mountains, or
if there is salt in the water, or if there are some other contaminants,
the water will not freeze. Various other points on the scale have other
interpretations that must be learned or discovered as we study a field.
The numbers themselves have no meaning other than that which we give them.
And the same is true of tests.
Norm referencing

One fairly easy and thus popular method of attaching meaning to test
scores is to make the meaning of a score relative to other scores. In
large scale testing situations, the publishers clearly define the population
of people for which the test is intended to be appropriate and then give
the test to a representative sample from that group. The statistics calculated
from these data are called norms (normal in the sense of typical, not
in the sense of healthy or good). The average internal body temperature,
measured orally, is an average and has no implied meaning. (It has recently
been changed from 98.6§ F to a lower point because the original figure
was gotten from hospital records and was considered biased.)
Criterion Referencing

When using Criterion Referencing,
take care in setting the cut points. For example, the body mass
index is still used as an indicator of obesity even though, by
using it, we would categorize Muhammad Ali and Arnold Schwarzenegger as obese.
|
Recently, criterion referenced interpretation of test scores has become
central to much thinking. Here, a score is interpreted by its distance
from a pre-set point. Being above the criterion, or cut-point, implies
a consequence different that being below it. Establishing these cut-points
is a science in itself and cannot be taken lightly. Current “healthy”
blood pressure readings and cholesterol cut points are changing. But ideas
change slowly, especially those that are culturally influenced.
We set cut-points for criterion referencing for many purposes. These
include hiring, financial punishment, and retirement. The utility and
meaningfulness of the decisions, however, are based on the credibility
of both the test and the criterion established as the cut-point. You want
people you hire, that is, those above your cut-point, to be successful
and productive. Addendum: The age of 65 for retirement was decided by
Bismarck because he realized that very few Germans of that time ever reached
it.
Implications for the Classroom Test

- Letter grades are a message interpreted differently by different
interest groups.
- You should distinguish between student outcomes and teaching effectiveness.
- Careful wording of items will determine the clarity of the information,
reduce mis-readings, and lessen item and test bias.
- “Test” bias should be differentiated from evaluation bias,
a slanted interpretation of the test results.
- Test bias is not immediately determined from raw test scores.
- Grading on the curve is norm referencing.
Criteria for letter grades should be set using information outside the
test, such as professional judgment about the student’s future coursework,
his success in a related occupation, or the responses of some idealized
student groups.