jump to content | main menu | tips on using this site | site map
OCT sitemap
assessment unit home
Assessment Overview
overview of assessment home button Overview of Assessment Home
introduction button Introduction
historical background button Historical
Background
purpose of testing button Purpose of Testing
testing elements button Testing Elements

print module; link opens in new window search the O C T site tell a friend about the O C T site; link opens in new window contact the O C T team; link opens in new window  meet the O C T team

 






 

  Printer-friendly page Printer-friendly page

Purpose of Testing (Uses and Misuses)

This Page Includes
 Categorizing People
  Unintended Uses
  Language, Culture, Social Issues
  Activity
  Types of Test Interpretation
  Test Interpretation
- Norm Referencing
  Test Interpretation
- Criterion Referencing
  Implications for Classroom Tests

 


Categorizing people

Tests are usually developed to make decisions about the ability or knowledge of people. In order to make these decisions, we have to know the meaning of each decision category and the consequences of the student being there. It is often convenient to develop a test by working backward from this point. In the classroom, for example, we assign letter grades of A, B, C, and so forth. When a student gets a “B” in your class, what are you telling

  1. Him—about his knowledge and skills, how bleak or sterling his future in the field is, his chances of getting an “A” in another class?
  2. His employer—about his skills and your confidence in his ability to perform?
  3. Other professionals in your area—about him and his skills, you and your grading procedures, your credibility?

The content of your classes, along with the skills and knowledge required to successfully complete the test, what it takes for the student to communicate these ideas to other people, and a good perception of the various interest groups could very well be the starting point for designing your classroom test.


Unintended Uses Return to top of page

As discussed above, the purposes of any classroom test are essentially communication about student characteristics and, as such, are effective indicators of instructional outcomes. They are very rarely ever designed with the express purpose of determining the professor’s instructional effectiveness. If this is the main goal of an evaluation instrument, then that instrument should be designed with instruction in mind.

The purpose of this module is not to discuss measuring instructional effectiveness. But it should be mentioned here that instruction involves knowledge of the material, organization of the material, sensitivity to the audience, rapport with the audience, structure and clarity of presentation, the use of illustrative material, and so forth.


Language, Culture, and Social issues Return to top of page

Contemporary educators are well aware of the multicultural nature of the United States, and there is a good deal of focused attention on test bias. Much of this is politically generated or derived from other agendas, but much also is a result of a genuine desire to develop evaluation methods that are fair to all populations and still maintain the validity of the technique.

We are talking here of any inherent characteristic of the instrument, its design and delivery. We are not referring to the interpretation of the scores derived from the tests.

We must first be sure that we know the population that will take the test. While it is unnecessary to say this, please do not give an advanced test to freshmen, a test of science in English to Hispanic immigrants, or an oral exam while the street outside is being repaired. These are not fictional examples. Do not give an exam written by one text author to a class which had another author unless the terms are the same and the interpretation of the authors are the same.

While it is not difficult to write an obviously biased test, it is very difficult to write one that is surreptitiously slanted toward or against a given group. A math test using baseball stats is not ipso facto biased against women, but if that is all the content there is in the test, the test is certainly sending some sort of message.

One point has to be made quite clear, and that is merely because one group does more poorly on a test than another does not imply that the test is biased against the lower performing group. If, on the other hand, there is a group of Hispanic students with knowledge equivalent to another, comparison group, and the Hispanics do more poorly on the exam, then and only then is it biased. There are mathematical methods, upheld in court, which can detect such bias.

activity


The beginning mathematics course at your college can be one of three different courses: basic, intermediate, or advanced. The content in the advanced course is analytic geometry and calculus while that of the basic course involves algebra.

To place students in the various courses, someone suggests using the quantitative scores from the SAT: students with high scores would be placed in the advanced class, and so forth.

Develop a response to this plan in 100 words or less. Highlight the most important reason for you views of this plan.

 


Types of test interpretation Return to top of page

Tests are rarely designed and interpreted outside of a context. Here, the metaphor of the thermometer is specially useful. When a person is first introduced to the concept of temperature, he is probably unaware of the meaning of the numbers. For one thing, a temperature of 32 degrees does not imply anything unless I tell you that I am using a thermometer with a Fahrenheit scale. You now know that water will freeze under standard conditions. If the scale had been centigrade or Celsius, it would be hot (90 degrees F).

Now, water does not always freeze at 32§ F. High in the mountains, or if there is salt in the water, or if there are some other contaminants, the water will not freeze. Various other points on the scale have other interpretations that must be learned or discovered as we study a field. The numbers themselves have no meaning other than that which we give them. And the same is true of tests.


Norm referencing Return to top of page

One fairly easy and thus popular method of attaching meaning to test scores is to make the meaning of a score relative to other scores. In large scale testing situations, the publishers clearly define the population of people for which the test is intended to be appropriate and then give the test to a representative sample from that group. The statistics calculated from these data are called norms (normal in the sense of typical, not in the sense of healthy or good). The average internal body temperature, measured orally, is an average and has no implied meaning. (It has recently been changed from 98.6§ F to a lower point because the original figure was gotten from hospital records and was considered biased.)


Criterion Referencing Return to top of page

picture of Muhammad Ali boxing

When using Criterion Referencing, take care in setting the cut points. For example, the body mass index is still used as an indicator of obesity even though, by using it, we would categorize Muhammad Ali and Arnold Schwarzenegger as obese.

Recently, criterion referenced interpretation of test scores has become central to much thinking. Here, a score is interpreted by its distance from a pre-set point. Being above the criterion, or cut-point, implies a consequence different that being below it. Establishing these cut-points is a science in itself and cannot be taken lightly. Current “healthy” blood pressure readings and cholesterol cut points are changing. But ideas change slowly, especially those that are culturally influenced.

We set cut-points for criterion referencing for many purposes. These include hiring, financial punishment, and retirement. The utility and meaningfulness of the decisions, however, are based on the credibility of both the test and the criterion established as the cut-point. You want people you hire, that is, those above your cut-point, to be successful and productive. Addendum: The age of 65 for retirement was decided by Bismarck because he realized that very few Germans of that time ever reached it.


Implications for the Classroom Test Return to top of page

  1. Letter grades are a message interpreted differently by different interest groups.
  2. You should distinguish between student outcomes and teaching effectiveness.
  3. Careful wording of items will determine the clarity of the information, reduce mis-readings, and lessen item and test bias.
  4. “Test” bias should be differentiated from evaluation bias, a slanted interpretation of the test results.
  5. Test bias is not immediately determined from raw test scores.
  6. Grading on the curve is norm referencing.

Criteria for letter grades should be set using information outside the test, such as professional judgment about the student’s future coursework, his success in a related occupation, or the responses of some idealized student groups.

 

 

return to top


© CET, SFSU 2003 Introduction | Design | Development | Implementation | Assessment | Site Home
this is the end of the page.