jump to content | main menu | tips on using this site | site map
OCT sitemap
assessment unit home
Assessment Overview
overview of assessment home button Overview of Assessment Home
introduction button Introduction
historical background button Historical
Background
purpose of testing button Purpose of Testing
testing elements button Testing Elements

print module; link opens in new window search the O C T site tell a friend about the O C T site; link opens in new window contact the O C T team; link opens in new window  meet the O C T team

 






  Printer-friendly page Printer-friendly page

Historical Background

This Page Includes
 Psychophysics
  Educational Testing
  Activity

 

Because of time and space constraints, our discussion of the background of contemporary testing will be short. The present condensation of over one hundred years of psychology should serve to demonstrate that there is a logical and well-thought-out substratum to all of what contemporary test constructors attempt to accomplish.


Psychophysics
Return to top of page

The history of psychology began with the development of scientific thinking but never really split from its parentage in philosophy until the late nineteenth century. At that time German scientists, led by Gustav Fechner (1801 – 1887) with his 1860 Elemente der Psychophysik, began to study the relationship of the response of organisms to stimulation. Classic graphs of the period and later show purported functional relationships between color intensity, sweetness, sound volume and so forth with their corresponding psychological perceptions as reported by test subjects.

From this origin we still retain the model that a test question, as a stimulus, yields a student’s response, which we record as a number. Admittedly, the connection between sugar on ones tongue and pleasure (psychophysics) and our educational example might seem stretched, but we are not trying to understand the link between input and response as much as using it as a rationale for our work.

In the early part of the twentieth century, Louis L. Thurstone (d. 1955) developed a series of psychophysical laws that indicated methods of developing scales from stimulus-response data. These methods led, in turn, to a wide variety of more elaborate methods all of which were based in mathematics because the psychologists were assuming that they were measuring quantitative responses to stimuli which also existed in quantitatively increasing degrees.

Their goal was to create more precise numerical models—scales—with precise mathematical characteristics. Two of these characteristics are still our goals—to have the numeral zero indicate the absence of the trait being measured and to have equal intervals, that is, to have the distance between any two adjacent scores on the scale to be the same no matter where on the scale they lie. Putting these two characteristics together gives us what is termed a ratio scale because any two numbers can be compared using numerical ratios and the interpretation of the ratio will be a meaningful amount—twice as much, three-fifths as much, for example. Without the zero characteristic, but assuming that the intervals are equal across the scale, we have an interval scale, where the intervals are interpretable: “5” is three intervals from “8” and “200” is three intervals from “197”.

related item

Likert Scale Example

Likert Scale questions are often used for personal feedback. A common example would be the questions on classroom evaluations required by many colleges and universities. For example:

This class has contributed to my knowledge.

Strongly agree
Agree
Neither agree nor disagree
Disagree
Strongly disgree

Today, the historical fallout from this is the Hollywood movie “10” which taught all Americans that we can use the 10-point Likert scale (pronounced lick-ert, please) to evaluate many areas of life. This is one form of what are called rating scales. Ever since Rensis Likert developed the rating scale in 1932 with seven intervals, how many intervals to use and whether or not there should be a middle (“undecided”) point, has occupied many measurement journals. Needless to say, this scale has a long and proud history.

To bring home the long and hard development of today’s testing efforts, let’s go back in time and look over the shoulder of some early scientists. Suppose you want to measure heat. You know that some days are “warmer” than others and that some ovens will bake breads well and others won’t come close and others (forges) will char the dough immediately. You know, then, that

  1. temperature differences do exist
  2. they exist in graduated amounts, and
  3. there are physical consequences of these differences.

How do you go about getting a quick, ready, precise, and useful method of indicating these differences? You certainly could teach each baker to feel with his hand what the appropriate temperature for bread is, but the problems with teaching and individual skills would be overwhelming. And certainly there would be no way to communicate this information in writing. We need a system that will be as free of human error and perception as possible and, to be as precise as possible, we want the results to be numbers. The model of alcohol in a glass tube has to be one of the most ingenious measurement insights in history. Imagine recognizing that alcohol, which doesn't freeze in winter, expands in a regular fashion with increasing temperature!

a picture of a thermometer

Gabriel Fahrenheit (1686-1736) changed the thermometer by putting mercury in the glass tube because he had the intuitive sense that temperatures could go below the zero point of the alcohol instrument. To this point scientists believed that there was a lowest temperature and labeled it zero. Fahrenheit's experiments indicated that there were temperatures of water below that point—thirty-two intervals, to be specific. He made that point zero which made the old zero point 32. The other extreme, where water changed to another form, steam, was the highest conceivable temperature. What number should that be? Steam is the opposite of ice, and with the circle already divided into 360 degrees thanks to the Babylonians, it made sense to divide this scale into 180 degrees. So, from the new zero point to freezing of water is 32 degrees and we have another 180 degrees to boiling at 212 degrees. (32 + 180 = 212) Even today, if two concepts are considered to be opposites, we can refer to them as 180 degrees apart.


Now, why this anecdotal aside? For several reasons:

  1. Almost all educational tests fall on an interval scale.
  2. They thus have the same interpretation as the Fahrenheit scale.
  3. Zero does not mean the absence of the trait.
  4. Intervals are assumed to be equal.
  5. The numbers themselves have no inherent meaning by and of themselves.
  6. You can add and subtract them, but you can’t take ratios. If Junior gets 48 on a test with 64 items on it, he got 75% of them correct but there is no implication that he knows 75 percent of the material. (48„ F is not 75% of 64„ F)


Educational Testing Return to top of page

The use of “tests” in education began at the University of Bologna in 1219, but formal written exams were not introduced until 1803 at Oxford University. Similar to the quantification of psychology and the development of group tests of intelligence and aptitude, the history of educational testing in the United States was influenced by the philosophy of behaviorism and the desire of the scientist to be the objective observer. In 1845, the public schools in Boston replaced oral interrogation with written examinations. Horace Mann recognized that these exams were an improvement over the old format because they treated all students equally, allowed more material to be tested, and reduced the degree of examiner bias.

In the 1930’s, a good deal of research was undertaken on further reducing this examiner, rater, or grader bias. Borrowing from the psychologists, educators recognized the advantages of the multiple-item objective test over the essay: broader breadth of coverage, rapidity of scoring, scoring without any rater input whatsoever, and greater reliability. With the great influx of immigrants and large numbers of inadequately-schooled men entering the army in WW I, the psychologists and educators also recognized that it is extremely difficult to get a sense of a person’s knowledge or aptitudes if educational or verbal skills were not up to standard American English usage. Evidence had been collected that immigrants had done poorly on intelligence tests because the tests were presented in English.

activity


An instructor has only four students in his class. He intends to give an oral exam of the material to each student. Should he allot a portion of the examination to other types of questions? What kind? Why?

A classroom examination is a serious process and one not to be taken lightly. Simply because a test is called a “midterm in Biology 101” does not mean that it is appropriate for any and all classes in biology. While it is always possible to borrow questions from a variety of sources, the most useful and meaningful examination begins with a design based on your intended learning outcomes. These “course objectives’ can also be the most efficient way to plan one’s classroom activities. Figure 3 illustrates the articulation among all the parts of our thinking.

 

return to top


© CET, SFSU 2003 Introduction | Design | Development | Implementation | Assessment | Site Home
this is the end of the page.