| jump to content | main menu | tips on using this site | site map |
|
|
|
|||||||||||||||||
|
Historical Background
Because of time and space constraints, our discussion of the background of contemporary testing will be short. The present condensation of over one hundred years of psychology should serve to demonstrate that there is a logical and well-thought-out substratum to all of what contemporary test constructors attempt to accomplish.
|
Likert Scale ExampleLikert Scale questions are often used for personal feedback. A common example would be the questions on classroom evaluations required by many colleges and universities. For example: This class has contributed to my knowledge. |
Today, the historical fallout from this is the Hollywood movie “10” which taught all Americans that we can use the 10-point Likert scale (pronounced lick-ert, please) to evaluate many areas of life. This is one form of what are called rating scales. Ever since Rensis Likert developed the rating scale in 1932 with seven intervals, how many intervals to use and whether or not there should be a middle (“undecided”) point, has occupied many measurement journals. Needless to say, this scale has a long and proud history.
To bring home the long and hard development of today’s testing efforts, let’s go back in time and look over the shoulder of some early scientists. Suppose you want to measure heat. You know that some days are “warmer” than others and that some ovens will bake breads well and others won’t come close and others (forges) will char the dough immediately. You know, then, that
How do you go about getting a quick, ready, precise, and useful method of indicating these differences? You certainly could teach each baker to feel with his hand what the appropriate temperature for bread is, but the problems with teaching and individual skills would be overwhelming. And certainly there would be no way to communicate this information in writing. We need a system that will be as free of human error and perception as possible and, to be as precise as possible, we want the results to be numbers. The model of alcohol in a glass tube has to be one of the most ingenious measurement insights in history. Imagine recognizing that alcohol, which doesn't freeze in winter, expands in a regular fashion with increasing temperature!
Gabriel Fahrenheit (1686-1736) changed the thermometer by putting mercury in the glass tube because he had the intuitive sense that temperatures could go below the zero point of the alcohol instrument. To this point scientists believed that there was a lowest temperature and labeled it zero. Fahrenheit's experiments indicated that there were temperatures of water below that point—thirty-two intervals, to be specific. He made that point zero which made the old zero point 32. The other extreme, where water changed to another form, steam, was the highest conceivable temperature. What number should that be? Steam is the opposite of ice, and with the circle already divided into 360 degrees thanks to the Babylonians, it made sense to divide this scale into 180 degrees. So, from the new zero point to freezing of water is 32 degrees and we have another 180 degrees to boiling at 212 degrees. (32 + 180 = 212) Even today, if two concepts are considered to be opposites, we can refer to them as 180 degrees apart. |
Now, why this anecdotal aside? For several reasons:
The use of “tests” in education began at the University of Bologna in 1219, but formal written exams were not introduced until 1803 at Oxford University. Similar to the quantification of psychology and the development of group tests of intelligence and aptitude, the history of educational testing in the United States was influenced by the philosophy of behaviorism and the desire of the scientist to be the objective observer. In 1845, the public schools in Boston replaced oral interrogation with written examinations. Horace Mann recognized that these exams were an improvement over the old format because they treated all students equally, allowed more material to be tested, and reduced the degree of examiner bias.
In the 1930’s, a good deal of research was undertaken on further reducing this examiner, rater, or grader bias. Borrowing from the psychologists, educators recognized the advantages of the multiple-item objective test over the essay: broader breadth of coverage, rapidity of scoring, scoring without any rater input whatsoever, and greater reliability. With the great influx of immigrants and large numbers of inadequately-schooled men entering the army in WW I, the psychologists and educators also recognized that it is extremely difficult to get a sense of a person’s knowledge or aptitudes if educational or verbal skills were not up to standard American English usage. Evidence had been collected that immigrants had done poorly on intelligence tests because the tests were presented in English.
|
A classroom examination is a serious process and one not to be taken lightly. Simply because a test is called a “midterm in Biology 101” does not mean that it is appropriate for any and all classes in biology. While it is always possible to borrow questions from a variety of sources, the most useful and meaningful examination begins with a design based on your intended learning outcomes. These “course objectives’ can also be the most efficient way to plan one’s classroom activities. Figure 3 illustrates the articulation among all the parts of our thinking.
| © CET, SFSU 2003 |
Introduction |
Design |
Development |
Implementation |
Assessment |
Site Home this is the end of the page. |