| jump to content | main menu | tips on using this site | site map |
|
|
|
||||||||||||||||
|
Introduction to Assessment
|
|
On educational tests, much of the time we are not directly concerned with whether a student knows the answer to any one particular question. But what we want to know is if the pattern of answers on a test has information to tell us, or the student, how to make some decision in the future. We thus want any “test” of a student’s knowledge to be grounded in the best thinking and the most sophisticated strategies.
The first attributes to be “measured” were probably distance and weight—the distance from one location to another (so many days trek) and the weight of an amount of grain or metal (possibly for trading purposes). As soon as we recognize that two amounts of grain are different, the scientific and entrepreneurial among us are interested in differences in weight, dryness, sweetness, and volume for example. We can readily figure out a system for comparing two weights, but “dryness” and “sweetness” pose other problems. And without getting into the philosophy of it, these two concepts differ from volume and weight in that, in the human senses at least, they are more complexly interwoven with individual, subjective and possibly aesthetic perceptions.
Thus, within each community, traders would have to agree on what a “heavy”
amount of wheat is and what a lighter one would feel like. That is, they
would have to standardize the measuring process, making it the same so
that two independent people could, and would, be able to agree on the
weight of a given amount.
Their system would be useless outside their
culture, of course, unless they could agree with the outside group on
a mutual standard.
The goal in educational testing in the United States is, and has been for close to one hundred years, the standardized test, that is, one given under fixed conditions of time and situation. Standardized tests are designed to be given to appropriate examinees in different contexts so that the results can be compared across groups, situations, or times.
Classroom tests almost never meet the criteria of standardized tests, but they do have criteria by which to judge them. A typical classroom test is understood only within the culture of the instructor’s classroom and the environment that has grown up during the semester of interaction with the students. Whether one pound of success—to mix metaphors—in a particular section of biology is equivalent to one pound in another is arguable. Is an “A” in Professor Aero’s class the same as an “A” in the class next door?
When we measure the weight of some wheat, we tacitly assume at least two things. Assumption One: there is nothing in the grain to throw off the measuring—no water, stones, or other foreign matter. This, in turn, implies two conditions. C1. Our “scale’ is sensitive only to weight and not, for instance, density or volume or some other attribute of the grain. C2. Our scale is not influenced by outside forces, such as the seller’s thumb on the scale or the weight of the basket. This is the assumption of unidimensionality, the implied condition that we are measuring only one underlying variable.
FIGURE 1: A multidimensional model of weighing wheat

Figure 1 illustrates a model used quite frequently by measurement specialists to develop mathematical simulations of the testing process. When measuring wheat, random errors (such as bad scales) and systematic errors (such as the weight of the container holding the wheat) can affect the measured weight. Because of these errors, the reported weight is not the same as the true weight. In an ideal world, there would be no random error and no systematic error; the true weight and reported weight would match exactly.
In classroom tests, this is a very tough criterion to meet. First of all, much of the content of college level courses is fairly heterogeneous. We move from one topic to another in class and then combine questions from the material into one midterm examination: one student’s total score on the exam, say, 15 correct out of 30, could be made up of 15 different items than another student’s. The first student could have aced the 15 questions on balancing equations and the second student could have aced those covering the make-up of the periodic table.
You should write the test so that the reading level is consistent with that of the text and lectures so that you are not measuring reading skill or extraneous vocabulary. Some professors make tests more “difficult” by making them extraordinarily long or unnecessarily complex or by including material that is peripheral or not implied (in the students’ minds) by the objectives. In this case they are also measuring speed with which the student can read, recall, and react to the stimulus material on the test or measuring the ability of the student to follow intricate instructions or retain obscure material. We hope they are also measuring the students’ knowledge of the subject matter as indicated by the objectives.
FIGURE 2: A multidimensional model of student performance

A study of Figure 2 implies what might be happening when a lab instructor tests a student’s knowledge of the common leopard frog by asking him to name parts of the anatomy of a plastic frog. As with the earlier example of weighing wheat, random errors (such as student anxiety and grader bias) and systematic errors (such as a student's poor spatial ability) can influence the test. Therefore, the test is not a direct reflection of the student's knowledge. When we assign a grade for this quiz, we must concentrate on the student’s knowledge and not the other two confounding variables that might interfere with our assessment.
|
The term “assessment” is fairly new in the evaluation literature, apparently arising from contemporary sensitivity derived from holistic orientations and perspectives. “Assessment” is used in different ways, but all usages have in common the broadening of the term “testing”: we wish to go from paper-and-pencil formats used alone in making decisions to a more comprehensive (not necessarily ‘holistic’, however) gathering strategy with broader purposes. Thus, assessment is gathering of any kind of information to help the instructor make any kind of educational or curricular decision.
More broadly, assessment can be defined to include measurement as part of a strategy that includes planning, evaluation (of student outcomes or teaching effectiveness) and final use of the results. It thus implies strategic planning and use of any type of information, only one of which—the written or oral examination--would be called “measurement.”
Assessment can also include what might be called formative evaluation
of teaching. This might be called “curriculum assessment”
or “teaching assessment.” Here, the instructor collects information
that serves as feedback about the clarity and accuracy of his teaching
process. By all rights this should be done before the instructor is let
loose to teach and evaluate students, perhaps in a pedagogy course. But
it also can be done, either formally or informally, as the teacher presents
material. We all do this in informal conversation: if our companion does
not appear to be understanding what we are saying, we broaden the dialogue
to clarify our terms, explain ideas, or what-have-you. If you consider
a lecture, lab situation, or other pedagogic device as a conversation
with an intelligent friend, you can be readily on your way.
In addition, information can also be collected more formally to monitor the progress of students. During the course of the semester, information that is essential to the development of the course material must be learned. You can test for this developing repertoire through a series of small quizzes.
You should be sensitive to the issue that assessing student achievement and assessing teaching are two separate and distinct processes. Student achievement is what the learner has accumulated over the course of instruction, and, as we all well know, can be influenced by health, social pressures, family troubles, inability to settle down, immaturity, and a host of other factors. Teaching effectiveness describes the presentation skill, structure, accuracy of knowledge, clarity of presentation of the course material. It includes the ability to accommodate different learning styles and the ability to model metacognitive learning strategies. Teaching effectiveness is influenced by time pressures, the verbal fluency and stage presence of the instructor, his or her knowledge in the area, quality of the text, presence of support staff for the instructor, among many other things.
Please do not confuse the issues by believing that if the student hasn’t learned, the teacher hasn’t taught. This is circular reasoning at its finest, not unlike the proverbial tree falling in the forest. Simply because no one hears the tree fall does not imply that it has not fallen (or has fallen!). Simply because a student does not learn in the class does not imply that the teacher has not taught.
| © CET, SFSU 2003 |
Introduction |
Design |
Development |
Implementation |
Assessment |
Site Home this is the end of the page. |