jump to content | main menu | tips on using this site | site map
OCT sitemap
assessment unit home
Assessment Overview
overview of assessment home button Overview of Assessment Home
introduction button Introduction
historical background button Historical
Background
purpose of testing button Purpose of Testing
testing elements button Testing Elements

print module; link opens in new window search the O C T site tell a friend about the O C T site; link opens in new window contact the O C T team; link opens in new window  meet the O C T team

 






 

 

 

  Printer-friendly page Printer-friendly page

Introduction to Assessment

This Page Includes
 Introduction
  Issues of Dimensionality
  Assessment - Definition & Overview

 


Introduction

The history of measurement parallels the history of science. Along with knowledge of the existence of a phenomenon, people have always wanted to understand their attributes and properties and the extent of these qualities. If we can assume that attributes exist in quantities, then, to the extent we have available techniques, we can assign numbers to varying amounts of these attributes. This is the process of measurement. When we apply this theory to educational or academic outcomes for the purposes of making decisions, we refer to testing.

A test is a formal process for generating some empirical indicator of a broader range of conditions or situations. Thus, you can test a wine cork to see if the wine itself might be good or blend well with the entree. You can sample a soup to see if it needs more herbs or salt. Or you can check an outdoor thermometer to see if you need a coat. Notice that the test can be a part of a larger entity (like the soup) or only remotely related, like how high some liquid is in a tube and your wearing a coat. Note that the test is used to generalize to untried situations.

activity


Identify five “tests” you do on a regular basis, throughout the day or week. What is the “untried situation” for each of these? Suggest several “untried situations” for a midterm examination. Suggest several or a final examination.

On educational tests, much of the time we are not directly concerned with whether a student knows the answer to any one particular question. But what we want to know is if the pattern of answers on a test has information to tell us, or the student, how to make some decision in the future. We thus want any “test” of a student’s knowledge to be grounded in the best thinking and the most sophisticated strategies.

The first attributes to be “measured” were probably distance and weight—the distance from one location to another (so many days trek) and the weight of an amount of grain or metal (possibly for trading purposes). As soon as we recognize that two amounts of grain are different, the scientific and entrepreneurial among us are interested in differences in weight, dryness, sweetness, and volume for example. We can readily figure out a system for comparing two weights, but “dryness” and “sweetness” pose other problems. And without getting into the philosophy of it, these two concepts differ from volume and weight in that, in the human senses at least, they are more complexly interwoven with individual, subjective and possibly aesthetic perceptions.

Thus, within each community, traders would have to agree on what a “heavy” amount of wheat is and what a lighter one would feel like. That is, they would have to standardize the measuring process, making it the same so that two independent people could, and would, be able to agree on the weight of a given amount. Their system would be useless outside their culture, of course, unless they could agree with the outside group on a mutual standard.

The goal in educational testing in the United States is, and has been for close to one hundred years, the standardized test, that is, one given under fixed conditions of time and situation. Standardized tests are designed to be given to appropriate examinees in different contexts so that the results can be compared across groups, situations, or times.

Classroom tests almost never meet the criteria of standardized tests, but they do have criteria by which to judge them. A typical classroom test is understood only within the culture of the instructor’s classroom and the environment that has grown up during the semester of interaction with the students. Whether one pound of success—to mix metaphors—in a particular section of biology is equivalent to one pound in another is arguable. Is an “A” in Professor Aero’s class the same as an “A” in the class next door?


Issues of dimensionality Return to top of page

When we measure the weight of some wheat, we tacitly assume at least two things. Assumption One: there is nothing in the grain to throw off the measuring—no water, stones, or other foreign matter. This, in turn, implies two conditions. C1. Our “scale’ is sensitive only to weight and not, for instance, density or volume or some other attribute of the grain. C2. Our scale is not influenced by outside forces, such as the seller’s thumb on the scale or the weight of the basket. This is the assumption of unidimensionality, the implied condition that we are measuring only one underlying variable.

FIGURE 1: A multidimensional model of weighing wheat

Figure 1

Figure 1 illustrates a model used quite frequently by measurement specialists to develop mathematical simulations of the testing process. When measuring wheat, random errors (such as bad scales) and systematic errors (such as the weight of the container holding the wheat) can affect the measured weight. Because of these errors, the reported weight is not the same as the true weight. In an ideal world, there would be no random error and no systematic error; the true weight and reported weight would match exactly.

In classroom tests, this is a very tough criterion to meet. First of all, much of the content of college level courses is fairly heterogeneous. We move from one topic to another in class and then combine questions from the material into one midterm examination: one student’s total score on the exam, say, 15 correct out of 30, could be made up of 15 different items than another student’s. The first student could have aced the 15 questions on balancing equations and the second student could have aced those covering the make-up of the periodic table.

You should write the test so that the reading level is consistent with that of the text and lectures so that you are not measuring reading skill or extraneous vocabulary. Some professors make tests more “difficult” by making them extraordinarily long or unnecessarily complex or by including material that is peripheral or not implied (in the students’ minds) by the objectives. In this case they are also measuring speed with which the student can read, recall, and react to the stimulus material on the test or measuring the ability of the student to follow intricate instructions or retain obscure material. We hope they are also measuring the students’ knowledge of the subject matter as indicated by the objectives.

FIGURE 2: A multidimensional model of student performance

Figure 2

A study of Figure 2 implies what might be happening when a lab instructor tests a student’s knowledge of the common leopard frog by asking him to name parts of the anatomy of a plastic frog. As with the earlier example of weighing wheat, random errors (such as student anxiety and grader bias) and systematic errors (such as a student's poor spatial ability) can influence the test. Therefore, the test is not a direct reflection of the student's knowledge. When we assign a grade for this quiz, we must concentrate on the student’s knowledge and not the other two confounding variables that might interfere with our assessment.

activity


Identify a situation in which you, as an instructor, would give a test at the end of every chapter. Contrast this with a situation in which it is appropriate to put all the test items into a comprehensive exam and give one total score.


Assessment Return to top of page

The term “assessment” is fairly new in the evaluation literature, apparently arising from contemporary sensitivity derived from holistic orientations and perspectives. “Assessment” is used in different ways, but all usages have in common the broadening of the term “testing”: we wish to go from paper-and-pencil formats used alone in making decisions to a more comprehensive (not necessarily ‘holistic’, however) gathering strategy with broader purposes. Thus, assessment is gathering of any kind of information to help the instructor make any kind of educational or curricular decision.

More broadly, assessment can be defined to include measurement as part of a strategy that includes planning, evaluation (of student outcomes or teaching effectiveness) and final use of the results. It thus implies strategic planning and use of any type of information, only one of which—the written or oral examination--would be called “measurement.”

Assessment can also include what might be called formative evaluation of teaching. This might be called “curriculum assessment” or “teaching assessment.” Here, the instructor collects information that serves as feedback about the clarity and accuracy of his teaching process. By all rights this should be done before the instructor is let loose to teach and evaluate students, perhaps in a pedagogy course. But it also can be done, either formally or informally, as the teacher presents material. We all do this in informal conversation: if our companion does not appear to be understanding what we are saying, we broaden the dialogue to clarify our terms, explain ideas, or what-have-you. If you consider a lecture, lab situation, or other pedagogic device as a conversation with an intelligent friend, you can be readily on your way.

In addition, information can also be collected more formally to monitor the progress of students. During the course of the semester, information that is essential to the development of the course material must be learned. You can test for this developing repertoire through a series of small quizzes.

You should be sensitive to the issue that assessing student achievement and assessing teaching are two separate and distinct processes. Student achievement is what the learner has accumulated over the course of instruction, and, as we all well know, can be influenced by health, social pressures, family troubles, inability to settle down, immaturity, and a host of other factors. Teaching effectiveness describes the presentation skill, structure, accuracy of knowledge, clarity of presentation of the course material. It includes the ability to accommodate different learning styles and the ability to model metacognitive learning strategies. Teaching effectiveness is influenced by time pressures, the verbal fluency and stage presence of the instructor, his or her knowledge in the area, quality of the text, presence of support staff for the instructor, among many other things.

Please do not confuse the issues by believing that if the student hasn’t learned, the teacher hasn’t taught. This is circular reasoning at its finest, not unlike the proverbial tree falling in the forest. Simply because no one hears the tree fall does not imply that it has not fallen (or has fallen!). Simply because a student does not learn in the class does not imply that the teacher has not taught.

return to top


© CET, SFSU 2003 Introduction | Design | Development | Implementation | Assessment | Site Home
this is the end of the page.