Page 1

THE JOURNAL OF THE LEARNING SCIENCES, 10(1&2), 27–61 Copyright © 2001, Lawrence Erlbaum Associates, Inc.

Situating Cognition Wolff-Michael Roth University of Victoria

In this article, I describe an epistemological framework and an associated research method. This framework focuses on the structural relations of individual and setting as experienced by the individual. Due to the fact that individuals may be perceptually attuned to phenomena at different ranges and time scales, researchers who want to understand cognition have to follow this changing field of attention. My analysis consists in zooming to identify patterns that simultaneously occur at multiple levels. Zooming, therefore, embodies a reflexive relation to different figure–ground relations in the attentional field of the individuals I study. To exemplify my approach to research, I use data from a physics classroom where students learn to explain computer-animated microworld events in terms of Newtonian theory.

In the past 15 years there has been a flurry of research in many different domains (all concerned with knowledge and intelligent actions) that clusters around the banner concepts of situated or distributed cognition (e.g., Chaiklin & Lave, 1993; Clancey 1997; Resnick, Levine, & Teasley, 1991). Following the developments in cognitive science, recent work in (science and math) education attempts to work out whether and how distributed cognition and situated learning are viable theoretical concepts in the context of schooling (e.g., Greeno & Hall, 1997; Pea, 1993). My own work on situated cognition locates itself within this recent tradition of research in educational settings and includes the coordination of methodological procedures associated with ethnography, discourse analysis, microgenetic analysis, gesture research, action research, and design experiments. In short, the purpose is to build a multifaceted and situated description of cognition. As such, and in keeping with the general theme of this special

Correspondence and requests for reprints should be sent to Wolff-Michael Roth, Lansdowne Professor, Applied Cognitive Science, MacLaurin Building A548, University of Victoria, Victoria, BC, Canada V8W 3N4. E-mail:



edition, the intent of this article is to articulate the relation between theory and method that grounds much of my work. Central to my research has been the (ethnomethodological) presupposition that there exists a reflexive relation between an analyst’s understanding of social places (e.g., classrooms) and the inferences that are drawn from the data (Garfinkel, 1967). Thus, [t]he sociologist who seeks to reconstruct the order of affairs within a psychiatric clinic from its records must … anticipate the very thing that the study is intended to determine, namely the ways in which “such places” (or even “this place”) work as grounds for deciding what the evidential materials could possibly mean. (Sharrock & Button, 1991, p. 150)

This reflexive relation requires researchers to know the places of interest well enough so that they can reliably interpret any documentary evidence. However, this reflexive relation also harbors the danger that once researchers know a place well, they may simply reify preconceived ideas and concepts (Bourdieu & Wacquant, 1992). One important empirical question is exactly what this place is and what it looks like from the perspective of the individual social actor; here, Gestalt theorists have made important contributions. Gestalt theorists noted that an individual does not perceive some situation at once and in all its detail, but that some entities become salient as figure that stands against a more or less diffuse ground (e.g., Merleau-Ponty, 1945; Wertheimer, 1985). Individual actors perceive and act toward this figure, which in turn shapes their activities and their learning. Therefore, understanding the nature of figure–ground relations from the perspective of the individual has to be the analyst’s central concern. What is currently figure depends on an individual’s past experiences, as well as his or her current goals and intentions. What entities are salient in the foreground is contingent both in nature and extent; consequently, the figure constitutes a continuously changing field of attention. However, if individual social actors are attuned to different attentional fields (i.e., when there are figure–ground shifts), the analyst will (have to) be attuned to these different fields in a similar way. The analyst, therefore, has to find, from the documentary records, what is salient to the individuals he or she studies. I use the metaphor of “zooming” to characterize the shifting figure–ground relations by actors and analysts. The purpose of this article is to articulate how, during analysis, I focus on two related questions: What is the nature and content of the figure–ground relation of the social actor (student, teacher), and what is the zoom level that the analyst has to choose to understand the ongoing event? I draw on one classroom research project, in which I was also the teacher, to exemplify my approach to answering these questions.



EPISTEMOLOGICAL AND ANALYTICAL BACKGROUND Individual–World Relations For each organism, the world has objectively experienced physical (and social) structures to which it adapts (von Uexküll, 1928/1973). Gestalt theorists introduced the notion of figure and ground to describe these experienced structures (Lyotard, 1991). Gibson (1979) articulated similar issues in terms of affordances that are perceived by the individual—affordances are descriptions of the environment directly relevant to the individual’s actions in the present context. Although perceived and experienced as objective, the structures of the world are not the same across individuals (e.g., Merleau-Ponty, 1945). This necessitates the cognitive argument to be reversed. That is, the central phenomenon is not how individuals come to act in a stable world, but how different individuals come to a consensus that they live in the same world despite individual differences in perception that only sometimes become obvious (e.g., Garfinkel, 1967). Due to the subjective element of perception (more specifically, the subjective and situational nature of the salient figure that the individual attends to), it has been proposed that we model cognition using relational representations (e.g., Chapman, 1991). Typically, such relational representations take the form of “the cup I am holding,” “the place where I am standing,” “the arrow I am pointing to,” and so on (Agre, 1995). What is currently salient to an individual actor can therefore be inferred from its functional relation (action, talk) to the world. These inferences are conservative in the sense that we do not make inferences about the properties of the cup, place, or arrow unless these properties can themselves be documented as a functional relation. The nature of figure and ground is therefore an empirical matter rather than a matter that can be decided on a priori grounds. As an example, consider the analysis of classroom conversations. What is being said can not be determined by an analysis of talk alone. Rather, it requires an analysis of what was being heard by the listeners—which conversation analysts infer from their next contributions to the conversation (e.g., Sacks, Schegloff, & Jefferson, 1974). Central to any analysis, therefore, is the identification of relations between agent and environment—in Gibson’s terms, it is an identification of the affordances, which are always relational rather than properties of the environment (Clancey, 1997). From these functional relations, the analyst infers the nature of the figure that the individual agent attends to (i.e., the analyst reconstructs the figure–ground relations or affordances that describe the individual–environment relation). The theoretical perspective outlined in this article has epistemological implications. Acknowledging that figure–ground or affordances are relational properties rather than properties that can be assigned to individual or environment also means that researchers actively make cognition a contingent and situated phenomenon. Taking the present theoretical perspective (and the methodology deriving from it)



actively situates cognition (thus, the title of this article). Depending on the scale of the phenomena salient in the individual’s actions, we note different ways in which cognition is situated. Thus, cognition may be distributed across the material setting, lie in group interactions, be embodied in practices, and so on. Figure A recent study illustrates the tremendous variations in responses students provide to “structurally identical” lever problems when aspects of the setting are changed (Roth, 1998a). This study showed that students’ responses changed significantly with changes in format or social situation. Thus, significantly different structures are observed in cognitive activity when (a) the lever beam is marked or unmarked, (b) problems are presented in a practical or in verbal form, (c) students answer in interview settings or paper-and-pencil format, and (d) students engage in conversation with an interviewer or with a peer. The study shows that what is salient and therefore an object of cognitive activity changes across the configuration of the assessment. Due to this and similar studies in different contexts (e.g., Roth, McRobbie, Lucas, & Boutonné, 1997), I infer the figure–ground relation salient in students’ actions. To understand unfolding activity, cognition, and actions, analysts need to know what is currently salient as figure in the perception of the individual. Analysts risk misunderstanding problem-solving activity when they assume that what is figure to them is the same for the individual. Thus, Lave (1988) showed that shoppers’ responses to “Which cereal is the better buy?” differ from their consideration of the question statement “the 375-g box of cereal A costs $3.20, whereas the 800-g box of cereal B costs $6.70.” Despite the fact that the problems are held to be the same across the two contexts in traditional cognitive analysis, different elements are salient in the shoppers’ consideration. What is salient to them in the supermarket situation differs from the entities used in the cognitive analysis. The importance of getting a handle on what is figure for the individual was recently pointed out in two independent studies of children’s reasoning about the balance beam (Metz, 1993; Roth, 1998b). For many years, researchers assumed that children were acting on and toward weight and distance to determine fulcrum. However, Metz (1993) and Roth (1998b) pointed out that children did not reason in terms of the concepts weight and distance. Weight and distance are emergent phenomena: Children develop them through their interaction with the materials and in the particular settings; that is, whereas it has been assumed that children would give incorrect responses to weight and distance problems on the balance beam (e.g., Inhelder & Piaget, 1958; Siegler, 1978) and that, in fact, children were responding to an entirely different question. Metz and Roth showed that young children are not attuned to weight and distance in the way adults are. Rather, children develop an adult understanding of weight and distance through their interactions out of much more primitive precursors. The very structure of the focal phenomena is different from what has been assumed.



In both the Metz (1993) and the Roth (1998a) study, microgenetic analyses allowed the researchers to reconstruct what is figure for the children. Central to microgenetic analysis (Siegler & Crowley, 1991) and to the work reported in this article are intense and repeated data collection at moments leading up to and through occasions in which there was cognitive change. As Metz noted, “microgenetic analysis, particularly using videotaped data, constitutes a powerful tool to get at the meaning the subjects are attributing to the task domain and their actions” (p. 88). What is salient to an individual and what therefore shapes his or her orientation toward an activity may differ in its extent. To understand the ongoing activity analysts have to change their figure–ground relation to bring relevant structures into focus—for this process, I use the film-related metaphor of zooming. Zooming, therefore, is a reflexive phenomenon because the analyst follows his or her actors in shifting between fields of attention. An important question for the analyst is how he or she decides on where to focus and how far to zoom. A good analogy for these processes is what happens when we look at a drawing to find something without knowing what it looks like or how large it is (e.g., finding the differences between two drawings, finding a figure in what looks at first like random splotches). In this case, we shift focus horizontally to look at different areas and shift focus in the line of sight (i.e., we zoom) to look at smaller and larger aspects of the drawing. At some point, our regard locks in on something, the figure or difference to be detected that stands against everything else (ground).1 Pertaining to research on cognition, we can shift focus watching video by playing it at the normal rate, speed it up, look at it in a (speaking) turn by turn fashion, or study it one frame at a time. In each situation, we will begin to notice patterns, but the nature of these patterns will differ depending on the chosen time scale. To orient my zooming, I draw on a heuristic developed by Hutchins (1995). Orienting the Zooming Process In classroom research, I assume that each moment of time is associated with three trajectories that require different “lens” openings: ongoing activity, change in individual practices, and development of collective practices (see Figure 1). Consequently, each moment of activity is analyzed in terms of (a) the unfolding activity, its history, contingencies, constraints, and so forth; (b) the trajectory of the individual 1Conceptually, this (phase) locking is similar to what happens when we look at a random dot stereogram. Viewed separately, each diagram consists of random dots, but if one looks at both diagrams at the same time, each eye focused on a different diagram, a figure (consisting of the invariant dots) rises against a ground. The perceptual processes bring to the foreground what is the same (i.e., invariant) between the two different inputs. In the process, “the brain rummages through various [planes of fixation] hunting for a match for retinal images that will permit fusion and thus resolve into a coherent scene” (Churchland & Sejnowski, 1992, p. 193).



FIGURE 1 Dimensions of development (bold arrows) and constraints (broken arrows) on the various dimensions.

agent with reference to other moments featuring this agent; and (c) the practices of the community that envelops the agent. Changes associated with the three trajectories occur at different time scales. Thus, ongoing activity is always fleeting, changes in individuals’ practices are usually tied to recurrent activity in the same setting (but may arise from a particular activity), and changes in the practices of the community parallel the slow developments of other cultural practices (see Figure 1). Transactional processes characterize the developments along the three dimensions. For example, development of the classroom discourse arises from the development of individuals, which arises in turn from the development of activities. Then, the developments of activities and individuals are constrained by developments of the more inclusive dimensions (individual and community, community). Finally, the scientific community (represented by the teacher and textbooks) also constrains the forms of discourse that develop at each of the three levels (e.g., the teacher–student interactions described later). Central to this framework is an interest and commitment to understanding cognition and learning in terms of the integration of structure at five zooming levels (see Table 1). Each of these analyses is targeted toward a different aspect of the situation, none by itself providing a picture of cognition that is complete. Selection of one of these aspects, however principled, may automatically exclude other, equally principled selections. In my understanding of situated cognition, we need to account for all of these aspects (and more aspects are possible) to get a sense of what cognition involves and what makes it possible. For example, the social construction described in Level 1 comes about because of the type of constraints de-



TABLE 1 Five Levels of Structure Attended to in the Methodological Framework Level

Description of Importance of Level

1 Extended time scale

By focusing on the development over longer time scales, I show how students develop a common set of signifiers within and across groups. That is, this analysis shows aspects of the individual and collective trajectories (Figure 1). By focusing on physical arrangement, social configurations, and the nature of focal artifacts, I show how these interact to give rise to different participation and discourse patterns, and therefore to what we understand as macrostructure in cognitive activity. By focusing on the unfolding activity (horizontal axis in Figure 1), I show how students co-construct a description in real time and subject to the history and contingencies of the activity. By focusing on the perceptual ontologies of students and teacher, I show (a) how the “same” screen events are perceived differently by students and teacher and (b) how the teacher’s (my) interactions with students constrained their perceptions of the on-screen events. (See Figure 1 and the constraints on the development of practices in the three dimensions.) By focusing on different parts of the physical setting (different layers in Figure 5), I show how gestures interact with the visual display, and how they may forebode understandings that verbal discourse reveals only much later. (Here, knowing is understood as distributed across body and setting.)

2 Macro-structure

3 Unfolding activity

4 Perceptual ontology

5 Physical setting

scribed in Level 4, and presuppose a convergence in the participants’ perceptions (which, by default, they take as shared; Level 2). In the unfolding events that lead to students’ sense that their understandings are shared (i.e., socially constructed; Level 1), the gestures make salient particular aspects when read against the background (Level 3). Gestures and ground allow the researcher to infer the salient entities that underlie students’ actions (Level 1). For a complete analysis, I zoom through an entire spectrum of (temporal and spatial) frames, though space limitations in research journals usually require a separate presentation of each analysis (similar to the perceptual processes described in Footnote 1). I consider any one data selection and reduction as limiting our understanding of cognitive processes. Although some may consider this a brute force approach, and although it is time consuming, I personally find it more consistent with the situated perspective to arrive at models that account for cognition at multiple levels. Analytical Processes and Presuppositions Understanding cognition in terms of individual figure–ground relations in the context of the three trajectories requires at once deep familiarity with the setting and an



empirical approach to the nature of what is currently salient to the different actors involved (students, teacher, and analyst). I therefore enact (a) interpretive inquiry involving long periods of stay in the worlds of interest and many interactions with the people who inhabit these worlds—often participating as a teacher—and (b) hermeneutic phenomenological analyses (e.g., Ricœur, 1991) involving long and intensive periods of critically analyzing video and text. As a teacher–researcher, I am afforded the unique opportunity of being on the “inside,” working with students as they build their scientific explanations and I construct interpretations of what is salient (figure) to them (these are different rather than privileged interpretations). This positioning, my role as teacher, allows me to test my interpretations through continual rearrangements of the learning context in ways that allow me to refine my interpretations. I also gain understandings of the school and classroom as a culture, local practices, ways of interacting, and so forth. Out of this long-term engagement develops my understanding of the changes in individual and collective practices (see Figure 1). In addition, I spend extended periods of time with the videotapes (often with colleagues who bring different perspectives) radically questioning my own ways of viewing the events. Here, the attempt is to reconstruct the elements currently salient to each agent during ongoing activity, and thereby to locate structure along the third trajectory. All interactions are videotaped and are transcribed in an ongoing manner—often by myself—so that the text is available in written form during my ongoing analysis. Ultimately, the goal is to integrate my unfolding understanding of cognition across the three trajectories. Texts, photographs, and copies of written artifacts are inventoried and scanned to be quickly available through one and the same computer interface. I also play the videotapes through the computer interface and use a stereo system to achieve maximum resolution of the audio channels. Before writing up a study, I spend weeks (even months) watching videotapes and reading texts to the point that the entire database becomes a familiar (multidimensional) environment with multiple sense-making resources (cf. Greeno, 1991) that allow me to situate cognition. During this phase, I write notes, again using the computer so that the notes themselves become part of the data set. My epistemological commitment is to functional individual–environment relations (i.e., to the identification of affordances). My analyses are therefore based on the assumption that reasoning is observable in the form of socially structured and embodied activity (e.g., Suchman & Trigg, 1993). Videotapes, transcripts, and artifacts produced by the participants are natural protocols of their efforts in making sense of, and imposing structure on, their activities. With Garfinkel (1967), I assume that organizational phenomena “are contingent achievements of organizations of common practices, and as contingent achievements they are variously available to members as norms, tasks, troubles” (p. 33). These achievements are in the same way available to the analyst; that is, these protocols constitute the documents that I structure and elaborate by drawing on resources available “from



within the competence systems� (Lynch, 1985, p. 6) that I attempt to describe. When I work with colleagues, we organize our analytic work around the precepts of interaction analysis (Jordan & Henderson, 1995). This involves enacting detailed collaborative analyses of videotapes, which are particularly fruitful in generating multiple perspectives of and hypotheses about a small number of events. Consistent with the constant-comparative method (Corbin & Strauss, 1990), patterns identified at each level (trajectory) are tested in the entire data corpus. In the end, patterns are accepted as characteristic or are modified or discarded when they appear to be singular (or retained as negative cases of some other phenomenon). For example, in the analysis of the videotapes from one group, I realized that students used many different terms for each of the arrows in Figure 2 (see the analysis below) leading to what I termed muddled talk. My analysis of the entire corpus showed that muddled talk was a characteristic for all groups; hence, it was selected for discussion in this article. In this manner, I have attempted to select patterns of discussion that are representative of student interactions more broadly and not simply indicative of my own idiosyncratic interests.

FIGURE 2 Interface of Interactive Physics™, a Newtonian microworld superposing phenomenal objects (ball and conceptual framework; i.e., velocity and force vectors).



The Teacher as Researcher The examples presented in this article are derived from a research project that was conducted during an 11-week unit on mechanics and kinematics topics. I conceived the study as a “design experiment” (Brown, 1992), intertwining research and instructional practice. In design experiments, the results of ongoing research are fed immediately back into educational practice. The changes that result are themselves documented and analyzed by research, which feeds its results back for another cycle. This interaction of teaching and research is also a salient feature of other forms of research that encourages researchers to “be purposeful in promoting learning for the purpose of studying conceptual change” (Magnusson, Templin, & Boyle, 1997, p. 100). Being a teacher–researcher allows much greater flexibility in making curricular changes that preliminary data analysis suggests. For example, a preliminary analysis may suggest a change in the planned activities to follow up on some misunderstandings, incorporating a formal test of student understanding or adding a particular activity. Furthermore, teacher–researchers often generate additional data often not available from regular classroom teachers. For example, we keep reflective journals in which instructional decisions and the relation between research and instruction are articulated (e.g., Hammer, 1996). This additional information allows further a posteriori inferences about the role of teacher knowledge and planning processes in teaching. From a researcher perspective, being immersed in the school culture comes with all the benefits of ethnographic research (e.g., Marcus & Fischer, 1986), especially access to local knowledge (i.e., access to meaning and meaning relations that are not accessible from microanalytic levels alone). As the anthropologist Geertz (1983) noted, understanding our subjects also requires closeness to their experience and the language that renders it and which cannot be revealed by exclusive reliance on external “etic” or “experience-distant” accounts. Knowing the school culture from the inside, therefore, gives access to understandings that constitute one half of the earlier discussed hermeneutic phenomenological process of data interpretation. Finally, knowing the setting from the inside allows researchers to appropriate the participants’ competence systems—a necessary prerequisite for ethnomethodological “descriptions of orderly and socially organized inquiries [that] do not present an opposition between the practices described and the practices which make such a description possible” (Lynch, 1985, p. 6). Doing research as a classroom teacher also comes with constraints. Due to the complexity of conducting a productive and cognitively rich lesson, it is virtually impossible to assume both researcher and teacher roles at the same time. Teacher and analyst roles thereby become temporally separated. Lessons become an object of inquiry: The temporal (and spatial) distance leads to an objectification of the events. During the lesson, my responsibility is always to the students. I arrange any



data collection such that it can be planned and set up prior to the lesson. In the situations analyzed later in this article, I was a full-time teacher and had no research assistant: The camera was therefore set up and put into recording mode before the class began. Transcription, data analyses, and other research-related activities (e.g., constructing tests) were completed at night. Conducting research in the teacher-as-researcher mode harbors dangers during the data analysis phase, particularly when the analyses are conducted without recourse to others who do not have stakes in the project. For example, the teacher’s retrospective accounts of what he or she has done in some situation recorded on tape may not accurately render the plans, goals, and beliefs that motivate his or her utterances (e.g., van Zee & Minstrell, 1997). At best, the teacher’s reflections are plausible (rather than to be privileged) accounts of the events experienced earlier (e.g., Hammer, 1996). Being an insider also comes with the danger that one simply reifies one’s preconceptions about how the place works. To minimize entrapment in my own presuppositions and beliefs, I involve colleagues (teachers and professors) as “disinterested peers” (Guba & Lincoln, 1989) in all of the research where I am also the teacher. Disinterested peers interact with the researcher, and therefore understand the research project, but have no direct stakes in it. This allows me to see my project through the eyes of others and thereby to move at a distance from the situation of which I am an integral part (momentarily at least). CONTEXT AND DATA To show how these precepts are embodied in my research, I provide details from a study of cognition where high-school students learned physics using modeling software (Interactive Physics™). The following sections provide details about the context that situate the subsequent analyses. Student Participants Forty-six Grade 11 students (41 boys, 5 girls) from three sections of a qualitative Grade 12 physics course participated in this study (20, 15, and 11 students, respectively). The students attended a private school in Canada (Grades 4–13), which was in its 1st year of transition from an all-boy to a coeducational institution. For about half of the students, this course was a precursor to the Grade 13 advanced physics course. Most students were not science majors and later pursued careers in business, medicine, law, and politics. I taught all three sections of this physics course. Physics Course The course was premised on the assumption that learning means to achieve a certain level of competence in talking physics (e.g., Lemke, 1990; Roschelle, 1992).



Thus, I planned many activities to engage students in physics conversations. These activities included (a) open investigations of motion phenomena chosen by students according to their own interests, (b) explorations of phenomena in a computer-based microworld (Interactive Physics), and (c) collaborative concept mapping with the main concept labels of a unit. Students were asked to read relevant chapters in one of the available textbooks (e.g., Hewitt, 1989) on their own, and to complete six problems per week. The open investigations of natural phenomena constituted the core of the curriculum, the microworld activities occurred once every other week interspersed, and the collaborative concept mapping took place once a month. Microworld activities and concept mapping were contexts that afforded students to focus more on the conceptual aspects of the physics of motion than on the mechanical aspects of implementing their practical research.

Computer-Based Microworld Activities Interactive Physics is a computer-based Newtonian microworld in which users conduct experiments related to motion (with or without friction, pendulum, spring oscillators, or collisions). The microworld allows different representations of observable entities (measurable quantities). For example, force, velocity, or acceleration can be represented by means of instruments such as strip chart recorders and digital and analog meters. I planned the computer activities because Interactive Physics (like the “Envisioning Machine”; Roschelle, 1992) superposes conceptual representations of these quantities, vectors, and the objects creating hybrid objects bridging phenomenal and conceptual worlds (Roth, Woszczyna, & Smith, 1996). Students, therefore, have concurrent access to phenomenal and conceptual representations, which they do not have with real world experiments. By alternating between real-world and computer activities, I (as teacher) hoped to assist students in developing a stable discourse about motion phenomena across situations. All student activities in this study included, at a minimum, one circular object (see Figure 2). A force (full arrow) could be attached to this object by highlighting and moving it with the mouse. The object’s velocity was always displayed as a vector and students could modify its initial value by highlighting the object, “grabbing” the tip of the vector, and manipulating its magnitude and direction. Students were instructed to find out more about the microworld, especially the meaning of the arrows (i.e., the vectors representing force and velocity). Although students concurrently conducted real-world experiments on motion in which they analyzed distance–time, velocity–time, and acceleration–time graphs, they were not told the scientific names of the arrows. In part, I did this because as a teacher who believed in inquiry-based learning, I wanted to know whether students would eventually use them without being instructed so. As a basic guideline, I followed the design developed in a similar study by Roschelle (1992).



Some of the prepared activities displayed nothing more than the circular object (including its velocity) and a force. Others required students to manipulate the arrows (force and velocity) to hit a small rectangle and knock it off its pedestal. After setting force and initial velocity, students could run the experiment. A tracking feature “froze” the motion as if recorded with flash photography. During the microworld experiment, the cursor takes the form of a stop sign, and a simple mouse click stops the motion. The replay feature allows the inspection of individual states in the motion of the sphere. Data Collection At the time of the data collection, I was a high school teacher interested in understanding how his students learned in the different activities he designed. I prepared and started the camera equipment and microphone immediately prior to students’ arrival in the class and stopped recording once students had left. Due to the nearly continuous presence of the camera in my classrooms over the 3 years teaching in the school, it seemingly became transparent to our activities. On the computer, three groups of students—representative of the entire physics course in terms of achievement and gender—were each videotaped during four 60-min classroom periods separated by 2-week intervals for a total of 12 hr of analyzable material. The physical configuration of students and recording devices are represented in Figure 3. The descriptions of learning developed in this study are based on the entire data corpus constituted by the tapes and transcripts.

FIGURE 3 Physical arrangement and recording set up for the Interactive Physics™ activities as they would have appeared from above.



For the purpose of illustrating identified themes, I selected episodes from one of these groups, Glen, Elizabeth, and Ryan. The three students were in many ways representative of the students I had taught in various public and private schools throughout Canada. They were not “typical science students,� did not achieve in the top quartile, and did not enroll in science or a science-related field at the university level. As a group, the three had a preference for agreement and conflict was not part of their interactions. The three worked together rather well and although they did not know each other initially, they stayed together as a group for the whole school year. The data for the computer activities exist in a large context of other data collected during the same school year with the same three classes (which allow inferences about the changes in collective practices). These data include video records obtained during students’ experimental work, during semantic networking activities (concept mapping), and during individual interviews about knowing and learning physics. Furthermore, hard copies of the results of laboratory work and student reflections on knowing and learning in diverse physics activities also entered the database. For the group of three students presented, the additional database contextualizing the Interactive Physics study includes the following. There are 15 reports of independently conducted laboratory investigations, 10 hour-long sessions of semantic networking (concept mapping), 1 exam and 3 tests per trimester, 13 essays on knowing and learning, and a series of semistructured interviews focusing on physics knowledge and epistemology.

Inscriptions and Discourse Understanding conversation in the presence of inscriptions requires a particular attention to the relation of talk, gesture, and the representation (i.e., to the features of the development along ongoing activity; e.g., Goodwin, 1986; Hall, 1996). In the course of some conversation and by using words and gestures, speakers make salient certain objects and events within a more complex context. These objects and events come to the foreground and become figure, whereas the remainder of the inscription recedes into the background and become more diffuse. An important component in the analysis of discourse situations is the relation among talk, inscription (external representation), background, and gesture. In this case, to analyze what is happening as students interacted with each other and Interactive Physics, I use another heuristic to conceptualize where I may zoom in to locate structure in student activities (see Figure 4). Figure 4 shows how I understand each conversation consisting of different layers and being embedded in and distributed across situational particulars. Depending on the specific research question, I zoom to a level where I expect to find patterns. For example, when I am interested in the temporal relation



FIGURE 4 Analytical framework for conversation in the front of a representational medium (e.g., chalk board, computer). For the analyst, there are eight levels, only some of which are focused on during each zooming process.

between gestures and the associate verbal expression and how this relation changes over time (e.g., see the section “Gesture and Scientific Talk”), I zoom in on and bring into focus utterances, gesture, and display. When I am interested in understanding how talk and gesture relate to the perceptual structuring of the field, I include graphic display, gesture, and talk but do not need to bring into focus medium, peers, or classroom community (e.g., see the section “Gesture and Scientific Talk”). Finally, when I may be interested in understanding the relation of the language used within student groups and the language at the classroom level (i.e., during whole-class conversations), focus on private talk with peers and on public talk in the classroom community (e.g., see the section “Development of Individual and Collective Practices”). Due to space limitations, individual research articles often concern only one level of zooming. As part of an overall research project and as a matter of arriving at a more fine-grained picture of cognition in classrooms, I analyze phenomena at different levels and develop interpretations in part through the coordination of these multiple levels. All episodes selected have been tested in the entire data corpus and are representative descriptions of other video segments collected around the same moment in time in terms of five dimensions. These include (a) the nature of students’ discourse, (b) the integration of gestures and talk, (c) the manipulation of objects on the interface, (d) the nature of entities students perceive, and (e) the nature of student–student and teacher–student interactions. Thus, for example, the episode in Figure 5 could be exchanged with other figures that I have constructed without a change of the argument. To make claims about the interaction of gesture and scientific talk, video offprints could have easily enhanced episodes without video. The particular episodes featured are therefore a matter of pragmatic choice among many possible alternative episodes.



The development of individual and collective practices over longer time scales (see Figure 1) can be described as a series of changes in students’ descriptive language for the development of new ways of perceiving microworld objects and events (e.g., Roth, 1996, 1999b). New forms of descriptive language emerge from older forms (vernacular) that students find to be inappropriate in the course of their unfolding activities. In this database, students move from their own everyday language to language that resembles those of scientists talking about the same events. For example, Table 2 shows the temporal sequence of how two groups of students evolved signifiers for the vectors standing for force and velocity. (The two groups are from different classes so that negotiation of different signifiers at the class level has not occurred. Furthermore, even in the same class, different signifiers were used within the small group prior to whole-class conversations.) These signifiers emerge from interactions within small groups that themselves are constrained by interactions with other groups until finally, each class has negotiated a common set of signifiers. Consequently, this development of shared ways of talking emanated from distributed rather than individual achievements (see also the section “Social Construction”). This means that, in terms of the three trajectories (see Figure 1), the unfolding activity allowed the development of taken as shared views and a common set of signifiers used by the individual participants within a group. At the class level, these signifiers converged with their use of a Newtonian language. Going down in each column of Table 2 reveals that students use a substantial number of different signifiers to refer to the same object. Table 2 also provides evidence that students use a signifier that ultimately turns out to be the one scientists use. Yet, as their activity unfolds, students may abandon such a signifier. However, and certainly not uninfluenced by the teacher, computer, and other resources, the two groups as all others eventually stabilize a specific signifier for each object. These observations concerning the development of discursive practices within and across groups support an evolutionary metaphor for learning. Phenomena are perceived in new ways and new descriptions of phenomena emerge from and adapt to the contingencies of particular learning settings. Within groups, students’ new perceptions and descriptions became increasingly viable in a triple sense. First, as shown in the section “Social Construction,” students’ collaborative work affords convergence between phenomena and descriptive talk to provide an increasing fit between the two. Second, students converge in their respective ways of talking to establish a shared discourse. Finally, this student talk converges with the standard language about these microworld phenomena. At the small-group and whole-class levels, this process of convergence is part of the stabilization work needed for new descriptive language to survive the talk’s immediate context. As subsequent sections show, microworld and teacher talk (see the section “Perceiving Forces”) provide constraints that curtail the interpretive flexibility and constrain the development of students’ talk in specific ways.


ROTH TABLE 2 Development of Signifiers Within and Across Groups Glen, Ryan, Elizabeth

[Force] Little or big arrow Time set Time Direction Time and direction Velocity Redirection Gravity Force Gravity Gravity

Ben, Fred, Joe, Mike



Little or big arrow Initial speed Velocity Initial speed Velocity Force Effort Strength Speed Strength Speed Direction Speed and direction Velocity

Big arrow Velocity Force Transfers kinetic energy Moves [ball] forward Force, kinetic energy Kinetic Energy Force and energy Direction Some kind of force Pressure Kind of force Force

[Velocity] Skinny arrow Velocity Kinetic energy Motion going on a path Direction and time Velocity Speed Velocity

In classrooms, developments such as the change in signifiers that students use for describing and explaining focal events do not occur independently of the teacher, textbooks, and other resources available in the classroom. Thus, the teacher who conducts research is also responsible for changes in activities and for the resources available to students. However, this is an advantage from the design experiment perspective, for the intent of research is to improve on and maximize instructional processes. My analysis of the signifiers (type, frequency, and temporal sequence) allows me to better understand the trajectories that take students from one point in the curriculum to another. Setting Effects Having followed the literature on interactions and collaborations in the workplace (e.g., Sharrock & Anderson, 1993), I am aware of particular effects that settings can have on ongoing activity. Such effects are still not sufficiently addressed in educational research. However, my own studies reveal mediating effects and interactions of representational artifacts, social configuration, and physical arrangements on student participation during science conversations and on the form and content of these conversations (e.g., Roth et al., 1996; Roth, McGinn, Woszczyna, & BoutonnĂŠ, 1999). As such, I am interested in searching for structure in cognitive activity by zooming to bring into focus representational artifacts, social configurations, physical arrangements, and student interactions; that is, structure in the activ-



ities (and therefore cognition) arises from structures at a more global consideration of setting salient to students and teacher—and reflexively, to me in my role of a posteriori analyst. The questions I attempt to answer include (a) What is the role of computers in the coordination of the groups? (b) How does group size afford and constrain the development of the ongoing activity? or (c) How does the physical arrangement mediate participation? The interface provides students with a context that facilitates their mutual orientation to each other and the joint problem (Roschelle, 1992). Through such mutual orientation to objects and talk, students coordinate their utterances and gestures with the microworld objects and events, allowing them to make sense and evolve common observation sentences of, and explanations for, the microworld phenomena. The data suggest that the copresence of physical and conceptual aspects on the interface also interferes with student interactions in two important ways. First, Interactive Physics frequently constitutes a tool that is “unready to hand” (Brown & Duguid, 1992). In contrast to a transparent tool that can be used without cognitive effort (which an individual does not attend to at all), a tool that is unready to hand draws the user’s attention to itself, and thus away from the real problem to be solved. The notion of readiness to hand draws attention to the fact that what is salient is different in the two ways of tool use (i.e., the tool does not afford the same kinds of activities that it affords to the teacher and other “experts”). Although considered user friendly, students find the interface to be complex requiring considerable time to learn. Second, when there were more than two students, the physical arrangement of people and computer organizes interactions in such a way that it curtails the mutual orientation of the students. (As in most schools, the computers in this study are on tables and against a wall to minimize accidents involving the various forms of wiring.) The interface can be considered as a tool for exploring the microworld, and by means of this activity, to learn physics. However, along with similar studies of human computer interaction (e.g., Suchman, 1987; Winograd & Flores, 1987), this study shows that although tools constrain actions in some ways, they can also be interpreted in multiple ways and therefore do not embed unambiguous meanings. What and how entities are salient is therefore an empirical matter. For example, students in this classroom often interpret software feedback in ways unintended by the designers. In one situation, Ryan tests a configuration of object and arrows, and, after the object races off the screen, receives the following message, “Object velocities are high for this simulation; reduce time step for greater accuracy.” The three students subsequently denote force2 with “time step.” This is a rather surprising off-the-wall interpretation. However, the conversation becomes more under2For ease of reading, I use velocity and force to denote the respective arrows → and ⇒. However, especially in the excerpts presented, students do not perceive these arrows as denoting velocity or force or any thereby reified natural phenomena.



standable when we consider a larger time frame and both humans and the machine; that is, Table 3 allows us to attribute particular aspects of the structure of interaction (and therefore of cognition) to users and software. Within the students’ attentional horizon, immediately following the students’ lengthening of one arrow (force) the panel message is displayed. As a result of these actions, the software makes available a trajectory, followed by the panel message. Thus, from the students’ perspective, the message is an immediate consequence of their previous action of lengthening the arrow. The word reduce, when viewed in the context of the previous lengthening, is used as a resource to relate the subsequent time step to the manipulated arrow. However, the interpretive frame of Interactive Physics is different. It is designed to run and display the experiment given a particular specification of the relevant variables (mass, velocity, position, force). The message, based on the size of velocity somewhere along the trajectory, is designed to indicate an action that allowed greater accuracy given the user specifications. Thus, rather than basing its feedback on the history of the interaction or on the specified size of the variables, the system starts with an aspect of the simulation. This aspect is not directly available on the interface, but the program uses it in a default mode and checks whether the simulation is possible with a modified time step. In the hand of a competent user familiar with the design rationale and simulation practices, however, the message is likely to be interpreted differently (e.g., in my own case). My analyses reported in the section “Gesture and Scientific Talk” show how Interactive Physics enables students to use deictic and iconic gestures to make salient certain features (as figure) to which they link their utterances. When viewed against the interface as background, gestures help a speaker to make salient those aspects relevant to his or her explanation. However, as can be seen from Figure 3, when there are three or more individuals oriented toward the interface, there are space constraints on possible physical configurations. Whereas immediate members perceive gestures against background, the same affordance does not always exist for other participants. Thus, although the physical setting does not preclude participation in the conversation, it does preclude the linking function of gestures. However, because gestures are central to scientific laboratory talk (e.g., Lemke, 1998; Ochs, Gonzales, & Jacoby, 1996 ), not having equal access to the representational medium actively interferes with learning. The point is that not being able to handle the computer input appears to be far less important than the exclusion from the ongoing conversation because of limited access to a different mode of communication. This analyses focused on learning in a broader frame considering how physical arrangements, size of social configurations, and nature of focal artifacts interact and affect conversational and participatory patterns. This broader focus then leads us to construct different aspects of cognition than those that emerge from a microanalysis of unfolding activity.

TABLE 3 Machine and Human Perspectives on the Unfolding Activity The Users

Not Available to Interactive Physics E: Pull it out G: So pull it now go that way

R: Oh, what did I do? G: Cancel, Oh there we go, leave it, yeah. Alright, now push it back, keep connected to the back, now run it. G: OK run it, oh baby! yes

Interactive Physics™ Available to Interactive Physics

Available to the Users

Design Rationale

R INCREASES [force] TURNS[force]

Object-oriented manipulation of physical variables

R INCREASES [force] TURNS[force]

Object-oriented manipulation of physical variables

R STARTS[experiment] Given position, initial velocity, force, mass, calculate and display trajectory

DISPLAY PANEL: Object velocities are high for this simulation. Reduce time step for greater accuracy


E: Which one’s the time step? R: It’s that big arrow G: Oh yeah the big arrow’s time, OK. I comprehend

High object velocities along trajectory cause large position changes, cause inaccuracies in trajectory and recalculation of velocity, acceleration



Social Construction Recent research on cognition in joint activity reveals how new meanings arise from the interactions in small groups (e.g., Moschkovich, 1998). The process and products of such interactions are frequently referred to as social constructions. As I am interested in finding out how microworld objects are perceived and how new descriptive and theoretical language about them arises, I bring into focus student–student interactions, their gestures, and the objects that their talk is about. (See Figure 5 and the representation of interactions in the following transcripts.) In this study, students are asked to find out about the relation between the motion of a circular object and the two arrows, and to construct an explanation of how the microworld works. Prior to this episode, the students already conducted several experiments with different configurations of velocity and force, leading to different curvilinear trajectories. At one point, Ryan accidentally detaches the force arrow from the object; but the three decide to run an experiment in this new configuration. They discuss the resulting screen display in the following excerpt.



Glen provides a first description in terms of velocity as moving “in the same velocity” while his deictic gesture picks out the arrow, which is followed by an iconic gesture that traces and therefore makes salient the trajectory. Ryan first follows up by describing the trajectory as being “in the same direction” and by tracing a straight line in the air. He then links velocity to a feature of the initial state in the experiment, but, overlapped by Elizabeth does not complete his statement about the final direction. Elizabeth’s statement about something being constant can be read as confirming both Ryan and Glen’s earlier utterances “same direction” and “same velocity.” Glen, followed by Ryan, describes the action of force as “forcing” and “changing direction after the start.” In this episode, the three students produce descriptions commensurable with Newtonian physics. They use gestures and utterances to pick out, and make salient, a limited number of objects (force, velocity) and events (trajectory). These observation descriptions are assembled in a public space, and require both the inscription and the gesture. The gestures allow students to fix the referents of some words, although in this episode, the referent for the deictic term it fluctuates and its referents are never clarified. For example, in Glen’s description, the force acts on it, presumably the object. However, Ryan’s “it changes direction” does not unambiguously pick out whether it is force that causes some change, velocity which changes, or the object that moves on a curvilinear trajectory. It therefore needs to remain open whether students talked about the arrows or the objects. However, we can take the unfolding conversation as an entity that exists in public space (thus, somewhere other than the agent pole), leaving open what memory traces they leave (what students learned), or how this aspect constrains later developments of the conversation. One may be tempted to infer from this transcript that the three already evolved mental representations consistent with Newtonian physics. For example, Glen may be interpreted as having a representation of velocity (“this arrow”) as indicating a constant velocity of it, the circular object. As indicated, we need to radically question our own perceptions and how we attribute them to the agent. Later parts of the unfolding interaction show that the discourse is not a stable one. The students are in the middle of the development, represented in Table 2; there is still considerable variation in the designations used for velocity and force. The four lists in Table 2 show that the same terms are used to denote different arrows. In this sense, the aforementioned observation sentences are constructed in the context, contingent on the computer configuration, history of the emergent conversation, and students’ perceptions. Existing resemblances between scientific language and vernacular all too easily lead researchers to make assumptions about perception, representations, and conceptions that are not viable representations of students’ knowing.



The episode shows us how students coproduce a description of an event in the sense that all observation sentences highlight some entity as being constant when one arrow (force) is disconnected from the object. They also share that there are changes in the direction when the same arrow is attached to the object. As we can see in part from Table 2, out of these uncertain beginnings the students develop a consistent way of describing and explaining the phenomena at hand (see also Roth, 1996). However, this development does not occur independently of other events in the classroom. Rather, the interactions between myself (teacher) and students bring about changes in the way students perceive and talk about the events. We may assume that conceptions drive what students say. Their talk is then considered as a medium of externalizing thoughts and conceptions from the computational hardware to the public forum. Such a view is inconsistent with the data presented because of the considerable variations in the discourse, which would have required the assumption that their “conceptions” constantly changed. Based on my epistemological frame, I make the less stringent assumption that students produce situated observation sentences out of their interactions in the setting. This does not necessitate representations, for the relevant elements (image to be described, language, gestures, etc.) can be picked from the setting. These descriptions are ephemeral and may be forgotten in the next instance; subsequent sentences may in fact be incompatible with earlier ones when studied by the researcher. On the other hand, observation sentences can also stabilize within the group and then become conversational results that students remember, and which last beyond the immediate activity. Perceiving Forces Lave (1988) critiqued researchers of cognition for assuming that interacting with materials (diagrams, texts, graphical models, tools, instruments, physical phenomena) provides individuals with relatively unambiguous perceptual experiences. Science educators generally make the same assumptions (e.g., Berg & Philips, 1994). All students really have to do is look and see, or infer, the same patterns available to those with a scientific background. My own studies in a variety of settings and cultures show that this is not the case (e.g., Roth et al., 1997). Even students’ perceptions of carefully staged teacher demonstrations are radically different and a function of prior expectations. I am therefore interested in finding out how students perceive various aspects of the microworld and decide to bring into focus student descriptions, gestures, actions, and the microworld entities (see Figure 4). The episode presented is interesting from another perspective, as it shows what a teacher does when he realizes that students perceive the microworld differently. As such, elicitation of these differences and provisions of constraints that afford students to make observations relevant to understanding the scientifically correct framework are crucial elements of teacher–student interaction. (These constraints operate at all three levels of learning as illustrated in Figure 1.) I show that the par-



ticular form of teacher–student interaction and the affordances of the (computer-animated) inscription provide constraints that allow students to modify their observation sentences, and reconstruct how they perceive events.3 Due to the fact that the three students do not consistently describe an event despite a considerable exploration time, I decide to set up an experiment. Due to its up–down orientation, the experiment affords an analogy between students’ lived world and the microworld. I orient the force so that it would push the object downward, but I orient the initial velocity upward.

Ryan and Glen respond to my “what if…?” question by stating the hypothesis that the object would immediately descend (“straight”). I then run the experiment and ask students the question, “But first?” Glen describes the object as going down. The fact that Elizabeth uses “but” suggests that her observation, “it went backwards first,” contradicts Glen.

3My roles as teacher and analyst are clearly distinct. As teacher, I was attuned to the unfolding events and prone to understandings and misunderstandings as other conversation participants. As analyst, I am at a remove and analyze in a way that I would analyze another teacher in action (as much as this can be done).



Coincident with my questioning, “But first?” Elizabeth uses a contrast (“though”) to describe the object as going backward first. This contrast, together with “first,” can be read as a contrastive description to that provided by Glen and Ryan. Ryan responds by describing the initial movement in the direction of the “little arrow,” to which Elizabeth reiterates her perception as a contrast (“didn’t it”) to what she understands Ryan as saying. Subsequent to this interaction, use of the slow-motion feature makes the upward motion salient. Later, I ask students to relate this phenomenon to something in their everyday life to which the three responded with descriptions of a returning hula-hoop, yo-yo, and object thrown in the air. Out of this, Elizabeth suggested that force represented something like gravity. Glen and Ryan describe the object as moving down. In this situation, this observational description—constituting what is salient and currently figure—is different from my own expectation and observation that the object should move upward before it descends. Not perceiving the initial upward motion is significant, for it does not allow an understanding of the relation between velocity and forces in the early part of the trajectory, and therefore a more general theory of forces and the motion of objects. Despite setting up what I considered a crucial experiment, the move is unsuccessful in the very first instance. This changes with my question and Elizabeth’s different observation description. Whereas the students appear to have come to an agreement that the object moved up before it descended, the episode does not make clear whether they actually observe the velocity change: From being pointed upward it decreased to zero, and then increasing again in length but pointing downward. In fact, the subsequent episode shows that, during the moment of teaching, I interpret students’ talk as not having made this observation and therefore ran the same experiment repeatedly in slow motion until the students’ observation descriptions include not only the object but also the two vectors. We note that there are moments when, despite the very small number of elements that constitute the microworld, students do not describe it in the same way I (qua physicist and physics teacher) perceive it. The students’ observational descriptions make salient different elements of the microworld: Although students experience their perception as real and objective, it is inconsistent with a Newtonian description. However, the constraints provided by different observations within a student group, and those provided in interactions with the teacher, are crucial for establishing the phenomenal backdrop to any correct understanding of the theory that students are to learn according to the curriculum.

Gesture and Scientific Talk Structure in cognition and learning is also found in the relation among gestures, talk, salient objects (figure), and the ground (e.g., Hall, 1999; Kendon, 1997). Ges-



tures obtain significance in two considerably different respects. First, they assist in achieving a mutual alignment of talk and objects because motion makes entities salient against a background and is more easily discriminated than the boundaries of static objects (Allen & Saidel, 1998). Gestures constrain perception and therefore the ways speech can be used (e.g., Goodwin, 1994). Second, there is evidence from other research that gestures communicate scientific concepts prior to and in the absence of speech (e.g., Church & Goldin-Meadow, 1986; Crowder & Newman, 1993; Goldin-Meadow, Wein, & Chang, 1992). My own research shows that, when students engage in practical science activities, their gestures often arise from, and abstract, earlier manipulations of objects (Roth, 1999a, in press). Furthermore, manipulations and gestures precede and are integral part of the construction of conceptual categories related to simple machines. Table 2 shows considerable variation in the words students used to name or categorize the different elements in the microworld, and in understanding how these elements (object, arrows) interact. In this seeming chaos in which the same words are used to denote different objects, deictic and iconic gestures are crucial to establishing a common ground, finding appropriate observation sentences for the situation at hand, and ultimately, arriving at a theoretical discourse that is consistent with Newtonian physics (Roth & Lawless, in press). Thus, zooming in and focusing on this relation allows us to understand how cognition is distributed. In this section, I provide one analysis of the relation among talk, gesture, and setting. At the moment of the episode, the three students still have no grasp of what the arrows stand for and how they relate to the moving object—we are in the middle of the development represent in Table 2. The students previously affiliated them with time, energy, time step, and many other words. Glen enacts another attempt at describing and explaining the events, of which traces are still visible in the top left of the first frame. His utterances (see Figure 5) are paralleled by the gestures of both hands, which enact the arrows and their behavior as he saw them previously. Glen holds his right hand with fingers parallel to force for (1.47 + 2.00 =) 3.47 sec prior to specifying its referent in the second frame (Frames 1–3 in Figure 5). He then makes another brief circular gesture, which marks the transition between two iconic gestures and highlights the salience of the hand (e.g., McNeill, 1992), while uttering “that arrow” that immediately preceded the causal meaning unit “that’s why it is pushing it.” Before he says “the velocity” (Frame 3), his left hand appears, held parallel to velocity. In the next frame, both hands are visible: The right parallel to force, and the left parallel to velocity. Then, the right hand already pushes against the left hand, which is moving to the left. This movement continues to the end of the sentence and out of the video frame. The gesture of the right hand begins substantially (i.e., 0.10 + 0.20 + 0.53 = 0.83 sec) before its verbal correlate pushing; that is, the iconic gesture already describes the shape of the object’s trajectory (visible in Frame 1) that Glen attempts to explain.



The episode is complex because Glen uses the arrow in the presence of two arrows on the monitor, and repeatedly uses the indexical signs that and it, but each time with a different referent. That appears three times. In the first instance, the function of that (Frame 2) is deictic as it designates a particular arrow standing in opposition to the speaker (i.e., on the opposite pole in the subject–object relation). Coinciding with the utterance, the right hand, which had moved to the right, came to a sudden stop. Frame 2 shows that the fingers of the right hand stand parallel to force. This finger position, the noticeable (abrupt) stop of motion, and the coincident utterance “that arrow” makes it reasonable to assume that the right hand models force. The listener can draw further confirmation for this interpretation from the causal connection between “that arrow” and force: The three students had previously manipulated this arrow whereas the other arrow changed as a function of their action. In the second instance, that (Frame 4) introduces the causal consequence (“that’s why”) of the hand arrangement he had set up and described in the previous part of the utterance. The word appears while the gestural trajectory, which iconically rerepresents the earlier visible trajectory, is merely beginning. In the third instance, that is linked to way, the immediately preceding trajectory (way) enacted by the gesture. In the vernacular, “that way” most frequently expresses a specific direction. However, “that’a way” (Frame 6) together with the curved motion of the hand, when read against the ground of the earlier curvilinear motion of the object and the corresponding positioning of the arrows, highlights not only the existence of the trajectory but in particular its curvilinear shape. The verbal sign it occurs twice, but we can distinguish two different referents. The student speaks while the right hand follows, fingers pointing to the left. When this is heard together with “It’s pushing it” (Frames 5 and 6), the right hand can be understood as literally pushing the left hand. The first occurrence of the sign it has the hand–arrow as referent. The second occurrence of the sign has some entity that is being pushed as referent. This entity could be the second arrow–left hand or the object. At the time of this episode, Glen (along with his two peers) does not yet describe the arrows in scientific terms (i.e., as force and velocity). He uses the appropriate scientific (verbal) language only 2 weeks later during a subsequent lesson with the microworld. However, his gesture is consistent with scientific practice—when understood as a description of the relation between the concepts of velocity and force. He characterizes the action of the outline arrow as “pushing,” which is a vernacular form of describing forces. Glen also associates the longer pushing arrow with a resulting higher velocity. The referent of velocity is not completely clear and two readings are possible. Due to the fact that the utterance coincides with the positioning of the left hand, velocity can be heard as the referent to the left hand: Therefore, the longer right arrow (force) pushes more and therefore leads to a longer left arrow (velocity). However, the fragment “Since that arrow’s



longer the velocity is higher� can also mean that the longer right arrow is equivalent to a higher velocity. Then, velocity (incorrectly so from a scientific perspective) would refer to the right arrow. However, the referents for each of the two hands are clear by their position in space in the course of the motion. The directional orientation of the right hand is constant and parallel to force. The left hand changes its direction in the way velocity previously changed. Although there are some studies related to the interaction of gesture and speech in a variety of nonmotion domains (e.g., Crowder & Newman, 1993; Hall, 1999), the role of gestures in scientific and mathematical discourse largely remains unexplored in educational research (Lemke, 1998). The research presented contributes to this emerging literature. In this episode, gestures, animated diagrams, and words are deeply integrated; that is, the structure in the activities arises from structure in each of the levels so that we can view cognition as distributed across the agent-in-setting unit. Taken as a whole, gestures, words, and diagrams (both topic talk and background to gesture) make a lot of sense. Due to the fact that we have to consider these elements together, it makes sense to speak of cognition as being situated. The structure and coordination of the actions make sense if considered in this particular setting.

DISCUSSION The five analyses of knowing and learning in the physics classrooms feature different takes on the structure of activity, and therefore of intelligent action. The analysis of an individual’s gesture and talk over and about inscriptions shows how deeply integrated these are. Furthermore, the changing relation of gesture and talk over time also suggests that, for the individual, there is a change in the nature of the display. At one end, there are arrows and a circular object. At the other end, there are velocity and force as vectors that have different relations to the object. When we consider the individual and the environment it perceives as one unit, which is continuously transformed through the experience, we can always break out a part, the individual, environment, or relation between the two and see that they have changed. However, I suggest that we maintain the cognitive unit of analysis and always consider the individual and its perceived world in its entirety. By changing focus and by zooming, phenomena pertaining to different fields of attention become visible and are of different grain sizes and time scales; nevertheless, it is always part of the overall picture. For example, in a recent project, we studied the events in an inner-city classroom in Philadelphia (Roth & Tobin, in press; Seiler, Tobin, & Sokolic, 2000). At one level, we could describe the learning processes when these students engaged in learning physics through technological activities. At another level, because these students were also attuned to their social condition (e.g., poverty, looming unemployment, etc.) different processes and



events at the classroom level became salient when zooming took larger and more encompassing perspectives. However, what is a salient figure to a person cannot be answered a priori; figure and ground are empirical matters because of the contingencies of perception and attention. With others (e.g., Mandelblit & Zachar, 1998), I am advocating a dynamic unit of analysis. By adopting such an approach to the unit of analysis, researchers actively situate cognition; that is, the decision to make the unit of analysis a function of what is salient to the observed individual or individuals makes cognition a situated and contingent phenomenon. Central to my approach is the use of multiple levels of analysis (i.e., zooming), which reveal different aspects of a more general phenomenon that I call cognition. To locate the nature of cognition, we have to do analyses at multiple levels, which requires zooming. Due to the fact that it follows changes in the individual’s field of attention it embodies a reflexive relation between observer and observed, which, as ethnomethodologists have suggested for some time, always exists even though many researchers of social phenomena have not been attuned to it. Different foci of analysis and the associated changes in spatial and temporal scales require what are considered different methodologies. The study of gesture–talk–ground coordination requires video records and the possibility of precise timing. At the same time, if we are interested in developmental changes, these video records have to span considerable periods. Furthermore, these developmental changes do occur within larger frames, including the particular course students are enrolled in or even larger units such as the out-of-school worlds. Then, anthropological studies that draw on ethnography, participant observation, or apprenticeship as method provide the necessary data for constructing an understanding of culture and groupings. Most important, because engaging in an activity is different from talking about one’s engagement in an activity, most of my data bases are constituted by large amounts of video data showing people in activity rather than by interviews about activity. With a very narrow frame—matching words and video frames with 33-msec accuracy—I focus on an individual, his or her utterances and gestures over and about a computer-animated event. Such an analysis reveals the nature of the relation between words and gestures embodied at that depth of vision (around the chosen level of zooming). Gesture, words, and world coproduce each other. What we recognize as cognition is in fact an assemblage of coincident images, combining iconic gesture and the shape of a trajectory created for the analyst spectator. Words and deictic gesture pick out or leave underdetermined particular ways of cutting the focal area into objects and events allowing the analyst to make inferences about the nature of perceptual figure. When the analytic frame is opened up and several individuals are brought into focus as a collectivity, new cognitive phenomena present itself—multiple beings engaged in constructing a common world, where their respective observation descriptions are recognized as being the same. At this level, the time scale is of the



order of seconds and the focus is on several verbal exchanges at a time. Learning then becomes a social phenomenon, and the question to be dealt with is what and how the activity influences individual learners. My analysis shows how students come to construct a taken-as-shared world. When corresponding observation descriptions are viewed as compatible, there appears to be what I have called interactive stabilization (Roth, 1996)—a phenomenon that only becomes apparent at a time scale of several seconds or minutes. Due to their common condition and the task to arrive at a collective response, students come to experience (perceive, act on, describe) the focal objects in ways that they recognize as shared. It is often in the conversation as a collective phenomenon that new “conceptions” are worked out before each individual seems to subscribe to it (consistent with a sociocultural view of learning; e.g., Vygotsky, 1978). Thus, in the episode discussed in this article, the three students collectively arrive at a description for situations in which force is not acting on (disconnected from) the object. Only from that point on does each of the three individuals consistently refer to the object on a straight trajectory when force does not act on the circular object. They each appropriate, from the publicly accessible conversational situation, a new way of talking about the phenomena at hand. The episode featuring an interaction between students and myself (teacher) highlights two important elements. First, students’ ways of perceiving objects and events may be significantly different from that of the scientist and differ even among each other. Glen and Ryan expect and then perceive the object as immediately going downward. Elizabeth seems to perceive an upward motion that precedes the downward motion. Rather than interpreting such differences as a defect or a cognitive deficiency, I interpret it as a consequence of the interaction of present ways of organizing the world and the stimuli that arrive at the sensory surface of each individual. It is simply one form of patterned activity. However, even the orientation (attention) to the world is a function of the current state of the cognitive system. From the cognitive scientists’ perspective, the issue then is to understand the kind of experiences that allow the cognitive system to change in particular directions (i.e., pursue a particular trajectory), and how these changes come about. As part of the commitment to the individual’s perspective on the world, the setting itself becomes part of the analysis. In my final example, the analysis again keeps agent and setting concurrently in focus rather than letting one slip in favor of the other. This concurrent focus on human actors and computers (as seen by participants) and their interaction is embodied in the way Table 2 was constructed. The analyses of human–computer interaction also make clear why I am little interested in analyzing what a computer can record as having occurred. What is available to the computer is only a small (although important) slice that underdetermines what is salient in the world of the users (cf. Suchman, 1987). Again, I am interested in the larger unit of user–user–computer interactions—in other words, users’ interactions over and with the computer interface. On the other hand, the mapping from



machine states (structures) to a priori assumptions of user intents (structures in mental activity), on which the success of certain interactions such as that in Table 2 depends, would lead to trouble (cf. Suchman, 1987). The levels of zooming do not need to be constrained to groups as I have done in this article for reasons of space limitations. Elsewhere, I describe phenomena at more global levels than any of the examples provided. In one study, we confirm the hypothesis that a different physical placement of the same individuals in the same social configuration (whole-class activity) leads to different forms of participation in discourse and even in the nature of the discourse contributions (Roth et al., 1999). In that study, we also document the interaction of changes in classroom discourse with the development of group activities, and changes in the discourse of individual students. Learning therefore arises from phenomena at the levels of activity, individual, and classroom that mutually influenced each other. I am, therefore, holding that the question is not whether we should chose between one or the other zooming level but how many levels one can feasible investigate. By now, a decade after the seminal publications of Lave (1988) and Suchman (1987) that laid the groundwork for expanding cognitive units of analysis, a number of investigations in educational settings have explored the usefulness of regarding cognition as situated (e.g., Greeno, 1998). Too often, however, educators seem to be tempted to provide micro-level descriptions without considering more overarching temporal and physical constraints on the activities. I suggest that we need to resist such temptations and ask questions such as, How do individual students change in the course of activity? Which aspects of the cognitive unit are transported to new settings, and what are the long-term effects of individual activities? As researchers, we may approach these tasks by asking how much overlap we can observe when we conduct investigations of the type individuali in setting j for all sets (i, j) that are of (theoretical) interest. In the examples provided in this article, different students contributed to stabilizing particular observation sentences. It should also be of interest to find out answers to questions such as, How are such co-constructed sentences eventually appropriated by individuals, and how do individuals arrive at using these observation sentences for their own intentions even in the absence of the other group members?

ACKNOWLEDGMENTS This work was made possible in part by Grant 410–96–0681 from the Social Sciences and Humanities Research Council of Canada. My thanks go to the following colleagues: Throughout the data collection phase, I interacted with Anita Roychoudhury and G. Michael Bowen regarding instruction and data interpretation. Carolyn Woszczyna and Gillian Smith assisted in the transcription and analysis of the data. Sasha Barab and Daniel Lawless pro-



vided considerable feedback and helped with editing earlier versions. Finally, David Kirshner, Jonna Kulikovich, and two anonymous reviewers provided valuable insights that helped me to improve on earlier versions of this article. REFERENCES Agre, P. E. (1995). Computational research on interaction and agency. Artificial Intelligence, 72, 1–52. Allen, C., & Saidel, E. (1998). The evolution of reference. In D. Cummins & C. Allen (Eds.), The evolution of mind (pp. 183–203). Oxford, England: Oxford University Press. Berg, C. A., & Philips, D. G. (1994). An investigation of the relationship between logical thinking structures and the ability to construct and interpret line graphs. Journal of Research in Science Teaching, 31, 323–344. Bourdieu, P., & Wacquant, L. J. D. (1992). An invitation to reflexive sociology. Chicago: The University of Chicago Press. Brown, A. L. (1992). Design experiments: Theoretical and methodological challenges in creating complex interventions in classroom settings. The Journal of the Learning Sciences, 2, 141–178. Brown, J. S., & Duguid, P. (1992). Enacting design for the workplace. In P. S. Adler & T. A. Winograd (Eds.), Usability: Turning technologies into tools (pp. 164–197). New York: Oxford University Press. Chaiklin, S., & Lave, J. (Eds.). (1993). Understanding practice: Perspectives on activity and context. Cambridge, England: Cambridge University Press. Chapman, D. (1991). Vision, instruction, and action. Cambridge, MA: MIT Press. Church, R. B., & Goldin-Meadow, S. (1986). The mismatch between gesture and speech as an index of transitional knowledge. Cognition, 23, 43–71. Churchland, P. S., & Sejnowski, T. J. (1992). The computational brain. Cambridge, MA MIT Press. Clancey, W. J. (1997). Situated cognition: On human knowledge and computer representation. Cambridge, England: Cambridge University Press. Corbin, J., & Strauss, A. (1990). Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology, 13, 3–21. Crowder, E. M., & Newman, D. (1993). Telling what they know: The role of gestures and language in children’s science explanations. Pragmatics & Cognition, 1, 341–376. Garfinkel, H. (1967). Studies in ethnomethodology. Englewood Cliffs, NJ: Prentice Hall. Geertz, C. (1983). Local knowledge: Further essays in interpretive anthropology. New York: Basic Books. Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin. Goldin-Meadow, S., Wein, D., & Chang, C. (1992). Assessing knowledge through gesture: Using children’s hands to read their minds. Cognition and Instruction, 9, 201–219. Goodwin, C. (1986). Gestures as a resource for the organization of mutual orientation. Semiotica, 62, 29–49. Goodwin, C. (1994). Professional vision. American Anthropologist, 96, 606–633. Greeno, J. G. (1991). Number sense as situated knowing in a conceptual domain. Journal for Research in Mathematics Teaching, 22, 170–218. Greeno, J. G. (1998). The situativity of knowing, learning, and research. American Psychologist, 53, 5–26. Greeno, J. G., & Hall, J. (1997). Practicing representation learning with and about representational forms. Phi Delta Kappan, 78, 361–367. Guba, E., & Lincoln, Y. (1989). Fourth generation evaluation. Beverly Hills, CA: Sage. Hall, R. (1996). Representation as shared activity: Situated cognition and Dewey’s cartography of experience. The Journal of the Learning Sciences, 5, 209–238.



Hall, R. (1999). The organization and development of discursive practices for: “Having a rheory.” Discourse Processes, 27, 187–218. Hammer, D. (1996). Misconceptions or p-prims: How may alternative perspectives of cognitive structure influence instructional perceptions and intentions? The Journal of the Learning Sciences, 5, 97–127. Hewitt, P. G. (1989). Conceptual physics (6th ed.). Glenview, IL: Scott, Foresman. Hutchins, E. (1995). Cognition in the wild. Cambridge, MA: MIT Press. Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence. New York: Basic. Jordan, B., & Henderson, A. (1995). Interaction analysis: Foundations and practice. The Journal of the Learning Sciences, 4, 39–103. Kendon, A. (1997). Gesture. Annual Review of Anthropology, 26, 109–128. Lave, J. (1988). Cognition in practice: Mind, mathematics and culture in everyday life. Cambridge, England: Cambridge University Press. Lemke, J. L. (1990). Talking science: Language, learning and values. Norwood, NJ: Ablex. Lemke, J. L. (1998). Multiplying meaning: Visual and verbal semiotics in scientific text. In J. R. Martin & R. Veel (Eds.), Reading science (pp. 87–113). London: Routledge. Lynch, M. (1985). Art and artifact in laboratory science: A study of shop work and shop talk in a laboratory. London: Routledge & Kegan Paul. Lyotard, J.-F. (1991). Phenomenology. Albany: State University of New York Press. Magnusson, S. J., Templin, M., & Boyle, R. A. (1997). Dynamic science assessment: A new approach for investigating conceptual change. The Journal of the Learning Sciences, 6, 91–142. Mandelblit, N., & Zachar, O. (1998). The notion of dynamic unit: Conceptual developments in cognitive science. Cognitive Science, 22, 229–268. Marcus, G. E., & Fischer, M. M. J. (1986). Anthropology as a cultural critique: An experimental moment in the human sciences. Chicago: University of Chicago Press. McNeill, D. (1992). Hand and mind: What gestures reveal about thought. Chicago: University of Chicago Press. Merleau-Ponty, M. (1945). Phénoménologie de la perception [Phenomenology of perception]. Paris: Gallimard. Metz, K. E. (1993). Preschoolers’ developing knowledge of the pan balance: From new representation to transformed problem solving. Cognition and Instruction, 11, 31–93. Moschkovich, J. N. (1998). Resources for refining mathematical conceptions: Case studies in learning about linear functions. The Journal of the Learning Sciences, 7, 209–237. Ochs, E., Gonzales, P., & Jacoby, S. (1996). “When I come down I’m in the domain state”: Grammar and graphic representation in the interpretive activity of physicists. In E. Ochs, E. A. Schegloff, & S. A. Thompson (Eds.), Interaction and grammar (pp. 328–369). Cambridge, England: Cambridge University Press. Pea, R. D. (1993). Learning scientific concepts through material and social activities: Conversational analysis meets conceptual change. Educational Psychologist, 28, 265–277. Resnick, L., Levine, J., & Teasley, S. D. (Eds.). (1991). Perspectives on socially shared cognition. Washington, DC: American Psychological Association. Ricœur, P. (1991). From text to action: Essays in hermeneutics, II. Evanston, IL: Northwestern University Press. Roschelle, J. (1992). Learning by collaborating: Convergent conceptual change. The Journal of the Learning Sciences, 2, 235–276. Roth, W.-M. (1996). The co-evolution of situated language and physics knowing. Journal of Science Education and Technology, 3, 171–191. Roth, W.- M. (1998a). Situated cognition and assessment of competence in science. Evaluation and Program Planning, 21, 155–169.



Roth, W.- M. (1998b). Starting small and with uncertainty: Toward a neurocomputational account of knowing and learning in science. International Journal of Science Education, 20, 1089–1105. Roth, W.- M. (1999a). Discourse and agency in school science laboratories. Discourse Processes, 28, 27–60. Roth, W.- M. (1999b). The evolution of umwelt and communication. Cybernetics & Human Knowing, 6(4), 5–23. Roth, W.- M. (in press). From gesture to scientific language. Journal of Pragmatics. Roth, W. -M., & Lawless, D. (in press). Signs, deixis, and the emergence of scientific explanations. Semiotica. Roth, W.- M., McGinn, M. K., Woszczyna, C., & Boutonné, S. (1999). Differential participation during science conversations: The interaction of focal artifacts, social configuration, and physical arrangements. The Journal of the Learning Sciences, 8, 293–347. Roth, W.- M., McRobbie, C., Lucas, K. B., & Boutonné, S. (1997). The local production of order in traditional science laboratories: A phenomenological analysis. Learning and Instruction, 7, 107–136. Roth, W. -M., & Tobin, K. (in press). Learning to teach science as praxis. Teaching and Teacher Education. Roth, W. -M., Woszczyna, C., & Smith, G. (1996). Affordances and constraints of computers in science education. Journal of Research in Science Teaching, 33, 995–1017. Sacks, H., Schegloff, E., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking in conversation. Language, 50, 697–735. Seiler, G., Tobin, K., & Sokolic, J. (2000). Roadblocks on the path to understanding technology and science. Manuscript submitted for publication. Sharrock, W., & Anderson, B. (1993). Working towards agreement. In G. Button (Ed.), Technology in working order: Studies of work, interaction, and technology (pp. 149–161). London and New York: Routledge. Sharrock, W., & Button, G. (1991). The social actor: Social action in real time. In G. Button (Ed.), Ethnomethodology and the human sciences (pp. 137–175). Cambridge, England: Cambridge University Press. Siegler, R. S. (1978). The origins of scientific reasoning. In R. S. Siegler (Ed.), Children’s thinking, what develops? (pp. 109–149). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Siegler, R. S., & Crowley, K. (1991). The microgenetic method: A direct means for studying cognitive development. American Psychologist, 46, 606–620. Suchman, L. A. (1987). Plans and situated actions: The problem of human–machine communication. Cambridge, England: Cambridge University Press. Suchman, L. A., & Trigg, R. H. (1993). Artificial intelligence as craftwork. In S. Chaiklin & J. Lave (Eds.), Understanding practice: Perspectives on activity and context (pp. 144–178). Cambridge, England: Cambridge University Press. van Zee, E., & Minstrell, J. (1997). Using questioning to guide student thinking. The Journal of the Learning Sciences, 6, 227–269. von Uexküll, J. (1973). Theoretische biologie [Theoretical biology]. Frankfurt: Suhrkamp. (Original work published 1928) Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Cambridge, MA: Harvard University Press. Wertheimer, M. (1985). A Gestalt perspective on computer simulations of cognitive processes. Computers in Human Behavior, 1, 19–33. Winograd, T., & Flores, F. (1987). Understanding computers and cognition: A new foundation for design. Norwood, NJ: Ablex.

Situating cognition  
Situating cognition