Issuu on Google+


GESPIN – GESTURE & SPEECH IN INTERACTION – Poznań, 24-26 September 2009

Coordinating gesture location in geometric puzzle task Irene Kimbara Kushiro Public University

Ashino 4-1-1 Kushiro, Hokkaido, Japan

Abstract The present study examines coordination in the use of gesture space between two speakers engaged in a joint geometric puzzle task. The experiment required pairs of speakers to determine a way to combine block pieces into a geometric pattern without manipulating the block pieces. To investigate coordination in the use of gesture space, all pairs solved two puzzles under different conditions: high screen and low screen conditions. Results showed that gesture location varied between the two conditions. Comparison of co-timed gesture pairs across speakers revealed there were more gestures at the same location when gestures were visible and thus indicated that there was reciprocation in the use of gesture space and in the way gestures were employed within interaction.



In speech, speakers are known to coordinate their linguistic expressions to avoid ambiguity and achieve efficient understanding with minimal effort. When presented with a tangram figure, which, unlike a triangle or square, does not have a readily available linguistic label, speakers try to find a common term to refer to the object (Clark & Wilkes-Gibbs, 1986). Similar coordination behavior could be found at a higher level of language use. For instance, in a joint task which required participant pairs to specify locations in a maze, the participants became increasingly inclined to use the same descriptive frameworks to navigate through the maze as the task progressed (Garrod & Anderson, 1987). That is, as members of a small community having a common discursive goal, speakers cooperated in building a shared referential framework over time (cf. Garrod & Doherty, 1994). The present study investigates whether such coordination is observed in gesture with respect to the use of gesture space, using a joint task experiment where participant pairs were given a set of four blocks and were asked to find a way to combine the pieces into a given geometric pattern. To examine interspeaker influence in the use of gesture space, mutual visibility of speakers was controlled using screens of different height during the experiment. The high screen condition used a partition that completely obstructed the mutual view of speakers. The low screen condition used a partition that was only high enough to obstruct the view of each speaker’s block pieces. Use of the low screen prevented the participants from identifying block pieces by pointing directly at them instead of using speech and gesture. It was expected that speakers were more likely to gesture at the same location in the low screen condition than in the high screen condition if there is coordination in the use of gesture space.

GESPIN proceedings, vol. I



Fifty-six subjects (28 pairs), all university students, were recruited and participated in the experiment. Upon entering a room, pairs of participants were seated at a desk facing each other. An identical set of 4 wooden blocks of various shapes was randomly placed in front of each participant to vary the blocks’ starting orientation and their relative position. The task was to determine a way to arrange the block pieces into a target geometric pattern, either a fish or an umbrella.

Figure 1. Block pieces for the fish pattern (left) and target geometric patterns (right). Initially, a sheet with a silhouette of the target geometric pattern was provided and participants were given time to think alone. However, they were prohibited from touching or moving the block pieces and had to rely on the ability to mentally rotate and flip block pieces into the target pattern. Silhouette sheets were returned when at least one participant solved the puzzle and felt confident of the solution, and participants began discussing their solutions until they reached an agreement. Because the target patterns were both symmetric figures, each of them had at least two arrangement patterns as correct answers (i.e., correctly placed blocks could be flipped horizontally without altering their shape). Therefore, participants needed to carefully explain the placement of each block piece to ensure the arrangements were identical and to avoid any discrepancies. During this discussion period, gestures were often used to visualize the orientation of the blocks by tracing lines with their fingers or molding their hands in the shape of block pieces. Mutual visibility of speakers was controlled using screens of different height. In the high screen condition, the partition completely obstructed the mutual view of the speakers. In the low screen condition, the partition only obstructed the view of each other’s block pieces and not each participant’s face and gestures (Fig. 2). Three video cameras were set up to record gestures, with one camera recording both participants from the side and two other cameras recording a closer view of each participant’s hands.

Figure 2. Still photos of the high (left) and low (right) screen conditions. Each pair solved three puzzles: one for a trial case using the silhouette of a butterfly and two for data coding and analysis. Half of the pairs solved the umbrella puzzle first followed by the fish puzzle, and the order was reversed for the remaining pairs. The order of the high and low screen condition was also altered. Upon completion of the experiment, pairs demonstrated their solutions by manipulating the block pieces.

Irene Kimbara: Coordinating gesture location in geometric puzzle task



After one minute, all gestures produced during the following two minute period were analyzed using ELAN (Max Planck Institute), a time based annotation software for video recordings. Each gesture was coded for its semantic type, referent, and location. All gestures were categorized into five semantic types: Piece, Action, Pointing, Aborted, and Others. Piece gestures were iconic representations of the block pieces and the most frequent of all types (e.g., a straight line for a rectangular block). Action gestures represented the act of fitting a block into a spot. Pointing gestures pointed to the block pieces on the desk. Aborted gestures are those that were abandoned before completion. The last category, Others, included everything else and was mostly beat gestures and interactive pointing gestures directed at the interlocutor. All of the above semantic types except Others had block pieces as part of their semantic content. A Piece gesture referred to one or a combination of several block pieces while a Pointing gesture singled out a block piece laid on the desk deictically. An Action gesture was also a description of an action with respect to a certain block piece. A referent, i.e., which block piece was represented by a gesture, was identified for these three semantic types. These gestures were also coded for their location, which was divided into three broad areas: desk, palm, and air. Gestures were coded as occurring on the desk if the hand(s) made contact with the surface of the desk. When coded as palm, speakers traced lines on a raised palm using the opposite hand. Use of the palm as a writing board is often observed among Japanese speakers when recalling or showing how Chinese characters are written and thus familiar to the participants of the present experiment. The remaining location code, air, covers the largest area of gesture space, and was used when no contact was made with either the desk or a palm. This included gestures made slightly above the desk and those made in front of the speaker’s face.

4 4.1



More gestures were observed in the low screen condition (M=18.55, SD=12.16) than in the high screen condition (M=11.61, SD=9.13) (t(55) = 4.41, p < 0.01). This result is consistent with the previous finding that speakers produce some gestures to be seen by the interlocutor, leading to a lower gesture rate when gestures are not visible (Cohen & Harrison, 1973; Bavelas, et al., 1992; Alibali et al. 2001). The mean frequency of gestures produced at the three locations is shown in Figure 3. The mean number of gestures produced in the air was greater in the low screen condition (M=8.13, SD=8.08) than in the high screen condition (M=2.82, SD=3.65). In the low screen condition, more gestures were made in the air than on the desk (M=5.23, SD=6.81). However, the reverse trend was observed using the high screen condition, with more gestures being made on the desk (M=4.88, SD=5.81) than in the air. In both high and low screen conditions, only a small number of gestures were made using the palm (low screen condition: M=0.11, SD=0.37; high screen condition: M=0.04, SD=0.19). An analysis of variance (ANOVA) shows that the frequency of gesture varies according to location (F(2, 110)=26.009, p < 0.01) as well as mutual visibility of speakers (F(1, 55)=23.046, p < 0.01).

Figure 3. Mean frequency of gestures at desk, air, and palm locations using high screen and low screen conditions.


GESPIN proceedings, vol. I



Co-timed gesture pairs were identified, in which the two speakers gestured simultaneously in order to examine coordination in the use of gesture space. When a gesture by one speaker overlapped two gestures by the other speaker, resulting in two distinct overlapping intervals, the intervals were treated as a separate entry. Results show that the frequency of co-timed gesture pairs where the gestures of both participants were coded for its location (i.e., Piece, Action, and Pointing gestures) were 152 in the low screen condition and 53 in the high screen condition. The numbers of matches and mismatches in the use of gesture space are shown in Table 1. A greater percent of matches in location was observed in the low screen condition (low screen: 76%, high screen: 51%). The two conditions differed also with respect to where matches occurred most. When speakers could see each other over the low screen, most matches occurred in the air (n=78), and when the high screen blocked mutual view of the speakers, most matches occurred on the desk (n=20). Table 1. Frequency and proportion of matched and mismatched gesture location of co-timed gesture pairs.  






Total and  percentage of  mismatches 

Low screen 

36 (24%) 






High screen 

26 (49%) 












Total and  percentage of  matches 

Co‐timed  gesture pairs  total 


116 (76%) 



27 (51%) 




The use of gesture is an integral part of speech, and conveys meaning related to speech and sometimes supplements what is expressed verbally. The use of gesture as part of a communicative package means that interlocutors are expected to attend to the gestures used to fully understand what the speaker is trying to convey. In fact, research on the ability of interlocutors to glean meanings from gestures shows that gesture is combined with speech in comprehension (Kelly et al. 1999; Kelly 2001; Goldin-Meadow & Singer 2003). For instance, when the content of speech and gesture are somewhat incompatible, interlocutors try to come up with a feasible interpretation, sometimes, at the expense of speech content (Cassell et al. 1999). From this perspective, questions arise as to when gestures become the primary means of communication (de Fornel 1992; Tabensky 2001) and whether there is interspeaker influence in gesture, etc. The present study addresses these questions by focusing on the use of gesture space and how it is coordinated between speakers in the joint geometric puzzle task. Results indicated that speakers assigned different communicative values to different locations. When the high screen prevented speakers from seeing each other’s gestures, speakers made more gestures on the desk than in the air; they drew outlines with their fingers or molded their hands into the shape of block pieces on the desk surface to visualize block orientation and placement in relation to other pieces. The tendency to produce gestures on the desk in the high screen condition indicated that the desk was a default position for producing self-oriented gestures. In contrast, more gestures were made in the air, closer to face level than on the desk, in the low screen condition. It is important that gestures could be seen regardless of their location in the low screen condition. Therefore, the speakers did not need to produce gestures in the air to ensure they were visible. Rather, the difference in the use of gesture space in the two conditions suggests that when the gestures were visible, speakers produced more gestures in the air to draw the interlocutor’s attention towards them. By selecting a more noticeable position, speakers in the low screen condition were able to establish gesture as a primary means of communication, and reach a

Irene Kimbara: Coordinating gesture location in geometric puzzle task

common understanding on how block pieces could be placed more efficiently. According to this account, only self-oriented gestures, not intended for the interlocutors, such as those made for the speaker’ own purpose of visualizing how to combine block pieces, occurred on the desk when the screen was low. Furthermore, the analysis on the co-timed gesture pairs across speakers showed that producing a gesture in the air invited the opposite speaker to also produce a gesture in the air. This means that gesture location was reciprocated when speakers could see each other. Because gesture locations were associated with different functions, coordination in the use of gesture space can be considered as coordination with respect to the role gestures play within interaction at a given moment, in the case of the present study, whether gestures should be brought to the forefront and receive focus, or should only serve a self-oriented function. To summarize, the present study showed that, in the joint geometric puzzle task, speakers assigned different communicative values to different areas of space and tended to use the same gesture space to achieve an efficient communication. Such coordination can be compared to the way language is used within interaction, where speakers are known to select common descriptive frameworks as well as the same referential labels for objects and actions to establish common understanding. Bibliography Alibali, M. W., Heath, D. C., & Myers, H. J. 2001. Effects of visibility between speaker and listener on gesture production: Some gestures are meant to be seen. Journal of Memory and Language, 44: 169-188. Bavelas, J.B., Chovil, N., Lawrie, D., A., & Wade, A. 1992. Interactive gestures. Discourse Processes, 15: 469-489. Cassell, J., McNeill, D., & McCullough, K-E. 1999. Speech-gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information. Pragmatics & Cognition, 7(1): 1-33. Cohen, A.A. & Harrison, R.P. 1973. Intentionality in the use of hand illustrators in face-to face communication situations. Journal of Personality and Social Psychology, 28 (2): 276-279. Clark, H. H., & Wilkes-Gibbs, D. 1986. Referring as a collaborative process. Cognition, 22: 1-39. de Fornel, M. 1992. The return gesture: Some remarks on context, inference, and iconic gesture. In: Auer, P. & di Luzio, A. (eds.), The Contextualization of Language. Amsterdam: Benjamins. pp.159-193. Garrod, S., & Anderson, A. 1987. Saying what you mean in dialogue: A study in conceptual and semantic co-ordination. Cognition, 27: 181-218. Garrod, S., & Doherty, G. 1994. Conversation, co-ordination and convention: an empirical investigation of how groups establish linguistic conventions. Cognition, 53: 181-215. Goldin-Meadow, S., & Singer, M. A. 2003. From children’s hands to adults’ ears: Gesture’s role in the learning process. Developmental Psychology, 39: 509-520. Kelly, S. D. 2001. Broadening the units of analysis in communication: Speech and nonverbal behaviours in pragmatic comprehension. Journal of Child Language, 28: 325-349. Kelly, S. D., Barr, D. J., Church, R. B., & Lynch. K. 1999. Offering a hand to pragmatic understanding: The role of speech and gesture in comprehension and memory. Journal of Memory and Language, 40: 577-592. Tabensky, A. 2001. Gesture and speech rephrasings in conversation. Gesture, 1(2): 213-235.