Frames of Reference and Direct Manipulation Based Nagivation by Vision Included

17-Jun-09

Frames of Reference and Direct Manipulation Based Navigation Representation of Movement in Architectural Space

Abstract Although highly advanced in visual realism, current virtual reality systems used by designers do not provide the user with the ability to interact effortlessly and move in space as desired as one does in real space. As a result, these systems offer limited help to designers when they want to test or communicate the spatial quality of their projects. The aim of the research was to develop a design tool for navigation in virtual environments that can offer a sense of ‘immersion’ and vividness similar to that experienced when one moves in real space. The research examined a number of typical cases of CAD-VR navigation systems and analyzed their performance. All programs use direct manipulation: a virtual representation of reality is created, which can be manipulated by the user through physical actions like pointing, clicking, dragging, and sliding. The analysis showed that the lack of realism of these systems means that they do not offer a ‘sense of reality’ because of the user’s inability to interact easily with the computer to navigate among represented objects. The user cannot: 1. Plan a course from a given location to a desired one. 2. Shift the direction of their gaze and focus attention to objects as they move across a path 3. Move around an object keeping track of this change in relation to the surrounding objects. 4. Turn an object in front of the viewer in order to examine it. This lack of ‘sense of reality’ cannot be simply improved by adding attributes to the system that are more realistic - details, shadows, and reflections. Departing from pioneering rigorous studies developed in environmental design by Kevin Lynch and his followers about ‘cognitive mapping’, and drawing on recent research in cognitive science on spatial thinking, the study identified the cognitive processes through which people perceive their environment as they move through it. In contrast to Lynch’s approach concerned with the visual quality of urban environments focusing on visual urban cues for recognition and orientation within a city, the present research, related to movement through the built environment, concentrated on the linguistic commands. The ‘frames of reference’ people use to plan their path among objects, shift their attention to them, move around them, and turn them. The frames of reference used are 1. allocentric, 2. egocentric, 3. relative, 4. intrinsic, and 5. absolute. Following the criteria of realism and vividness in exploring the virtual world through movement, the system uses an agent/avatar which is an immersed

17-Jun-09

character that allows the user to direct navigation and the view of objects. It permits both agent-based and object-based navigation. The user can refer to the movement of the avatar as a basis for movement or to the object as a reference point. To enhance the feeling of engagement, the userâ&#x20AC;&#x2122;s input to the system requesting a change of viewing position, direction of view, object of view, or a path, or a view, is expressed in terms of natural language while the output remains visual. In other words, the user talks to the avatar and , at the same time sees on the screen what the avatar views in the virtual world. The user-centered navigation tool produces â&#x20AC;&#x153;on the flyâ&#x20AC;? navigation most desirable for design professional applications as opposed to a tailored presentation. It can be applied in urban environments as well as in architectural interiors, both using the same types of axes and frames of reference. It is targeted to support testing of the quality of designed environments, both interior and exterior, by individual designers, but it can be most effective in architectural presentations and debates where the architect communicates with various parties, while examining the various aspects of the three-dimensional project. Keywords: CAD-VR Navigation Systems Spatial Frames of Reference Visual versus Language Representation of Space Design Tool Development Design Methodology Urban Design Representation

17-Jun-09

Acknowledgements I would like to thank Prof. A Tzonis for giving me the space to formulate my thoughts and his time to exchange ideas. This fruitful exchange also involved Prof. Dr. W. Porter who followed the whole process. My thanks also go to the committee members Prof. Dr.E. Backer, Prof. Dr. Y. Kalay, Dr. E. Ross, and Prof. H.J. Rosemann, for their useful comments. I would also like to thank Prof. Dr. B. Tversky for heir help in initial stages of the dissertation. I would also like to thank all past DKS associates, Prof. Dr. L. Lefaivre, Prof. R. Serkesma, and past colleagues Dr. P. Bay Joo Hwa, Dr. Ir. B. S. Inanรง, Dr. K. Moraes Zarzar, and Dr. J. Press, Dr. P. Sidjanin, T. Beischer also student assistants, and J. Arkesteijn for administration support. I would also like to thank M. Richardson for editing my work. I would also like to thank Bowkounde TUD for its financial support, and all the people of Bowkounde that made it such a pleasant environment.

iii

17-Jun-09

Table of Contents

Abstract Acknowledgements Contents

1. Introduction 1.1 Navigational interface 1.2 Descriptive aspects of action 1.3 Spatial reasoning 1.4 Route knowledge 1.5 Navigational goal analysis 1.6 Examination of movement in virtual space 1.7 Outline of this study

2. The basic principles of navigation in virtual environments 2.1 Basic configuration of current navigational systems 2.2 Description of spaces that people navigate 2.3 Current performance of navigation in virtual reality 2.4 Basic assumptions concerning navigation

17-Jun-09

3. Selected cases of Current navigational systems 3.1 Introduction to the state of the art in computer navigation 3.2 Exploration Programs - Cave™ 3.3 Exploration Programs - Cosmo Player® 3.4 Exploration Programs - Myst® 3.5 Representation programs - 3DS MAX® 3.6 First person shooters - Tomb Raider®

4. Critical analysis of the state of the art technology 4.1 Historical origins of the ‘flatness’ of the computer screen 4.2 Limitations of interaction in virtual space with surrounding objects 4.3 Navigational criteria 4.4 Scenarios of interaction 4.5 Adequacy criteria for an agent directed by the user

5. Conceptual system of agent-based frames & axis navigation 5.1 Language of the path 5.2 Elements of the path 5.3 Method of examination

17-Jun-09

6. Visual representation - implementation 6.1 Egocentric and allocentric systems 6.2 Panoramic communicative view 6.3 Possible interaction with objects and routes

7. Language based representation - implementation 7.1 Directing the path 7.2 How view changes; axes and frames of reference 7.3 Flat-shooters – Two-dimensional – Amazon’s Sonja

8. A comparison between visual & language-based representation 8.1 Aligning the visual and linguistic frames 8.2 Travel Guide – guided walk – Amsterdam 8.3 Comparison between linguistic and visual frames 8.4 Comparison between existing simulation programs

9. An Agent/object-based system of navigation 9.1 Conceptual framework 9.2 Controlling the pedestrian agent behavioral model 9.3 Operation of the system 9.4 Process of the system

iii

17-Jun-09

10. Usability of the system 10.1 Architectural case 10.2 Evaluation of the object- centered tool 10.3 Testing the hypothesis

11. Conclusions; future avenues 11.1 Review of process 11.2 Findings 11.3 Future research

Appendix A References Summary Index Samenvatting

17-Jun-09

CHAPTER 1 INTRODUCTION There has been growing interest in computerized architecture over the last two decades of architectural design. These Virtual Reality [VR], Computer Aided Design/Drawing [CAD] systems employ interactive software and hardware. Although the systems are advanced in visual realism, they are still lacking in the ability to permit their users to interact intuitively with threedimensional objects as they navigate among them. As architects design, the design process can be roughly divided into a creative part that generates the built form, and an observation/exploration part, that attempts to comprehend the consequences of that intervention. Thus the design process enables the architect to develop a solution to a required local condition. Currently those tasks are given to digital media production houses as described in an article written by Michel Marriott published in The New York Times on March 4 2004, entitled “For New Buildings, Digital Models Offer an Advance WalkThrough”. Michael Schuldt, president of AEI Digital, states: “What we're trying to do is to bring this technology that's out there, that's been developed for the video game industry and Hollywood, and bring that to the building industry …It's really just a communications tool.” Indeed, AEI's presentations often resemble video games and special effects in movies, permitting viewers to fly over and around structures - or even to enter them, walking into richly detailed corridors and exhibition halls. In order to establish and evaluate the design product associated with human action [use

17-Jun-09

and meaning], architects utilize Virtual Reality systems that generate a realistic three-dimensional simulation of an environment. On the other hand, according to an article by Edward Rothstein published in the New York Times on April 6 2002 entitled “Realism May Be Taking the Fun Out of Games”, “One of the major goals of video game systems has been to simulate the real, to create images so lifelike, and movements so natural that there is no sense of artifice.” Existing CAD-VR systems are highly reductive and abstract, and yet offer a limited potential to grasp the spatial quality of new design proposals. This lack of ‘sense of reality’ can be improved not just by adding attributes that are more realistic – details, shadows, and reflections – but also by improving the navigation techniques. The virtual space for the purposes of this research is the built environment, a simulation with human activity, a topological unit with continuous and discontinuous movement. According to Evans (1980), Passini, (1984), Garling, et al., (1986), and Peponis, et. al., (1990), in most situations architects want to design a building to reduce wayfinding problems for the people working or visiting a complex of designed environments. The research on the other hand focuses on the mechanism for navigation. It proceeds from pioneering work developed in urban studies, and follows a tradition of rigorous studies in environment design by Lynch (1960), Appleyard (1964), and Thiel (1961). These studies analyzed the urban environment in order to examine the visual quality of cities, and to enhance urban quality through people’s recollection of places. This research draws on knowledge from cognitive studies in linguistics: Jackendoff (1983), (1993), (Talmy, 2001), Levinson, (1996), and Levelt, (1996), and studies of vision by Marr, (1982), Ullman, (1996), and O'Keefe, (1993), and spatial reasoning by Piaget (1948), Campbell (1994), Eilan (1993), Paillard (1996), and Brewer (1993). It also employs current studies in spatial and motion cognition by Taylor (1992), Crawford (2000), and Tversky (1981). Direct Manipulation is used by all current navigational programs; a virtual representation of reality is created, which can be manipulated by the user through physical actions like pointing, clicking, dragging, and sliding. In these simulated environments one usually interacts with the aid of a visual display and a hand-input data device. At the low end it is a computer desktop system, which includes a computer and a screen, and is controlled with a mouse. At the high end, it is an immersed environment system, which includes a computer controlled by a set of wired gloves or a ‘magic wand’, a position tracker, and a head-mounted stereoscopic display for three-dimensional visual output, immersed in a cube-like

17-Jun-09

environment with a display projected on three to six sides of a box-like structure. Virtual reality in the broader sense is used to describe the sense of reality as opposed to virtual space of any kind, imagined or real. This study uses this limited sense of computer-simulated virtual reality. The basic limitation of current virtual reality systems lies in their failure to provide users with an effective system for object manipulation and interaction within the virtual environment. As we shall show, current programs designed for the architect are tools for modeling the single architectural object and are thus designed as an object-centered view, allowing users to move around, but they lack the capacity to manipulate objects while moving in a large-scale environment. This is inadequate for a higher-level function of movement, used in our everyday description of places. For example, the use of an input device for direct action manipulation commands – forward and backward, left and right – does not make any sense in some instances when users need to travel long distances in virtual reality. Moving the mouse the equivalent distance is simply too laborious as compared to an abrupt transportation from place to place. Other systems that allow for object manipulation supply the user with the ability to interact with the environment but lack control over movement to allow the manipulation of the observer’s point of view. The question that concerned this study is: what is the interactive experience of objects and places through navigation in such a virtual space? The answer lies in understanding the intuitive action that reflects our spatial reasoning in a virtual environment as opposed to “everyday reality”, based on the observation of what type of action was used and what constrains it. The aim of this research is to help architects move in virtual reality by automating such activity, utilizing a verbal description. Navigation is a means to achieve one’s goal; here I follow the assumption concerning the importance of navigation similar to that of Berthoz: “To localize an object simply means to represent to oneself the movement that would be necessary reach it.” (Berthoz, 2000 p.37) The tool is to be a mechanism that facilitates exploration in the virtual world, through interaction with objects. The investigation focuses on the role of the cognitive structures utilized as one moves through the environment, and tries to identify ‘the descriptive geometric conventions and cognitive processing involved’. This study aims to develop a theoretical framework through which the applied system can overcome such weaknesses. To achieve this augmented navigational interaction this research will mobilize knowledge from cognitive science and design, and more specifically

17-Jun-09

knowledge that deals with spatial reasoning. The goal of this dissertation is to present an augmented linguistic navigational system that is interactive and fulfils the requirements of the architect to explore new horizons in a virtual threedimensional world. Thus the first part will examine the missing frames that do not allow one to manipulate objects freely in current navigational programs. The objective of the second part of this dissertation is to specify the main characteristic of a navigation system that is “on the fly”, a system that would be generic in its application as opposed to a tailored presentation. The challenge is to construct a tool that will explicate recent cognitive findings and thus represent a better model fit for human-computer interaction. The dissertation will concentrate on the process and not the product. This is an interdisciplinary research spanning related areas of endeavor, such as perception, cognition, neuroscience, linguistics, information science, and robotics. This is both its strength and its weakness, since it has to borrow many different terminologies and make comparisons between them, thus expanding the number of areas of research covered.

1.1 Navigational Interface The navigational interface is the response of the computer to our indication of desires and needs to experience environments visually, a command translated through spatial reasoning to perform a required task. ISO 13407 defines an interactive system as a “combination of hardware and software components that receive input from, and communicate output to, a user in order to support his or her performance of the task.” But how does one examine the actions of a person to understand whether the interaction is effective? “An interaction technique defines a consistent mapping between the user and the virtual environment technology, and how the environment will react as a user interacts with the input devices.” (Willans et al. 2001) What this technique is, is what we intend to explore. Users of Virtual Reality (VR) systems can suffer from severe usability problems, such as disorientation and an inability to explore objects naturally. Therefore there is a need for better-designed VR systems that support exposition, navigation, exploration and engagement (Shneiderman, 1982). Significant usability problems with current VR systems have been reported by Tognazzini (1989), Broll et al. (2001), Azuma et al. (2001), and van Dam (2000). According to Norman (1986), user activity is achieved through the examination of the stages of performing and evaluating action sequences. “The primary, central stage is the establishment of the

17-Jun-09

goal. Then, to carry it out requires three stages: forming the intention, specifying the action sequence, and executing the action. To assess the effect of the action also requires three stages, each in some sense, interpreting the state, and evaluating the interpreted state with respect to the original goals and intentions.” (Norman, 1986 p. 37) According to Shneiderman, (1998) one should consider the syntactic-semantic model, which made a major distinction between meaningful acquired semantic concepts and arbitrary syntactic details. Evaluation methods may be able to discover some usability problems but no current evaluation methods fit the specific problem of navigation in three-dimensional virtual environments. It may be argued that conventional usability evaluation methods, such as a layered interaction analysis (Nielsen 1992), or co-operative evaluation with users to diagnose problems (Monk et al. 1993), or cognitive walkthrough (Wharton et al. 1994), could be applied to virtual reality systems. However, neither Shneiderman’s syntactic-semantic model nor Nielsen’s heuristics addresses issues of locating and manipulating objects, or navigating in three-dimensional environments; while neither Norman’s theory of action nor cognitive walkthroughs (Wharton et al. 1994) were designed to address perceptual orientation and navigation in virtual environments.

1.2 Descriptive aspects of action When one examines the relationship between the representation of a space and ‘movement’, a gap appears – a “mismatch” according to O'Keefe (1990 p.52) – between what we imagined the space to be and what that space is like. This perception of what one imagined can tint our acquisition of knowledge from an experience. According to de Certeau (1984 p. 93), “Perspective vision and prospective vision constitute the two-fold projection of an opaque past and uncertain future into a surface that can be dealt with.” The definition of perception of concern to this research is an active process that manifests itself in three main areas: vision, language, and body movement - kinesthetic. Perception is where “the observer is constructing a model of what environmental situation might have produced the observed pattern of sensory stimulation” (Palmer 1999 p. 10). That is, the viewer desires are the bases for its interpretation. It is what one believes the relation between objects is, with possible interpretations competing against each other. When attempting to describe human action, the fundamental questions that have to be asked are: what is its nature? What is its essence? And how will we describe it?

17-Jun-09

“Actions are agent-managed processes” (Rescher, 2000). Action as a philosophical term has a long history. Here I will circumvent the philosophical debate and enter the discussion with an attempt to describe action, believing that in order to label action, one has to describe it. According to Rescher (2000), the conceptual tools for what might be called the canonical description of an action inventory are the following: 1. Agent (Who did it?) 2. Act-type (What did he do?) 3. Modality of action (How did he do it?) a. Modality of manner (In what way did he do it?) b. Modality of means (By what means did he do it?) 4. Setting of action (In what context did he do it?) a. Temporal aspect (When did he do it? How long did it take?) b. Spatial aspect (Where did he do it?) c. Circumstantial aspect (Under what circumstances did he do it?) 5. Rationale of action (Why did he do it?) a. Causality (What caused him to do it?) b. Finality (With what aim did he do it?) c. Intentionality (From what motives did he do it?)

The explanation of why an agent performs an action has to do with rationalization of the goal into intention, that is the decision to act to achieve a goal. However, intention must also include motivation and belief underpinning the reasons to act. The critical questions that the user asks when moving in space are: “What is of interest?” and “How does one get there?” According to Jackendoff, (1972) the qualification of perception of causality can be divided into STATES and EVENTS. STATES are considered non-dynamic instances, while EVENTS are associated with dynamic causation. The distinction between them is a fundamental frame position that determines their representation. The distinction between moving oneself in space and moving objects in space is basically assumed to be a distinction of the awareness of what one sees [recognition] and how one acts. In our case, the distinction between moving oneself and moving objects is the state one is in, a distinction between perception and action. This can be summarized thus:

17-Jun-09

Desire: I want to see X1 object in relation to X2 object. Belief: If I change location to L2 then I will see (X1 object in relation to X2 object). Fact: What X objects are in relation to other X objects. Backing: I can see no hindrance to view X1 object in relation to X2 object. Base: I have done similar operations before.

Thus we see that the above argument is about the change of state; regardless of its ‘primary reason’ it is warranted through the backing of spatial reasoning.

1.3 Spatial reasoning There are two types of reasoning about space: visual reasoning and spatial reasoning. Visual reasoning is how one extracts information from the environment. The process of visual perception is conceptualized as a hierarchy spanning two main levels. Low-level vision is usually associated with the extraction of certain physical properties of the visible environment driven purely by stimulus input, such as depth, three-dimensional (3-D) shape, object boundaries, or surface material properties. High-level visual processing relies on previously stored information about the properties of objects and events that allow us to identify objects, navigate, and reach toward a visible goal. It is beyond the scope of this study to provide a thorough account of what the visual system does to produce a mental representation; the research assumes that object recognition is a solved problem and focuses instead on motion and the relationship between objects and the observer. However, several aspects of visual reasoning are important for our concern here. According to Goodale, “The ventral stream plays the major role in the perceptual identification of object, while the dorsal stream mediates the required sensorimotor transformation for visually guided actions directed at those objects. Processing within the ventral stream enables us to identify an object, such as a ripe pear in a basket of fruit; but processing within the dorsal stream provides the critical information about location, size, and shape of that pear so that we can accurately reach out and grasp it with our hand.” (1995, p. 177) The brain (ventral) system seems to be involved in identifying objects, whereas the parietal centers in the upper (dorsal) system seem to be involved in locating objects. These two pathways are often called the “what” system, whose task is object discrimination, and the “where” system, whose task is landmark discrimination (Palmer, 1999 p. 38) (Landau, 1993). We shall show later how those findings connect to our model of movement command.

17-Jun-09

The second type of reasoning about space is spatial reasoning. “Spatial reasoning is the relation between an observer and the space that surrounds that observer in order to achieve some defined goal though movement.” (Presson, 1982) Reasoning about space is how one plans how one will proceed to move in such an event/state, or concluding where something is in relation to where it was first, assuming certain premises. “Reasoning about space is tied to the ‘how’ system representation. The ‘what’ system and the ‘where’ system both sub-serve spatial guided motor behaviour – the ‘how’ system.” (Palmer 1999 p. 39) Thus, spatial reasoning is always related to a task to be performed, the basis of which are perception and visual reasoning. Spatial reasoning is also related to causality – the knowledge by which one knows how the environment as well as objects within that environment will perform. To plan a route, a representation of the environment must be achieved, at the level of mental representation devoted to encoding the geometric properties of objects in the world and the spatial relationships among them, i.e. visual and verbal description, expressions of multifaceted origin, of goals and desires (Huttenlocher, 1991), (Crawford, 2000).

1.4 Route knowledge In order to have route knowledge, two processes must be included to bridge the gap between passive observation of the temporal succession of sensory images (State) and active navigation using a description of the fixed environment (Event). The first process constructs descriptions of routes traveled so that they can be followed without guidance, or mentally reviewed in the absence of the environment. The second process must construct descriptions of the fixed features of the environment (places and paths) from the succession of Views and Actions. According to Kuipers (1983), there are two capabilities we would expect to correspond with knowing a route: being able to travel the route without guidance, and being able to describe or review the route without physically moving along it. These are different capabilities, since we occasionally experience the anomalous state of being able to travel a route, but not being able to describe it without traveling it. Traveling the route must include, as a minimum, knowing which action to take at any given moment when faced with a particular view. Thus, knowledge of a particular route must include a set of associations (View - Action) between descriptions of sensory images and the corresponding actions to take. An action like travel would terminate at the next decision point, when the current View should activate an association providing the next Action. The ability to rehearse the route in the absence of the

17-Jun-09

environment requires, in addition, a different set of associations (View - Next -View) from the current View to the View resulting at the next decision point. We can represent both associations in a three-part schema for a path, an element of the route: (Context: <View>; Action: <Action>; Result: <View>). The route description consists of a set of these schemas, each indexed for retrieval under the View in its Context part. As a new route is learned from experience, new schemas are created, and their Action and Result components are filled in. According to Kuipers (1983), a route description consisting of a set of partially filled-in schemas constitutes partial knowledge of the route, and has some interesting properties. When only the Context parts of the schemas are filled, the route description supports recognition of landmarks, but not self-guided travel; when the Action parts are also present, the route description supports self-guided travel but not mental rehearsal of the route apart from the environment; finally, when all three components are present, knowledge of the route is complete. These states of partial knowledge allow the route to be learned incrementally from observations when processing resources are scarce. The other major process at the foundation of the route knowledge is the creation of descriptions of the fixed features of the environment – places and paths – given the temporal sequence of sensory images experienced during travel. We must augment the cognitive map to include a “place” description that has explicit associations with the Views obtainable at that place, and is the object of the topological and metrical relations. We then ask how the many Views available in our agent’s sensory world come to be grouped according to place. As a minimum, we expect that two views will be associated with the same place description if they are linked by an Action consisting of a rotation with no travel. In fact, this relation is an equivalence relation on the set of Views, and the equivalence classes turn out to correspond exactly with the places in the environment. Precisely the same technique can be used to define path descriptions as equivalence classes of Views joined by travel without rotation. A path, by this definition, is more than an arc in a network. Rather it collects the Views occurring along a street. Since those Views may also correspond to places, a topological connection between a place and a path is made whenever the current View has associations to both a place and a path description. The spatial order of places along a path can be obtained, incrementally, from the temporal order of Views during travel.

17-Jun-09

1.5 Navigational goal analysis Spatial reasoning related to navigation can be further divided into tactical and strategic navigation. The hierarchy of navigation behaviors can be examined through performance analysis, whereby one would have a short aim task – tactical navigation, compared with long goals – strategic navigation. For example, a task could be to move from one place to another on a particular route, while the goal could be arrival at a desired location at the end of the route. “Strategy can be defined broadly as the individual’s approach to the task” (Rogers, 2000). This type of analysis requires that the intention of subject be revealed. A navigational goal is for someone to be somewhere, and from there to see something. In the virtual world it is the placement of a person at a specified location with a perspective view. To achieve this goal, to be somewhere, one has to have an intention, a subgoal that breaks the movement into its components. For example, if our goal is to reach the town hall, our intention is the script that takes us through the process of getting there. It might follow this script: Leave the house, turn left, go to the street corner and turn left again… This task identity can then be broken down into highlevel and low-level action. According to Norman (1986), a convenient summary of analysis of tasks (the process of performing and evaluating an action) can be approximated by seven stages of user activity: Goals and intentions: A goal is the state the person wishes to achieve; an intention is the decision to act so as to achieve the goal. Specification of the action sequence: The psychological process of determining the psychological representation of the actions to be executed by the user on the mechanisms of the system. Mapping from psychological goals and intentions to action sequence: In order to specify the action sequence, the user must translate the psychological goals and intentions into the desired system state, then determine what settings of the control mechanisms will yield that state, and then determine what physical manipulations of the mechanisms are required. The result is the internal, mental specification of the actions that are to be executed. Physical state of the system: The physical state of the system, determined by the values of all its physical variables. Control mechanisms: The physical devices that control the physical variables.

17-Jun-09

Mapping between the physical mechanisms and system state: The relationship between the settings of the mechanisms of the system and the system state. Interpretation of system state: The relationship between the physical state of the system and the psychological goals of the user can only be determined by first translating the physical state into psychological states (perception), then interpreting the perceived system state in terms of the psychological variables of interest. Evaluating the outcome: Evaluation of the system state requires comparing the interpretation of the perceived system state with the desired goals. This often leads to a new set of goals and intentions. However, as a normative theory, it still lacks action analysis; for that I shall follow criteria similar to those set by Ullman (1996 p. 272). The action criteria are: 1. Spatial properties and relations are established by the application of a route to a set of early representations. 2. Routes are assembled from a fixed set of elemental operations. 3. New routes can be assembled to meet newly specified processing goals. 4. Different routes share elemental operations. 5. A route can be applied to different locations. The processes that perform the same route at different locations are not independent. 6. In applying routes, mechanisms are required for sequencing elemental operations and for selecting the locations at which they are applied. The use of routines to establish shape properties and spatial relations raises fundamental problems at the levels of computational theory and the underlying mechanisms. A general problem on the computational level is to establish which spatial properties and relations are important for different tasks.

1.6 Examination of movement in virtual space The first task of this study is to explain some of the reasons why existing virtual reality programs seem so detached from the observer. The study will attempt to clarify some of the misdirected technological analysis generated in some recent articles about the phenomena of interaction, such as those written by Lauwerreyns (1998), Laurel (1986), Hutchins (1986), Myers (1999), Conway (2000), Pierce (1997), Stoakley (1995), Steuer (1992), McOmber (1999), Biocca (1992), and

17-Jun-09

many others. The study will attempt to expose the relationship between the observer and objects in the virtual environment and follow models proposed by Jackendoff (1983) and elaborated since. Since there is no model available for the examination of action that fits our specific task â&#x20AC;&#x201C; for example see (Rubin, 1994) (Hackos, 1998) â&#x20AC;&#x201C; the methodology that was used for the examination of the performance of interaction within virtual environments was the case study. The cases study examined existing computer navigational programs. According to Yin (1994), from the case study methodology one can generate the adequacy criteria of action. From the new criteria, one can build a conceptual system and respond to some of the difficulties encountered in the integration between cognitive and computerized systems. The conceptual system is conceived through an analysis of the spatial reasoning in directing movement through frames of reference. The conceptual system is constrained by translation and transformation rules. Once the tool is established, we will use scenarios to examine its effectiveness. According to (Carroll, 2002) scenario-based design is a family of techniques in which the use of a future system is concretely described at an early point in the development process. Narrative descriptions of envisioned usage episodes are then employed in a variety of ways to guide the development of the system. Scenario-based design changes the focus of design work from defining system operations to describing how people will use a system to accomplish work tasks and other activities. Scenario-based practices now span most of the software development lifecycle. The subject of such analysis is the visual guided action and natural language command (English). There are wide cultural differences in approaching such a task, and therefore the study is constrained by a native English speaking population.

1.7 Outline of this study The Introduction deals with the general formulation of the problem and the methodological premises, research methods, and theoretical models used in the study. The attempt is to decompose movement consistent with cognitive aspects of spatial reasoning, frames, and axes. Chapter 2 â&#x20AC;&#x201C; Reviews the basic configuration of current navigation systems. The chapter reviews the principle and performance of navigation in desktop computerbased applied virtual environment programs. The chapter ends with basic assumptions concerning navigational capabilities.

17-Jun-09

Chapter 3 – Presents the case studies, from computer simulations to guided tours, in order to discuss the control of representation of movement in virtual space. The cases include the state-of-the-art programs available commercially, from computer games to three-dimensional navigational programs, to architects’ modelling programs, in order to discuss the controls of representation/interaction as one moves in virtual space. Chapter 4 – Analyzes the software programs to show the ‘flatness’ of interaction and the performance of movement in different tasks. The adequacy criteria are introduced to answer some of the questions on how to overcome ‘the flatness of the computer screen’ – the inability to interact intuitively with the surroundings. Body input devices and the screen output devices are examined to generate criteria for an agent directed by the user. Chapter 5 – Presents the conceptual system that underlies the endeavour. The conceptual structure is shown to connect to the linguistic and visual faculty. The chapter also presents the method by which the system is examined. Chapter 6 – Presents basic framework components for representing spatial knowledge and performing spatial reasoning. The chapter introduces a basic pointing panoramic navigational system for representing an agent moving in an environment. Through the use of a visual panoramic model, we present a mechanism of iconic representation. Thus, the chapter examines the relationship between the basic visual panoramic model and basic language commands. Chapter 7 – Represents the integration of language commands with action to produce spatial reasoning. Through a spatial semantic model, we are able to introduce paths controlled by frames and axes. Chapter 8 – In this chapter we will align the various frames that are activated when one moves. We will introduce one more system of description. Then a case of a tourist guide will allow us to explore natural language and the visual system. Finally, all the different computer programs will be examined. Chapter 9 – Introduces the proposed computerized tool of navigation and its spatial reasoning mechanism. The tool components are then examined to reveal how visual and linguistic parsers analyze and generate a route command. Chapter 10 – Presents a simulation of how the system works in an architectural case. The simulation is based on a scenario given by experts; four commands have been extracted and presented as a simulation.

17-Jun-09

Chapter 11 â&#x20AC;&#x201C; Concludes the study and shows that localization of action is possible in an object-centered system constrained by a panoramic view.

CHAPTER 2 THE BASIC PRINICPLES OF NAVIGATION IN VIRTUAL ENVIRONMENTS The general aim of this research is to find ways to make interaction between the computer and user more vivid while navigating in virtual environments. Interacting with objects is a primary way to learn about an environment, yet interacting with current computer programs is not an intuitive task. The study will show that one of the ways to improve interaction between the computer and user is through voice commands. The specific aim of this research is to build a system that transforms linguistic instructions into an agentâ&#x20AC;&#x2122;s movement in virtual environments. In order to begin, we shall now provide a short introduction to the basic configurations of existing systems.

2.1 Basic configurations of current systems The navigational discourse between the architect and his client involves representation of the artifact through movement. The architect-client relations involve communication through which one examines scenarios of usability. In the architectural scenario, designer and client walk through the environment utilizing different scenarios involving user participant roles: resident, employee, customer, manager, janitor, service worker, neighbor, and passer-by. In our case the end-user (architect) seeks to communicate better about the placement of a proposed object within the virtual environment. That is, the intention of the end-user (designer) is to examine changes or alternatives to the proposed architectural object, and its

17-Jun-09

effect on the environment. Movement is one of the basic tools, a means to an end by which people examine objects, whether it is to move objects, move around them, or through them. Movement is primary, yet the knowledge of how one directs movement has been ignored. Movement knowledge includes wayfinding and orientation strategies, and through it one can reason about space. The examination of movement deals with the character of experience that is perception and description. Description involves the production of data, how one can talk about what one sees; from an introspective point of view, one can just talk about it, virtually without effort. Perception is one of the sources of information about our world: as one moves in space the process of visual perception converts retinal information into visual information. There is a distinction between the real world and the projected world (Kant, 1790 - 1987). The Gestalt psychology demonstrated that perception is the result of an interaction between environmental input and active principles in the mind that impose structure on that input (Wertheimer, 1923, Kohler, 1929, Koffka, 1935). The phenomenon of perception is a mental construct, i.e. thought. Perception is the attempt to categorize in a content-rich environment with mutually referring variable sets (Lakoff, 1987). Most categorization is automatic and unconscious, and if we become aware of it at all, it is only in problematic cases. In moving about the world, we automatically categorize objects and abstract entities, like events, actions, emotions and spatial relationships (Rosch, 1978, Jackendoff, 1983). According to Aloimonos (1997), â&#x20AC;&#x153;A [biological] system that successfully navigates using its perceptual sensors must have a number of capabilities that we can roughly separate into local and global ones.â&#x20AC;? Visual navigation amounts to the control of sensory-mediated movement and encompasses a wide range of capabilities, ranging from low-level capabilities related to kinetic stabilization, to high-level capabilities related to the ability of a system to acquire a memory of a place or location and recognize it (homing). In the human-computer interface, this system is similar in its representation relationship, but before jumping to conclusions let us examine the components. According to O'Keefe (1993), navigation in an environment includes two stages in the acquisition of information. In the first stage, the mammal identifies a notional point in its environment, the polar coordinates. In the second stage, the mammal identifies a landmark, a transitive reference in the environment. The landmark is fixed no matter how one moves around, and one can partially define which way one is going by saying what angle one is making with it. Once the two stages are

17-Jun-09

identified, one can construct a map of its environment by recording the vector from the landmark to each of its targets, using the slope/permanent landmark to define direction. Assuming that the mammal has done this and now wants to know how to get to a particular target, what it must do is to find the vector from itself to the permanent landmark. Once it has the vector from itself to the permanent landmark, it can establish a vector from the permanent landmark to the new target. The mammal can then find the vector from itself directly to the new target. The other factor that we have to take into account is body movement in space as one moves to a target: aiming at a visual target and arm reaching, the basic function of which is to transport the hand to a given place (see Figure 2.1). In the case of computer interaction it is the manipulation of the input device, and although it is reductive relative to an haptic device, the remaining action still serves an important role in the process of visuo-motor feedback. In virtual reality, movement takes place as a result of competing frames of reference through intentional action by the user. There are six stages to the formulations to how one operates/controls movement in virtual reality, which are common to all CAD simulations:

Figure 2.1 Basic sensorimotor channels involved in reaching movement. As according to Pillard, 1996.

In virtual reality, movement takes place as a result of competing frames of reference through intentional action by the user. There are six stages to the formulations to how one operates/controls movement in virtual reality, which are common to all computer simulations: 1.

The user observes the monitor and interprets the display as an architectural scene. He formulates a strategy of action as a result of what he wants to see occurring on the screen and what kind of action he needs to take.

The hand makes contact with the input device and is on a flat surface in the case of a mouse, or unrestrained in the case of â&#x20AC;&#x201C; for example â&#x20AC;&#x201C; a handheld input device or targeted location as in the case of a keyboard with alphanumeric commands.

17-Jun-09

The monitor displays a scene plus the registration of the user movement on the screen. This is the system input aid and it is usually provided through a pointer in the form of an arrow or hand symbol.

The user issues commands by means of hand movement simulating movement location, forwards, backwards, etc.

The monitor displays a scene. This is the system output.

The user perceives the scene and interprets it as having occurred as a result of his action and then compares it to the expected output.

Repeat of process (go to 1).

2.2 Description of the space people navigate When people describe the movements of a person in space, they represent them in terms of ‘states’ and ‘events’: there is a subject/object in state S1, which is relocated to state S2. Actions are bifocal; they create a topological reality of conceptual relations and meaning. When people verbally relate to the environment, the task of formulating the requisite and sufficiently transitive properties for use of a spatial expression to describe an event is considerably difficult. Many have argued that it is impossible, in principle, to formulate definitions that are framefree. The problem arises in part because of the allotropic nature of an event. According to Jackendoff (1983), this is a critical linguistic element in spatial cognition – the connection between conceptual structure and spatial representation. Conceptual structure is the encoding of meaning independent of any language, while spatial representation is the encoding of objects and their configurations in space. According to Jackendoff’s frame/script theory, “The way of looking at the frame/script theory captures a generalization … the essential connection between the frame selection task (how one categorizes novel things and events) and the use of a frame for its default values. I am claiming here that these tasks use the very same information” (Jackendoff, 1983 p. 141). Tversky (1981) substantiates this claim by observing that people seem to keep track effortlessly of the objects immediately around them as they move and turn in environments. However, the systemic distortion of the relationship to objects depends on the directional reference of those objects from the body and the way the objects are placed in relation to the axial parts of objects.

17-Jun-09

The agent has to move from one location to the next through a route to arrive at a destination. Means-ends analysis involves strategies and procedures for reducing difference between states. Golledge (1999), conducted an experimental comparison of strategies: stated criterion versus revealed criterion in route selection, that is, the stated intention versus the tangible action (see Table 2.1). The results show that the way we think about spatial problems and the way we act about spatial problems reveal a gap between thinking about acting and thinking about space.

Stated criterion

Ranking of stated criteria most frequently used

Ranking of revealed criteria most frequently used

Shortest path

Least time

Fewest turns

Most scenic or aesthetic

First noticed

Longest leg first

Many curves

Most turns

Different from previous route taken (variability)

Shortest leg first

Table 2.1 comparison of stated criterion versus revealed criterion in route selection (taken from Golledge, 1999)

Spatial descriptions are composed of elements, typically expressed by nouns, and spatial relations, typically expressed by prepositions and verbs. In order to formulate a linguistic navigational theory, a number of requirements have to be adhered to. Similar to Jackendoff (1983 p. 11), there are four requirements: Expressiveness: A theory of linguistic navigation must follow the observation of the userâ&#x20AC;&#x2122;s requirement adequately; it must be able to express all the distinctions made by a natural language. In practice, of course, no theory can be tested on all possible sentences, but everyone assumes that some significant fragment of the language must be accounted for.

17-Jun-09

Universality: In order to account for the linguistic navigational needs, the stock of navigational route structures available to be used by particular languages must be universal. On the other hand, this does not mean that every navigational route structure is necessarily capable of expressing every meaning, for a language may be limited by its lexicon, grammatical structure, or correspondence rules. Compositionality: A linguistic navigational theory must provide a principled way for the structure of the parts of a sentence to be combined into the meaning of the whole sentence. This requirement may be taken more or less rigidly, depending on whether or not one requires each constituent (as well as each word) of a sentence to be provided with an independent interpretation. Semiotic Properties: A linguistic navigational theory should be able to account formally for so-called "indexical properties" of an object and signified. In particular, the notion of "valid inference" must be explicated.

2.3 Current performance of navigation in virtual reality The capabilities of current VR systems are limited to integrating the hand movement of the observer with the action taken by him. Current VR systems are also limited in their ability to integrate the following movements â&#x20AC;&#x201C; as a result of limitations at the conceptual level the system cannot: 1.

Combine movement of the agent with object examination;

Move from one landmark to the next through a trajectory;

Move along an object.

It is the belief of this researcher that this lack of â&#x20AC;&#x2DC;reference systemsâ&#x20AC;&#x2122; causes one to miss vivid concreteness, unlike Lynch (1973 p. 304), who expected to find a shorthand physical symbol for the city, both to organize impressions of it and to conduct daily activity. It is expected that the conceptual reference systems of communication can be used as a tool for improving spatial accessibility to the three-dimensional environment. Expanded knowledge permits us to develop devices for representing knowledge with competence to do and control the following: 1) A way of referring to an object; 2) A way of referring to a change of view;

17-Jun-09

3) A way that relays our movement in space. To avoid confusion, we will adopt the distinction of Franklin, Tversky and Coon (Franklin, 1992), and refer to the point of view as the viewpoint of the characterdescribed scenes, while the perspective is with the observer’s perspective. The act of movement results from controlling, i.e. input. Control and input types: 1) Control using screen aid (pointer device); 2) Control using input device (mouse, hand-held device); 3) Control using English commands (keyboard, voice command).

2.4 Basic assumptions concerning navigation The Elementary Navigation control levels are: 1.

Movement instruction level - provides a structure for selecting the objects in relation to background scene, through the use of the information to be supplied by the system.

Information representation level - relays our movement in space to virtual space.

The navigational control permits consciousness of the frame which specifies the position of the viewer in relation to the surrounding objects and the position of the viewer’s movement in space in relation to the frame of the attention at the same time, thus allowing for an information update. The navigational control permits the shifting of attention, through which one focuses on the visual scene as one moves through space in relation to specific targets. The device distinguishes between frames of observation and frames of object manipulation and representation.

To demonstrate how the system works for architects, we will use the technique of scenarios. Scenarios of human-computer interaction help us to understand and to create computer systems and applications as artifacts of human activity –– as things to learn from, as tools to use in one’s work, as media for interacting with other people. Scenario-based design of

17-Jun-09

architecture addresses five technical challenges: scenarios evoke reflection in the content of design work, helping architects coordinate design action. Scenarios are at once concrete and flexible, helping architects manage the fluidity of design situations. Scenarios afford multiple views of an object, diverse kinds and amounts of detailing, helping the architect manage the many consequences entailed from any given design project. (Carroll, 2000); (Hertzum, 2003); (Rosson, 2002) A typical scenario between the architect and the client contains many directional references, since it examines relations between object. The subsequent scenario has been compiled from interviews through out the research conducted by the author. Although it is an imaginary scenario, it is accurate in its details. A sample of interviews that has been recorded was on June 2004 with architects in Rotterdam, the Netherlands. (see Appendix II) Typical scenes that architects imagine the client might want or they might want are as follows:

A hypothetical scenario between the architect and the client: Architect: These are the plans for the proposed city. Client: What you are showing me is plans; can you show me what it might look like when I approach the city? Architect: (Pulls up the elevations) This is the North elevation. Client: What you are presenting me with is a projection of space similar to the plans in that it shows space as an analytical section. But can you show me what it would look like with what you call â&#x20AC;&#x2DC;perspectiveâ&#x20AC;&#x2122;? Architect (Pulls up the a three-dimensional representation) This is what you see. You see the main road that runs through the city and connects the parts divided by three rings of roads, the road runs from South to North. From here you see the statues, fountains, palaces. Client: Can you show me the citadel? Architect: (Pulls up another three dimensional representation) This is the view of the citadel as seen from the intersection of the main road with the inner ring.

17-Jun-09

Client: This is just a zoom! Can you show me what the citadel would look like from the back? Architect: (Pulls out another drawing) This is the view from the other side. Client: Wait, I am lost. Can you show me what is the relation to the fountain, can you show me a panoramic view of this location? Architect: (Pulls up a panoramic) As you turn left you can see the fountain and as we turn around we see the palace. Client: Can you take me though the palace? Architect: Would you like to see a movie? Client: No, I would like something interactive. Architect: The latest thing is you control an agent moving around in a virtual environment. This is the joystick. It works on the analogy of movement. When you move the stick forwards the agent moves forwards, when you move it backwards the agent moves backwards, and so on. Client: (Looking at the joystick and the perspective generated on the screen) How do I get to the main hall? Architect: Go through the entrance into the vestibule and the second door on your right is the main hall. Client: (Goes through the building arriving at the hall) Where is the bar? Architect: You cannot see it from here. Client: Why not? Architect: If we rotate this bookcase then you would be able to see it. Client: How does the deliveryman get to the bar? Architect: The door on the left leads to the storeroom in the back of the building. We have an access road all the way to the back. Client: Can you show me the delivery station in the back? Architect: For this we either have to go through the storeroom or go through the kitchen located to your right. Client: Fine. Architect: (Moves the agent to the new location) Client: Can you show me what happens when we add another floor to the palace? Architect: This will make the palace look higher than the city hall.

17-Jun-09

Client: Can you show me? Architect: This is the view from the left side of the city hall.

The hypothetical scenario presented here demonstrates a many-sided argumentation about: the nature of representation, the nature of architectural spatial reasoning, and the technology and desire for control that drives the industry. Architectural representation tools have evolved through the ages, i.e. the plans, elevation and sections as the main tools of the trade, but the ability to represent the three-dimensional product was always the highlight that was thought to bring most clients to an holistic comprehension of the proposed design. However, the perspective was 2 1/2 D. Hence, it offered no interactivity and no feeling of immersion; on the contrary, as Marr suggested, it keeps the viewer aware of its flatness. What is the nature of architectural spatial reasoning? One has to start by describing the design process. The task of the architect is to plan and integrate new design solutions to human habitation needs. The task involves the coordination of different stakeholders to produce a comprehensive design. The architect has to coordinate between different trades, clients and end-user demands. As architects design, the design process can be roughly divided into a creative part that generates the built form, and an observation/exploration part, that attempts to comprehend the consequences of that intervention. Thus, the design process enables the architect to develop a solution to a required local condition (Schön, 1983). In order to investigate the consequence of an architectural intervention one must examine the quality of space through images. The architect must inspect the object’s ability to contain and be contained. The presentation is a dialog between the client and the architect. The architectural presentation takes the client through the design process, and as the scenario shows, the client has their own agenda and as he/she moves through the site different aspects of the project are examined. The architectural presentation takes the client/user through a series of anticipated views, coupled with a scaled model. The “tour”, the Cave, games, and the architectural presentation programs have a lot in common; they all lead the user though a narrative, and they all attempt to present to the user a new environment. The problem is the possible stored images increase exponentially as we add more objects to the virtual environments, and the architectural client would like to have them “on the fly”, hence a desire for controlled dynamic movement in space.

17-Jun-09

Concluding remarks

This chapter has provided some basic definitions to facilitate an understanding of the cases that will follow. The current performance limitations are: 1) combining movement of the agent with object examination; 2) moving from one landmark to the next through a controlled trajectory; 3) moving along an object. We still require adequacy criteria to examine various computer navigational programs, and these will be extracted following the case-based analysis. In order to build a tool to enhance movement, one needs to understand the observer/observed relationship as well as object description. Movement is a change in object description. The user of computer-based representation via a monitor or other immersive device still has to decide: how do I change the observer/object relation, and how do I provide input into the computer? This is a highly complex question that cannot be answered immediately and is the reason why we need â&#x20AC;&#x2DC;casesâ&#x20AC;&#x2122;.

CHAPTER 3 SELECTED CASES OF CURRENT NAVIGITONAL SYSTEMS This chapter will describe the range of typical cases of navigation systems, a collection of applied computer-based programs of commercial enterprises. The programs vary from desktop system to fully immersed systems. The initial comparison method of interaction will allow us to explore the different systems.

3.1 Introduction to the state of the art in computer navigation The computer-based navigation cases were selected from professional literature on architecture and computers, as exemplified in the “International Journal of Design Computing” 2002 Vol. 4, in a special issue on “designing a virtual world”. The reason I have chosen to examine these programs is their ability to display architectural virtual reality in a dynamic way. The commercial games are the state of the art of what can be achieved in virtual reality systems as compared to existing architectural systems. In practice, architects use their own initiative and understanding in producing a tailored demonstration. The following case division provides the range needed to examine what is involved, from the perspective of the user in movement in virtual environments. Exploration Programs: CAVE™, Cosmo Player®, Myst® Representation programs: 3DS MAX® ‘First–person shooters’: Tomb Raider®

The selection presents navigational programs that use different computerized systems. Although 3DS MAX is a representation program, it was added to the list to show how architects work. The Cave is an immersed environment, while the majority of the programs use desktop computers, and Tomb Raider uses a game console. However, there is no reason why the programs cannot all be converted to desktop programs, since they share a common mechanism of transition and pointing. According to Buxton (1990), the model relates the state of an application to the state of the device. Figure 3.1 presents the state transition diagram for the use of the mouse. Pressing or releasing the button of the mouse transfers the state of the system between states one and two. If the button of the mouse is up, the system is in the tracking state until the mouse button is pressed. Then, the dragging state is entered. That is ‘direct manipulation’, where the manipulation of objects is by the user through physical actions like pointing, clicking, dragging, and sliding.

Figure 3.1. The possible three state model of interaction between the user and visual input device taken from Buxton (1990)

This study involves a considerable “reduction” from the everyday experience of movement or the experience of computer simulated virtual reality, i.e. it focuses exclusively on questions regarding the representation/interaction of movement in architectural environments. It is a study on the mechanism of representation indicating what the new system ought to be able to achieve from the cases of CAD virtual reality systems. Also, it examines how new theoretical systems could improve existing limitations through the inclusion and integration of the user’s movement in a new transitional structure involving communication between the user and the computer, i.e. the interface that facilitates interaction. The following aspects are to be examined: How the user expresses his/her desires: what types of actions are involved? What is the user perceiving as output on the computer screen? What is involved within the system of representation that generates this output? What are the disappointments, frustrations, and difficulties experienced by the user? Analysis of the constraints of the system as a result of input action.

17-Jun-09

3.2. Exploration Programs - the CAVE™ The CAVE™ system can be used as an example to introduce what the experience of virtual reality can be. Participants visit the virtual environment alone or in groups. The CAVE, created by the Electronic Visualization Laboratory of the University of Illinois at Chicago, is a cube-shaped space with stereo projections on three walls and on the floor, which are adjusted to the viewpoint of one of the users. In Cruz-Neira (1993), the authors define a VR system as one that provides a realtime viewer-centered head-tracking perspective with a large angle of view, interactive control, and binocular (stereo) display. Alternative VR systems, such as Head Mounted Displays, achieved these features by using small display screens that move with the viewer, close to the viewer's eyes.

Fig 3.2 The CAVE setup, a participant with the shutterglasses, a large group in the CAVE, and a participant in a demonstration (taken from (Saakes, 2002)

In the CAVE, one has the ability to share the virtual environment with multiple users. The CAVE is an immersive environment, but it does not completely isolate the users from the real world. According to Cruz-Neira (1993), real-world isolation can be highly intrusive and disorienting. The viewer is still aware of the real world and may fear events such as running into a wall. It has been shown in Cruz-Neira (1993) that tracking objects in the CAVE are less distracting than in other systems. This is due to the fact that the projection plane does not move with the viewer's position and angle as it does in an HMD device. The 3D effect of the CAVE comes from the stereo projection. This stereo projection is achieved by projecting an image for the left eye followed by an image for the right eye. Viewers wear stereographic LCD Crystal Eyes stereo shutter glasses to view the stereoscopic images. The glasses are synchronized with the computer using infrared emitters mounted around the CAVE. As a result, whenever the computer is rendering the left eye perspective, the right shutter on the glasses is closed and vice-versa. This ‘tricks’ the brain and gives the illusion that the left eye is seeing the left perspective and the right eye is seeing the right perspective.

Navigation can come in many forms, including flying or walking, and may be controlled by a device called a ‘wand’. The wand is a hardware device that can be thought of as a 3D equivalent of a mouse. In addition to being tracked, this wand has three buttons and a pressure-sensitive joystick. Since the wand is tracked, it facilitates various interaction techniques. The observer in the CAVE also utilizes simple commands to interact with the environment. For instance, by pointing with the wand one can pick up an object in the virtual world. Typically, with such a technique, a virtual beam is emitted from the wand, allowing the user to intersect the beam with the desired object, and then select it with a wand button.

Figure 3.3: The wand

The illusion of movement in the cave with the help of the wand in the immersed environment is intuitive. The tracking of the movement of the user in space offers six degrees of freedom.

3.3. The exploration program–Cosmo Player® Cosmo Player® is a program that plugs into the web browser to enable the user to explore three-dimensional worlds. With Cosmo Player one can visit any threedimensional world written in Virtual Reality Modeling Language (VRML). These three-dimensional worlds often include other kinds of media, such as sound and movies, and a brief guide shows the basics of the main controls. (see Figure 3.4)

Figure 3.4 Cosmo’s navigation control

Control and movement in Cosmo Player Manual The main controls on the Cosmo Player dashboard do two things: move around in 3D worlds and explore objects in 3D worlds. Moving Around in a World To move around in a 3D world, click the Go, Slide, or Tilt button and then drag the pointer in the Cosmo Player window. Once one clicks a control, it stays selected until another is clicked.

17-Jun-09

Click and then drag to move in any direction of the visual array.

Slide

Click and then drag to slide straight up and down or to slide right or left.

Tilt

Click and then drag up or down or from side to side.

Exploring Objects To explore objects in a 3D world, click the Rotate, Pan, or Zoom button and then drag the pointer in the Cosmo Player window. Once one clicks a control, it stays selected until another is clicked. Rotate Click and then drag to rotate an object. Pan

Click and then drag to pan right, left, up, or down.

Zoom

Click and then drag up to zoom in or drag down to zoom out.

Interacting with Active Objects An active object is one that will do something — like play a sound or an animation — when one clicks it or clicks-and-drags it. When one passes the pointer over an active object, the pointer changes form. Seek

Click Seek and then click an object to move closer to it.

Another Way of Moving Through a World Authors of 3D worlds can set viewpoints — places of interest — for one to visit. The user can move from one viewpoint to another by choosing from the Viewpoint list or clicking the Next Viewpoint or Previous Viewpoint button. Viewpoint List Click the Viewpoint List button and choose a viewpoint from the pop-up list of pre-selected scenes.

Figure 3.5 Cosmo’s navigation in action

Cosmo Player is an exploration program in which attention and direction are integrated with dragging (see Figure 3.5). The dragging mechanism makes the user feel as if he or she is constantly dragging some object behind. Cognitively, the dragging feeling comes from a concept related to anticipation by pointing to where one wants to go and the delay of arriving at that point. The monitor is treated as a physical entity with flat horizontal and vertical dimensions. Cosmo Player was designed with the visitor in mind, to create places where the user explores the designer’s creation. The belief of the Cosmo Player designers was that the basic navigation movement should also change to achieve the same status of wonder and surprise as the objects themselves. Movement in a Cosmo Player world is

fragmental; although it makes sense to define each move in cinematic terms, such as pan and tilt, when one makes a movie it restricts movement to a scene coordinate system when moving in virtual reality. When a person moves in the real world the actions described above do not readily occur as individual actions. For example: should movement around in a world and zooming be separate concepts? Examining objects in Cosmo Player is through array rotation; at first intuition it is not clear what kind of rotation is occurring since the visual indicator of movement is constrained to the two-dimensional screen and hand movement is constrained relative to the user’s pointing device surface. Combining dragging and pointing into one action results in screen confusion. Although the Cosmo Player comes closest to architectural movement it fails to overcome the sense of delay and confusion when one interacts with objects.

3.4 The exploration program –Myst® In Myst®, an adventure game, the mouse acts as the input device. Control is achieved through pointing and requires almost no skill to be learned. In Myst, the hand pointer integrates action by indicating possible direction and manipulation of an object. The mouse works well in combination with the flat surface on which it moves with the movement of the pointer devices. To move forward, the user must click in the middle of the screen. If one wants to turn right or left, one clicks on the right or left side of the screen. By pointing and clicking from one node to the next, one executes movement. Each node is a 360° panoramic view (cylindrical panoramic images); the sense of movement from one node to the next is abrupt. The user does not have any option for a different location other than the one provided by the designer of the game. There are two possible ways to view a scene: one is a panoramic view, the other is an object view. The user must stop at every new node to update his position. When one passes the pointer over an active object, the pointer changes form. In Myst the engagement is labeled through predetermined hotspots in a scene, that is to say, the user must constantly search for visual clues with the pointer device on the screen. The pointer acts both as a navigational device and an interaction device with objects at the same time (see Figure 3.6).

Pointer:

This is the standard navigation pointer for exploration.

17-Jun-09

Pointer:

This is the standard navigation pointer for exploration.

Open Hand:

This pointer indicates an object that you can use or manipulate or an object that you can pick up. Click and see what happens!

Zoom In / Zoom Out:

This pointer indicates something that you can zoom up on (+) or away from (-). Click once to see what you are examining in more detail and then click again to zoom out.

Lightning Bolt:

When Zip Mode is turned on, this indicates an area that you can zip to instantly.

Figure 3.6 Cursor icons from the Myst manual

3.5 Representation programs – 3DS MAX® 3DS MAX® is a representation tool aimed at animators of architecture and multimedia producers. Studio Max attempts to move objects in order to accomplish the demands of the user to manipulate objects. The history of MAX is of some interest in understanding the development of CAD. As a company, it started on the Omega platform. Omega took the first steps towards visual display as opposed to the alphanumeric commands of the first computers. MAX started as a representation program for designing objects. It was bought by AutoDesk to supplement the manufacture of AutoCAD, a drafting program. Recently it was spun off as a new company called Discreet - attempting to corner the market of games building agents to be installed in games like Tomb Raider. In order to construct or describe an object through descriptive geometry, orthographic projections are used. Orthographic means that all lines along the same axis are pertaining to, or involve right angles or perpendiculars. The object coordinates are plotted on a two-dimensional plane that defines the limits of the user's ‘sight’. In a sense, the viewing plane is like the frame of peripheral vision. To see what is behind, you either have to turn your head (rotate the viewing plane), or step backward until the object is in front (move the viewing plane). In other words, the user can only see things that are in front of the viewing plane, and everything else is ‘outside the field of view’. When an object is in alignment within geometrical co-ordinates it can be explored predictably in three ways: 1) array rotation, 2) object rotation, and 3) scene

rotation. An alternative to using the local view co-ordinates is to define an agent axis, often associated with character animation of the represented agent in virtual reality. In 3DS MAX, the windows that allow the user to view the 3D space are called viewports (see Figure 3.7). The monitor screen itself is akin to the viewing plane because the user can only see what is â&#x20AC;&#x2DC;beyondâ&#x20AC;&#x2122; the monitor in cyberspace. In MAX, three of the four default views are orthographic, where objects are shown as orthographic projections. The fourth default viewport in MAX, the perspective viewport, represents a more realistic view of 3D space where lines converge to vanishing points, as they do in real life.

Figure 3.7. The viewpoint represents the current vantage point of the user. The viewport - viewing plane indicates the limits of the user's view because only objects in front of that plane are visible.

There are four standards for the object geometrical Cartesian co-ordinate system. They are: 1) A reference location that defines the object relation to the point of origin; 2) A reference orientation that defines the object axis; 3) A reference distance that defines the unit of measurement; 4) A reference sense along the orientation that defines the positive direction in relation to the human axis. 3DS MAX has two representational systems; one that gives the appearance to the user that all objects are seen and all have equal visual access, and a second representational camera through which one explores the virtual space with a limited point of view of the perspective (see Figure 3.8). That is, in order to enhance object manipulation, one can toggle between having one single viewport and splitting the viewport into four pre-defined world co-ordinates (top view, front view, side view and a perspective). The 3DS MAX viewport has combined the idea of world-co-ordinates with axis of array rotation, and thus one can see all objects and all positions according to the world and view co-ordinates. With view coordinates one can rotate the array and place an object within viewports and rotate the array of objects. (see Figure 3.8 for an overall view of the system, and for the specific viewport located at the bottom right Figure 3.9)

17-Jun-09

Figure 3.8 User interface for 3DS MAX

Figure 3.9. MAX viewport navigation

Using Standard View Navigation - Button Operation; from MAX Manual Clicking standard view navigation buttons produces one of two results: â&#x20AC;˘

It executes the command and returns to your previous action.

â&#x20AC;˘

It activates a view navigation mode.

While in a navigation mode, one can activate other viewports of the same type, without exiting the mode, by clicking in any viewport. Zooming, Panning, and Rotating Views; from MAX Manual Click Zoom or Zoom All drag in a viewport to change view magnification. Zoom changes only the active view while Zoom All simultaneously changes all non-camera views. If a perspective view is active, you can also click Field of View. The effect of changing FOV is similar to changing the lens on a camera. As the FOV gets larger you see more of your scene and the perspective becomes distorted, similar to using a wide-angle lens. As the FOV gets smaller you see less of your scene and the perspective flattens, similar to using a telephoto lens. Click Region Zoom to drag a rectangular region within the active viewport and magnify that region to fill the viewport. Region Zoom is available for all standard views except the Perspective view. In a perspective view Field-of-View replaces Region Zoom. Click the Zoom Extents or Zoom Extents All fly-out buttons to change the magnification and position of your view to display the extents of objects in your scene. Your view is centered on the objects and the magnification changed so the objects fill the viewport. Click Pan and drag in a viewport to move your view parallel to the viewport plane. You can also pan a viewport by dragging with the middle mouse button held down while any tool is active. Click Arc Rotate on Rotate Sub-Object to rotate your view around the view center, the selection, or the current sub-object selection respectively. The latter option is a new feature in 3DS MAX. When you rotate an orthogonal view, such as a Top view, it is converted to a User view. With Arc Rotate, if objects are near the edges of the viewport they may rotate out of view. With Arc Rotate Selected, selected objects remain at the same position in the viewport while the view rotates around them. If no objects are selected, the function reverts to the standard Arc Rotate. With Arc Rotate Sub-Object, selected sub-objects or objects remain at the same position in the viewport while the view rotates around them.

Cameras are non-rendering objects that you can position in the 3D scene. They work like real cameras in that they provide a viewpoint on the scene that can be adjusted to space and animated to time. Just as with real cameras, MAX cameras have different settings such as lens lengths and focal lengths that one can use to control the view of the scene. Cameras can move anywhere and through objects. Contrary to real-world camera effects, such as depth of field, MAX does not create a depth of field, so everything is in focus. In 3DS MAX, there exist two types of action cameras: a target camera and a free camera. A target camera makes use of a target, which is a point in 3D space where the camera is aimed. The target camera can track an object moving in the scene. A free camera is a camera without a target that can easily be animated along a path or easily pointed by simply rotating the camera. The camera can be manipulated by dragging, direct manipulation of the camera in the world coordinates, or by camera viewport control through pan, tilt and zoom. It can also be further manipulated by time lines and key frames.

Figure 3.10 Camera navigational directions

You move a camera view by clicking one of the following buttons and dragging in the camera. • Dolly moves the camera along its line of sight. • Truck moves the camera and its target parallel to the view plane. • Pan moves the target in a circle around the camera. For a target camera, it rotates the target about the camera. (For a free camera, it rotates the camera about its local axes.) • Orbit moves the camera in a circle around the target. The effect is similar to Arc Rotate for non-camera viewports. It rotates the camera about its target. (Free cameras use the invisible target, set to the target distance specified in the camera Parameters rollout.)

Figure 3.11 Possible camera movements

17-Jun-09

The camera view in MAX is slightly different than the conventional perspective view; the difference lies in the treatment of the object rotation. In the conventional views there is a switch between array rotation and view rotation in relation to view co-ordinates. That is, when view co-ordinates are visible the rotation is an array rotation, and when the view co-ordinates disappear from view the system switches to panoramic rotation. In the camera views one can be engaged with an object allowing array rotation in both conditions. The 3DS MAX approach to immersed navigation was by distributing views between the various viewports. The camera routine mimics the act of directing movies with elements of array rotation. One can always direct action by physically moving the camera to the required position, but then one is at the level of orthogonal projection, with no direct experience of space.

3.6. First-person shooters - Tomb Raider® Early ‘shoot-up gallery’ programs had an agent immersed in a two dimensional environment with four basic navigation commands: turn left, turn right, move forward, and move back. The navigational commands expressed the attempt to integrate different views with the user’s demands to experience space. It soon became obvious that those commands were insufficient and more elaborate systems were required. Tomb Raider is an example of first-person action games, where action is integrated by means of an agent. In the old arcade games, a person had a gun and shot at everything that moved; in the first-person shooting gallery, ‘action’ is controlled by direct control of the agent through input of the hand-held device. The Tomb Raider agent is Lara Croft – the representation of a character that moves as instructed by the user through computer keys: as one presses a key or combination of keys certain action follows. The control is divided according to the type of action one wants to perform. In the game, there are three levels of control over the agent: the first is basic movement of the agent (run forward, jump back, side-step left, sidestep right), with some control over motion and speed (run, jump, walk), and some rotation (roll, turn left, turn right); the second level of control pertains to action of the agent (draw weapons, get/throw flare, fire); and the third level of control is action in relation to the environment (grab ledge, pull object).

Figure 3.12 Lara turning and shooting

Below is a list of some of the actions of Lara Croft taken from the manual: Running Pressing Up moves Lara forward at a running pace, while pressing Down makes Lara jump back a short distance. Pressing Left or Right turns Lara Left or Right. Walking By pressing the Walk button in conjunction with the Cursor Keys, Lara can carefully walk forwards or backwards. Whilst the Walk button is held down, Lara will not fall off any edge - if one makes her walk up to an edge Lara will automatically stop. Side Steps

Side-step right and left do exactly as one might imagine.

Roll Selecting Roll will make Lara roll forward, and finish up facing the opposite direction. This also works when Lara is underwater. Roll may also be activated by pressing the Up and Down Cursor Keys simultaneously. Jumping Lara can jump in any direction, to evade her enemies. Press the Jump Key and Lara will jump straight up into the air. If you press a Cursor Key immediately after pressing Jump, Lara will jump in that direction. In addition, pressing Down or Roll straight after starting a forward jump makes Lara somersault in the air and land facing the opposite direction. This also works when jumping backwards by pressing Up or Roll immediately after takeoff. Vaulting If Lara is faced with an obstacle that she can climb over, pressing Up and Action will make her vault onto it. Climbing Some walls are climbable. If Lara comes across such a surface, pressing Up and Action will make her jump up (if there is room) and catch handholds on the wall. She will only hang on whilst Action is held down. She can then be made to climb up, down, left and right by pressing the Cursor Keys. Pressing Jump will make Lara jump backwards away from the wall. Grabbing hold If Lara is near to a ledge while she is jumping, pressing and holding the Action Key will allow her to grab the ledge in front of her and hang there. If a wall is climbable, Lara can catch onto it anywhere (not just ledges). Press Left or Right, and Lara will shimmy sideways. Pressing Up will make Lara climb up to the level above. Let go of Action and Lara will drop. Picking objects up Lara can retrieve objects and store them in her inventory. Position Lara so that the object you want to retrieve is in front of her feet. Press the Action Key and she will pick it up. Using puzzle items Position Lara so that the object receptor is in front of her. Press the Action Key and the Inventory Ring will appear. Left and Right will allow you to select the object you want to try, and pressing Action again will use it. Pushing/pulling objects Lara can push certain blocks around and use them to climb up to greater heights. Stand in front of the block and hold down Action, and Lara will get into her ready stance. Once she is ready, press Down to pull the block, and Up to push it, or if you decide you no longer wishes to carry on with this task, simply release the Action Key. Looking around Pressing the Look Key will make the camera go directly behind Lara, whatever the camera is currently doing. With the Look button held down, the Cursor Keys allow Lara to look around her. Once you let go of the key, the view returns to normal. (TIP: if you are

17-Jun-09

trying to line Lara up for a jump, and the camera is in an awkward position, pressing just the Look Key on its own will show you exactly what direction she is facing.)

First-person action games have combined the agent and the directing into one space. The challenges of the game Tomb Raider are about exploration and timing, how fast one can move without mistakes in guiding Lara to her target. Directing movement is continuous and requires a learning curve and good hand-eye coordination. The user is encouraged to be constantly on the move. Playing Lara is non-intuitive; one presses alphanumeric keys, a process which takes time to learn. The player must learn how to align or position the agent with virtual objects, and how to make the agent run while constantly adjusting the agentâ&#x20AC;&#x2122;s position. The discrete commands make it difficult to move Lara diagonally.

Concluding remarks

This chapter has described a range of navigational programs and their interface control mechanisms. The cases range from games to exploration programs, to modeling programs are all available commercially. In the next chapter, we will analyze this critically and compare the various programs in order to produce adequacy criteria.

17-Jun-09

CHAPTER 4 CRITICAL ANALYSIS OF THE STATE OF THE ART TECHNOLOGY This chapter will examine the overall performance of the different cases presented in the previous chapter. From this analysis the general criteria of the systems will be exposed. We have examined professional architectural programs, navigational programs and games. Unlike the games, all the other computer programs were not intuitive to learn and movement of hand did not necessarily corresponded with the action on the screen. We call this phenomena flatness of interaction.

4.1 Historical origins of the ‘flatness’ of the computer screen Movement in virtual reality as represented on the computer screen has a long history with its roots in the Renaissance. In order to understand the process of interaction between the user and objects of desire, we must start with Alberti’s attempt in “On Painting” (Alberti, 1956) to describe the process of representation: “I beg you to consider me not as a mathematician but as a painter writing of these things. Mathematicians measure with their minds alone the forms of things separated from all matter. Since we wish the object to be seen, we will use a more sensate wisdom.[…] The painter is concerned solely with representing what can be seen. The plane is measured by rays that serve the sight called visual rays which carry the form of the thing seen. We can imagine those rays to be like the finest hairs […] tightly bound within the eye where the sense of sight has its seat. The rays, gathered together within the eye, are like a stalk; the eye is like a bud, which

extends its shoots rapidly, and in a straight line to the plane opposite.[...] The extrinsic rays, thus encircling the plane [...] like the willow wands of a basketcage, and make, as is said, this visual pyramid. […] The base of this pyramid is a plane that is seen. The sides of the pyramid are those rays that I have called extrinsic. The cuspid, that is the point of the pyramid, is located within the eye” (Book I). Alberti’s system of representation for perspective, as demonstrated in his book “On Painting”, was a new way of seeing the world (Gadol, 1969), (Edgerton, 1966), (White, 1957), (Panofsky, 1924). According to (Panofsky, 1924) Alberti developed a mathematical perspective. The basic visual forms were mediated through an optical projection on a planar surface in a Euclidean space. Alberti’s substitution of the cone of vision with a pyramid makes the representation of a onepoint perspective possible. In Alberti’s perspective, the size of the object observed varies with the height of the observer’s eye and the distance to the object, the construction has the lines converge into a single point, called the vanishing point. Alberti’s concern was with an active representation of the physical world, by means of the ‘window’ described through the ‘flatness’ of the canvas. One’s observation is fixed on the intersection of the picture plane and the physical world. It employs perspective for relating the viewer to a represented world represented through the visual rays and the visual pyramid. The instance of observation is juxtaposed upon the intersection of the picture plane – the canvas and the physical world. “First of all about where I draw. I inscribe a quadrangle of right angles, as large as I wish, which is considered to be an open window through which I see what I want to paint” (Alberti, 1956). To construct such a window frame one needs an object of an intended view, and to project it onto a perpendicular surface to the observer (see Figure 3.1). According to Gadol (1969 p. 29), the window view or the centric-point scheme, as defined by Alberti, is represented through the visual rays – the observer perception of the picture – and on the other side of the picture plane is the pyramid of ‘rays’, a purely imaginary construct. The information displayed is dual, the picture is both a scene and a surface, and the scene is paradoxically behind the surface.

Figure 4.1 In perspective, all lines of sight converge on the viewer’s eye which is positioned in a stationary privileged location. This creates the illusion of a vanishing point.

17-Jun-09

“He who looks at a picture, done as I have described [above], will see a certain cross-section of a visual pyramid, artificially represented with lines and colors on a certain plane according to a given distance, center and lights” (Alberti, 1956). The perspective, as defined by Alberti, the visual lines emanating (radiant lines) from the object or non-object on the horizon of one’s visual field and where they intersect is the vanishing point that constructs the perspective (see Figure 3.2). According to Kubovy, “The geometric evidence for this point can be found in the size of the depictions of known objects. The geometry of perspective implies that the painting of an object which is in front of the picture plane will be larger-than-life; since Renaissance painters very rarely painted larger-than-life figures, most figures must be behind the picture plane” (1986 p. 23).

Figure 4.2 Construction of perspective representation of pavement consisting of square tiles. (Kubovy, 1986)

According to Panofsky (1924 p. 29), “In order to guarantee a fully rational - that is, infinite, unchanging and homogeneous - space, this "central perspective" makes two tacit but essential assumptions: first, that we see with a single and immobile eye, and second, that the planar cross section of the visual pyramid can pass for an adequate reproduction of our optical image. In fact these two premises are rather bold abstractions from reality, if by “reality” we mean the actual subjective optical impression.” That is, Alberti attempted to set the condition for realism. According to Tzonis, “The tripartition that characterized the organization of perspective-based pictures corresponded to the tripartite cognitive framework of front-middle-back, up-middle-down, right-middle-left - categorical structures internal to the mind. Consequently, what the viewer recognized in such paintings was nature categorized, humanized. Perspective paintings were not only naturalistic images, but also mental images” (Tzonis, 1993). Current mechanisms controlling movements in virtual space are predominantly based on such perspective systems. The analogy between the observer of a painting and the observer of computer interaction also has its limitations. As Kubovy (1986) shows, the interaction between the observer and the picture plane does not change, and as a result one experiences a double dilemma that is resolved differently in humancomputer interface. The dilemma of the picture plane is explained as follows: the experience of the picture turning stems from two perceptions – on the one hand, even though we are walking around the picture, we perceive the spatial layout of

the represented scene as if it remains unchanged. On the other hand, even though the spatial layout of the scene remains unchanged, we perceive our own motion in space as we walk past the picture. The modern reader may find a similarity in Alberti’s “On Painting” with a phenomenon one encounters in the human-computer-interface phenomena where the screen can either be transparent to the observer or opaque. “When painters fill the circumscribed places with colors, they should only seek to present the forms of things seen on this plane as if it were of transparent glass. Thus the visual pyramid could pass through it, placed at a definite distance with definite lights and a definite position of center in space and in a definite place in respect to the observer” (Alberti p.51). When the screen is transparent one is working with an agent moving within a three-dimensional world; when the screen is opaque one is working with a representation, a flat pointer that moves up and down. In cases where the input device is a mouse, one’s input movement is restricted to a twodimensional space; forward and backward, left and right, and the screen feedback; up and down, left and right. The attempt to break the Albertian window is characteristic of the modern movement in art. There are two approaches that emerged from the ‘modern movements’ that concern the re-construction and super-imposition of the views of an object over time; one examined tracking movement in relation to objects, recording different views as combined viewpoints frozen in an imaginary instance, or what Henderson (1983) calls the fourth dimension. This is characterized as part of the aims of the Cubist painters, while the Futurist movement examined an object as it moved in space registering the instance of its trajectory, taking their guide from the first experiments with a fixed camera and human movement (animation). The two approaches displayed a cognitive representation of the act of tracing an object in short and long-term memory. So what are the characteristics of flatness, and how do we define it? Flatness works on two levels: one physical, the other conceptual. The physical flatness is the disembodied unattached movement of the hand relative to the display of the computer screen when it does not correspond to the representation of movement in space. The conceptual flatness is our inability to manipulate objects intuitively and immediately, since the scene is paradoxically behind the surface. This paradigm has prevented us from exploring ways of working and navigating with objects in an intuitive way. It is now our task to expose the limits of such interaction.

17-Jun-09

4.2 Limitation of interaction with surrounding objects Architects utilize several means to examine an object through design criteria. Architects first employ descriptive geometry to construct the object and perspective views to combine the views of the projected object so as to be able to examine the relationships. Those relations can be summarized as follows: 1) the relation between the object and its detail or the relationship between the object and another object; 2) the relation between what is seen and occluded parts; 3) the relationship between objects and people. Those views are then examined in a cyclical manner through the design process. Much of the work of the locative prepositions involves the identification of these two variables; action description (EVENT) and location description (STATE). When modeling a computer program of transformation of states into events, one has to abstract different levels of representation, from object recognition to language interpretation – fields of developing expertise. The program derives its knowledge from object analysis to spatial propositions, to reference systems. On the conceptual level the system works on the notion of the path as a vector. A vector consists of a direction and distance from a known location to a new location, the agent’s existing location (departure point), and the agent’s new location (arrival point). Accessibility is a task for a navigator to perform. The physical accessibility answers the question: Can one move between objects or can one move between other agents? The visual accessibility asks the questions: What can be seen? Which point allows the most unhindered visible range? Locations are insular or they have access to other locations; this physical accessibility is secondary to visual accessibility, since one has to know where objects are in order to access them. Accessibility or what Gibson (1979) calls affordance relies on the information that the various senses gather about an object. Visual accessibility is the unhindered visible range, sometimes referred to as spatial openness (Fisher-Gewirtzman, 2003) or isovist (Batty, 2001) . The space that can be seen from any vantage point is called an isovist and the set of such spaces forms a visual field whose extent defines different isovist fields based on different geometric properties. This is part of a growing field that attempts to build our understanding of how our spatial knowledge is acquired.

There are different reasons to examine pedestrian movement from security to services (Kerridge, 2001), to understanding the pedestrian flow (Helbing, 2001), (Haklay, 2001), and also the importance of viewpoints for wayfinding (Ramloll, 2001), (Mallot, 1999), and (Elvins, 1997). Those systems use the observer space to record views triggered by requests of object handling and route planning. User type requires the identification of the type of navigation group one belongs to. For example, tourists would have different needs than architects or shoppers in examining the environment. Every type of group requires different operations in relation to objects used. The difference between the different spatial knowledge lies in the way spatial data is captured and organized. On the one hand, there is the geometric description of the environment, on the other, the description of the user. For example, a shop of one kind or another may be given importance over other environment landmarks, or object features might be emphasized for architects. The environment may also impose restrictions on the users. For example, in a grid city the user might prefer to use the streets to navigate rather than use landmarks. Thiel (1961) attempted to create a notational system from the participant point of view, by dividing his system into two parts: the individual traveler viewpoint and the taxonomy of the environment. As opposed to user-centered (Norman, 1986), Thiel (1970) coined the term ‘user participant’. The user participant is a viewer who acts out different roles, i.e. different scripts. The information is filtered through three components: 1) Transducers – transfer information from one physical system to another; 2) Coders – a set of conditioning or learned response; 3) Attenders – monitor the set conditions from the end-user’s point of view. By introducing this division, Thiel is able to escape the attempt to map the user goals and intentions onto resulting action. That is, Thiel eliminates judgment and decision from the process. The detriment factor of behavior include both motivation and intention, exemplified by theories of reasoned action and planned behavior (Ajzen, 1985) – theories in which intention mediates the relation between motivation (attitude) and behavior. In order to understand what one can expect from a computer navigation program, one must go back to the reason why architects have adopted the computer, or, to be exact, which computer software functions and features were offered. One of the things the representational programs presented is the ability to change an object and instantly be able to re-examine it: a reproducible perspective without the construction involved in redrawing. What architects expected was that they would be able to combine the way they work with the way the perspective was

17-Jun-09

represented. The architectural expectation from the representational programs was the ability to change views instantly in a complex environment. What the user received was a turn wheel procedure, a chain of actions that he or she had to follow in order to examine object relationships. Those procedures are different from program to program yet all of them require action to switch from one global coordinate system to another, a process that is dependent on computational power, i.e. the time to execute the commands. Those actions have profound implications for the production of an artifact, which is manufactured through collaborative communication. The question is not just how much control one has over the procedures, but also the flexibility in the design process. What is the most effective way to move between those established frames as one walks in the virtual environment? How much can the visual co-ordinates be extended in relation to the users and to facilitate movement of view? This implies that the controls of movement are through spatial relations. The way objects relate to objects/to the viewer/to an object or part of it in those programs is through the determination of what is active and passive in a scene, the manifestation of which are object and array rotation. There are two instances of object rotation in relation to the observer. One is object rotation â&#x20AC;&#x201C; the user selects an object and can then rotate the object. This is an observer control that refers to strategic or goal-oriented information processing when the individual intentionally directs attention towards relevant stimuli (Wang, 1999). The other is array rotation â&#x20AC;&#x201C; the user selects the object then moves around the object at the same constant distance from the object, thus changing his point of view. This is directing attention elicited by characteristics of the object, and implies automatic or mandatory information processing. Rotation is one of the most intuitive actions and the most controversial in terms of which category it belongs to. In cognitive acquisition skills it precedes the categories of object and route. Rotation is an instance where object manipulation is exposed. Through rotating the object and array one engages with the object. The programs examined all have different approaches concerning how to resolve the need for movement. The result of the different approaches generates different procedures for hand-eye co-ordination. All those programs use the interactive devices of a physical pointer (the mouse) and a screen cursor. The use of the hand to control the mouse and screen transfers our attention from the pointer to objects on the screen, thus captivating us.

The 3DS MAX solution is to split the viewport fields; in practice when one views the perspective there is no update to the viewport fields. For MAX as a manufacturer of virtual objects the way it was set up worked well, since for the professional architect manufacturing is the ability to record an object projected geometrically. The basic components of MAX navigation are pan, array rotate, and zoom; through them one can circumnavigate an object. MAX also introduced a camera similar to Cosmo Player. The Cosmo Player solution is to assume that directing a movie and navigation are the same problem; it takes the film director as a model. It divides the navigational tools into route and object manipulation. Myst’s solution is what they term panoramic nodes – scene rotations which are constructed before a preview and can then be connected through hotspots. The use of the mouse and a pointer allows for greater freedom by working as a spotlight, but is the most restrictive in the sense of movement flexibility and feedback control. The Tomb Raider solution is to immerse an actor in the scene to direct movement. This powerful idea allows the user to forgo the pointer for a puppet. Tomb Raider is the most immersed of the programs: one’s attention is on the object movement in three-dimensional space and allows the visualization of the experience of space through the action of the agent, but it allows no architectural investigation. There are two ways in which interaction is represented in the programs examined in the previous chapter: the first is a cursor sign on the screen indicating the current position of the pointer in relation to the screen, the second is the use of an agent (see Table 4.1). The question of the normative values of immersion does not necessarily have to be settled by experiment; suffice to say that a three-dimensional action is better then a two-dimensional action. The question one has to ask is: what about the performance of different tasks? What can those programs achieve once all possible moves are accounted for? A comparison table examines the viewer’s ability to move.

Viewer’s ability…

3DMAX

Tomb Raider

Myst

Cosmo

To move forward and backwards

Zoom

Yes

Zoom

To turn right and left

Pan

Yes

Pan

To move up and down

Tilt

17-Jun-09

To rotate object

Yes

To rotate array according to desired axis

No (only established axis)

No(only proximity axis)

Table 4.1 Viewer’s ability to perform simple tasks

In Table 4.1 a list of possible actions is generated and contrasted with the different programs. From it, it is clear that one cannot rotate the array arbitrarily; also missing is the ability to move an agent through a trajectory.

4.3 Navigational Criteria The way that humans think about physical situations contains four aspects of analysis that appear to be particularly important in solving simple qualitative spatial reasoning problems. These aspects are: 1. Representation of detail at multiple levels. People are able to store a large amount of detailed information about a complex object, yet also consider that object in terms of gross shape alone when this is necessary. They are also able to focus on a particular detail of the overall shape, while retaining a record of its context. An example of this ability is the way that an architect views a house. He knows a huge amount about its detailed shape, but is able to think, when necessary, simply in terms of its overall shape. 2. Independent reasoning in local contexts. Where overall shape is very complex, people are able to reason about one part of the overall shape, treating it as an independent context. The architect, for example, when designing a facade, is able to work purely with that local context that is abstracted from the overall shape of the building. This is the conflict between the frame and the object’s intrinsic properties. 3. Assignment of properties to groups of features. People are able to assign an abstract description to a whole set of shape features, and then make statements about the new abstraction, rather than simply about a single instance of it. 4. Qualitative size description and judgment. In many spatial reasoning situations, the absolute size of a given shape feature is not important. Its size relative to other shapes may be more important, as in the question “will this cargo fit through the

door?” Alternatively, its size may be altogether irrelevant, as in the question “is that a door?” If qualitative reasoning methods are available, it is possible to discuss relative size, or size-independent questions, without numerical information. An architectural presentation can be seen as a visual lecture on the potential of selected architectural objects, through plans, elevations, sections and perspectives that help to visualize the building. The transition of movement from one place to another in virtual environments needs to make sense to the audience. So what kind of characteristic output will give us the spatial experience in movement? The transition between the different nodes in every VR program is what makes its character. In cinema, a shot must end in a cut. Thus the architectural presentation is an exposition of the object’s different points of view, and architectural transition has to respond to the architectural description for it to make sense. We have an observer in an existing location in relation to world and view. The observer desires a new view of the world, thus moving from the existing location to a new location through a channel or path. For a pointer/visual aid one has an observer who believes: if I move through this channel and find myself in this new location I will have a new view of the world (this is spatial knowledge using spatial knowledge strategy, tactics, and so forth). For an agent we have the observer who believes that if he or she moves the agent, they will have a new view of the object. According to (Peacoke, 1993), “We experience objects specifically as material objects. Part of what is involved in having such experiences is that perceptual representations serve as input to an intuitive mechanics, which employs the notion of force. This involvement is in turn a consequence of a general principle governing the ascription of content to mental representations, together with very general philosophical considerations about what it is for an object to be a material”. This is intuitive mechanics: an important aspect of movement between objects is the discontinuous space based on the ‘naive physics’ notions of substantiality and continuity. The substantiality constraint states that solid objects cannot pass through one another. The continuity constraint states that if an object first appears in one location and later appears at a different location then it must have moved along a continuous path between those two locations. In other words, objects do not disappear and then later re-appear elsewhere. Substantiality and continuity are major points of decisions; they determine the strategy of movement, and in a virtual built world they do not necessarily need to be applied. In short, one must have at least a simple theory of perception and action. Grasping

17-Jun-09

an object employs a representation of the procedure of movement towards that object overcoming obstacles in its way. When comparing the task of turning around a building in programs like Tomb Raider and 3DS MAX, several distinctions arise. Tomb Raider has an agent moving forward, backward, right, and left, and lacks any further elements of interaction as the agent moves in continuous three-dimensional space. In 3DS MAX, on the other hand, one can rotate the object as well as rotate the array. The ability to go from agent-action to directed-action is critical if one wants to be able to augment reality. As 3DS MAX proves, it is more effective to rotate the array in order to see the rear of a building than to walk around the building. It is also more efficient in an architectural office to prepare a presentation of a building or an entire environment in this manner. A program that can do those things will improve the efficiency of the office in examining the proposed design as well as in the architectural presentations to the client.

Tomb Raider

3DS MAX

What is represented

How one inputs things

What is represented

How one inputs things

Initial condition

Identify movement – Pointing

Initial condition

Identify axis – Pointing and labeling

Turn towards the path

Using Right/Left coordinates

Move from existing location to new location

Use Front/Back/ Left/Right coordinates

Array rotation

Use Left/Right coordinates

Turn towards the building

Using Right/Left coordinates

Exit

Exit at new/old location?

Table 4.2 Difference between movements for the task of “moving to the back of the object”

4.4 Scenarios of interaction For visual navigation one has two modes of thinking.

Bounded Agent: directing an agent to move. For example: go forward, turn left, etc (see Chapter 5 for equivalent linguistic device). Directed Agent: moving the agent in relation to a cursor on the screen. For example: go towards an object (see Chapter 5 for equivalent linguistic device). For navigation one has two instances of encounter: ‘Opaque screen analogy’ (up and down movement of screen pointer) and ‘Transparent screen analogy” (forward and backward movement of screen pointer). (See Table 4.4)

Opaque screen analogy (up and down movement of screen pointer)

Transparent screen analogy (forward and backward movement of screen pointer)

Bounded Agent

Animated being, Early video games (PacMan)

Augmented agent (Tomb Raider®)

Directed Agent:

Animated scene (Myst®)

Augmented environment (Cosmo Player® 3DS MAX®)

Table 4.3 Analysis of current systems

The four possibilities represent the unique situation that one encounters to the navigational strategy. Animated scene – is pointing at the desired location in order to move. It is the basic

attention to an object i.e. zoom in relation to the center of projection. Animated Agent – is moving of the agent, correspondence of hand movement with

agent image; forward (hand) – up (screen), backward (hand) – down (screen). It has a topological, side view, and axonometric view in relation to the observer. Augmented environment – is pointing at the desired location in order to move. It is

the basic attention to an object but with flexibility of the projecting axis of observer/object. Augmented agent – the movement of the agent, correspondence to hand movement;

forward, and backward in immersed environments. That is, the analogical relation of the projecting axis of observer/object.

17-Jun-09

4.5 Adequacy criteria for an agent directed by the user The arrival of animation capabilities at the desktop has provoked interest in the use of known animation techniques for computer-human communication. A common thread to the proposal and inclusion of animation capabilities in user interfaces is a strong intuition that motion and making information objects move should make the interface environment more credible, more “real”, and less cognitively foreign to users. Baecker, (1990) discussed the potential of user interface animation to reveal process and structure (by moving the viewpoint) and introduced the following taxonomy of eight uses of animating function to make the interface more engaging and comprehensible: • Identification associates the symbol with its function (“What is this?”); • Transition carries the user smoothly between states (“Where did I come from and

where have I gone?”); • Choice shows possible actions (“What can I do now?”); • Demonstration illustrates the capabilities of the tool or service (“What can I do

with this?”); • Explanation shows how to employ it (“How can I do this?”); • Feedback provides information on process dynamics and state (“What is

happening?”); • History replays previous actions and effects (“What have I done?”); and • Guidance suggests suitable next steps (“What should I do now?”).

Stasko (1993) adds three design guidelines drawn from the principles of traditional animation: • Appropriateness dictates that the operation or process should be represented

according to the user’s mental model and system entities. • Smoothness is essential since jerky, wildly varying animations are difficult to

follow. • Duration and control vary with the type of animation. Demonstrations of unit

operations such as selection should be short (not more than a few seconds). Animating continuous processes with a clock time correspondence should be kept faithful to the clock time.

According to Sloman (1978), verbs of motion all seem to involve a subset of the following ideas: 1. An agent (may or may not also change position, and may or may not change the position of other objects). 2. There is a route for the motion of each object, with a starting and a finishing location. 3. An agent may use an instrument, possibly to move an object. 4. Moving things have absolute and relative speeds. 5. If A causes B to move, A may be on the side away from which B is moving or on the side to which B is moving. 6. The movement of B may be merely initiated by A (pushing something over the edge of a table) or may be entirely due to A (throwing something, pushing it along). 7. The agent may have a purpose in moving the object. 8. There may be a previous history of movements or locations referred to (e.g. if A retrieves B). 9. There may be more than one stage in the motion (e.g. A fetches B). 10. A may do something to B which tends to produce motion, but the motion may be resisted (e.g. pushing an object which is too heavy, pulling an object with a string which stretches or breaks.) 11. The agent movement may be supported by an object (e.g. in riding it). What kinds of needs does the tool have to satisfy? To satisfy the need to be there and see, one must be able to control movement through a movement instruction level. The structure of object display can be divided as: 1. Plan a course from a given location to a desired one. 2. Shift the direction of their gaze and focus attention on objects as they move across a path. 3. Move around an object keeping track of this change in relation to the surrounding objects. 4. Turn an object in front of the viewer in order to examine it. Concluding remarks

In this chapter we have analyzed the programs to show the flatness of interaction and the performance of movement pertaining to different tasks with different tools.

17-Jun-09

When one examines the current computer programs for architecture one discovers that they are based on the work-desk metaphor and exploration programs that use a pointing metaphor (agent-based). The work-desk metaphor uses the universal coordinate system allowing the user to rotate an array. The system works well for a carefully placed object at the point of origin thus transferring it to an objectcentered model. The exploration programs, on the other hand, use agent-centered models. Yet when one navigates in an environment an important gap opens for a system that is able to choose and manipulate agent/object relationship, and this is an object-centered system as opposed to agent-based navigation. In the next chapter we need to examine the overall relation of vision and language and the methodology by which we are to examine the interaction.

17-Jun-09

CHAPTER 5 CONCEPTUAL SYSTEM OF AGENT-BASED FRAMES & AXIS NAVIGATION Up to now we have examined existing computer programs, and now we must turn our attention to a phenomenological approach, the visual and linguistic encoded information. We will examine the conceptual theory that will allow us a full range of action in directing an agent/avatar and the resulting elements of the path. We will also present the methodology by which we are to examine the interaction.

5.1 Language of the path According to Jackendoff (1992), there is a single level of mental representation, a conceptual structure, at which linguistic, sensory, and motor information is compatible. Word meaning is instantiated in large part in the brain combinatorial organization. The full class of humanly possible concepts (or conceptual structure) is determined by combinatorial principles of the Conceptual Well-Formedness rules. That is, the conceptual well-formedness rules characterize the space of possible conceptual states – the resources available in the brain for forming concepts. The conceptual well-formedness rules are the foundation on which learning is based. Inference rules are those relations among conceptual structures specifying how to pass from one concept to another, including the rules of inference principals of pragmatics and heuristics. For example, what makes the verb ‘approach’ an object different from the verb ‘reach’ an object? If you approach an object, you are going towards it, but you do not necessarily arrive

there. By contrast if you reach an object, you have been going towards it, and you have arrived there. Spatial representation is a format or level of mental representation devoted to encoding the geometric properties of objects in the world and the relationship among them in space. We assume that spatial representation must be translatable into a form of representation specific to the motor system used to initiate and guide behavior. Here we want to equate spatial representation with Marr’s 2 ½ D sketches where objects have the same properties as in the spatial representation and not with 2D topological relationships. The semiotic rules express the spatial relation of an object (figure) in relation to a region in which the other object (the reference object) is located. While the semiotics rules are factual, the syntactic rules are asymmetrical. In our case the asymmetrical description does not apply in cases of relative size. The relative size rule states that objects are considered relative to their size. For example, one can say that the “the agent is next to the house” but not “the house is next to the agent”. The last element is the semiotic well-formedness rule, which abstracts the geometric properties of an object according to linguistic criteria of spatial representation. The semiotic well-formedness rules apply to the notion of ‘affordance’ as defined by Gibson, (1979) and also examine the route to such an object.

Figure 5.1 Spatial Semantic Model according to Jackendoff

The principle of phrase structure is a homomorphic relationship between what is said and what is there. A homomorphism connects every point from system A to system B, without, however, connecting every point of structure B to structure A. An isomorphism is a symmetrical relation; it connects every point from system A to every point of system B and vice-versa (Frey, 1969). Consequently the ‘theory of conceptual structures’ has to be linked by a different set of correspondence rules to the representations for perception and action. In addition, conceptual structures of course have to be linked by a principle set of correspondence rules to the mental representations that serve language: conceptual structures are by hypothesis the form in which the meaning of linguistic expression must be couched internally. Therefore there must be correspondence between the syntax and conceptual

17-Jun-09

structure. For every major phrasal constituent in the syntax of the sentence there must be a correspondence to a conceptual constituent that belongs to one of the major ontological categories. Hence the head of the tree structure is a major phrasal constituent corresponding to a function in conceptual structure. In order to grammatically structure a sentence, a primary distinction is customarily made between lexical categories (or parts of speech, e.g. Noun (N), Verb (V), adjective (A), and preposition (P), and Sentence (S)). The cases that describe spatial location and motion are in the form NP VP PP. Within this restricted class the correspondence of syntax and semantics is transparent; the PP refers to a place or path, the subject NP refers to a thing, and the sentence as a whole refers to a situation or event in which an object and agent are located in the virtual environment. According to Herskovits (1998), the lexicon of English spatial prepositions has a limited number of relations. The full list can be seen in figure 5.2.

Primarily Location at/on/in upon against inside/outside within/without near/(far from) next beside by between beyond opposite amid among throughout above/below under/over beneath underneath alongside on top/bottom of

Primarily Motion across along to/from around away from toward up/down to up/down into/out of onto/off out through via about ahead of past

on the top/bottom of behind in front/back of left/right of at/on/to the left/ right front/back of at/on/to the left/right side north/east/west/south of to the east/north/south/west of on the east/north/south/west side of Figure 5.2 The English spatial prepositions, taken from Herskovits 1998

5.2 Elements of the path On the phenomenal level the event that one is looking at is an event depicted as an action sequential stream; one is in a state and commands to changes to a new state. What we are interested in is the relationship between the agent’s previous position and subsequent position relative to either the object or the agent. That is, what directions are equivalent to the relationship of the subject’s transformed position? The task of formulating the necessary and sufficient truth conditions to describe an event is immensely difficult in vision as well as in linguistic commands. Many have argued that it is in principle impossible to formulate definitions that clearly delineate the subject of occurrences of an event. The problem arises in part because of the fuzzy nature of event classes. For any event there will be subjects that are clear instances of that event type, those that clearly are not instances, and those whose membership in that class of events is unclear. For examples, see (Levinson, 2003), (Levelt, 1996), (Tversky, 1998). Visual and linguistic descriptions have the ability to convey information about the path through explicit and implicit knowledge; for example “go left” is a description, where the start point is implicit. The path can also have the end point suspended like “go towards the house” or “go into the house”; it is the equivalent of pointing visually. The converse - path can have an arrival point like “go to the left of the house”. Lastly the transverse - path can have an explicit start and an end point, giving us the ability to determine the path relation to an object. These four types of path are represented below (see Figures 5.3-5.6). The verb action used in this analysis is “Go”, “See” and “Turn”.

17-Jun-09

Figure 5.3 Bounded agent

Figure 5.4 Directed agent

Figure 5.5 Converse Path

Figure 5.6 Transverse Path

Bounded agent The agent can move in any direction desired (six degrees of freedom). It is operated by the use of correspondence of the movement of screen actions to observer purpose by combining the agent reference systems of input to output. This is an agent-entered reference system. To move the agent, use the command GO. Utilizes the commands: go → forward, backwards, left, right, up, and down. The agent can also turn (Look) sideways. Utilizes the commands: Turn to → the left/right.. Directed agent identifies the new position of an agent by directing it to an object; it uses the object-centered reference system. The go towards command differentiates between movement and end goal. The ‘go towards an object’ directed command has no ability to discern between different regions of space relative to the object reference system. Converse path: defines a spatial path of the observer in relation to an object, object-centered. Two-Point Relations; this includes the evaluation of topological, angular and distance-dependent relations, which share the common characteristics that they – in their most basic form – relate two objects to each other. Uses the commands: in front of, on the left/right side, behind – preference to agent role: architect, tourist etc. It is operated by the use of identification of object and the new position relative to it. The go to command differentiates between movement and end goal. ‘Go to an object’ has the ability to discern between different regions of space relative to the object reference system.

Transverse path: defines the relation of objects in relation to the path of the observer. N-Point Relations: Relations that cannot be reduced to a two-point problem, such as path relations [through/along/across]. The user operates it by the identification of a target (object) and the object axis along which movement is to

be performed. Go along has the ability to discern between different path movements relative to the object. Utilizes the commands: Go along → (path) and Go around → (an object) is also part of those commands

The elements presented here are the full range of interaction in an immersed environment ranked from low to high knowledge function. The agent movement already exists in other visual systems that have been reviewed so far; the converse and transverse path still await implementation. The new features will be examined as part of the enlarged system.

5.3 Method of examination The method of examination is usually a compromise of task analysis and information analysis (Sutcliff, 1997). The process starts with a requirement analysis (see Figure 5.7). The first part of the method concentrates on user context analysis eliciting information to classify information requirements in their task context. The task requires interviews with trained users who have knowledge about the process. The information/knowledge is used in a task walkthrough with the architect and client. This method will be demonstrated in Chapter 10.1. Information analysis builds on the task model, which in our case is the possibility of action of any given spatial preposition task. In the case of information analysis, the question that one asks at every stage of the process is: • What input information is required for this action? • What output information is produced by this action? • Is any other information required to help the user complete this action?

Figure 5.7 Method diagrammatic overview

The model is then analyzed in terms of the specific demands of the user and categories of description. The information analysis of the user when performing a task depends on information categories and information declaration. The

17-Jun-09

information categories are examined in terms of verb action, and information declarations are examined in terms of syntax and semiotics. To investigate the usability of the system one asks the following questions: • Were there any tasks that the system was unable to represent? • Were there any relations between intention and commands that the system was

unable to represent? • Is there any evidence that the use of the system would have saved the actors any

effort? • Would the use of the system have created substantial new work for the actors? Concluding remarks

In this chapter we have presented the theoretical basis for conducting the research, and explained the relationship between the visual and the linguistic aspects of communication. We have also shown that the method used is the analysis of information of the user demands. In the next chapter we will examine the visual representation of movement through the historical review of Piaget, (1948) and then examine the cognitive visual location analysis.

17-Jun-09

CHAPTER 6 VISUAL REPRESENTATION IMPLEMENTATION There are three systems of movement representation: the agent-centered model, the object-centered model, and the environment-centered model, which allow movement to be represented and verbally communicated in the virtual environments. The object-centered model is of importance to architecture since it relates object/landmark to the user, yet there is no exploration program that mimics those features. This chapter will examine a way of representing movement and performing spatial reasoning with an object-centered model, based on panoramic views, as opposed to topological maps. We will examine the visual representation of movement through the historical review of Piaget and then examine the way the cognitive visual navigational translates to topological vector analysis.

6.1 Egocentric and allocentric systems Let us start with the very simple premise that people interact with objects and places in real or virtual environments. The way people represent objects and arrange objects, locations and paths is a mental construction, in the built â&#x20AC;&#x2DC;realityâ&#x20AC;&#x2122; and in the computer interface. Visual navigation represents one possible way to move about in space. Visual navigation is a sequence of arrangements of objects in a location and the paths of object action. The representation of manipulation of objects is a choice of objects transferred through attention and directed by gesture.

The term “spatial frames of reference” has been used by researchers in several different but related areas of endeavor, for example in perception, cognition, neuroscience, linguistics, and information science. Across these various areas, a consensus emerges (Jackendoff, 1983) (Levinson, 1996) (Campbell, 1994). Fundamentally, a spatial frame of reference is a conceptual basis for determining spatial relations. This description is applicable across situations in which person-to-object and object-to-object spatial relations are represented or described. When Piaget and Inhelder first published their book “The Child’s Conception of Space” in 1948, Piaget was striving to understand the development of spatial reasoning. They reasoned as follows: “As the straight line leads to an understanding of projection, and the three dimensions of projective space lead to the idea of a bundle of lines intersected by a plane, so both these fundamental operations of projection and section become familiar enough to enable the child to give the kind of explanation seen in the examples quoted. But the concept of the straight line itself, together with the various relationships resulting from its synthesis with the original topological relations, ultimately presumes the discovery of the part played by points of view, that is, their combined co-ordination and differentiation. How is this discovery to be accounted for? To ascribe the origin and development of projective geometry to the influence of visual perception … is to overlook the fact that the purely perceptual point of view is always completely egocentric. This means that it is both unaware of itself and incomplete, distorting reality to the extent that it remains so. As against this, to discover one's own viewpoint is to relate it to other viewpoints, to distinguish it from and co-ordinate it with them. Now perception is quite unsuited to this task, for to become conscious of one's own viewpoint is really to liberate oneself from it. To do this requires a system of true mental operations, that is, operations which are reversible and capable of being linked together” (Piaget, 1960). This division between egocentric and allocentric still remains with us and divides the location of structure, within which the position of the objects and events are specified into frames. In the allocentric view (i.e. Many-to-Many relationship), all objects are related to all objects. The points which represent object location in Cartesian space relate to X and Y co-ordinates and to other objects in that space. By contrast, in the egocentric view (i.e. One-to-Many relationship), all objects relate to a single object. The allocentric view is a mathematical construction; when spatial reasoning is introduced into the allocentric construction one is dealing with representation, visual and verbal description. Traditionally, on the visual side, ‘plan’ and ‘axonometric’ are associated with an allocentric view, while ‘perspective’ is

17-Jun-09

associated with an egocentric view. The plan, or schema, is an analytical section, a reduction of space to 2D. It is capable of transmitting accurate distances relative to the egocentric view where distance viewed is judgmental. (Piaget, 1960) experimented with the limits of children's abilities to transform spatial information. Piaget and Inhelder attempted to discover the age by which children can switch from an egocentric to an allocentric frame. That is, the egocentric frame was considered to be innate while the allocentric was considered to be acquired. They presented “perspective” problems in which children were shown a model of three colored mountains and were asked to indicate how it would look to an observer who viewed it from a different position. Until 9-10 years of age, children tended to make what were thought to be egocentric errors when shown a representation of the array, which depicted a variety of elevations. According to Huttenlocher, Piaget had shown that viewer-rotation problems are difficult when children must choose among pictures or models of an array from differing perspectives. According to Huttenlocher (1979), parallel tasks, array-rotation problems are much easier to solve than viewer-rotation problems. Huttenlocher proposed that in solving these problems, subjects interpret the instructions literally, recoding the position of the viewer vis-à-vis the array for viewer-rotation problems and recoding the array with respect to its spatial framework for array rotation problems. The results show that the viewer is fixed vis-à-vis the spatial context rather than that the viewer being fixed vis-à-vis the array. Campbell (1993) adds to some distinctions between ways of thinking that involve an explicit or implicit dependence upon an observer and those that have no such dependence. Campbell’s suggestion is that the resultant system is egocentric only if this significance can be given solely by a reference to the subject’s own capacities for perception and action, in what he calls causally indexical terms. The causal significance – the judgment made about objects standing in various spatial relations – is essentially given in terms of its consequences for the subject’s perception or action: casual indexical. It will be allocentric if, and only if, this significance can be given without appeal to the subject’s perceptual and active abilities, causally non-indexical, in terms that give no single object or person a privileged position, which treats all the world’s objects (of a given kind) as on a par with respect to their physical interaction.

6.2 Panoramic communicative view Spatial reasoning is the engagement in representation of motion with geometric properties of points in space. The most elementary properties are prepositions, which specify the relation between objects in space; I call such prepositions spatial prepositions. Examples of spatial prepositions are in – city(x) or next to – house(x). Typically, the set of points where a predicate is true form a single compact region of space, and spatial reasoning amounts to detecting intersection relations among combinations of regions, called environments. Spatial reasoning in our case is the representation of this reasoning in order to communicate. To understand the relationship between an observer and the relatum one must understand the communication relationship. It is a representation which makes only as many distinctions as necessary and sufficient conditions for the communication of spatial relations of object to the observer. The representation for ordering information that restricts the location of a point with respect to some reference points is given by the panorama. It is defined as a continuous image of front and sideways view while on the move from which features are extracted. The examination of how one describes what one sees while moving is an expression of the speaker’s underlying cognitive states. In order to achieve spatial cognition, i.e. the ability to represent the environment and act upon this representation to form a decision, where to go and what to see, one has to be able to distinguish between the object in a scene and its background. The English spatial predicate marks a location, an operation that designates it as one to be remembered. The spatial predicate also marks the referent and relatum position in space and arranges its parts to be accessed. The English spatial predicate takes the form of a predicate, a referent and a relatum: (1) Referent – the object; (2) Relatum – the reference object in the background; (3) Predicate – the spatial relationship between the referent and the relatum. This distinction was first defined by Filmor’s (1968) “Case Grammar”, and later by (Talmy, 1983). The spatial predicate marks the referent and relatum position in space and arranges its parts to be accessed. Orientation information locates a point object in any position of the semi-straight line from the origin of the Cartesian co-ordinates with a given angle. Orientation information can be given by polar co-ordinates: the orientation is given by a vector – an angle and the exact position in the straight line of orientation by a distance, both measured from the origin of the Cartesian coordinates. Three spatial point objects are involved in the definition of orientation

17-Jun-09

relationships by orientation model, i.e. ‘a’ and ‘b’, which define the reference system, and ‘c’, the object whose orientation is provided with respect to the reference system. In investigating the hippocampus (the area of the brain thought to contain the encoding of spatial relationships), O'Keefe (1990, 1991) proposed the slope-centroid model as the way in which animals successfully navigate. This model represents the basic relations of frames by always having a reference, which is outside the simple Euclidean metric relation of trajectory vector between the current location and the desired location. The model contains two stages in an animal’s construction of a map of its environment. In the first stage, the animal identifies a notional point in its environment, the centroid, which is a notional point in the sense in which the South Pole or the Equator are notional: there may be no distinctive physical feature at that place. It is a fixed point, in that it does not move with the animal. In the second stage, the animal also identifies a gradient for its environment, a way of giving compass directions. This is the slope of the environment; it functions like the direction east-west. The direction is fixed no matter how one moves around, and one can partially define which way one is going by saying what angle one is making with it. As in almost all models of mapping, we take it that the animal is constructing a two-dimensional map of its environment; the third-dimension is not mapped. Once the animal has identified the two stages, it can construct a map of its environment by recording the vector from the centroid to each of its targets, using the slope to define direction. Assuming that the animal has done this and now wants to know how to get to a particular target, what it must do is to find the vector from itself to the centroid. Once the animal has the vector from itself to the centroid and the vector from the centroid to the target, it can find the vector from itself directly to the target. According to O’Keefe (1990), at any point in an environment, an animal’s location and direction are given by a vector to the centroid whose length is the distance to the centroid and whose angle is the deviation from the slope (360-γ ), as in Figure 6.1. Other places (A and B) are similarly represented. This dichotomy has its roots in egocentric and allocentric frames of reference and subsequent attempts by O’Keefe (1993) to define the possibility of navigation without allocentric thinking. For people, “This is done by enhancing landmarks which permit the use of object reference system” (Hazen, 1980 p.14).

Figure 6.1 Use of the movement translation vector (T). (Taken from O'Keefe 1990; 1991)

The co-ordinate system, centered on the viewer, seems to be based generally on the planes through the human body, giving us an up/down, back/front and left/right set of half lines. Such a system of co-ordinates can be thought of as centered on the main axis of the body and anchored by one of the body parts. Although the position of the body of the viewer may be one criterion for anchoring the co-ordinates, the direction of gaze may be another, and there is no doubt that relative systems are closely hooked into visual criteria. An axis is a locus with respect to which spatial position is defined. Landau and Jackendoff distinguish three axes. Three types of axes are required to account for linguistic terms describing aspects of an object’s orientation. According to Jackendoff, “The generating axis is the object's principal axis as described by Marr (1982). In the case of a human, this axis is vertical. The orienting axes are secondary and orthogonal to the generating axis and to each other (e.g., corresponding to the front/back and side/side axes). The directed axes differentiate between the two ends of each axis, marking top vs. bottom or front vs. back.” (Landau, 1993)

Figure 6.2 Three axes - object to parts construction

In the TOUR model (Kuipers, B., 1978), the simulated robot performs two types of actions: TURN and GO - TO. The purpose of the procedural behavior is to represent a description of sensorimotor experience sufficient to allow the traveler to follow a previously experienced route despite incomplete information sensed. It is stored as sensorimotor schema of the form <goal, situation, action, result>. The “you are here” pointer describes the current position of the robot by determining its place and orientation. The topological map is constructed when there are enough

17-Jun-09

sensorimotor schemes. The topological map consists of a topological network of places (points), paths (curves), regions (areas), and topological relationships among them (connectivity order and containment). A place consists of an orientation reference frame, a set of paths intersecting at the place together with the angles of the paths relative to the orientation reference frame, and the distances and directions of other places, which are visible from this place. A path consists of a partial ordering of places on the path, and regions bounded by the path on the left and the right. The orientation reference frame is described in terms of its orientation relative to other frames. A district consists of edges and paths. According to Escrig (1998), there are four different types of inference rules defined to manipulate knowledge embedded in this representation: (1) rules which compare the â&#x20AC;&#x153;you are hereâ&#x20AC;? pointer with the topological description of the environment; (2) rules for maintaining the current orientation with respect to the current coordinate frame; (3) rules which detect special structural features; and (4) rules which solve route-finding and relative-position problems. The approach to pointing is to define a path from a to b with the position of the observer c; thus, in the panoramic model, one can point to an object and locate a new perspective relative to it (see Figure 6.3). The basic knowledge is represented in Freksa and Zimmermannâ&#x20AC;&#x2122;s approach, which is the orientation of an object, c, with respect to the reference system defined by two points, a and b, that is, c with respect to ab. The vector from a to b and the perpendicular line by b define the coarse reference system (Figure 6.3 a), which divides the space into nine qualitative regions (straight-front (sf), right-front (rf), right (r), right-back-coarse (rbc), straight-back-coarse (sbc), left-back-course (lbc), left (1), left-front (lf), identical-front (idf)). The vector from a to b and the two perpendicular lines by a and b define the fine reference system (Figure 4.3 b) which divides the space into 15 qualitative regions (straight-front (sf), right-front (rf), right (r), right-middle (rm), identical-back-right (ibr), back-right (br), straight-back (sb), identical-back (ib), straight middle (sm), identical-front (idf), left-front (lf), left (1), left-middle (lm), identical-back-left (ibl), and back-left (bl)).

Figure 6.3 a) The coarse reference system and b) the fine reference system

Given the original relationship c with respect to ab, five more relationships can be directly obtained [Freksa and Zimmermann] by permutation of the three objects a,

b and c. The number of permutations of three elements is 3! = 3*2 = 6, which are the following five operations plus the original relationship: •

c with respect to ab is the original relationship.

•

c with respect to ba is defined as the inverse operation. It answers the

question: "What would the spatial orientation of c be if I were to walk back from b to a?" •

a with respect to bc is the homing operation. It answers the question:

"Where is start point a if I continue my way by walking from b to c?" •

a with respect to cb is the homing-inverse operation.

•

b with respect to ac is the shortcut operation. It allows us to specify the

relative position of objects that are not on our path but to the side of it. •

b with respect to ca is the shortcut – inverse operation.

These observer instructions are examined through basic left, right, below, and above, aimed at the target of acquisition in a three-dimensional space. A table with iconic representations is shown in Figure 6.4. One should note that the shortcut is the only relationship that refers to the object; we will come back to this point as we progress in our analysis.

Figure 6.4 Iconic representation of the relationship c with respect to ab and the result of applying the five operations to the original relationship (adapted from [Freksa and Zimmermann]).

In our case we can redefine the following: Visual accessibility: What can be seen? Which point allows the most unhindered visible range for the agent? Accessibility, or what Gibson (1979) calls affordance, relies on information the various senses gather about an object. Object context: How does an object look from a different angle? (Go to the front/left side of the building) Topological relations: What is the relation to other objects not necessarily seen? (am I left of the church?) •

a with respect to bc is Visual accessibility

•

b with respect to ac is Topological relation

•

c with respect to ab is Object context

17-Jun-09

6.3 Possible interaction with objects and routes The aim of the tool is: to enhance for the user the interaction of virtual environments, and to make the navigation a natural directed experience. Thus, the proposed system can enhance interaction with the user utilizing an object-centered frame of reference. Through pointing at an object with visual feedback, one can indicate one’s desire to be relative to an object by employing the topological map to convey information through the cursor keys. This is the way computer games allow us to play football; we indicate to which player we want to pass the ball and the direction in which the ball should go relative to the indicated player. If you replace the ball with a camera and the player with a building, you would have a system that functions as object-centered with visual feedback. Reaching for a nearby object requires pre-existing knowledge, knowledge of what properties define and delimit objects. According to Berthoz, movement and the representation of movement, the ‘body’ and ‘thought’, go hand in hand. There is no apparent command structure of ‘thought’ then ‘body’; on the contrary, the body already anticipates our next action. When navigating in built environments, one employs a different strategy of grasping, that of approach and position (look) and that of reach (interaction.) According to Merleau-Ponty (1962), “Concrete movement is centripetal whereas abstract movement is centrifugal.” Or in the words ofBerthoz, “The brain is capable of recognizing movements of body segments, selecting and modulating information supplied by the receptors at the source. But proprioceptive receptors can themselves only detect relative movement of body masses. They are inadequate for complex movement-locomotion, running, jumping-where the brain must recognize absolute movements of head and body in space.” (Berthoz 2000 p.32) Thus, the higher functions of movement have a conceptual linguistic component in them that existing direct manipulation cannot provide. The grasping frames relate to objects in two ways: Manipulation mode an intention that conveys an impulse toward the object, as in the case of object use. Observational mode - an intention conducted away from an object, as in the case of object observation. To design the architectural object, the architect works mostly with canonical views such as sections, plans, and elevations. The perspectives that architects use enhance the design by allowing an observational comparison of at least two sides of an object. Thus the canonical extension of axes that are most commonly used are divided into eight qualitative regions: straight-front, right-front, right, right-back,

straight-back, left-back, left, and left-front of the reference system. The canonical architectural reference system representation (see Figure 6.8) avoids the granularity of choice among users by setting a constraint that states that all relative distances to the agent-object shall remain constant when the viewpoint changes, unless contradicting the substantiality constraint that states that solid objects cannot pass one another. Figure 6.8 The architectural reference systems, incorporating the visible area from a generating location (or convergence location of the optic rays)

In the city there is the added distinction between observation and movement. The city form dictates a physical limitation from where one can view the city. For simplification, interaction in the built environment is divided into sixteen canonical points of view to approach an object (see Figure 6.6). The eight points of view (of Cases 2 and 3) take on the nature of urban bound movement interaction with the agent approaching the side of an object, from a road, where only one side of an object is visible. This view is often ignored by architects when designing, and its effect on the architecture as Venturi (1977) shows is critical to the modern architect. As yet it has not become part of the architect’s canonical language.

Case 1

Case 2

Case 3

Case 4

Figure 6.6 Sixteen different cases depending on the relative orientation of the point “a” with respect to the extended object “b” which are grouped into four cases due to symmetries. Taken from (Escrig, 1998).

The approach to pointing is to define a path from a to b with the position of the observer c; thus, in the panoramic model, one can point to an object and locate a new perspective relative to it (see Figure 6.3). The notion of reference system can be viewed as a conceptual neighborhood with topological and linear relations (see Figure 6.5). Thus one can walk around to the back of the building or transport to the new position in the back of the building. There are two possibilities of manipulating an object. The first is (Figure 6.5 a) a topological conceptual system; the other (Figure 6.5 b) is a linear route. What is needed is a system that allows the user to choose between the different rotation of object and array system according to the adequacy criteria.

17-Jun-09

Figure 6.5 Topological and linear view of the conceptual neighborhood (taken from Escrig, 1998)

Concluding remarks

This chapter examined a way of representing movement and performing spatial reasoning with an object-centered model, based on panoramic views. We have examined the visual representation of movement through the historical review of Piaget and examined the way the cognitive visual navigational translates to topological vector analysis. In this chapter we have presented a visual system for object-centered interaction. The system works well as a module but lacks any competing attention devices. Thus the user cannot switch between different frames of reference. We now turn our thoughts to the linguistic model, which has an attention mechanism built into the frames.

17-Jun-09

CHAPTER 7 LANGUAGE BASED REPRESENTATION IMPLEMENTATION In this chapter we present the way in which language performs the task of object manipulation through a command/description. We examine the mechanism of attention and the divisions that language creates. The use of frames of reference will be introduced, and a case of a basic linguistic navigational program will be brought in to demonstrate some of the conceptual difficulties encountered in the early development of such programs.

7.1 Directing the path Language has a different set of frames than the visual axis; in fact, language has more axes of attention. A frame of reference is a set of axes with respect to which spatial position is defined. To draw attention to an object, one must correlate between the sharing of attention and the desired location. To represent the relationship one has to refer to visual and linguistic aids for pointing and labeling. Confirm attention is an engagement to form attention. When interacting, pointing is the token action to identify joint engagement. When interacting, labeling is the matching of the object with the signified. Labeling is imperative pointing, a declarative pointing using a referential language. When interacting, directing is an explicit instruction of how to proceed from one place to another via a route. Directing involves an explicit declarative set of instructions, and in our case the use of the English language. In English, a prepositional phrase is a function word that typically combines with a noun to form a phrase that relates one object to another. According to Jackendoff (1987), the structure of spatial prepositional phrases in

English consists of two notions: the first is place, and the other is path. This is a reference as projection in the sense of “conceptual structure” of mental information – including both “prepositional” and “diagrammatic” information (Johnson-Laird 1999). Prepositional phrases in English explicitly mention a reference object as the object of the preposition, as in “on the table”, “under the counter”, or “in the can”. The path is an ordered sequence of places and the translation vectors between them. Paths can be identified by their end places or by a distinct name. On the other hand, places along the path can be identified and associated with the path. A path may be marked by continuous features such as a trail or road, but need not be. The internal structure of a path often consists of a path-function and a reference object. The PATH often consists of path-function and a reference object, as expressed by phrases like ‘towards the house’, ‘around the house’, and ‘to the house’. Alternatively the path function may be a reference place” (Jackendoff, 1983). This possibility is in phrases like “from under the table” where “from” expresses the path-function and “under the table” expresses the reference place. Prepositions such as “into” and “onto” express both a path-function and the placefunction of the reference place. For example, “The man ran into the shelter”. Many prepositions in English, such as “over”, “under”, “on”, “in”, “above”, and “between”, are ambiguous between a pure place-function and path-function, for example, “the man is under the shelter”, and “the man ran under the shelter”. One of the ways to view an architectural building is to travel along a route. A route is a sequence of procedures between two nodes requiring decision-making. The path is then the course of action or conduct taken between two nodes. One can express this conceptual possibility formally in terms of a phrase-structure-like rule for the functional composition of a conceptual structure. [Place X] –> [Place PLACE-FUNCTION ([Thing Y]) [PLACE] projects into a point or region, but within the structure of an event or state, a [PLACE] is normally occupied by a [THING.] The internal structure of [PATH] often consists of path-function and a reference object, as expressed by phrases like “towards the mountain”, “around the tree”, and “to the floor”. Alternatively, the argument of path-function may be a reference place. This possibility is in phrases like “from under the table”, where “from” expresses the path-function and “under the table” expresses the reference place. Prepositions such as “into” and “onto” express both a path-function and the placefunction of the reference place.

17-Jun-09

The mouse ran from under the table. [Path FROM ([Place UNDER ([thing TABLE])])] The mouse ran into the room. [Path TO ([Place IN ([thing ROOM])])] Language also makes use of different frames of reference for spatial description; they are used to identify places in directing our actions, in deciding where to move. The vocabulary used in the description of spaces in such situations is what Jackendoff (1993) calls “directions”: Vertical: over, above, under, below, beneath. Horizontal: side to side, beside, by, alongside, next to. Front to back: in front of, ahead of, in the back of, behind, beyond.

“Two factors affected task difficulty. The first factor was whether the problem was described as a rotation of the array or of the viewer. The second was the type of question. The effect of these two factors interacted: with appearance questions, array-rotation tasks were easy and viewer-rotation tasks were difficult; with item questions, viewer-rotation tasks were easy and array-rotation tasks were difficult.” (Huttenlocher 1979) The two principles are involved in how people treat these problems. First, arrays are coded item by item in relation to an outside framework. Second, transformation instructions are interpreted literally as involving movement of the viewer or array. For array rotation this entails recoding the array with respect to the framework; for viewer rotation it entails recoding the viewer’s position with respect to both the array and its framework. The extension of axis, such as above, below, next to, in front of, behind, alongside, left of, and right of, is used to pick out a region determined by extending the reference object's axes out into the surrounding space. For instance, in front of X denotes a region of space in proximity to the projection of X’s front-back axis beyond the boundary of X in the frontward direction (Johnson-Laird, 1983); Landau and Jackendoff 1993). By contrast, inside X makes reference only to the region subtended by X, not to any of its axes; near X denotes a region in proximity to X in any direction. Note that “many of the ‘axial prepositions’ are morphologically related to nouns that denote axial parts.” (Jackendoff, 1999) For example, “Go to the front of building” [event GO [thing AGENT] [preposition FRONT [thing BUILDING]]]

At this point one notices the disparity between the representing and acting, the difference between ‘bounded agent’ and ‘directed agent’, which are low level commands with two additional high level commands. The first is ‘converse path’, which is parallel to sight, where one indicates the target, and can plot a course to the target. The second high function is ‘transverse path’, which is perpendicular to sight, where one also needs to define the start and end point of the path, shown in prepositions like ‘through’, ‘around’ and ‘along’. The route-bound agent is transverse where the agent turns in relation to a building, while the object-bound agent is converse when it refers to an independent path in relation to the architectural object. Thus the conceptual linguistic model employs two basic operations in the modeling of an agent’s movement, where there are Three-Point interaction and N-Point interaction (see Figure 7.1). Converse

Transverse

Figure 7.1 Symbol inequalities for a) the parallel and b) the perpendicular lines to the South-North straight line. Taken form Escrig (1998).

7.2 How views change; axes and frames of reference With the introduction of an agent into the environment, a situation is created where the observer frame may be projected onto the object from a real or hypothetical observer. “This frame establishes the front of the object as the side facing the observer. We might call this the ‘orientation mirroring observer frame’. Alternatively, the front of the object is the side facing the same way as the observer's front. We might call this the ‘orientation-preserving observer frame’”(Jackendoff, 1999 p. 17). According to Levinson, “To describe where something (let us dub it the ‘figure’) is with respect to something else (let us call it the ‘ground’) we need some way of specifying angles on the horizontal. In English we achieve this either by utilizing features or axes of the ground or by utilizing angles derived from the viewer’s body coordinates… The notion ‘frame of reference’ … can be thought of as labeling distinct kinds of coordinate systems)” (Levinson, 1996 p. 110). Linguistic literature usually invokes three frames of reference: an intrinsic or object-centered frame, a deictic or observer-centered frame, and an absolute frame (see Figure 7.2).

17-Jun-09

The frames of reference presuppose a ‘view-point’, and a figure and ground distinct from it, thus offering a triangulation of three points and utilizing coordinates fixed on the viewer to assign directions to a desired location. Intrinsic frames of reference – the position defining loci are external to the person in question. This involves taking the inherent object-centered reference system to guide our attention, and uses an allocentric frame (Jackendoff, 1999). Relative frames of reference (deictic) – those that define a spatial position in relation to loci of the body or agent-centered. The relative frame of reference is used to identify objects’ direction; this involves imposing our egocentric frame on objects (Jackendoff, 1999). Absolute frames of reference – defining the position in absolute terms, such as North, South or Polar co-ordination. Absolute frames are environment-centered and use either Cartesian or Polar co-ordinates (Jackendoff, 1999).

Figure 7.2 Three linguistic frames of reference

Let us consider an example, “the gate is in front of the house”. For a manufactured artifact, the way we access or interface with the object determines its front, anchored to an already-made system of opposites: front/back, sides, and so forth. This would also be the case with any centralized symmetrical building but not for a cylindrical building. In fact, the situation is more complex. The sentence “the gate is to the left of the house” can sometimes employ a relative frame of reference that depends on knowledge of the viewer location. This entails that the gate is between the viewer and the house, because the primary co-ordinates on the viewer have been rotated in the mapping onto the ground object, so that the ground object has a “left” before which the gate is situated. “Viewing a frame of reference as a way of determining the axes of an object, it is possible to distinguish at least eight different available frames of reference” (for further details see Jackendoff 1996 p. 15; many of these appear as special cases in Miller and Johnson-Laird 1976, which, in turn, cites Bierwisch 1967, Teller 1969, and Fillmore 1971, among others). Despite extensive interest in the role of frames of reference in spatial representation, there is little consensus regarding the cognitive effort associated with various reference systems and the cognitive costs (if any) involved in

switching from one frame of reference to another. An experiment was conducted by Allen (2001) with regard to these issues, in which accuracy and response latency data were collected in a task in which observers verified the direction of turns made by a model car in a mock city in terms of four different spatial frames of reference: fixed-observer (relative-egocentric), fixed-environmental object (intrinsic-fixed), mobile object (intrinsic-mobile), and cardinal directions (absolute-global). The results showed that frames of reference could be differentiated on the basis of response accuracy and latency. In addition, no cognitive costs were observed in terms of accuracy or latency when the frames of reference switched between fixed-observer versus global frames of reference or between mobile object and fixed environmental object frames of reference. Instead, a distinct performance advantage was observed when frames of reference were changed (Allen, 2001). When comparing the frames of reference, a few conclusions can be drawn: 1. Frames of reference cannot freely “translate” into one another. 2. There is common ground between visual axes and linguistic frames of reference that allows them to converge on an object, and allows one to talk about what one sees. 3. Language is most adaptive to directing, and therefore other modalities should follow.

7.3 Flat-shooters – Two-dimensional – Amazon’s Sonja Amazon (Chapman, 1991) is a computer software program that recognizes the simple commands of a user. We introduce the case of basic linguistic navigational program to demonstrate some of the conceptual difficulties encountered in the early development of such programs. (see Appendix I). As a player views the environment his vision is limited, and so is the built environment that is exposed. The program/game is designed to search the represented space and recognize objects viewed by the user. In order to recognize the occurrence of events in the world, we need some way of representing the transitive properties of occurrences of those events. The command can be said to describe the relationship between objects and agent. Similar attempts include those of Schank (1973), Jackendoff (1983, 1990), and (Pinker, 1995). These prior efforts attempted to ground spatial expressions in perceptual input. However, they did not offer a procedure for

17-Jun-09

determining whether a particular perceived event meets the transitive properties specified by a particular spatial expression. The alignment between the agent’s view of a place with analysis of control through English commands, is exemplified by the system of commands in Sonja (Chapman, 1991). Sonja uses English instructions in the course of visually guided activity to play a first generation video action game, and specifically one called Amazon. The game features an agent – an Amazon-warrior – whose goal is to find an amulet and kill a ghost in two-dimensional space. According to Chapman, the use of advice to the Amazon-warrior requires that that the computer interprets the instructions. Interpretation can in general require unbounded types and amounts of work. Sonja’s interpretation is largely perceptual; it understands instructions by relating them to the current Amazon-playing situation. When Sonja is given an instruction, it registers the entities the instruction refers to and uses the instruction to choose between courses of action that themselves make sense in the current situation. An instruction can fail to make sense if it refers to entities that are not present in the situation in which it is given, or if the activity it recommends is implausible in its own right. Some instructions have variant forms. Pick-up-the-goody and use-a-potion are chained together when the instruction ‘Get the potion and set it off’ is given. Instruction

Instruction buffer(s) set Field

Get the monster/ghost/demon

kill-the-monster

Don't bother with that guy

don’t-kill-the-monster

Head down those stairs

go-down-the-stairwell

Don't go down yet

don’t-go-down-the-stairwell

Get the bones

kill-the-bones

Ignore the bones for now

don’t-kill-the-bones

Get the goody

pick-up-the-goody register-the-goody

Don't pick up the goody

don’t-pick-up-the-goody register-the-goody

Head direction

direction

suggested-go

type

Don't go direction

suggested-not-go

direction

Go around to the left/right

go-around

direction

Go around the top/bottom

go-around

direction

Go on in

go-in

OK, head out now

go-out

Go on in and down the stairs

in-the-room go-down-the-stairwell

Go on in and get the bones

in-the-room kill-the-bones

Go in and get the goody

Get the potion and set it off (chained)

Scroll's ready

scroll-is-ready

On your left! and similar

look-out-relative

rotation

On the left! and similar

look-out-absolute

direction

Use a knife

use-a-knife

Hit it with a knife when it goes light

hit-it-with-a-knife-when-it-goes-light

Use a potion

use-a-potion

No, the other one

no-the-other-one

Table 7.1 Amazon’s natural and formal instructions.

When one examines ‘Sonja’, one is constrained in terms of language and terrain; the two-dimensional space allows for easier reference to action, but the visual and linguistic extended reference systems operate differently (Jackendoff, 1983). When a command is given, Sonja carries out the demand to move, for example ‘go around’ still has the pattern a —> b —> c and utilizes a reference system. When navigating in three-dimensional space, people generally consider up and down, left and right, forward and backward as possible movements of the vantage point. The terrain in Amazon consists of barriers that require decisions to move up or down and inter-locking spaces restraining the use of the ‘channel’ between elements, to left and right, up and down. Head direction

suggested-go

direction

Don't go direction

suggested-not-go

direction

Go around to the left/right

go-around

direction

Go around the top/bottom

go-around

direction

Go on in

go-in

OK, head out now

go-out

Go on in and down the stairs

in-the-room go-down-the-stairwell

17-Jun-09

Table 7.2 Amazon’s Sonja, possible moves in the terrain

Placement of an agent in a scene does not necessarily enhance the feeling of immersion. Sonja’s icon is a two-dimensional attempt (see Figure 7.3) at the transition of frames according to movement in relation to the geographical up – down of the screen icon, in relation to the front – back direction of the user, and the side view of the agent in relation to the left – right direction. The instructions to the agent show some of the difficulties in translation to derive route knowledge. On your left! and similar

look-out-reference – agent

On the left! and similar

look-out-reference – topological map

Figure 7.3 Amazon icons showing Sonja’s avatar

This confusion is avoided in first-person shooters like Tomb Raider, where the agent (e.g. Lara Croft) mostly shows its back, and is an isomorphic correlation between the user and the agent through a common reference system. The following commands constitute what needs to change or be added in the system examined (see Table 7.3): Go around (object) to the left/right Go around (object) under/over Go above/bellow (object) Go through Go back Go left and similar Go along the (object) Go to the left of (object)! and similar Go near (object) Go between (objects) Go to east/west/north/south Go towards

Go in front/back of (object) Look left/right Go ahead of Go past Table 7.3 The most commonly used prepositions

Concluding remarks

We have examined the mechanism of attention and the divisions that language creates, the use of frames of reference. The three frames of reference: relative, intrinsic, and absolute, were introduced and a case of a basic linguistic navigational program was brought in to demonstrate some of the conceptual difficulties encountered in the early development of such programs. In this chapter we have established the basic ways of handling objects as shown in language. We have shown how, with the different frames of reference, one handles objects regardless of prior position. We now need to examine the differences and similarities between the visual command and the linguistic command and what the introduction of linguistic commands contributes to the navigational system.

17-Jun-09

CHAPTER 8 A COMPARISON BETWEEN VISUAL & LANGUAGE-BASED REPRESENTATION In this chapter we will align the various frames that are involved in spatial reasoning. We compare the different frames of reference as well as egocentric and allocentric reasoning. We will compare the different frames in relation to the task to be performed. A case taken from a tourist guide that describes a walkthrough will allow us to explore natural language and visual systems and compare the two. Finally all computer simulation programs introduced in previous chapters will be examined through common frames and axes.

8.1 Aligning the visual and linguistic frames The linguistic command involves the desire to be somewhere, and the mental representation of an object already assumes a position. The updating mechanism of pointing and labeling has been compared to and shown a clear preference for objects rather than location (de Vega, 2001) (Warga, 2000). Frames of reference have a visual component and a linguistic component. The visual components are: generative axis, orienting axes, and directed axes, while the linguistic frames of reference are: absolute, intrinsic, and relative. This idea of frames of reference is further developed by Levinson (1996) and Campbell (1994). According to Levinson(1996), and Campbell (1994) the frames of reference already involve

egocentric and allocentric thinking. According to Levinson, the three linguistic frames can be summed up in Table 8.1. Intrinsic

Absolute

Relative

Origin: ≠ego

Origin ≠ego

Origin = ego

Object-centered

Environment-centered

Viewer-centered

3-D model

2 ½ -D sketch

Allocentric

Egocentric

Orientation-free

Orientation-bound

Table 8.1 Aligning classifications of frames of reference (S. Levinson)

8.2 Travel Guide – guided walk – Amsterdam The case presents one of the recommended guided walks in the Jordaan in Amsterdam, a visual depiction and written description, taken from a popular travel guide book on Amsterdam (Pascoe, 1996). The role of the different frames involved in the proposed system is demonstrated. The walk is accompanied by a written description, as well as a map, and images of various highlights of buildings and streets that one would encounter along the way. The two printed pages are an integration of a multimedia production for the presentation of a Dutch quarter, a mixture of image and text. Page one - On the first side of a two-page spread there is an aerial view photograph of the Jordaan, a three to five story row of houses. On the second page there is text and a map of Amsterdam, giving the scale and direction of north, with two photographs of locations beneath the map, and a wall plaque inserted between the graphics (see Figure 8.1).

Figure 8.1 First page of the Jordaan tour

The text reads as follows: Guided Walk MANY OF AMSTERDAM'S most important historical landmarks, and several fine examples of 16th- and 17th- century architecture, can be enjoyed on both of these walks. The first takes the visitor through the streets of the Jordaan, a peaceful quarter known for its narrow, pretty canals, houseboats and traditional architecture. The route winds through to the man-made Western Islands of Bickerseiland, Realeneiland and Prinseneiland, built in the 17th century to accommodate the

17-Jun-09

expansion in Amsterdam's overseas trade. The area, with its rows of warehouses and wharves, is a reminder of the city's erstwhile supremacy at sea.

The first two-page spread is an architectural historical description; here we shall examine one of the photographs whose underlying caption tells of a tour. The photographic perspective is the view of the Dricharingenbrug across Prinsengracht on the next page (see Figure 8.2). Second Page - The two-page spread has a different frame on this page; the map is in the middle of the page and the text is to the sides, in between the graphic and photographic display of an architectural frontal elevation and two more viewer perspectives. The first paragraph encourages people to participate in a possible event.

Figure 8.2 Second page of the Jordaan tour

The text reads as follows: A Walk around the Jordaan and Western Islands The Jordaan is a tranquil part of the city, crammed with canal houses, old and new galleries, restaurants, craft shops and pavement cafes. The walk route meanders through narrow streets and along enchanting canals. It starts from the Westerkerk and continues past Brouwersgracht, up to the IJ river and on to the Western Islands. These islands have now been adopted by the bohemian artistic community as a fashionable area to live and work.

The actual tour starts with this description. Prinsengracht to Westerstraat Outside Hendrick de Keyser's Westerkerk [page and illustration referral in the text] turn left up Prinsengracht, past the Anne Frankhuis [page and illustration referral in the text], and cross over the canal. Turn left down the opposite side of Prinsengracht and walk along Bloemgracht - the prettiest, most peaceful canal in the Jordaan. Crossing the second bridge, look out for the three identical mid17th-century canal houses called the Drie Hendricken (the three Henrys) [page and illustration referral in the text]. Continue up [illustration referral in the text] Leliedwarsstraat, with its cafes and old shops, turn right and walk past the St Andrieshofje [illustration referral in the text], one of the numerous well preserved almshouses in the city. It is worth pausing to take a look across Egelantiersgracht at No. 360, a rare example of an Art Nouveau canal house.

Guided walk; a case The command description can be seen as a verb of action specifying speed, direction and attention; the verb presupposes the position of the viewer in relation to an object, a predicate. In our demonstration the most common command is the verb ‘turn’, followed by the verb ‘walk’, used half the number of times. The verb already contains a relationship of the surrounding objects with viewer script and the specific place to the rotation of the viewer’s body. The verbal and visual description takes the viewer through: 1) The Amsterdam city frame 2) The Jordaan quarter frame 3) The street frame 4) Building object frames. We will analyze the first three sentences in the guide walk (see Figure 8.3.).

Figure 8.3. The first three sentence segments enlarged for clarity.

The first sentence in the tour starts (all references are part of the guide book ): a) Outside Hendrick de Keyser's Westerkerk (1) ‘turn’ left up Prinsengracht, past the Anne Frankhuis (2) (see p. 90), and cross over the canal. Outside – A reference location — Object extended axis, the agent is facing the front. ‘turn’ left Reference orientation — agent extended axis. up Reference orientation in between the map and location through absolute coordinates. c with respect to route ab

c—> ab

past Reference in relation to route (confirmation). b with respect to ac

b —> ac

cross over The cross over is also rotation/relative position, one is turning left. There is no equivalent egocentric reference.

b) ‘Turn’ left down the opposite side of Prinsengracht and walk along Bloemgracht. ‘turn’ left Rotation/orientation — Agent extended axis but also reflection/location – opposite side. down Reference orientation in between the map and location through absolute coordinates. a with respect to bc

a—> bc

17-Jun-09

It is interesting to note that this sentence needs both visual and linguistic understanding. Let me start by explaining that ‘gracht’ is the Dutch word for canal, and in the Netherlands the same street name is given to both sides of the canal. In our situation when the tourist approaches the canal and looks around for Bloemgracht without referring to the map, the tourist does not cross the canal to the opposite side and goes in the opposite direction since the next command – turn right – is relative (tourist); and since Bloemgracht (street) has two sides, crossing can be preformed both ways. Along — Object extended axis b with respect to ac

b —> ac

c) Crossing the second bridge, look out for the three… Rotation in terms of an object’s relative position and map reading, thus no further information for orientation. There is no equivalent egocentric pointing. The analysis of the accessibility of the site produced a map of possible branching and a script shown in Table 8.2 and Figure 8.4. Access description of a

Access description of b

Access description of c

First right

Fourth right (First bridge)

Second right

Third right First left

Third right

Fifth right (Second bridge)

First left (First bridge)

Or First right after canal

Fourth right First left First left

Table 8.2 Script of first three sentences

Figure 8.4 Branching graph for the first sentence segments with one degree branching to target

From the examples above we observe that the analysis depends on the strategy and contextual parameters which specify performance. The morphological characteristics and performance are meditated through visual and linguistic frames. That is, performance and heuristic rules govern the strategy by which one frames exploration.

8.3 Comparison between linguistic and visual frames In order to compare the different frames, two different rotations are examined: object and array. The literature of examining this phenomenon is large and goes back to the times of Marr (1982). As Levinson demonstrates (in Table 8.3), there is a significant difference between the various frames but the frames can be compatible across modalities (1996 p.153). F = figure or referent with center point at volumetric center Fc G = ground or relatum, with volumetric center Gc, and with a surrounding region R V = viewpoint A = anchor point, to fix labeled co-ordinates “Slope” = fixed-bearing system, yielding parallel lines across environment in each direction

Intrinsic

Absolute

Relative

Relation is

binary

ternary

Origin on

ground

viewpoint V

Anchored by

A within G

"slope"

A within V

Transitive

Yes

Yes if V constant

whole array?

Yes

viewer?

Yes

ground?

Yes

Constant under rotation of

Table 8.3 Summary of properties of different frames of reference (S. Levinson)

When comparing the visual and linguistic system to the allocentric and egocentric system one has four elements: Visual egocentric: is perceived by the internal senses and is, in principle, located within the limits of a person’s own body. In the visual egocentric space one can perceive one’s own body – what Pillard defines as the sensorimotor mode. The visual egocentric mode is the alignment of objects in relation to the environment

17-Jun-09

and the alignment of object category in relationship to the environment. Berthoz differentiates between the relationship of ‘personal space’ (2000) and grasping space, which allows one to point. Linguistic egocentric: is the representation of the location of objects in space through a relative frame of reference. It is what Berthoz defines as the ‘egocentric frame of reference’ (2000) and Pillard defines as representational mode. Visual allocentric: encodes the spatial relations between objects and the relation to a frame of reference external to one’s own body. Thus, the visual allocentric space, the alignment with one’s own body, and the represented reality is already given. Visual allocentric can be a frozen visual egocentric point of view referred to as the pictorial perspective or Marr 2½ –D. Linguistic allocentric: is the direction of relations between objects and the relation to a frame of reference external to one’s own body. What Berthoz defines as an allocentric frame of reference is a direction through absolute and intrinsic frames of reference. When a comparison table is drawn (see Table 8.4), one notices that when one equates the visual frames of reference to the linguistic frames of reference and the difference between them is the number of axes or a semiotic distinction, then egocentric and allocentric cannot be the distinction between the frames of reference. This is a question of one-to-one relations between the observer and agent, object, and polar coordinates. When using the intrinsic and relative frames of reference the observer is using an analogous process relative to himself, while in the absolute frame of reference he is using an analogous process external to himself. Thus in this act of communication one can use egocentric frames since it is a one-to-one relation. This conclusion coincides well with findings in neurophysiology (Dodwell, 1982) and (O'keefe, 1993).

Constant under rotation

Object rotation

Array rotation

Agent rotation

Visual egocentric

Pointing

Possible through window frame

Indirectly through window frame

Linguistic egocentric

Relative

Intrinsic

Relative

Visual allocentric

Display dependent

Possible through window frame

Indirectly through pointing

Linguistic allocentric

Intrinsic and absolute

Table 8.4 Aligning Classifications of Frames

8.4 Comparison between existing simulation programs According to (Amarel, 1968) “a problem of reasoning about actions is given in terms of an initial situation, a terminal situation, a set of feasible actions, and a set of constraints that restrict the applicability of action.” Visual and linguistic descriptions have the ability to convey information about the path through explicit and implicit knowledge, for example “go left” is a description, where the start point is implicit. The path can also have the end point explicit, like “go towards the house” or “go into the house”; it is the equivalent of pointing visually. The converse path can have an arrival point like “go to the left of the house”. Lastly, the transverse path can have an explicit start and end point, giving us the ability to determine the path relation to an object. Vision has three types of movement: bounded agent, directed agent, and converse, while language also has three types of movement: bounded agent, converse path, and transverse path. Comparing the various navigational programs to the operational criteria of qualifying movement via a route, one has to take into account the discrepancy between pointing and directing, as exemplified through axes and frames. In the existing navigational programs not all three modules subsist (see Figure 7.9). Every one of the programs has attempted to respond to the requirements of enhancement of immersion differently. 3DS MAX uses a global coordinate system to rotate an object, while Cosmo utilizes a viewer-centered frame with elements of converse mobility. Thus the system is more effective and efficient. Comparing the overall tasks that are involved in visual and linguistic commands, in visual systems the observer has attentional competition between the various tasks that he/she has to perform. For example, when one identifies an object, one has a choice between array rotation and object rotation as well as object pointer and moving pointer. The linguistic system command is adaptive; it can perform the various tasks with one interface. On the other hand, the visual system has a visual feedback/confirmation that provides the user with a more interactive experience, while the linguistic based system has no visual feedback. Probably the best immersive solution is a mixture between the two. In order to examine the limits of each system the choice fell on the best task performance i.e. descriptive systems.

17-Jun-09

3DS MAX

Tomb Raider

Myst

Cosmo

Sonja

Manipulation mode

Yes

Observational mode

Yes

Converse

Yes

Transverse

Yes

Intrinsic frame

Yes

Relative frame

Yes

Table 8.5 Comparison between the different navigational programs

Concluding remarks

In this chapter we have compared the basic ways of handling objects, with both visual and linguistic systems. The visual and linguistic systemsâ&#x20AC;&#x2122; approaches to objects have shown that one handles objects with the same frames and axes, although they use different methods to convey movement, and both have an added ability to refer to the pathâ&#x20AC;&#x2122;s intrinsic value. When one equates the visual frames of reference to the linguistic frames of reference the difference between them is the number of axes, then egocentric and allocentric cannot be the distinction between the frames of reference, only a broad description. This is a question of one-to-one relations between the observer and agent, object, and polar coordinates. When using the intrinsic and relative frames of reference the observer is using an analogous process relative to himself, while in the absolute frame of reference he is using an analogous process external to himself. In the case of intrinsic frames of reference, opinions tend mostly towards a use of an allocentric frame of reference. I prefer to think that of this act of communication, where one employs an egocentric frame of reference, as giving the preference to vision, in a one-to-one relationship. This conclusion coincides well with findings in neurophysiology (Dodwell, 1982) and (O'Keefe, 1993). We have also shown that the existing simulation programs still lack a comprehensive modulation. The new proposed system will work better because, in contrast to previous products, it will utilize the intrinsic frame of reference in a wider variety of possibilities (see Chapter 7.3). In the next chapter, we will demonstrate how the proposed ideal system might work.

17-Jun-09

CHAPTER 9 AN AGENT/OBJECT-BASED SYSTEM OF NAVIGATION The aim of this object-based navigation tool is to enhance for the user the exploration of a virtual environment, and to make the navigation a natural directed experience. Following the goals of realism and immersion in exploring through movement in the virtual world, the system uses an agent that allows the user to directly control “on the fly” navigation while viewing objects, moving towards a destination that is non-distal, in other words objects that are visible by the person who moves. The objective of this chapter is to illustrate some of the aspects of how such a proposed system works, and in a way that makes it different from existing navigational programs. The chapter will concentrate on the cognitive aspects that underlie the proposed system and the linguistic and visual expressions of input/output of the system. The navigation is controlled by the user employing spatial frames of reference. It offers a linguistic control through a non-metric topological environment as used in object-based navigation, while the visual output is a metrical. When one examines the English spatial prepositions in conjunction with the visual representation, one realizes that there are a few basic elements of movement in the toolkit. Agent-centered – has an agent that can move in any direction using the relative frame of reference; object-centered – where one can move in relation to an object using the intrinsic frame of reference; and environment-centered – where one moves according to invariance within the environments using the

absolute frame of reference. In fact the situation is more complex; when one looks at agent-centered one has four choices while moving ‘to view the agent’ or ‘not’ and while standing still ‘to view the agent’ or ‘not’. In our case while moving, one sees the agent and when one is standing still the agent point of view takes priority.

9.1 Conceptual framework The proposed system consists of input of a language command and outputs a visual display. The system comprises a screen in front of an observer, who sees a generated perspective – a location. The Albertian window divides the lines of sight (visual rays) from the viewer positioned in a stationary location and the transparent picture plan where all lines of construction converge on the vanishing point. The distance from the constructional objects to the observer is an imaginary virtual space. The observer thus surveys the location of objects in front of him. The observer’s position can now be calculated and we can call this calculated point the agent. This is sometimes referred to in the literature as perspective-taking, a common core of representation and transformation processes for visualization, mental rotation, and spatial orientation abilities. The aim of this tool is to move from one location to another through a route by specifying either the start position, for example “go left”, or the end location, for example “go towards the house”, or a path, for example “go along the street”. In order to recognize the change in position, we need some way of representing those directional commands. Through limited basic elements, we can generate a large computational variance that is meaningful to the observer. In order to command an agent based on the belief, context and desire of a user, the computer program must go through two steps to interpret such a request. Participants engage in dialogues and sub-dialogues for a reason. Their intentions guide their behavior and their conversational partner’s recognition of those intentions aids in the latter’s understanding of their utterance. A sub-dialogue is a command segment concerned with a subtask of the overall act underlying a dialogue. For example, the user wants to examine the effect of a proposed building on its environment; the sub-dialogue is a linguistic command to move Agent A from P0 to P1. Thus, the discourse includes three components: a linguistic structure, an intentional structure and an attentional state. The linguistic structure consists of command

17-Jun-09

segments and an embedded relation. In the linguistic command, an agent performs an act in relation to objects. The intentional structure consists of command segment purposes and their interrelationships. A command segment purpose is an intention that leads to initiation of a command segment (intended to be recognized). The attentional state is an abstraction of the state of action of the participant’s focus of attention. The system does not need to recognize an object since everything is labeled in the virtual reality, and indexialilty is assumed. The field of view helps to narrow down the object search. When the system encounters an ambiguous command, it clarifies it through a linguistic query. All replies are complaints, generated when the system cannot make sense of an instruction. Examples are “Where is the X (object)?” and “Go where?” The system only complains when an instruction does not make sense: specifically when it determines that one of the entities it refers to does not exist. Accordingly, the system clears the instruction buffer, thereby rejecting the instruction. The system does not allow for a reply. The system is similar to that of (Chapman, 1991). Chapman’s ‘Sonja’ is what is implemented in this system. In order to engage in more extended negotiation of reference, one would in many cases require a mechanism for storing and using more of the linguistic context. The knowledge precondition of the agents is axiomized as follows: 1. Agents need to know the scripts for the acts they perform 2. Agents have some primitive acts in their repertoire 3. Agents must be able to identify the parameters of the acts they perform 4. Agents may only know some descriptions of the acts 5. Agents know that the knowledge necessary for complex acts is derived from form their component acts To determine the command scripts the proposed system uses a parser to answer four questions about the nature of movement in virtual environments. The parser’s model of analysis consists of four different types of information. It answers the question Why? – What is the intent of the user? That is, how does one want to view the image: ‘Observation mode’ or ‘Manipulation mode’? The grasping frames use a user’s profile to establish the intent of the user according to the user type in determining Manipulation mode and Observational mode action. It answers the question How? – What kind of action should be taken? The Mobility module distinguishes between converse and transverse movement.

Converse – Two-Point Relations: this includes the evaluation of topological, angular and distance-dependent relations, which share the common characteristics that they – in their most basic form – relate two objects to each other. Transverse – N-Point Relations: relations that cannot be reduced to a two-point problem, such as path relations, special cases like in between, or constellational relations, are qualitatively different from two-point relations. They require more than two objects, and/or additional arguments such as shape or outline information. The system relies on object width and length alignment empowering the user with the ability to decide within a channel which way he or she wants to look. It answers the question Where? – What kinds of spatial frame of reference are used? The Orientation module deals with several relations. Angular relations in particular depend on the establishment of a frame of reference in order to be nonambiguous. In our case it is the object-centered and viewer-centered frames of reference. Intrinsic frames of reference are relations that take two or more objects as arguments and specify the position of a located object with respect to the reference object. Relative (deictic) frames of reference are relations that take two or more objects as arguments, specifying the position of one object with respect to the reference frame of the viewer. It answers the question What? – Which object? That is, the way one categorizes the data as opposed to the retrieval classification of the data. The thematic sentence analysis processes both objects and routes (see Figure 9.2). An agent’s thematic role specifies the agent’s relation to an action. In linguistic terms, verbs specify actions, nouns identify the objects, and prepositions identify the relation to the object, either route or location. The conceptual structure expression helps us to establish the modules of language parsing, or the thematic roles that verbs, nouns, and prepositions cooperate, towards a semantic understanding.

Figure 9.2. The thematic sentence analysis

The specification of location requires something more than identifying the object: what is needed is a location description. This can take one of two forms: (1) state and (2) process. A state description of location tells where something is located in terms of a well-known and commonly understood system of coordinates. A process description is a set of instructions telling how to get to a particular location. The

100

17-Jun-09

linguistic command does not necessarily carry that much information – most of the process is assumed to be contextual. The computational tool is based on a digital information retrieval system; objects and locations in space. The tool attempts to match different meta-tags within the database in order to infer the command of the user. The database system consists of database retrieval strategies for forward propagation. A standard assumption in computationally oriented semantics is that knowledge of the meaning of the sentence can be equated with knowledge of its truth conditions: that is, knowledge of what the world would be like if the sentence were true. This is not the same as knowing whether a sentence is true, which is (usually) an empirical matter, but knowledge of truth conditions is a prerequisite for such verification to be possible (Davidson, 1969). Meaning as truth conditions needs to be generalized somewhat for the case of imperatives. The cognitive linguistics system draws its sources from an older tradition, of functionalist linguistics. This theory holds that constraints in the form of language (where form means the range of allowable grammatical rules) are derived from the function of language. Many linguistic phenomena remain unaccounted for in our grammar, among them agreement, tense, aspect, adverbs, negation, coordination, quantifiers, pronouns, reference and demonstratives. ‘Frame-semantic’ is considered fundamental to an adequate understanding of linguistic entities, and as such, is integrated with traditional definition characterizations.

9.2 Controlling the pedestrian agent behavioral model People are used to the situations they normally interact with; their behavior is determined by their experience, which reacts to a certain stimulus (situation) and is then evaluated. Their reactions are usually rather ‘habitual’ and well predictable, a micro-movement (Haklay 2001). The micro-movement fluctuates, taking into account random variations of behavior that arise. In the following paragraphs, we will specify the micro-movement of agent motion: 1. A user wants to walk in a desired direction ‘g’ (the direction of his/her next destination) with a certain desired speed ‘v’. In our case the desired speed of an agent is equally distributed:

101

2. Agents keep a certain distance from borders (of buildings, walls, streets, obstacles, etc.). This effect can be described by a repulsive, monotonic decreasing potential. At the same time people are attracted to certain streets and objects. 3. Agents need to simulate pedestrian preference. These interactions cause either high walkability effects or low walkability effects on him/her, like walking on different objects or materials and the effects of performing desired maneuvers to achieve such preference. For example, walking up the staircase and walking down a corridor. In effect, one can walk anywhere. For example, the imperative “Walk in the shade” or “walk in the middle of the road”. 4. The user needs to see the agent moving to grasp his environment, and since no haptic or immersive device was integrated into the system to provide feedback, vision is the primary source of information. Micro-movement characteristics contribute to the detailed behavior of agents, and the immersion of the observer. Factors include progress, visual range, and fixation. Progress is simply the desired walking speed at which an agent moves. Visual range relates to an observer’s visual acuity and determines which buildings and other elements in the environment the agent will ‘see’ and potentially respond to. In order to enable an agent to receive commands from the user and search the nearby area and to match an object to its label, the agent must control the following elements: (1) Route – This includes the whole route, an agent’s position on that route, its current location in the world, its fixation on its route, its thresholds for evaluating possible new destinations (derived from its behavioral profile), and a threshold for deciding that a target has been reached. (2) Progress – The preferred speeds by which one moves in the environment as it relates to objects size. In other words, the progress made towards the next target and the minimum acceptable progress per unit time. (3) Direction – Encompasses the agent’s current directional heading and the direction to the next waypoint. (4) Location – The coordinates of the agent’s center point and the agent’s size expressed in terms of the effect on the walkable surface of the agent's presence in a grid square. (5) Visual range– Describes an agent’s visual capabilities expressed as a visual cone with breadth and range into the surroundings. Potential destinations inside the

102

17-Jun-09

visual cone are considered as potential deviations from the current plan as directed by the observer, or as obstacles by the wayfinding task. (7) Control – Reflects the agent's movement state. For example, is the agent active or waiting, moving normally or stuck? The subroutine structure These different levels enable the agents to compute separately local movement, the process of moving to the next grid square on the walkable surface, medium-range movement maintaining a proper direction, and longer range movement trying to move to the next point while avoiding obstacles. The system is able to examine where one is located through the shortest path between two points. The modules as currently used are presented in Figure 9.1. The following sections describe their operation, starting with the low-level operation of the helmsman and proceeding up to the highest level of the chooser.

Figure 9.1. The interactions between the agent control modules and the agent state variables stored on its ‘agent state’. There is a general ‘zig-zag’ of interaction whereby high-level modules control the state variables to which lower level modules respond.

The helmsman module This module moves the agent ‘physically’ through the environment. It reserves space for the agent in the environment and checks for obstacles like other agents and buildings. Each grid square or ‘cell’ in the ‘world’ has a certain capacity, its walkability value, low to high, where a high value refers to a non-penetrable object, such as a building. Values indicate the proportional occupation of that grid square, so that low values are preferred by pedestrians and high values indicate that a cell is unsuitable for pedestrians, such as the center of busy roads. Each agent is also assigned a value representing its own occupancy of a grid cell. In a tour, the mover module looks in up to five directions, starting from the current heading direction, to determine where the most space is available. It sets the heading to this direction and places the agent at the new location, according to its heading and speed. The navigator module Supporting the helmsman at the next level is the navigator, which also maintains the agent’s heading so that it does not deviate too far from the target direction. However, the agent must be allowed to deviate somewhat from the heading

103

towards a target so that it can negotiate corners and get out of dead ends. Therefore, the controlled variable is the agent’s progress towards its current target. In operation, the navigator module checks if it is possible to walk in the target direction. If so, the navigator sets a new heading; if not, the heading remains unchanged. These modules together deal with ‘tactical’ movement of getting to the next point in the route. The last behavioral modules attend to more strategic movement and planning. The chooser module The user-choice or chooser receives a command from the user that identifies the next target in the route of the agent. This module enables an agent to receive commands from the user and search the nearby area and to match an object to its label. The target can be a point like a junction in the street network, or a building or location the agent wants to enter. The chooser uses the agent’s visual field to detect candidate objects in its immediate surroundings. In motion the visual field’s extent is defined by the agent’s speed and fixation on its task: the higher the speed and fixation, the narrower is the fan of rays which the vision module sends to the environment. Building objects in the field of view are considered by the chooser module as potential new destinations which may be added to the current planned route. Building attributes such as type and general attractiveness are compared with the user’s commands, and if a match is found then the location may be pushed onto the route as a new next destination.

9.3 Operation of the system The linguistic interpretation system is based on work already done both theoretically and empirically in computer science and robotics. The system is viable because the domain is constrained, by limiting the vocabulary of objects in the architectural domain in the virtual environment. There are numerous systems using the domain constrained hypothesis, to mention just a few (Chapman, 1991), (Tollmar, 1994), (Smith, 1994), (Polllock, 1995), (Bunt, 1998), (Cremers, 1998), and (Strippgen, 1999). According to (Varile, 1997), there are several approaches to language analysis and understanding. Some of those languages are more geared towards conformity with formal linguistic theories, others are designed to facilitate certain processing models or specialized applications. Language understanding divides into: constrained-based grammar formalism, and lexicon for constrainedbased grammars. According to Allen, (1990) the original task was defined by in the

104

17-Jun-09

1980s, in which positional change was defined in terms of the object, the source location and goal location. In order to respond to a verbal command the system must interpret the verbal command and at the same time translate the visual scene into a verbal description, so that there is a correlation between the two components. Each component, the visual parser and the linguistic parser, have their own information restraints (see previous chapters). The computational model shows the basic processing modules and assumes that most modules already exist; for example, modules like voice recognition have been worked out, as has a rendering engine. The system makes use of partial matches between incoming event sequences and stored categories to help solve this command and implementation. The system imposes a user constraint that comprises four principles. First, each sentence must describe some action to be performed in the scenes, that is the agent must be instructed on the action and location in order to perform. Second, the user is constrained to making only true statements about the visual context. Everything the user says must be true in the current visual context of the system. The user cannot say something which is either false or unrelated to the visual context. Third, the order of the linguistic description must match the order of occurrence of the EVENTS. This is necessary because the language fragment handled by the system does not support tense and aspect. Finally the user is also restricted to homomorphic relations, i.e. language is time/effort restricted in the amount of information it contains about reference objects. These constraints help reduce the space of possible lexicons and support search-pruning heuristics that make computation faster. The above four learning principles make use of the notion of a sentence “describing” a sequence of scenes. The notion of description is expressed via the set of correspondence rules. Each rule enables the inference of the [EVENT] or [STATE] description from a sequence of [STATE] descriptions which match the pattern. For example, Rule 1 states that if there is a sequence of scenes, which can be divided into two concatenated sub-sequences of scenes, such that each subsequence contains one scene interpretation. In every scene in that first subsequence, x is at P0 and not at P1. While in the second subsequence scene, x is at P1 but not at P0. Then we can describe that entire sequence of scenes by saying that x went on a path from P0 to P1. For example, “go to the left of the house”. The navigational command program consists of two parsers: one visual, the other linguistic. The parser is a computer program that breaks down text or visual object information into recognized strings for further analysis. The organization of these

105

modules is illustrated in Figure 9.2. The visual parser examines visual geometric relations between the agent, object and the object’s background. The objects within this unhindered visual space are divided into access trees in which objects and nodes are connected through string clusters. The visual parser then produces a linguistic script of all the possible relations between the agent and the objects in view. The language parser examines the syntactic properties of the sentence. The language parser examines the geometric relations between the agent, object and the object’s background as described in the linguistic command. The Linker receives information from the object parser and the predicate parser. The data received by the linker is already restricted by the observer point of view. The Linker compares the linguistic object relation and matches it with the appropriate visual object/node relations. The user profile module establishes the subroutine behavior of the agent. The Knowledge rules restrict further the movement by the user.

Figure 9.2. The process of analyzing a command

The visual object parser The object parser examines the relationship between objects and nodes, i.e. object location and the path one has to travel to get to one’s goal. The more nodes one has to pass through, the higher the circulation cost, i.e. it is a question of distance versus energy. The hindered visual space is used to eliminate from the search objects that cannot be seen. Physical accessibility examines if one can move between objects or between other agents and eliminate all inefficient nodes. The object parser process is as follows: 1. Locate the agent position within the scene 1.1 Locate the visual environment boundary 2. Detect object types and count objects 2.1 Locate object positions 2.2 Identify the attributes of objects 3. Determine the object’s reference frame system through geometrical analysis of objects in view 3.1 Create a reference frame system for each of the adjacent objects

106

17-Jun-09

3.2 Detect the adjacency relationship to other objects 4. Detect all paths available to the user to reach the objects 4.1 Create a physical access path to each object Topological analyses of objects consist of a semantic graph and geometry graph of the object. The geometry graph of the object analyzes the face vertex and edge, while the semantic graph analyzes top and bottom, front, and back, left and right. The semantic graph also analyzes the zones of object viewing, adjacent zones. The visual parser prepares the basis for comparison with the linguistic command. The operation of the attentional mechanism in vision is based on three principles concerning the allocation of spatial attention (Mozer, 2002). These abstract principles concerning the direction of attention can be incorporated into the computational model by translating them into rules of activation, such as the following:

(1) Locations containing objects in the visual field of the agent should be activated. (2) Locations adjacent to objectsâ&#x20AC;&#x2122; regions should also be activated. The directed regions use an architectural reference frame system (see Figure 9.3) where the exact location is established through a view that encompasses the object. Eight regions were used to increase identification of the target area by the users. The architectural reference frame system not only refers to adjacent points in relation to an object but links them into a conceptual neighborhood, for example the preposition â&#x20AC;&#x2DC;alongâ&#x20AC;&#x2122; will use three points from a side of the object.

Figure 9.3 architectural reference frame system; front, back, left, right, front left, front right, back left, and back right, incorporating the visible area from a generating location (or convergence location of the optic rays)

The visual parser links the topological and visual accessibility graph (distances and orientation) to produce accessibility tables, and adjacent graphs (path and direction) to reach an object. Pedestrian activity can be considered to be an outcome of two distinct components: the configuration of the street network or urban space and the location of particular attractions (shops, offices, public

107

buildings, and so on) on that network. The visual parser then ranks and describes in tables the route to each object.

The Linguistic parser The language parser has to enforce a relation between a time-ordered sequence of words in a sentence and a corresponding time-ordered sequence of syntactic structures or parse trees, which are tagged by the lexical category (for the definition of prepositions, see Figure 9.4). The language parser use of noun, verb, preposition and adjective through slots requires the sentence to be linked by a simple syntax. The proposed function of the language parser produces a corresponding parsetree/semantic-structure pair. The language parser imposes compositional syntax on the semantic tree, relating the individual words to grammatical English and lexicon. The parser directly represents the clauses of the grammatical syntactic functions and grammatical constituents. The language parser will be able to distinguish a command given with a relation to the position of the object, and thus will differentiate between the object positional reference system, for example: “Go along the street.”

The sentence is constructed thus: Verb – go, Preposition – along, Noun – street. “Go to the left of the house.”

The sentence is constructed thus: Verb [Preposition, Noun], where the object adjacent location is the target. “Outside the church, go left.”

The sentence is constructed thus: [Preposition, Noun] Verb, Noun, where the object reference frame is the starting point and the agent reference frame is the end point.

The list of prepositional definitions are as follows: “left/right/back/front side” The chosen object needs to be analyzed for its orienting axis. Process We need to establish the reference system at a distance that shows the whole facade.

“Rotate” Rotate the object requires the retrieval of the object generating axis. Process Switch to object panorama with the same distance to the object as the observer is.

108

17-Jun-09

“Through” ‘Through’ depends on the opening/passageway within an object or between objects. Process Move through the opening and continue in the same direction, while relinquishing part of the control to the observer (left and right in relation to the observer).

“Around” The term ‘around’ is defined as moving in relation to the circumference of an object. One directs the observer through the directed axes. Process Move the agent on a path neighboring to the object in relation to the agent, while relinquishing part of the control to the observer (left and right in relation to the object) and stop at the further point from the agent.

“Along” Rotate the object requires the retrieval of the object directed axes. Process Move the agent as on path in relation to the object, while relinquishing part of the control to the observer (left and right in relation to the observer).

“Towards” ‘Towards’ moves the observer to desired object. Process Move the agent to the object and wait for further command.

“Zoom” ‘Zoom’ places the observer in front of the desired element within an object. Process If the agent is in front of the object then elements of that object can be chosen.

Figure 9.4 Definition of preposition

The Linker The linker component relates a time-ordered sequence of linguistic input to a sequence of cognitive structures that directs the visual output. The input is given in terms of virtual visual location as well as a sentence that describes part of the environment and the process the agent has to perform. The linker then connects all possible sequences of the STATE with EVENT to form connections with the objects in a scene in front of the user, and produces as an output a new STATE, location. The linker has a list of prepositions that link objects to action. The linker relies on the object parser to determine the access graph and object adjacent

109

relations, as well as topological examination. When other information is given such as a “Go to the front of the second building”, the linker then compares the description produced by the system and the description produced by the user. The linker divides the sentence into the four categories of the path, the conceptual constraints. The syntactic analysis is constrained by the four modules of movement (see Chapter 5.2). The linker also relies on user profiles and knowledge rules. It can be thought of as regulations for preposition actions; when the system examines a sentence it must compare the new sentence with those rules. The pattern that emerges form those commands is as follows: “Go left” Agent centered – (Bounded) I) GO(x- current position + Preposition) “Go towards the house” Object centered – (Directed) II) GO(x- current position, TO (z an object)) “Go to the left of the house” Object centered – (Converse) III) GO(x- current position), TO, (Preposition) + (z an object)] “Go along the street” Object centered (Transverse) IV) GO([FROM(y an object) To(x- current position, minimal distance) + (Preposition), TO(y an object) (x- current position, maximal distance)]) Chains “Go from the house to left the of town hall” V) GO(x- current position, [FROM(y an object) + (Preposition), TO(z an obj ect)]) “In front of the house go left” VI) (Preposition) + (y an object) +Go (x- current position + (Preposition) The referring expressions of natural language will be just those expressions that map into the slots of the conceptual structure expressions.

User profile The computer system queries the end-user about his preference to object examination, or grabbing frames which suits the user type. The types of questions that the system asks are: What type of action would you like to perform?

110

17-Jun-09

A. Visit locations? B. Architectural tour? C. Maintenance inspection? Would you like to be close to the object or would you like to see the object’s outline? Would you like to see the object from a frontal position or from the side? Would you like to go directly or through a path?

Knowledge/production rules In order to draw attention to an object, the user must specify the location of the object, as well as the new position. Once the information from the linker and parser correlate, the command statement should be executed. 1)

IF THE LOCATION OF THE OBJECT IS KNOWN AND THE OBJECTS WITHIN THE POINT OF VIEW IS "DEFINED" AND THE OBJECT AND DESCRIPTION MATCH AND THE OBJECT HAS A REFERENCE SYSTEM AND NO OTHER RELATIONS ARE SPECIFIED THEN THE USER IS REFERRING TO AN INTRINSIC FRAME

In order to draw attention to an object, the user must specify the location of the object, as well as the new position. Once the information from the linker and parser correlate, the following movement should be executed. Go left – Agent-centered – Relative frame of reference Go towards the house – Object-centered – Intrinsic frame of reference Go to the front of the house – Object-centered – Intrinsic frame of reference Go left of the tree – Agent-centered – Relative frame of reference Go south of the house – environment-centered – Absolute frame of reference The semiotic distinction is at the heart of the theoretical system. It makes the distinction between the label signifier and the object image itself. The system differentiates between the semantic object and the signified, thus disambiguating some of the sentence’s confusion between the intrinsic and relative frames (see Figure 9.5). In the case of a converse path, the system matches the topology of the object to the linguistic preposition. If the object has no reference system, the attention shifts to the agent; when the object has a geometric axis then the attention shifts to the object. To this statement, we add the following rules: 2)

IF THE IDENTITY OF THE OBJECT IS KNOWN

111

AND THE OBJECT HAS NO REFERENCE SYSTEM AND THE SURFACE OF THE OBJECT IS "UNDEFINED" AND THE OBJECT IS AN ARTEFACT THEN THE USER IS REFERRING TO A RELATIVE FRAME

Figure 9.5. An example of the inference rule used in the sentence “go to the left of the house”

Also 3)

IF THE IDENTITY OF THE PATH IS KNOWN AND THE PATH HAS AN ACCESS GRAPH AND THE SURFACE OF THE PATH IS "DEFINED" AND THE PATH IS AN ARTEFACT THEN THE USER IS REFERRING TO PRIMARY PATH

DEMONS

The following two rules examine the possible connection between the type of command and the preferred strategic means to achieve the desired route. 4)

IF THE VERB THROUGH IS USED IN CONJUNCTION WITH PATH IF THE LOCATION OF THE PATH IS KNOWN AND THE PATH WITHIN THE POINT OF VIEW IS "DEFINED" AND THE PATH HAS AN ACCESS GRAPH AND NO OTHER RELATIONS ARE SPECIFIED THEN THE USER IS REFERRING TO A SCENIC PATH

IF THE VERB THROUGH IS USED IN CONJUNCTION WITH AN OBJECT IF THE LOCATION OF THE PATH IS KNOWN AND THE PATH WITHIN THE POINT OF VIEW IS "DEFINED" AND THE PATH HAS AN ACCESS GRAPH AND NO OTHER RELATIONS ARE SPECIFIED THEN THE USER IS REFERRING TO SHORTEST PATH

The system will be able to distinguish a command given with a relation to the position and direction of the object, and thus will differentiate between the object

112

17-Jun-09

directional reference system and the egocentric position which creates a linguistic mirror effect. When we discussed the robotic system in Chapter 6.2, the frames of reference always assumed a relative frame, but people do use an intrinsic frame of reference as well, and herein lies the confusion. For example, the front right and back left depend on the viewer poison. In order to solve it the system will adopt the viewer position. This does not mean that all human error will be handled by the system, but that the system will respond to corrections made by the user. The system now has the intended object and the preposition and in this case also an adjective. 6)

IF THE IDENTITY OF THE IS OBJECT KNOWN AND THE OBJECT HAS A REFERENCE SYSTEM AND THE OBJECT DIRECTIONAL AXIS IS NOT THE SAME AS THE AGENT AND THE OBJECT IS AN ARTEFACT THEN THE USER IS REFERRING TO REFLECTIVE RELATIVE FRAME

The system will be able to distinguish a command given with a relation to the position of the object, and thus will differentiate transverse action. 7)

IF THE IDENTITY OF THE OBJECT IS KNOWN AND THE OBJECT HAS A REFERENCE SYSTEM AND THE OBJECT DIRECTIONAL AXIS IS NOT THE SAME AS THE AGENT AND THE OBJECT IS AN ARTEFACT AND THE PREPOSITION IS ALONG IS USED

AND THE AGENT POINT OF VIEW IS THE DOMINATING VIEW TO THE LEFT OR RIGHT OF THE OBJECT THEN THE USER IS REFERRING TO THE LEFT OR RIGHT OF THE OBJECT IF THE DISTANCES ARE EQUAL THEN PROMPT THE AGENT FOR LEFT OR RIGHT DIRECTIONS

IF THE PREPOSITION PAST IS USED AND IF THE IDENTITY OF THE OBJECT IS KNOWN AND THE OBJECT HAS A REFERENCE SYSTEM THEN GO TO THE BACK OF THE OBJECT

THEN LOOK AWAY FROM THE OBJECT FROM THE DIRECTION OF THE POINT OF DEPARTURE

113

9.3 Process of the system Let us go one more time go through the process and see the different phases of this multi-processing. In the first phase the user looking at the screen utters a command. It is then processed through technologies for the spoken language interface. According to (Zue, 1997), the spoken language interface can be broken down into ‘speech recognition’, which receives information from ‘speaker recognition’, which in turn receives information from ‘language recognition.’ The command is then translated into text and that is when our system starts to operate. This information flow can be displayed in basic ways through three elements; the perception of objects, the object in the virtual world, and lastly the language expression. The system gathers information into the following slots: Why, How, Where, What. Each slot is a semantic tree that answers or informs the linker about a specific task. Why – User preference Type of user Manipulation mode preference Viewer preference

In this module, the system displays dialogue box which attempts to categorize the user into a type of potential navigator. In responding to the user’s input, the system incorporates demons to help the user navigate. How – Verbs – Mobility module – What is the objective of the task? What is the EVENT? What kind of path type is it? Is it a path or is it a location?

In this module the system searches the database for verbs that describe the action to be taken. It has to decide which type of path is to be taken, thus it borrows from the orientation module and together they determine the path type. Bounded agent, Directed agent Converse path and Transverse path. Where – Preposition – Orientation module What preposition is used? What attachment to objects action is required? Does it use an object-centered command or an agent-centered command?

The where module establishes the linguistic command center of attention. In order to accomplish such a task it must use the identification module to determine the object geometrical properties. What – Nouns – Identification module – world’s objects

114

17-Jun-09

Identify objects in the agent field of view. What is the STATE? How many objects are there? Identify each object position relative to the agent. Identify object and attributes relative to the agent. What are the axes of the targeted objects? Identify path in the agent field of view. Identify path relation to objects.

The identification module determines location and direction of the agent to the object in the agent visual field. It parses the visual information to be retrieved by the other modules. Through accessibility graphs and adjacency graphs, a basic linguistic relationship of objects in the agent visual field are generated in the form of a table (see Chapter 8.2). The semantic trees seek information to fulfill their objective. Once certain information nodes have been fired, and the semantic net gives a reply, the system can then execute the command. As one can see, the knowledge that is required has its source in both the visual and linguistic parser. According to activation, the system has a different output. The linker applies a simple rule: if no object is activated, or the object has no reference frame, then it is an agent-centered navigation, otherwise it is object-centered. The linker also matches the user command description with the system table of object locations relative to the agent. There are practical difficulties at this stage of development to add more features in the confined time of a dissertation. Adding more features requires bringing in more modules as is exemplified by the statement “Go to the left side of the house with a tree in front.” This is an instruction which contains a subset – a particular affirmative – some of a is b, or, if we were to alter the statement slightly, “Go to the left side of the house without a tree in front” – some of a is not b. All these types of reasoning within the sentence require the system to reason within the subject and predicate, which in turn requires an ever growing sentence complexity for the predicate parser, and ever-growing reasoning means and representations. The system also needs a topological visual Gestalt grouping mechanism so contextual related objects can be referred to. In addition, many linguistic phenomena remain unaccounted for in our grammar. In the theoretical world this is not a limit, but in the real world it does make a difference. The navigation system also has difficulties in extreme heterogeneous and homogenous space, and sparse and dense environments. The system is vulnerable to navigation in the wild for deficient or twisted language of the user in commanding the agent within virtual environments.

115

Concluding remarks

In this chapter, we have shown how a theoretical model might operate and perform on a broad range of movement. The bounded agent and directed agent use a different method of calculation, with primitive spatial entity, while the converse and transverse uses regional based theory. The environment-centered directions are not included; these impose an independent set of a coordinate system on the object (centroid). This system, which is not part of the dissertation, was completely ignored although a simple polar coordination is implemented in existing GIS.

116

17-Jun-09

CHAPTER 10 USABILITY OF THE SYSTEM 10.1 Architectural case Typical scenes that architects imagine the client might want or they might want are as follows: “Go to the back of building” “Rotate the building” “Go through the building” “Go around the building” Those sentences contain all the ingredients of the two higher function commands, converse and transverse. The particular commands where chosen for their graphical effect, any other combination as described can be generated. The following is a simulation of the system performance for a user. In the initial condition the viewer sees the front of a building (see Figure 10.1). The agent/avatar is not seen at this point, only when the agent moves does the camera fall back and shows the agent/avatar moving. (We will show at the movement of an agent/avatar in the command “Go around the building”. (see Figure 10.6))

Figure 10.1. Initial condition

The user then commands the system to “Go to the back right of building”

117

To understand what is meant by this command we must break down the sentence into its components. Given the agent’s position, we need to identify the building in question and analyze its geometric properties. We need to establish the preposition relationship i.e. the back of the building at a distance that shows the whole facade. The system then examines the object reference system and matches it to the “back right” and establishes it as the new position. This is a converse command and follows Rule III and knowledge rules 1 & 3. In the demonstration the camera is in the new position (see Figure 10.2).

Figure 10.2. Back of the building

“Rotate the building” Rotate the building requires the retrieval of the building generative axis. In ‘rotate the building’, the system prompts the user to find which way to rotate the building and at what angle. In the case of array rotation of the system, it switches to a fixed distance rotation from the building and the viewer controls the rotation through left and right commands. This is a demon activation (see Figure 10.3).

Figure 10.3. Rotate the building

“Go through the building” ‘Go through the building’ is a term for the system to perform movement between two objects, or an opening in a building, that is the node has a two or three-level structure. If the building has a complex layout, there are systems that recognize this and find the most efficient route. This is possible through a search such as ‘A*’ algorithm, but it is not the purpose of this system to solve such problems. This is a transverse command using Rule IV and knowledge rules 1 & 3 and demon 7. In our case the system shows an intermediary image at the point that requires a decision (Figure 10.4) in which the user replies “Go left”.

Figure 10.4. Through the building

118

17-Jun-09

“Go around the building” The term ‘around’ is defined as moving in relation to the circumference of an object. In addition, one can direct the observer through segmenting of the object reference system. The segmentation can be done by either referring to another goal, as in ‘circumvent the object’, or an even smaller section as in ‘turn around the corner’. In either case of segmentation, movement is away from the object. In our case one can direct movement while traveling by the commands ‘look left’, ‘look right’ and ‘stop’. In the case of prepositions like ‘around’, the system starts to move the agent around the house (see Figures 10.5-10.6). The camera pulls back to reveal the agent moving along a path around the building. The user then commands the system to stop at the back of the building (see Figure 10.7).

Figure 10.5. Initial phase – The user viewing the buildings

Figure 10.6. Around the building – Turning with agent in view

Figure 10.7. Final phase – Turning endpoint seeing the building as the agent sees it

This simulation shows the need for such an architectural tool even in the most basic form of a single building presentation. The system does not deal with rules of composition to form the attempt to frame “the perfect view”.

10.2 Evaluation of the object-centered tool The present research has attempted to provide insights into the process of navigation in order to improve the design process, offering the architect the ability to interact with objects as one moves in ‘virtual’ environments. The model augments elements of a route that work with intrinsic frame of reference to move the observer. The proposed conceptual system is modular in the sense that presenting all the modules gives the observer the possibility to form the required system. The model represents the cognitive process as one moves in virtual environments. The interpretation of the model into a tool of simulation of movement required the development of semiotics in order to construct the language command tool. As opposed to direct manipulation, the linguistic command of

119

movement in the virtual environment is without gesture, since movement is interpreted as a directional vector, an analogical spatial representation. The following aspect of movement has been achieved: The user-centered tool performs both agent-based and object-based navigation. That is, the user can refer to the movement of the agent as a base for movement or the user can refer to an object as a reference point in a triangulation scheme using frames of reference. We have shown that of the four elements of the path (see Chapter 5.2), the transverse path with phrases like ‘through’ has a limit. The act requires a decision to be taken along the path, as for example in the command ‘walk through the building’. The user-centered tool is generic and exposes all possible types of movement as presented through logic and evident in language. The object-centered tool enhances movement in virtual environments by introducing an agent making it more vivid and introducing language to make it more immersive. As was shown in the analysis of Tomb Raider, introducing an agent to the virtual environment causes the viewer to see the context of his environment from a point of view external to himself. For example, when the agent walks one can see the agent in contact with the ground, enhancing the haptic perception of that environment. The user-centered tool is also more immersive in the sense that it allows the user to be more engaged in the subject matter, and navigation becomes second nature. Using ‘on the fly’ user-centered tool performance is enhanced by a more efficient and effective system. The user-centered tool will function better in an architectural office environment where the architect communicates with various parties, while examining the various aspects of the three-dimensional project. The user-centered tool is effective in moving an agent through virtual reality, as shown in Table 4.2. The object-centered language tool has integrated all modules of the visual interface into a unified whole. As was shown in the examination of existing visual programs (see Chapter 4), one module cannot perform all tasks that are required, while the linguistic interface does not need any modules, and the interpretation of the verbal command is sufficient to direct the navigation (see Chapter 7). The object-centered tool is easy to use and learn since it uses everyday language. Talking to an agent represented on the screen does not require any transference of hand movement direction to an agent in the virtual environment. As with directing people in the ‘real’ world, movement in a virtual world opens almost the same channels of communication. The ability of most people to communicate properly

120

17-Jun-09

was never in question despite mounting evidence of people who are dyslectic and cannot perform this task. The primacy of vision, although recognized, should not interfere with the establishment of an alternative communication route. The object-centered tool is limited in the ability to select objects in the field of view â&#x20AC;&#x201C; as they increase so do the reference strings, and so does the demand on the computing power. The user-centered tool is also limited in the number of inferences that the system can make with the positional reference system. The system is bound to make misjudgments when the user switches from being in front of the object command to being in the back of the object command using the intrinsic and relative frames of reference (see chapter 9.3). The object-centered tool would be difficult for one person to build since it requires a combination of many components, such as voice recognition, language understanding, database knowledge combined with a three-dimension rendering engine. Thus a full demonstration of the tool is not possible at this stage of very limited resources, and only a thought experiment and a user needs demonstration are possible.

10.3 Initial testing of the hypothesis During the study, a test was conducted in order to determine the initial student reaction to which type of instruction, direct manipulation or linguistic commands, is preferred by the user. The test did not simulate an architectural investigation but a wayfinding task. The test simulated the proposed interaction using a human mediator to perform the task of the system on users. The test compared two groups using a direct manipulation program (VRML - Cosmo Player) and the linguistic approach. The test entailed training eight students, in two groups of four, to be proficient users of VRML. Both groups performed the same set of tasks but in opposite order, on two different navigational courses. The users were then asked fill in a questionnaire asking them to rank their preferences. If both groups had more or less the same preferences then the data was not contaminated by the recent experimental experience.

121

The experiment took place in a simulated environment of the Jordaan, Amsterdam, with the same points as the tour guide (see the guided walk in Chapter 8.2). Course (1) is point [c] and course (2) is point [f]. The students were able to observe both points [c] and [f] from the existing location point [x] (see Figure 10.8). The student instruction level was simple: the students used bounded and directed agent commands for the task. The location goal did not place any demands on the student to observe the location features. The students in the first group (A) learned VRML first, and were then told to navigate a course (1) from point [x] to point [c]. Then group (B) verbally instructed group (A) to navigate course (2) from point [x] to point [f]. The second group (B), which first gave instructions to group (A), learned how to use VRML, and then navigated a course (2) from point [x] to point [f]. Group (A) then gave verbal instructions to group (B) to navigate course (1) from point [x] to point [c]. The students received an initial observation point (Figure 10.8) and then proceeded to navigate at eye level. The group of students selected come from across the European Union. They are in their fourth year of architectural education, aged 20-23. Out of a class of 12 students, eight students volunteered. The questions they were asked were: 1) Which system do you prefer? 2) Why did you prefer one over the other? 3) Rank the difficulty of each task. We did not pose the students the question of whether they would need higherfunction language to accomplish an architectural task. At the moment, direct manipulation programs do not have any higher-function tools to compare with language. The results from the eight users show that all students preferred direct manipulation to voice commands. 1) They expressed a preference for being able to control movement. 2) They found it moderately difficult to express their intentions through language. 3) Ranking did not yield a consensus; three students from group A ranked the difficulties of giving instructions as moderate, while four students from group B and one from group B ranked the difficulties of giving instructions as considerable.

122

17-Jun-09

Figure 10.8 The test initial observation point

Discussion The results are not supportive of the hypothesis of language as the preferred navigational tool. The following is a discussion of why the students intuitively preferred direct manipulation. To begin with, the architectural student group that was tested is biased towards direct manipulation, and also falls within the target group of computer players (aged 15-35). The hand-eye coordination presents the user with the pleasure of learning a new skill and controlling another medium with direct commands. The idea of the one-to-one relationship between one’s action and the computer’s reaction, i.e. the idea of the action of the user being analogical to the action on the screen, is captivating, The students’ direction instructions were in simple language, using expressions like “go forward” or “turn right”, and thus the students used bounded and directed agent commands. The students used language intuitively with no clear definition of the difference between the linguistic command-based manipulation and direct manipulation of body movement. This result is not surprising, but why it is so has a more complex answer. The ranking shows how the students who practiced language usage had an easier time. Perhaps if the students had practiced natural language for navigation they might have chosen language navigation over direct manipulation navigation, but as it stands the students have a bias towards visual language. This is the first part of the answer to why the students did not use linguistic higher functions. The other part has to do with the students’ experience of language prepositions. During their studies, students have difficulties in trying to determine the frame of reference as they move around the building. There is a difference between existing direct manipulation with ‘bounded agent’ and ‘directed agent’, which are low functions, and the linguistic conceptual with ‘converse path’ and ‘transverse path’, which are high functions. There may not be an advantage to verbal commands if they are at the same level as direct manipulation commands, e.g., “go straight”, “turn right” – which can be easily done with direct manipulation. But they could have an advantage had they been at a higher level, e.g. “go around the building”, which direct manipulation cannot do.

123

The results are not supportive of the hypothesis of language as a navigational tool, but show the way that a â&#x20AC;&#x2DC;realâ&#x20AC;&#x2122; test could be conducted. Much more sophisticated testing would have to be developed, and users of many age groups and professions will be needed to establish user preferences.

124

17-Jun-09

CHAPTER 11 CONCLUSIONS; FUTURE AVENUES The research has investigated computer navigational systems’ capability to allow the user intuitive interaction and to move in space as desired in a trajectory. The tool uses a linguistic interface for navigating and manipulation in virtual three-dimensional environments. Existing visual systems can only point, whereas the new linguistic system can do more: it allows for converse and transverse paths. Because of this, the linguistic system was chosen as well as a theoretical stance, to expose the visual instruction. The mechanism of language understanding was examined and a way to address the imprecise and unambiguous nature of language was found. The mechanism of frames of reference helped to disambiguate linguistic spatial expressions. The tool contains a visual parser and linguistic parser, based on a constrained domain. This linguistic system demonstrated that it could work more effectively than any previous systems. The problem that this dissertation tackled was how to overcome the existing software shortcoming, what I termed the ‘flatness’ of the computer screen, the inability to grasp and manipulate objects naturally. The aim was to improve the immersive vividness of navigational interaction based on knowledge people have when they move through the built environment. A comparison between the visual and linguistic system revealed the linguistic superiority over the visual system in its ability to differentiate between different types of movement. The investigation focused on the linguistic mechanism of directing movement of an agent immersed in a virtual environment. That is, facilitation of an observer’s interaction through agent-based and object-based frames similar to real world experience.

125

11.1 Review of process The study revealed that there are three basic frames identified with navigation in virtual environments: agent-centered, object-centered and environment-centered. Thus one can navigate in a virtual environment not just through agent-centered commands but also through the reference to random objects in one’s path, and this can be done verbally in virtual environments. In the first phase of the dissertation, we critically examined various programs that facilitate movement in the virtual environment. The results of the examination conclude that there is a phenomenon of ‘flatness’ of interaction, the gap between intuitive hand movement and visual response, the gap between what we see and what we expect to see. Finally, a matrix of all possible variations of visual interaction was introduced. In the second phase, we discussed some of the limits of communication, as defined by the visual and linguistic commands. We proposed a new linguistic model of communication. A navigational system of verbal commands and visual output was chosen and some of the problems associated with it were addressed – mainly the different parameters of object grasping, fundamentally, point of origin, goal and direction. The third phase examined the performance of such a system in the context of architectural office use. It examined an imaginary scenario of an architect’s presentation of a project to his clients. The scenario developed is a compound of interviews conducted with architects. The scenario proves the importance of such a system for architects.

11.2 Findings The originality of the project lies in developing an explanation and representation of movement in environments that goes beyond the conceptual confines of current CAD-VR systems, leading to a system of integrated multiple frames of spatial reference for interacting in the virtual environment. Towards this goal, the research examined initially a number of typical cases of CAD-VR navigation systems and analyzed their performance. All those programs use direct manipulation: a virtual representation of reality is created, which can be manipulated by the user through physical actions like pointing, clicking, dragging, and sliding. The analysis showed

126

17-Jun-09

the lack of realism of these systems because of the userâ&#x20AC;&#x2122;s inability to interact naturally and effectively with the computer-based system to navigate and manipulate represented objects. In addition, the proposed system uses an agent to enhance the feeling of vividness and immediacy, that is, an avatar moving around the built environment that allows the user to directly control navigation while viewing of objects, as they are viewed by the avatar. This permits for both agent-based and object-based navigation with a destination that is non-distal. The user can define the navigational movement of the agent as a basis of movement or the user can refer to objects. In the new system, the user inputs his desire to the system requesting a change of viewing position, expressed in terms of natural language while the output remains visual. In other words, the user talks to the agent and sees on the screen what the agent views in the virtual world. The mechanism of language understanding was examined and a way to address the imprecise and unambiguous nature of language was found. The mechanism of frames of reference helped us to disambiguate linguistic spatial expressions. In order to address the problem of the imprecise nature of language the tool uses a userâ&#x20AC;&#x2122;s profile to predict possible intentions. We have stated that intention can be divided into an observational mode and manipulative mode. The results were to be incorporated in the new system using four types of command: bounded agent, directed agent, converse path and transverse path.

Tool development The goal of this dissertation was to build a navigation system that is interactive and fulfils the requirements of the architect to explore new horizons in the virtual threedimensional world. The system was built without predicate reasoning, due to time constraints. This means that there is a limit to how complex a sentence formation the system can handle. The linguistic tool has its limits in navigation in extreme heterogeneous and homogenous space, and sparse and dense environments. The system is vulnerable to navigation in the wild (natural environment) or deficient or twisted language of the user in describing the environment.

127

Model development This study developed a theoretical framework for a system that utilizes the agentcentered and object-centered navigation to allow the user maximum flexibility. Since the tool is a linguistic tool, it can differentiate between agent-centered and object-centered, allowing it to differentiate between different types of path. This research has presented an explanation of the specific experience of a person’s movement in virtual space. For the purpose of the explanation, a model was developed to direct the movement of a person with a ‘point of view’ and with the ability to direct attention, through linguistic commands. The model is effective because it achieves better matching of expectation to performance relative to previous tools since it combines the different levels of representation. The tool extends the observer’s ability to manipulate objects through frames of reference. It presents the possibility of traveling and engaging objects in virtual environments. The tool had to overcome some difficulties inherent in language understanding that make language imprecise. The mechanism of frames of reference helped to disambiguate linguistic spatial expressions. Two mechanisms were employed to correct the problem; one is the user profile where the intention of the user is revealed, and the other is the use of a reference system to establish direct location. Added to that, this is the constrained domain, and one has a system that could be used to navigate in virtual built environments.

11.3 Future research The approach that this research has taken has yet to be fully implemented in the virtual world. Once the tool has been developed, future research might expand our understanding of the way people navigate in the virtual environment. The tool allows for broad types of investigations for wayfinding (recall and inference) and can have a direct influence on our understanding of the physical structure of urban environments. Future research can be the establishment of qualitative research methods in urban studies and media studies. The investigation can also concentrate, for example, on the translation of the linguistic system functions back to visual instruction level. This can be examined by simulating the proposed interaction of the visual manipulation engine compared to a control group using a human mediator (much like in the second set of scenarios described in Chapter 10.3).

128

17-Jun-09

Conclusion

We applied an interdisciplinary research method that employs knowledge from cognitive science, computer science, vision and linguistic theory. We also utilized case studies to facilitate this interaction and examine its consequence. We examined how one can improve navigation through the examination of existing tools. Through this examination, we have demonstrated the inability of existing tools to interact with objects naturally. With further examination of visual and linguistic systems, we came up with a linguistic system for navigation that is superior to existing systems. Through the limitation of a scenario, we established the importance of such a system for architects, as a presentation tool.

. . .

129

17-Jun-09

Appendix I A scene from Sonja (see Chapter 7.3)

Figure 1.1 The Amazon is in the center of the picture; an amulet is directly to her right. The frog-like creature with ears is a demon; a ghost is directly below it. In the room there are two more amulets.

Figure 1.2 Sonja kills the ghost by shooting shuriken at it. A shuriken is visible as a small three-pointed star between the Amazon and the ghost.

Figure 1.3 Sonja shoots at the demon.

Figure 1.4 The Amazon throws a knife at the demon based on the instruction "Use a knife". The knife appears as a small arrow directly above the Amazon.

Figure 1.5 Sonja passes through the doorway of the room.

131

Figure 1.6 Sonja turns downward to get the bottom amulet.

Figure 1.7 Sonja has been told "No, the other one!" and heads for the top amulet instead.

Figure 1.8 Sonja has just picked up the amulet, causing it to disappear from the screen.

132

17-Jun-09

Appendix II

This is a compilation of interviews conducted in 2004 with architects in Rotterdam, the Netherlands.

Andrew Tang (MVRDV) Flight Forum, Eindhoven, Business Park

Andrew: We were concerned with the grouping of several towers and also the relation to the main road with it services, i.e. a bus line and car transport. From what points of view did you choose to examine the site? Andrew: We looked at the project from the airport (since it is seen from the air). We also looked at the project from the point of view of the passenger in a car as they approach the site. What were the objects you wanted to examine? Andrew: I was interested in viewing the relationship between towers. For that we generated an elevation along the line of the buildings. The map helped us to establish the distances between the buildings but this is highly abstract. We also analyzed the movement of cars through the site. We did not generate a sequence of perspectives of the chronological movement of the passengers, which would have helped people understand and experience the site. There are may parameters for one to examine in this kind of project, and one of the dimensions missing from the plans and elevations is the three-dimensional view. There are many planes that you have to visualize three dimensionally. As architects we are trained to work in two dimensions and think three-dimensionally, but the general public is not. Is there some other point of view which you think was important and you did not generate? Andrew: We needed some perspectives from different points of the site which would have shown the relation between the different elements of the site in three

133

dimensions. The model can only generate views from the parameters of the site, you are still not embedded in the site.

Eva Pfannes OOZE Architects Vandijk clothing store

Eva: I was concerned with how my client inhabited the space. I wanted to develop a structure to support the activity that takes place in the confined space, to use the depth of the space of 20 meters with maximum flexibility. From what points of view did you choose to examine the store? Eva: We examined three points of view: The entrance from where the client looks at into the store; The way the client would look at the merchandise, looking at the left side of the store; And from the back of the store looking out. Eva: I then realized that we needed an overall view so we can examine particular areas of concern, so I generated a wide view perspective from the top to show the layout of the store. The client had problems understanding how particular situations will work and I had to explain how the place would work. In order to overcome the limiting view of the plan we have run several scenarios of usability. So with my client I went through scenarios of shoppers, suppliers and attendants. For example, with the shopper, we examined how to attract him to the store and once inside what he will see and do next. Where are the dressing rooms and can the shopper see them without hindrance? How would the merchandise be displayed and how inviting is it?

134

17-Jun-09

Paul van de Voort DAF A house in Den Haag

The house design was part of a prearranged package for the client to choose from. This is a very redistricted site with pitched roofs, volume, gutter, etc. The reason why such restrictions were imposed was an attempt to preserve the old nostalgic style. We were asked to generate a typical house. When the client came to us, he needed a bigger house and that is how we met. The client wanted gables in front and a veranda at the back. When you look at the section the house it is divided into two parts: the front with a garage and bedrooms on top, the living room is at the back. The front was more in keeping with the old nostalgic houses while the back was more open. From what points of view did you choose to examine the house? The way we work is with models. We presented the client with a model and talked to him about the house. We always put in the human scale so the client can imagine himself in the house. This is the closest experience one can have to an actual walkthrough. We do not generate walkthroughs or single perspectives because they do not convey enough information, and a video walkthrough is too constrained for the client. There is always something that the client wants to see that is not included in the video.

135

17-Jun-09

References … Index Axial Parts: Directed, Generating, Orienting Observational mode / Manipulation mode Converse / Transverse Allocentric / Egocentric Centroid Cognitive maps User activity Model Telepresence Model Augmented Computer programs 3DS MAX® – CAD Amazon’s Sonja CAVE™ Cosmo Player® - exploration program Myst® – exploration quest games Tomb Raider® - action games Continuity constraint Direct Manipulation Spatial frames of reference Absolute frames of reference Intrinsic frames of reference Relative frames of reference (deictic) Gesture Correspondence rules Intention Visual reasoning/ spatial reasoning Visual criteria Sensorimotor information Object rotation / array rotation

137

Orthographic Panning / Zooming Path structure Perspective / Point of view Pointing and labeling task Orientation information Polar co-ordinates Object reference system Prepositions of English Features of an object Reference system Scene recognition Event and State Route knowledge Semantic / Syntactic Lexical Conceptual Semantics Spatial cognition / Spatial language Spatial depiction Spatial description Spatial representation Visual information Haptic information Substantiality constraint/continuity constraint Naive physics Intuitive physics Panoramic navigation User activity Visual criteria Visual perception Visual navigation Intentional structure Attentional state

138

17-Jun-09

Samenvatting

ABOUT THE AUTHOR

Asaf Friedman was born in Tel Aviv, Israel, in 1958. Prior to his doctoral studies, he was engaged in teaching architecture at several architectural and design schools, primarily in the Technion: teaching basic design courses, second year design, and multimedia courses. In addition, he co-founded “Archimedia Ltd.”, which provided architectural services to clients as well as visual and multimedia presentations. The firm produced 3D models, animation, and multimedia interactive presentations for major architectural and industrial design studios in Israel. Between 1991 and 1993, he studied for his Master’s degree at Pennsylvania State University with a thesis entitled “The Architecture of the Face, Levinas’ Theory of the Other”. In this project, he showed the importance of non-representation and explored (what he termed) the ‘architectural Other’ - that which evades representation. Emmanuel Levina’s notion of the face was examined with respect to the façade of buildings, specifically to buildings which were presented in the works and writing of Leon Battista Alberti. In the years 1988 - 1991, he worked in New York as an architect, and during that time he constructed a conceptual ideographic tool, and together with the philosopher Dr. I. Idalovichi published it as a book by Macmillan Maxwell Keter Publishing under the title “Eidostectonic: The Architectonic of Conceptual Ideographic Organon”. In this text, he attempted to combine conceptual and visual elements of semiotics. The relationship between representation and signification was examined, and a synthesis of ideograms and concepts was represented in a singular comprehensive tectonic system. The book concentrated on the notion of space, and its historical conception from the ancient Greeks to the modern sciences. In 1986 he gained his Bachelor of Architecture degree from the Technion Israel Institute of Technology in Haifa, Israel.

139