Rebugging Mixed Reality: Moodroom

Page 1

Coelho Debacco, Arthur Fülöp, Botond Chen, Xuanlin

Chair of Architectural Informatics Technical University of Munich

2 The Moodroom Chair of Architectural Informatics Prof. Dr.-Ing. Frank Petzold Rebugging Supervisors:RealityNickFörster, Gerhard Schubert CoelhoAuthor: Debacco, Arthur 03726469 Fülöp, Botond 03734435 Chen, Xuanlin 03742310

3 34323028231914121064 ContactReferencesCollaborationOutlookDiscussionFinalPrototypeConceptIdeationResearchIntroductionImplementation Table of Contents

4 Introduction

Virtual reality technology has been present over decades, offering us the possibility to have immersive experiences, simulating or not the real world and its properties. VR is continuously a topic of research and has been used in different fields of application, such as medicine, videogames, education, and architecture. However, as we observe nearly what virtual reality is capable of at present, we notice a major drawback that terminates it from being an organic experience: the consideration of our emotions. The inputs of a VR setup are rather limited to sensors that help to mimic spatial changes, such as relocation and certain movements. Thus, the feedback seems mostly synthetic and barely reacts to someone‘s inner mood, as against outside, in the real world. In fact, people use gestures consciously or unconsciously as reactions to similar acts of others. The time we spent together moving, changing, and evolving becomes a self-sustained and reaction-based system. The human factor marks the living itself, where constant adjustments are required towards the common well-being. A person reacts to the reference experienced, and afterward, the reference reacts back Toslightly.consider emotions in a virtual environment is a non-trivial task. Emotions are abstracts, subjective, dependent on many factors, and complicated to decode. Nonetheless, changing the space has a response on one‘s visual perception, and even if slightly, influences a person’s mood. Based on this statement, The Moodroom project addresses the following question: How can we form visually static surroundings in a VR immersive experience into something that can reflect or alter the feelings of a human being?

Image 1: An abstract representation of The Moodroom Source: Own illustration

6 Research

1. Influence of lifeless elements

“a series of studies in a number of Western cultures (e.g., Ekman, 1971; Izard, 1971) demonstrated that human beings display their emotional states through similar facial configurations.” (Aronoff, 2006, p. 84) Due to cultural and geographical diversity of people, it seems hard to believe, that there are certain principles with the approximate magnitude. Deciding whether a setup is dangerous or not, depends on the outcome of that recognition process. The shorter this process’ temporal length lasts, the sooner can the possible physical reaction start. This means, knowing instantly if a person’s or animal’s face threatening, can increase survival rate. “the helplessness of babies, the looks of fear or anger, and the ravages of disease require fast and sure recognition if our species is to survive.” (McArthur & M. Baron, 1983, p. 220) Thus, there should be a set of collective definition for determining, which determination processes refers to biological mechanisms that evolved through time. “overall geometric configuration provided by the facial features, rather than individual features, was how a culture defined the emotional representation.” (Aronoff, 2006, p. 85) Knowing the fact that the valence of the recognition process comes mostly from geometrically defined outlines, to earn speed in processing. Therefore, it is possible to use basic shapes as an imitation to acquire emotional responses. In an experiment (Aronoff, 2006, p. 85), students were asked to draw masks which represents frightening will and masks which will be worn to win the heart of their beloved. Out of 19 mask characteristics, 18 discriminated significantly between the two types of masks. The classification was done

Some high and low threatening stimulus displays. (More threatening on the Source:left.)Aronoff, Joel; M. Barclay, Andrew; A. Stevenson, Linda (1988): The Recognition of Threatening Facial Stimuli, Journal of Personality and Social Psychology, p 651.

7 as per a former study: (Aronoff, M. Barclay, & A. Stevenson, The Recognition of Threatening Facial Stimuli, 1988). Thus, we have resolved to separate the possible outcomes of The Moodroom into 2 contradictory sides and give the opportunity to also experience the transitional states, just as in case of those Imagemasks.2:

8

2.Shapes affecting human

“We suggest that aesthetic experience is a function of the perceiver’s processing dynamics: The more fluently the perceiver can process an object, the more positive is his or her aesthetic response.” (Reber, Schwarz, & Winkielman, 2004, p. 365) This means, to evoke tranquillity in the object of The Moodroom, we must influence the recognition process itself. The more complication occurs via processing, the higher the anxious level is. There are many factors that have effect on the emotion’s valence: symmetry, repetitiveness, angularity, randomness, intensity, complexity etc. Providing the most influential setup rests on the recognition which unequivocally sifts the factors into two groups. One group of factors force the subject to leave or change that situation whilst the others suggest staying or to settle down in the room. The reason behind this, are those instincts’ help with we can rapidly decide whether a shape’s outline refers to a dangerous threat or to an unharmful event. “beauty is grounded in the processing experiences of the perceiver that emerge from the interaction of stimulus properties and perceivers’ cognitive and affective processes.” (Reber, Schwarz, & Winkielman, 2004, p. 365) “aesthetic experience is a function of the perceiver’s processing dynamics: The more fluently the perceiver can process an object, the more positive is his or her aesthetic response.” (Reber, Schwarz, & Winkielman, 2004, p. 365) Therefore, fluency is one of the attributes that if any stimulus has, the easier and faster can it be processed. Fast and easy recognition then leads to be aesthetically satisfactory. “High fluency may also feel good because it signals that an external stimulus is familiar, and thus unlikely to be harmful.” (Reber, Schwarz, & Winkielman, 2004, p. 366) Forms and shape compounds that contain less information in a way of symmetry, redundancy or because of being built up from simple or small number of elements, can be memorized and recognized easier. In an experiment that investigated the capability of apprehending a given amount

9 of information, showed that it is more likely to memorize an arrangement if it had been placed symmetrically. (Attneave, 1955) In our context, this also implies a higher possibility of being unharmful. We came to a conclusion, where the two endpoints of our virtual room had already have clear attributes; CALM SET: easy to comprehend, round shaped, driven by curved contours, fluent, arranged, repetitive, patterned and has the most symmetrical axes but not even a flat surface, since it would miss any chance to be an orientation point for the perceiver; HECTIC SET: random points of the surface are lifted to random height (in a given range), thus angular, mostly incomprehensible, diverse and not arranged at all. Image 3: Representation of the two sets. Source: Own illustration Calm Hectic Clear Dense Clear Dense ComplexSimple

TheIdeationrecognition

10

During brainstorming sessions and discussions, we asked ourselves what limitations impact us the most in a virtual environment. What makes us not feel comfortable inside a VR space? We rapidly concluded that often in VR spaces, the synthetic and static architectural design bothered us. Most of us, after a while, started to feel anxious and lonely inside VR rooms, even though there were many participants in the very same room and close to each other. We missed the possibility to recognize facial expressions, the slight changes of the surrounding, more interactivity, more emotions. We missed being able to feel the mood of the environment! The idea After recognizing what was missing in VR, we decided that we wanted to allow a room to have more activity, more dynamism, more influence-based reactions.

In the real world, every action is followed by some kind of counteraction, that often correlates in the emotional field to valence and arousal. Since changing static elements in the real world, governed by the laws of physics, is rather difficult, we spotted the perfect opportunity to do this in the virtual world. Exploit the huge potential that a VR setup has and with that, redefine the static spatial relations to use them to create mutations on dimensions. That is how the idea of The Moodroom came into existence. The scenario of a real application A scenario, in which The Moodroom could be used as a helpful tool is a therapy session. In the context of psychological counseling, emotions are an important reference for the psychologist to judge the patient‘s state. Sometimes patients cannot truly express emotions through language, so trying to decode the patient’s emotions could lead to positive results in the field of psychotherapy. The Storyboard (image 4) illustrates this situation.

11

In a therapy session, a man talks to the therapist about some emotional issues in his life. He seeks support to understand his feelings. The man sets his VR headset and starts the application. He finds the interface very intuitive and begins to experience it. After understanding the man‘s issues. The therapist suggests he try a new virtual reality application, the Moodroom. An immersive experience where he could try to visualize his emotions.

Image 4: Storyboard Source: Own illustration

By saying negative words he stimulates the surfaces of the room to morph into random sharp and aggressive shapes. After the therapy session, the man tells the therapist how much he loved being inside the Moodroom. He can‘t wait to share the experience with his friends! Now he tries to express himself in a positive way. The surfaces then morph into more curved and smooth shapes. He also realizes that the quicker he moves, the faster the surfaces move.

The workflow works as follows: first, there is a current emotional state. This emotional state will be depicted by analyzing the user‘s movement, voice pitch, and meanings of its saying, the video content, and the physiological features. Using these data, the valence and the arousal value of the emotional state can be extracted. After the proper picture of the emotional state, the algorithm can calculate the corresponding spatial features. The features that have been calculated, are a set of stimuli. Their creation is based on former psychological research. The spatial features are presented on VR glasses and through an iterative process, a new emotional state is generated.

At present, there is a lot of literature trying to establish credible emotion models, which can be considered as the basis for identifying emotions. However, these data charts and mathematical formulas are complex and require much time to be applied. Therefore, we propose a hypothesis, based on unconscious movements, tone, and peripheral physiological features.

In the concept of The Moodroom we approach and investigate the synergy of architectural design, emotions, and virtual reality. Trying to decode human emotions and use them to design.

The workflow of The Moodroom

12 Concept

The approach is to capture as many signs as possible a human emits to then convert them into spatial qualities of a room. After a short period of interaction of a person, the room should be able to assimilate what is the probable emotion the person is feeling. This iterative process can result in a self-driven, endless, and supportive functioning. In this manner, an algorithm could represent with precision what the subject is feeling, and the person can visualize these feelings.

13 Image 5: Concept Source: Own illustration

14

Prototyping

Low-fidelity Prototype

The prototyping part of The Moodroom project was mainly executed in two steps. The first one was a Low-fidelity Prototype, which helped us to narrow down our goals, sculpt our concept and identify possible challenges and limitations. The second prototype was more focused on the real implementation so that users could already test and evaluate it.

To build the first prototype, the free and open source 3D creation suite Blender was utilized. This tool allowed us to construct and visualize how our final prototype could look like. A simple cubic room was modeled and animated by changing its size, color, and form of the surfaces. The animation was rendered as a video and through video editing, some features were added to simulate the interface with which the user could intuitively understand and if desired interact. Icons positioned on the upper-right corner of the screen would help the user to keep track of physiological signals (heart rate, voice, motion intensity). On the bottom-left-corner, icons would inform which room parameters (size alteration, surface morphing, color) are currently active and could be enabled or disabled by positioning a controller and pressing a button. On the upper-left corner, the icons for Menus were set and would give the user the possibility to adjust settings, customize certain parameters and add objects to the scene.

Image 6: Low-fidelity Prototype Source: Own illustration

Improving the Prototype In a further stage of prototyping, the cross-platform game engine Unity version 2019.4.16f1 was used as a tool to get closer to the results that we wanted to achieve. Inside unity, two scenes were created. The first one containing the start menu with background music and a button “Start” to be able to go to the next scene. The second scene is where the user can interact with the room. It is possible to walk, run, jump, look around, and of course to express feelings. Image 7: Unity Development Environment Source: Own illustration

Image 8: Kinect tracking the right and left hand movements Source: Own illustration

At the current state, the user can express himself in two distinct ways: through hand movements and speaking. These signals are captured using two different procedures, treated through scripting, and directly assigned to room parameters.

Motion detection using the Kinect The Kinect, a motion sensing input device, is capable of tracking the motion of several joint connections from a human body. From these joints, two were enabled to be tracked in this prototype: the right and the left hand joint type, which were assumed to be the two most representative parts of the body when trying to express feelings. Through an algorithm, the velocity of the hands was calculated and attached as input for the speed value in which the surfaces of The Moodroom move.

Using a simple microphone, the user can input the voice signal into The Moodroom. This signal is primarily transcripted by the IBM service Watson Speech to Text. Then the text serves as input to the IBM service Watson Tone Analyzer, which uses linguistic analysis artificial intelligence to detect emotional and language tones from sentences. To work, both services must be activated through API keys and service URLs from an IBM Cloud account.

As a next step, the outputted probable emotion from the Tone Analyzer is set to room parameters. In total 4 emotions were chosen to influence the room: joy, anger, sadness, and fear. How exactly these emotions modify the room will be explained later on, in the section “Final Implementation” of this booklet.

Speech recognition and analysis using IBM Watson Speech to Text and Tone Analyzer

Extra features of the interface On the screen, two different menus were implemented as extra features, with which the user can interact. One main menu, where the user can adjust settings and exit the application, and another menu for customizing the texture the user desire. However, the menus were not fully developed and work partly as mere representations to add future functions. The visualization of the current predominant emotion is possible on the middle-upper part of the screen, as well as the real time speech transcription in the center of the screen.

19

Since implementing the concept with all its elements would require a significant amount of resources, a team of specialists from different fields, such as psychologists, programmers, architects, as well as a considerable amount of time, we decided to simplify it by performing a final implementation with only a part of the concept.

Image 9: Simplified version of the concept for the final implementation

Source: Own illustration

Final Implementation

As mentioned earlier in the section “Prototyping” of this booklet, two main signals were captured and used as triggers for the current version of The Moodroom: the movement velocity of the user‘s hands and the tone of the sentences spoken by the person. A total of 4 emotional tones were addressed to change the parameters of the room. Each sentence analyzed by the IBM Watson Tone Analyzer returned the most likely emotion based on a scoring system from 0 to 1 of all possible tones. These scores were used to set how much the room parameters change, allowing it to alter in real time after each transcription analysis. The assignment of the emotions to the room parameters was done as follows:

To obtain the desired results, the weighting of the values of the above quoted parameters was necessary. After several iterations and adjustments by the method of trial and error, the outcome can be seen in image 10.

The walls and the ceiling of the cubic room were created by positioning several small cubes next to each other, more precisely a matrix of 10 x 10 elements per wall/ceiling. Each small cube was assigned with properties from the Asset named Deform for unity, which made it possible to realize the effect of the surfaces morphing. The ground was added as a simple plane where the user can walk over it. Furthermore, a first person controller, simple sound effects for walking/running, as well as a background were added.

•Both joy and sadness scores were assigned to a wave deformer component. The higher the joy score, the higher the amplitude of the waves. The higher the sadness score, the lower the amplitude of the waves;

•The score from the emotion fear was connected to the brightness level of the diffuse environment light of the room;

•The score of the emotion anger was set to the magnitude of the Perlin noise deformer component. The higher the score for anger, the higher is the overall strength of the noise, resulting in a pseudo-random overall appearance to the surface.

Constructing the Scenario

Assigning emotions to room parameters

20

21 Image 10: Response of the surfaces to the emotions: a) fear, b) joy, c) sadness, and d) anger Source: Own illustration a) b) c) d)

An emergency exit for an emergency

22

Image 11: Emergency exit of The Moodroom

The final implementation of a simplified concept of The Moodroom turned out to be an interesting successful experiment. A kind of game, which is capable to reflect what the user is feeling, even if it is in a quite simple manner.

Source: Own illustration

As an assumption, that a confined space room in virtual reality may cause discomfort to certain people (for example someone who suffers from an anxiety disorder or a neurological disorder), an “Emergency Exit” was added as a way of escaping the environment of the room. This can be accessed simply by moving towards it and leads to a positive, low-poly nature environment, which might help to calm the user down.

In fact, in addition to those two parameters, others should be included, such as video images and physiological characteristics, to map emotions with high accuracy using the valence and arousal model.

How to react to the perceiver‘s emotion and what are those reactions based on? Since we know that affecting people‘s mood with consciously designed surroundings can be achieved, a correspondence between the subject‘s emotional state and the surrounding must be constructed. This is would probably “cure” people‘s disinterest.

23 Discussion

Due to time and equipment constraints, The Moodroom currently only uses two parameters: movements and voice recognition to recognize emotions.

Under the chapter concept, we described how the final use case would look like. To produce an application that can be useful or helpful, much more resources must be used. There are numerous aspects of human behavior that should be taken into consideration. Previously, we described only one of the methods, that can be used to obtain emotional parameters. After proper determination of all the emotional parameters coming from signs that a human emits, we must find the perfect domain in the changes of the spatial parameters and assign them to the human factors. These processes require extensive experiments and data gatherings. Therefore, we do not exactly know the precise relation between certain shape compounds and the extent of their effect on people yet. Only the rough directions.

Image 12: Illustration of the 3D emotion space.

1. Emotional dimensions To be able to connect such an affective attribute to a number-controlled system, first, the translation of the emotion is necessary, by assigning to every state a value. Those values represent the Valence, Arousal, and the Control (Dominance) parameters of the emotional state(M. Bradley, 1994) (J. Lang) (C. Osgood, G. Suci, & P. Tannenbaum, 1957) (Russel & Mehrabian, 1977).

Source: B. Dietz & Lang, 1999 To map certain emotions, a first approach would be to map the facial expression of a person, inside the room. Use a method based only on the video and audio content of the stopover inside. This method has been outworked by Alan Hanjalic and Li-Qun Xu, in the work of Affective Video Content Representation and Modeling (February 2005). A short explanation of this method comes below.

25 2. Model for Arousal (emotion intensity)

It is possible to extract the arousal level of a video content, which in our case is the view of and the sound produced by the user. The arousal level consists of 3 three main components:

Image 14: Arousal level of a football match.

Image 13: Arousal components‘ curves.

• The rhythm component: obtained by investigating the changes in shot lengths along with the video.

• The sound energy component: obtained in synchronization with video frame interval by computing the total energy in the soundtrack of the video.

Source: Hanjalic & Xu, 2005 February, p. 151

Source: Hanjalic & Xu, 2005 February, p. 151

The method and the calculations can be found in the study. (Hanjalic & Xu, 2005 February, p. 148)

• The motion component: obtained based on the overall motion activity measured between consecutive video frames.

26

Image 15: An arousal and a valence curve in comparison

• Pitch-Average Component: the average of pitch signal values. The average pitch is useful to distinguish positive and negative affective states (e.g., happiness [high-pitch average], sadness [low-pitch average]).

• Evaluation: the measurement of the emotion type (described in the study).

Source: Hanjalic & Xu, 2005 February, p. 152

3. Model for Valence (emotion type)

(Hanjalic & Xu, 2005 February, p. 150)

Valence describes the tone of an aroused situation and values of arousal and valence are related to each other. This means that the range of arousal values determines the range of absolute valence values. To determine it, the following additions must be computed:

27 4.Affect curve As can be seen from the previous figure, the effect of the control dimension only matters, when we are facing a distinctly high arousal level. This effect is also quite small, due to a narrow range of values belonging to this dimension.

Numerous studies of human emotional responses to media have shown that the elicited emotions can be mapped onto a space created by arousal and valence axes. After the evaluated affect curve, we can assign every period to a rough emotional map.

Source: B. Dietz & Lang, 1999

Source: Hanjalic & Xu, 2005 February, p. 146

(Hanjalic & Xu, 2005 February, p. 146).

Image 17: Illustration of the Arousal, Valence and Affect curve

Image 16: 2D emotions space

The Moodroom is currently used as an experiment to decode and visualize emotions. However, the project can be optimized and enhanced by testing more parameters, therefore improving its accuracy. It is also able to expand to a wider range of usage scenarios, for instance in exhibitions, conferences, architectural design, games, and medical treatment. In addition, the user interface should continue to be optimized to provide more setting possibilities according to different scenarios. Therapy mode In case of medical use, experts can research The Moodroom setup that will have been produced resting on someone‘s emotional profile. The visualized mood which can be also experienced by the subject, can strengthen or mark out psychological defects. Since studies showed that, it is possible to describe well the way someone feels, based on his/her body signs, but unifying all these, requires more experiments and the knowledge of medical experts. Every emotional state has multiple signs in a form of a communication attitude, movement features or physiological signs.

Being the base of another VR spaces If we imagine this tool as a possible addition to any VR space or an ingredient of the future VR world, what would it be? Since the VR has no limits, it is hard to shape a space inside of it, with reasonable intentions. One aspect can be the nature itself, because we find that pleasant. The other aspect can be the process itself, which with we find something pleasant.

28 Outlook

29

Exhibition All this can start with an exhibition in a museum for everyone who wants to try how these forms based on his/her own personal feelings. This also can be a good opportunity to gather data and research the relation between a crowd and a person. How a bunch of people reacts to the changes that based on another person‘s emotional state.

Image 18: Poster for exhibition Source: Own illustration

Multiplayer After investigating the role of a person, multiple participant‘s effect can be the next. How people react to others‘ spatial setup, what if a big space has multiple and open emotional response and what if everyone sees his/her own space. What kind of communication would happen in case of everyone has its own world and what if we share our ones with others.

Mainly responsible for the research part, low-fidelity prototype, and the for visualization and graphic tasks, including storyboard and diagram, video and audio editing, research work and brochure.

MainlyChen,brochure.Xuanlinresponsible

In this project, we have contributed our ideas and energy together, but each person had a different focus. Coelho Debacco, Arthur Mainly responsible for the overall organization and programming tasks, as well as the final implementation of the prototype, video and audio editing, and the Fülöp,brochure.Botond

Due to the restrictions of the pandemic, the project was conducted online. Every week, through zoom meetings, we discussed the current state and the progress of the project and planned the tasks for the following weeks. The information-sharing was centralized into the real-time collaboration platform Miro, which performed as a very convenient sharing platform and discussion tool. The iteration process of the concept phase and the display of progress were also carried out in Miro.

30

Collaboration

31 Image 19: Research part in miro Source: Own illustration

M.Bradley, M. (1994). Emotional memory: A dimensional analysis. In T. H.M. Van Goozen, N. E. Van de Poll, & J. A. Sergeant, Emotions: Essays on Emotion Theory. New York, East Sussex: Lawrence Erlbaum Associates, Inc. McArthur, L. Z., & M. Baron, R. (1983). Toward an Ecological Theory of Social Perception. American Psychological Association, Inc.

Aronoff, J. (2006). How We Recognize Angry and Happy Emotion in People, Places, and Things. Michigan: Michigan Stat University. Aronoff, J., M. Barclay, A., & A. Stevenson, L. (1988). The Recognition of Threatening Facial Stimuli. Journal of Personality and Social Psychology, 648.

Hanjalic, A., & Xu, L.-Q. (2005 February). Affective Video Content Representation and Modeling. IEEE Transactions on Multimedia, vol. 7, no. 1. J.Lang, P. (dátum nélk.). The Network Model of Emotion: Motivational Connections. In R. S. Wyer, & T. K. Srull. Hillsdale: NJ: Lawrence Erlbaum.

Attneave, F. (1955). Symmetry, Information, and Memory for Patterns. The American Journal of Psychilogy. B.Dietz, R., & Lang, A. (1999). Aeffective Agents: Effects of Agent Affect on Arousal, Attention, Liking & Learning. Proc. Cognitive Technilogy Conf. San Francisco. C.Osgood, G. Suci, & P. Tannenbaum. (1957). The Measurement of Meaning. Urbana: IL: Univ. Illinois Press.

32 References

33 Reber, R., Schwarz, N., & Winkielman, P. (2004). Processing Fluency and Aesthetic Pleasure: Is Beauty in the Perceiver‘s Processing Experience? Personality and Social Psychology Review. Russel, J., & Mehrabian, A. (1977). Evidence for a three-factor theory of emotions. In Journal of Research in Personality (old.: vol. 11, p. 273-294).

34 Contact Coelho Debacco, Arthur 3rd03726469Semster Ressource Efficient and Sustainable Building Fülöp, ArchitecturalErasmus03734435BotondSemsterEngineering

35 Chen, Architecture3rd03742310XuanlinSemester

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.