Page 1

Multimodal Interaction Giorgo Campo, Hugo Christiaans, Jelle Dekker Industrial Design TU/e, May 2011

Module DB303

3 Foreword

Dear reader, In front of you lies the report about the interactive word-cloud system that is created by three Master students of Eindhoven University of Technology, for the module Multimodal Interaction. The aim of this report is to inform you about the process towards the final product, but also about the functions of the concept. On request the personal reflections are added also to the report and can be found in the appendix in the back of this report. With kind regards, Team word-cloud


5 Table of Content Foreword Idea Generation & Selection Concept Why Multimodal? Scenario Prototype

..3 ..6 ..8 ..9 10 16

Personal Reflections Giorgo Campo Hugo Chrstiaans Jelle Dekker

18 19 20

6 Idea Generation & Selection

The aim of the second assignment in the module was to design a novel multimodal interaction for a team of people, e.g. a power trio. During the brainstorm we started to explore areas where people work together, but in a difficult environment which puts constraints on the communication methods possible. This for example lead to underwater(welding) but also roadworks, situations which need a clear coordination which can’t be achieved by normal speech. Another area we found interesting to explore was teaching, i.e. a musical instrument. With this the teacher can show things, try to explain. But the actual feeling that can be so important when playing an instrument isn’t a modality we can communicate. In the end we selected an idea where the computer would make “post-it’s” of your conversation, to translate the spoken conversation in a simple visual form (word-cloud). This because this could be used in several ways in a design team for new inspiration or more documentation. Compared to the other ideas we explored this was much closer to our own experience which made it more accessible in such a short time.

7 Constraints in current situation

We found three scenarios were we wanted to use our selected idea, and constraints that exist in the current situation that could be interesting to try and solve. The three scenario’s are: a design brainstorm, a focus group and an interview. For the first the word-cloud could serve as live inspiration, for the other two it can aid in documentation. A list of constraints where the word-cloud could aid: • If someone is taking notes it’s much harder for him to follow the conversation. • It’s hard for people to follow the conversation if there are more people talking at the same time. • Post-it’s are ideal for small notes during a brainstorm, but can easily get lost and don’t serve well as documentation for the whole group. • During an interview or focus group you need an audio recording to check later whether your writings were correct or a reinterpretation. • Things you say can be important but may go unnoticed or not written. • It’s hard to summarize and reflect on things during the conversation, earlier important points may be forgotten due to other things said (already).

8 Concept

Cloud of Colourful Ideas (COCI) is a system that supports communication inside design teams and tries to stimulate as well as optimise their teamwork by analysing several single data streams and visualise the information in an synoptic way, using a word-cloud. Coci is a multi functional system that is able to capture and filter the most important and relevant information and recurring topics gathered during interviews or live during brainstorm sessions. Visualising information helps the team to get a good inside view about the topic and helps them to generate solutions during brainstorm sessions. Through different sizes of the text, Coci communicates the occurrence of the words, while through random colours Coci tries to trigger and stimulate the teams creativity. How does Coci work Coci consist of a software program and a physical device that has to be worn by the user. The physical device records the voice of the person and transmit the data to the central computer on which Coci is installed. Each member that participates in a session has to wear a physical devices as the devices uses voice recognition to optimise the recording settings. The physical device. The physical device is a small recorder/transmitter which is light weighted and easy to wear. The design of the device is kept simple as it has only has to records and transmit data. Therefore the device is only equipped with a on/off button an indication light, inlets for the microphone and a USB connection to connect the device with the computer. Each member who takes part in a session turn his or her device on to let it communicate to Coci. When the device is turned

9 on, an blue indication LED informs the person that a communication with Coci is established. The software The software is an intelligent algorithm able to grasp and visualise keywords from an audio track, it recognises voice and filters it. The feasibility of the idea is supported by two main examples: Google translator is able to catch english sentences and write them down (and give a translation of course), by the way COCI goes beyond this suiting several languages; on the other side, is able to filter text and clean it from most common words in several languages, it also has an editable list of words that have to be removed from the text. COCI is useful when meeting in order to share insights and the whole output after an interview or a focus group as it easily gives access to keywords summarising the discussion. Keywords said most time are bigger and as a general rule keywords related to each other are visualised close together. On a second level, the software also allows to access the whole discussion. COCI is useful also after or during a brainstorm session. Besides recording the session in real time, it enhances creativity encouraging connections among random words which are visualised as coloured in the same way. Because Coci works with digital files, it’s easy to share with others insights and the whole interview/ discussion.

Why multimodal?

Coci aims to give a simple visual translation of much more complex audio speech input. The system on itself may not be directly multimodal, if you look at it from the perspective of human-computerinteraction, it only takes speech input and only gives visual feedback. This means that it is operating with two modalities, but both only in one direction, it’s making a translation between the two. This is also apparent because Coci adds a data stream to a discussion, but isn’t directly influenceable by the user. The real multimodality in Coci comes from it’s use, it provides an extra modality in a discussion that the users didn’t have between them before. Especially in the scenario of the design brainstorm, where the word cloud is generated live, you directly use this visual modality in a situation where you would before only have the audio speech. The visual output from the word cloud is a direct input in the (audio)discussion of the users. Of course people can now also write down ideas during a brainstorm, but Coci’s visual output is a much more direct translation of the speech and words that occur in the discussion. In the scenario of the focus group and the interview it may be less clear. Here the word cloud is a similar visual representation of speech, only the users probably interact with it more after the discussion, not during. This means the system does add a modality to the discussion, but not in the same time as the discussion is happening.



Scenario 1 - Design Brainstorm

All team members turn their recorder on and set them up in the initial screen of the program, after that they can start discussing as normal. While the team members have their brainstorm, the system records and displays the most occurring words on the screen in random colours. During the discussion the team pauses the system, afterwards they use the word-cloud serves as a new source of inspiration, using the words that are (unconsciously) used frequently, where you can link words and search for new connections. For this scenario the video can be viewed through the following link: 01 - Initial Screen for setup of the conversation

03 - The word cloud grows and changes during the conversation

05 - The word cloud after most of the brainstorm

02 - Reoccurring words start to appear on the screen

04 - The team pauses the system to go for a break

06 -The team uses the word cloud for new inspiration



Scenario 2 - Focus group

(the screen)

“What if you could eat the packaging!?” Kim 11.58

“...I don’t eat the whole, I want to store it in the packaging again” Jhon 11.30

01 - The facilitator conducts the focus group

02 - Each participant has a different device recording his/ her voice

(the screen)

03 - The devices can be connected to a laptop for download the different speeches

04 - The software gives a visual representation of the whole discussion, the bigger the word the more time it has been said.

05 - If someone wants to track keywords back to the discussion s/he only have to click on a word...

“...the packaging should be biological degredable so I can throw it away while running...” Sara 11.57

06 - ...and s/he will be able to read the exact words, who said them and referring exactly to what.



Scenario 3 - Interviews

(the screen)

TUE fknhbhj njink

“after chocolate I need water to clean my tongue”

“...when I run I always buy water at the kiosk in the park”



fknhbhj njink





tf or





es lik ld










tf or



02 - In order to save time they decide to split

03 - Each team member can run the interview and record it with his device

04 - Easily share with others the interview without listening to it again and writing it down

co u

01 - 3 designers of the TUE need to run interviews in different parts of the city

05 - Visualize the summa of interviews...

06 - ...and see possible patterns and similarities emerge.

16 Prototype

For the physical recording device of Coci we made a prototype. This to be used in the scenario video and to show how we think such a device should look like and behave to be unobtrusive in the brainstorm, focus group or interview. Just like the software is futuristic, the speech recognition and algorithms we would like are probably not available enough yet, so is the design for the product. The simple and clear shape might not be big enough now considering batteries, but wireless data transfer should become less power consuming in the future. We experimented with shapes to get to the one we liked, to visualize it more clear 3d renderings were made and the final version was 3d-printed to be worn on a keycord. We think the keycord is less obtrusive than a worn microphone and more flexible since the object can also be put on the table for i.e. and interview.


18 Personal Reflection Giorgo Campo s109714

The module reflected on how new multimodal interaction technologies can greatly enhance the volume of individual actions and communication among team members. Nowadays new paradigms are able to increase the interaction bandwidth between people and their technological environment. These paradigms could use many parallel modalities, communication channels that address our different senses and apply our skills of utterance through speech, gesture, movement, manipulation. The course was both theoretical and design oriented. After an introduction lecture we deepened a little the huge topic with some readings. Integrating modalities require an understanding of how people use their various senses to perceive and interact with the world around them. Multimodal interaction is definitely a multidisciplinary field involving interaction technologies, cognitive science, linguistics and philosophy and it would be nice to deeper understand these approaches and the contribute each one of them could give to a design project. The module had two assignments. The first one was about analysing how team players communicate, which channels do they use and why do they use this modalities rather than others, which are the constrains a team needs to face in an existing and realistic scenario. This assignment was helpful in order to familiarize with the topic and it was propaedeutic for the second phase. The goal of the second assignment was design a solution using a multimodal interaction to help a team of people in completing a task .We were asked to imagine what would have been the design power tool, what could have been its role and how would have the “band” communication been improved. We chose as target a design / research team and we designed a system, composed of a recorder and a soft-

19 ware, able to transform complex speech information into a simple visual overview. This is particularly helpful in a focus group or for running interviews and easily share information with all the team members and see similarities and possible patterns emerge. In a design brainstorm it turns useful also in generating new ideas stimulating creativity, enlighting different keywords not related to each other. We could have pushed further in the design reflecting on aspects like battery dimensions and clearly choosing a technology for the wireless communication, we had some discussion about this aspects but due to a lack of time we preferred to focus on the concept, keeping the mental model of the system as simple as possible and consistent, grounding the concept on existing examples and growing trends. The module gave me a clear understanding of the opportunities and potentials of multimodal interaction. I found the topic very interesting and valid, it would be easier to implement such ideas into a future project as I had this experience. The reviews/meeting we had were helpful, I appreciated the way the lecturer always enlighted interest for all the project reflecting on possible directions.

Personal Reflection Jelle Dekker s070085

First assignment (analysis): For the first assignment we (Lily Chong, Rob Dijkstra and me) looked at Crane Operators and the communication stream and constraints when moving a (large) object with a crane. We were looking at constraints caused by device, environment and user. I had the idea that many constraints were very logical and maybe a bit easy, but in the end we did discover some things I hadn’t thought of before. One important category here is the cognition, it’s not only the language of the country, but so much more that we have learned and use unconsciously in communication. I think this was the most important lesson for me in this first assignment: that there are so many things we can only do because we speak the right “language.” Something that I think is important in design especially for more extreme cases, you need to speak this language first. Second assignment (design): In the second assignment I found it harder at the start to find a relevant topic for which we could make a multimodal system or otherwise add an extra modality to the situation. In the end I like that we were able to make something for a design team, that can be used in multiple situations. And I think that for the limited time-frame we went quite far in the design, discussing many aspects and possible constraints of our design. For the prototype I would have liked to make a more “Wizard-of-Oz”-version where someone can control the computer to add said words into an automatically generated word-cloud, i.e. by typing them in and after that being able to click on the words from a list if they’re said again. Sadly this wasn’t achievable in the time-frame of the module.

Overall: Modalities are something that we always use in design.We think about how the user would interface with our product, why it should be done in a certain way etc. Sometimes a design will work on multiple modalities, but I don’t think this has often been a conscious choice so far. For me this module has made clear that you can really focus on this, and can make really nice, useful designs for that. I think this module has made me more conscious about the capabilities and constraints of information sources (modalities) and the opportunities for multimodal interaction. I think we can really add to the experience of products by utilizing more of our senses/modalities. The framework we used to map the used modalities and the user-/environment-/device-constraints can really help to analyse a situation in more detail and help to see where the opportunities and constraints are for a design.

20 Personal Reflection Hugo Christiaans s099544

Everyday people communicate and interact with other people as well with artificial devices to complete tasks or share thoughts. However, the modalities in these communications differ a lot. Communications and interactions between people are often rich and multimodal as these persons use many channels such as visual, auditory, smell touch, while the interaction between artificial devices and humans is often limited to visual and touch, even with today’s technology. The module multimodal interaction is aimed to enrich the communication between artificial products and humans as well between the humans itself. The module started with a one-day research about the topic multimodal interaction. The goal of this research was to get familiar with the concept and the thoughts behind the phenomenon multimodal. During this first day we analysed the language of a baseball team, trying to figure out how they communicate and through what kind of channels. Not surprisingly they used auditory and visual (code) channels, as this is the case in many sports. However, using these two channels has a specific reason, namely to protect the tactics of the team. As explanation the follow example is given, the coach explains to the catcher the strategy by speech and the catcher communicates this to the pitcher with hand gestures. By doing it this way, the opponent doesn’t know what kind of strategy the other team will use. When we take a closer look to these two channels, we see that each channel has their limitations. Although these limitations of the channels in the baseball example are used in a positive way, they generally cause inefficiency in the information flow. In other words, each channel has its own constrains. After the one-day research, the module continued with a larger re-

21 search and design assignment. As for this assignment, the focus was also on improving teamwork through a multimodal interaction. During the four days, a system was developed that supports a design team in gathering, sharing and presenting information. The system consists of a central software program and several physical recorder/ transmitter units that are linked with this software program, and who are worn by the participants during different team sessions such as a brainstorm. The information that is captured by the device is send to the software program, which filters the audio data and transforms it into a visualisation. The visualisation of the concept is a word-cloud in which the most common and interesting words are displayed. The main goal of this cloud is to help and inspire the designers during a session. The words inside the cloud differ from size as they depend on the occurrence in the conversation. The cloud is a live connection, what means that the cloud reacts dynamically on what is said. To add an extra dimension to the words, the words inside the cloud are randomly coloured to stimulate new word formations and or connections. In terms of modalities, this system uses two modalities to establish the communication. However, due to the fact that the system is an extension of the communication between the participants, more modalities are part of the total system. To clarify this, the system communicates with the participants through the modalities audio and visuals, but the communication between participants can contain much more modalities such as touch or smell. This results in a very dynamic and rich interaction between design teams and the product. However, we still limited ourselves to only two modalities. Overall this module was very inspiring and challenging, as we were stimulated to bring the theory into practice finding new opportu-

nities in interaction design. Although there is already a lot done in interaction design, these interactions are often between humans and computers (or other processor based products with a screen) using only two or three modalities (Audio, visual, haptic). However, new technologies and the on-going integration of computers in our environment create new possibilities in interaction design using no screens, which sometimes can lead to impropriate and or controversial ideas. This means that interaction design is also about creating acceptance of an interaction. Although this module was only one week, it was a nice module to do and it showed me some insights in the world of interaction design.

multimodal interaction_final report  

multimodal interaction_final report