Issuu on Google+

Language-Independent Socio-Emotional Role Recognition in the AMI Meetings Corpus Fabio Valente1 and Alessandro Vinciarelli2 fabio.valente@idiap.ch,alessandro.vinciarelli@glasgow.ac.uk

1 - Idiap Research Institute (Switzerland) 2 - University of Glasgow (UK) Interspeech 2011


Conversation Analysis •

Conversation analysis and role recognition have been an active research fields for long time [Sachs74].

Automatic role recognition based on statistical classifiers has been studied in CMU corpus [Banerjee06], the AMI corpus [Vinciarelli05], ICSI corpus [Lakowski04] and various Broadcast conversations [Sibel09].

Typical features consists in turn-taking patterns, turns durations, overlaps between participants, stylistic and prosodic features as well as lexical features.

Applications into summarization, indexing and analysis.

The roles considered in those studies are mainly formal roles constant over the entire duration of the conversation, e.g., the Project Manager during a professional meeting, the professor during a faculty meeting or the moderator during a broadcast talk show.


Conversation Analysis •

Formal roles do not generalize to any type of conversation.

Formal roles are not directly related to the type nor to phenomena in conversations.

Several phenomena have been studied in (meetings) conversations like hotspots and engagement [Shriberg03], dominance [Japygoci08], agreement/disagreement [Hillard03].

Socio-Emotional roles [Pianesi 2006] are a general coding scheme for small group conversations and are more related to the type of meeting and its dynamics.

Inspired from Bales IPA [Bales76] and characterize the relationships between group members and their roles “oriented toward the f unctioning of the group as a group” [Pianesi 2006].

Initial investigation using the same language-independent features used for formal role recognition.


Outline •

Socio-Emotional Roles definition and previous works.

AMI corpus annotations

Feature extraction

Statistical Modeling 1. basic generative model 2. modeling influence 3. jointly modeling formal and social roles

Results and discussion


Socio-Emotional Roles •

[PROTAGONIST] - A speaker that takes the floor, drives the conversation, asserts its authority and assume a personal perspective.

[SUPPORTER] - A speaker that shows a cooperative attitude demonstrating attention and acceptance providing technical and relational support.

[NEUTRAL] - A speaker that passively accepts others ideas and serves as audience.

[GATEKEEPER] - A speaker that acts like group moderator, mediates and encourage the communication.

[ATTACKER] A speaker who deflates the status of others, express disapproval and attacks other speakers


Socio-Emotional Roles •

1. A participant has only a role in between those at a given time instant.

2. Multiple participants can have the same role at a given time instant.

They can be related with a number of phenomena studied in meetings, like engagement, domiance and hot-spots.

Automatic recognition of social roles studied in the Mission Survival Corpus 2 (CHIL project).

Activity features (audio and video), i.e., non-linguistic features, were used to train statistical classifiers like SVM, HMM or coupled HMMs.


Dataset and Annotations •

The AMI Meeting Corpus is a collection of meetings captured in specially instrumented meeting rooms, which record the audio and video for each meeting participant.

In the scenario meetings, four participants play the role of a design team Project Manager (PM), Marketing Expert (ME), User Interface Designer (UI), and Industrial Designer (ID) tasked with designing a new remote control.

The meeting is supervised by the Project Manager (PM)


Dataset and Annotations •

Annotation guidelines same as the Mission Survival Corpus [Pianese06] and include a number of physical behaviour and inferential questions.

Five scenario meetings; Annotators were provided with audio and video.

Given a set of participants {S} and the role set {R} = {P, S, N, G, A}, the mapping ϕt (S) → R speaker-to-role is available.

Roles are post-processed such that the role becomes the most frequent role that the speaker has in a one-minute long window centered around time t.


Dataset and Annotations •

Resulting role distribution (percentage of total speaking time) of the five meetings: Role Distribution

40 30 20 10 0

Protagonist Supporter

Neutral Gatekeeper Attacker

Most of the time is attributed to the Protagonist/Supporter/Neutral roles and only 5% of the time is attributed to the Gatekeeper.

No speaker is labeled as Attacker because of the collaborative nature of the professional meeting. Social Role distribution conditioned to Formal roles

0.8 0.6 PM

ID

UI

ME

0.4 0.2 0

Protagonist

Supporter

Neutral

Gatekeeper

The Gatekeeper role, i.e., the moderator of the discussion, is consistently taken by the Program Manager.


Feature Extraction •

A meeting is a sequence of speaker turns (simplified turn definition [Shriberg01]). To further simplify the problem, the time in overlapping regions is given to the floor holder.

•

F0 frequency (mean, standard deviation, minimum, maximum and median for each turn), energy (mean and standard deviation for each turn) and mean speech rate over the turn (Xn ).

•

Meeting M = {(t1 , d1 , X1 , s1 , r1 , f1 ), ...., (tN , dN , XN , sN , rN , fN ) where: tn is the beginning time of the n-th turn. dn is its duration. Xn is the prosodic feature vector. sn is the speaker associated with the turn. rn is the social role associated with the turn. fn is the formal role.


Statistical modeling •

Simple generative conversation model as First-order Markov Chain:

p(M ) =

N Y

P (Xn |rn )P (dn |rn )P (rn |rn−1 )

n=1 •

P (rn |rn−1 ) represents the turn-taking patterns (Bigram LM).

P (Xn |rn ) represents the prosodic feature distribution (GMM).

P (dn |rn ) represents the turn duration (Gamma distribution).

Scaling factors are introduced to bring the distributions to comparable ranges.


Experimental Setup and Evaluation •

Leave-one-out approach on the five annotated meetings.

The social role of each speaker is assumed constant over the one-minute long window both during training and testing.

The center of the window is then progressively shifted by 20 seconds and the procedure is repeated till the end of the meeting.

All possible mappings speaker-to-role ϕ∗t (S) → R are searched and the one that maximizes the probabilitly of the model P (M ) is selected.

Accuracy

Random

Turns (Unigram)

Turns (Bigram)

Duration

Prosody

0.26

0.35

0.49

0.43

0.41

Model 1

Total

Protagonist

Supporter

Neutral

Gatekeeper

0.59

0.61

0.62

0.68

0


Modeling Influence •

Social roles are indicative of group behaviors - the influence that a speaker has on others has been pointed as a central effect in determining those roles [Dong08] in the MSC corpus.

•

The influence is verified not only on the speech activity but also on the prosodic behavior, body movement and focus of attention. Total

Protagonist

Supporter

Neutral

Gatekeeper

Model 1

0.59

0.61

0.62

0.68

0

Influence

0.65

0.70

0.63

0.79

0


Formal and Social Roles •

Even if Gatekeeper is a rare role - it is consistently taken by the PM in the AMI meetings.

This information can be modeled simply computing probabilities p(rn |rn−1 , fn ).

The formal role fn of speaker taking turn n, is assumed known and it is constant over the entire meeting.

p(M ) =

N Y

r

r

P (dn |rnn−1 )P (Xn |rnn−1 )P (rn |rn−1 , fn )

n=1

Total

Protagonist

Supporter

Neutral

Gatekeeper

Influence

0.65

0.70

0.63

0.79

0

Influence+Formal

0.68

0.72

0.65

0.80

0.15


Discussion and Conclusions •

Social roles characterize the relationships between group members and they can be related to several phenomena studied in conversations, e.g., engagement, hot-spots and dominance.

They are universal thus could generalize to any type of discussions.

The use of turn-taking patterns, turn duration and prosodic features can recognize social roles with an accuracy of 59%.

When influence is introduced, the accuracy becomes 65%.

Integrating the formal role information in the conversation model, increase the recognition rate to 68% permitting the recognition of Gatekeeper instances.

In future, more data, more features (eventually lexical) and more types of conversations...


Thank You


socrole_presentation