Page 1

9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

MultimodalityinaSpeechAidSystem DepartmentofAutomationandCommunicationTechnology UniversityofMiskolc,Miskolc,Hungary

co m

LaszloCzap,JuditMariaPinter

w w w .a sd fjo ur na ls .

  Abstractâ&#x20AC;&#x201D; Face to face human interaction is performed on several channels: all organs of sense are involvedinconversation.Speechâ&#x20AC;&#x201C;asthemostcomfortableandnaturalwayofhumancommunicationâ&#x20AC;&#x201C;is a multimodal phenomenon. Above the articulation, mimicry, gesture, posture, tactile sense and others influenceperceptionofkeynoteandemotionsofthespeaker.Nowadays,sophisticatedcomputerscanactin amultimodalmannerandgatherinformationfromdifferentinputmedia.Ourtalkingheadtechnologyis being utilized in a project that aims at developing a speech assistant application, training deaf people to speak. KeywordsÇŚtalkinghead;faceanimation;expressingemotions;speechaid I.

Introduction

Intelligibility of speech can be improved by showing articulation of the speaker. This visual support is essentialinanoisyenvironmentandforpeoplewithhearingimpairment.Anartificialtalkingheadcanbea natural supplement to acoustic speech synthesis. A sophisticated 3D model not only can imitate the articulation, but is able to control the atmosphere of conversation. Expressing emotions is an essential elementofhumancommunication.Acceptanceandattractiveforceofa3Dheadmodelcanbeenhancedby addingemotionalcontenttosynthesizedspeech.

D ow nl oa de d

fro m

Thepioneeringworkoffaceanimationformodelingarticulationstarteddecadesagowith2Dmodels[1,2]. Thedevelopmentof3Dbodymodeling,theevolutionofcomputersandadvancesintheanalysisofhuman articulationhasenabledthedevelopmentofrealisticmodels. 

Figure1.

 Photorealisticandtransparentvisualization

Teachinghearingimpairedpeopletospeakcanbeaidedbyanaccuratelyarticulatingvirtualspeaker,which canmakeitsfacetransparentandcanshowdetailsofthemovementsassociatedwithutterance(Fig.1.). Since the last decade, the area of visual speech has been developing dynamically, with more and more applications being developed. Existing systems are focusing on high quality modeling of articulation, but havebeenrestrictedbynumberofpolygonsissues,e.g.simulationofhairfallingneedsmorecomputation effortthanthespeechproductionitself. AvisualfeaturedatabaseforspeechÇŚreadingandforthedevelopmentof3Dmodelinghasbeenelaborated andaHungariantalkingheadhasalreadybeencreated[3].Inthisresearch,thegeneralapproachwastouse

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

bothstaticanddynamicanalysisofnaturalspeechtoguidefacialanimation.AthreeÇŚleveldominancemodel hasbeenintroducedthattakescoÇŚarticulationintoconsideration.

co m

Eacharticulatoryfeaturehasbeenarrangedinoneofthreeclasses:dominant,flexibleoruncertain.Analysis ofthestandarddeviationandthetrajectoryoffeaturesguidetheevaluationprocess.Theacousticfeaturesof speechandthearticulationarethenlinkedtoeachotherbyasynchronizingprocess.

w w w .a sd fjo ur na ls .

Theaimofthisresearchistousethetalkingheadtechnologytoshowthecorrectarticulationofvoices,voice transitions,wordsandsentences,accordingtotheageandskillfulnessofthetrainee.Especially,rendering withtransparentfacecanshowthetonguemovementbetterthanarealspeaker.Expressingemotionscan improvethefeedbackofthesystem,praisingthestudentorencouraginghimforfurtherexercising. II.

TalkingHead

A talking head for this purpose should model photorealistic appearance and sophisticated humanÇŚlike articulationclosetonaturalmovementsofthespeaker,triggeringfouradditionalresearchresults: x PreÇŚarticulation.Priortoanutterance,asilentperiodisinsertedâ&#x20AC;&#x201C;imitatingbreathingbyopeningthe mouthâ&#x20AC;&#x201C;thenthefirstdominantvisemeismovedfromtheneutralstartingposition. x Realizingthetemporalasynchronyeffect.Afilteringandsmoothingalgorithmhasbeendevelopedfor adaptationtothetempoofeitherthesynthesizedornaturalspeech. x Headmotion,gaze,eyebrowrising,andeyeblink.AnalgorithmhasbeendevelopedtosemiÇŚrandomly andmanuallycontroltheformermovements. x Emotion movements. Following the definitions of Ekman, a scalable and blended algorithm was developedtoexpressemotions.

A. MethodandMaterial

fro m

In this line of research, a 3D transformation of a geometric surface model has been used [4, 5]. The deformationbasedarticulationhasbeentranslatedintoaparametricmodeltoovercometherestrictionsof themorphingtechnique.Facialmovementshavenotbeencarriedoutbydeformationsoftheface.Instead,a collectionofpolygonshasbeenmanipulatedusingasetofparameters.Thisprocessallowscontrolofawide range of motions using a set of parameters associated with different articulation functions. These features can be directly matched to particular movements of the lip, tongue, chin, eyes, eyelids, eyebrows and the wholeface.

D ow nl oa de d

Thevisualrepresentationofthespeechsound(mostlyrepresentingthephoneme)iscalledviseme.Asetof visemeshasfewerelementsthanthatofphonemes.Staticpositionsofthespeechorganfortheproductionof Hungarianphonemescanbefoundinseminalworks[6,7].

ThefeaturesofHungarianvisemeshavebeenconstructedusingthewordmodelsof[7].Theprimefeatures of visemes have been adopted from the published sound maps and albums [6]. These features have been transformedusingthepropertiesofthearticulationmodelby[10].Featurescontrollingthelipsandtongue arecrucial.Basiclipparametersincludetheopeningandwidth,theirmovementarerelatedtoliprounding. Thelipopeningandthevisibilityofteetharedependentuponjawmovement.Thetongueisspecifiedbyits horizontalandverticalposition,itsbendingandtheshapeofthetonguetip(Fig.2.).

According to the static features, the articulation parameters characteristic to the stationary section of the visemecanbeset.

B. Modellingdynamicoperation

The dynamic properties of conversational Hungarian have not been described yet. The current research beginstogivethisdescription.Theusefulnessofmotionphasesrepresentedinvoicealbumsislimited,and canberelatedonlytotheparticularwordgiveninthealbum. 

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

Figure2.

 Illustrationoftonguepositionsforsoundsn(left)andkÇŚg(right).

co m

9RO,VV<HDU

w w w .a sd fjo ur na ls .

Dynamicinterpretationistakenfromtheauthorsâ&#x20AC;&#x2122;ownstudiesinspeechreading[3].Specifically,thiswork hasprovidedthetrajectoriesforthewidthandheightoforalcavityandthevisibilityofteeth andtongue. Thesedataprovidethebasisformovementbetweenvisemes.

Someparameterstakeontheircharacteristicvalue,whileothersdonotreachtheirtargetvalueduringthe pronunciation.Allpropertiesofthevisemes(e.g.,lipshape,tongueposition)havebeengroupedaccording to their dominance, with every articulation feature being assigned to a dominance class. This is different from the general approach where visemes are classified by their dominance only. For instance, the Hungarianvisemes[Üą,c,j,Ý&#x201E;]aredominantonlipopeningandtonguepositionandaredubiouswithrespect to lip width. This grouping is based on the ranges produced by the speech reading data. Features of the parametricmodelcanbedividedintothreegrades: x x x

dominantâ&#x20AC;&#x201C;coarticulationhas(almost)noeffectonthem, flexibleâ&#x20AC;&#x201C;theneighboringvisemesaffectthem, uncertainâ&#x20AC;&#x201C;theneighborhooddefinesthefeature.

Inadditiontotherange,thedistributionoftransitionalandstationaryperiodsofvisiblepropertieshelpsto determinethegradeofdominance.

D ow nl oa de d

fro m

Thetrajectoryofvisemefeaturesmayalsoneedfordeterminingdominanceclasses.Fig.3.abovepresentsthe trajectoryofinnerlipwidth(horizontalaxis)andlipopening(verticalaxis)oftheviseme[e:].Thesecurves cannotbetracedonebyonebuttheygothroughadensearearegardlessofthestartingandfinalstatuses. Thedominantnatureofthevowelsâ&#x20AC;&#x2122;lipshapeismanifest.  

Figure3.

 Trajectoryoflipsizes(widthandheight)inpixelsofvisemee:(above)andh(below).

In contrast, unsteady features do not provide a consistent pattern. The trajectory of h is shown in Fig. 3. below.(Tobeabletotrackthem,onlyafewcurvesarepresented.)

Thedominancegradeofthepreviouslyarrangedparameters,whichconsidersthedeformabilityandcontext dependence of the viseme, controls the interpolation of features. Forward and backward coÇŚarticulation formulasareappliedinawaythatarticulationparametersareinfluencedbylesselasticproperties.

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

C. PreÇŚArticulation Priortoanutterance,thereisanapproximately300mssilenceperiodinsertedâ&#x20AC;&#x201C;createdtoimitateinhalation throughthemouthâ&#x20AC;&#x201C;thenthefirstdominantvisemeismovedfromtheneutralstartingstatus.Becauseof thispreÇŚarticulationmovement,thesoundismadeasinnaturalspeech.Ifthelastsoundofthesentenceis bilabial,thenthemouthwouldbeslightlyopenedafterthesoundfade(postÇŚarticulation).

co m

D. AudiovisualAsymmetry

Audiovisualspeechrecognitionresultsindicatethatperformancedeclinesmonotonicallyandasymmetrically whenthereisadelaybetweentheacousticandvisualsignals.

30 20 10

12 0

80

40

0

0

Audio lag

Audio lead

 Decliningofrecognitionrates.Functionofaudioleadandlag(ms).

fro m

Figure4.

40 0

40

36 0

50

32 0

60

28 0

70

24 0

80

20 0

90

16 0

w w w .a sd fjo ur na ls .

Fig.4.showstheaudiovisualrecognitionratesofdiphones,measuredasafunctionofthedelaybetweenthe acoustic and visual signals. The time step of audiovisual delay is 20 ms from ÇŚ400 ms to 400 ms. Fig. 5. illustratesthesameresultsinanotherinterpretationshowingtheaudiovisualasymmetryobviously.  

Diphone recognition rate (%)

D ow nl oa de d

90 80 70 60 50 40 30

Figure5.

0 38

0

0

0 32

26

0

AudioDelay(ms)

20

14

80

0

20

-4

0 -1 0

0 -1 6

0

0

0 -2 2

-2 8

-3 4

-4 0

0

20 10 0



Diphonerecognitionrate.(Functionofaudiodelay.Negativevaluesmeanaudioleading.)

Based on the audiovisual asymmetry observed in the diphone recognition study significant improvements havebeenachievedinouraudiovisualspeechsynthesissystem.Thefilterutilizingthetemporalasynchrony forwardsthearticulationtotheacousticsignalasshownonFig,6.

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

E. AdaptingtotheTempoofSpeechandFiltering During the synchronization of human or synthesized speech, we have observed various speech tempos. Whenspeechisslow,visemepropertiesapproachtheirnominalvalue,whilefastspeechisarticulatedwith lessprecisioninnaturalspeech.Forflexibleparameters,theroundoffisstrongerinfastspeech.Amedian filterisappliedforinterpolationofflexibleparameters:thevaluesofneighboringframesaresortedandthe medianischosen.Afeatureiscreatedbythefollowingsteps:

w w w .a sd fjo ur na ls .

co m

x linearinterpolationamongvaluesofdominantandflexiblefeatures,neglectinguncertainones, x medianfilteringisperformedwhenflexiblefeaturesarejuxtaposed, x valuesarethenfilteredbytheweightedsumofthetwopreviousframes,theactualandthenextone.

The weights of the filter are fixed, thus knowledge of speech tempo is not needed. The smoothing filter refines the motions and reduces the peaks during fast speech. Considering the two previous frames, the timingasymmetryofarticulationisapproximated.Ref.[9]hasshownthatthemouthstartstoformaviseme before the corresponding phoneme is audible. Filtering takes this phenomenon into consideration. Other improvementsâ&#x20AC;&#x201C;suchasinsertingapermanentphaseintolongvowelsandsynchronizingphasesofaviseme toaphonemeatseveralpointsâ&#x20AC;&#x201C;refinethearticulation.Fig.6.describestheeffectofmedianfilteringand smoothing.Inthisexample,theslowspeechhasgottwiceasmanyframesasthefastone.Thehorizontal axispresentsthenumberofframes,whileverticalvaluesshowtheamplitudeofthefeature.

fro m

 Figure6. Theinterpolationofdominant(firstpeak)andflexible(secondpeak)parametersforslow (above)andfast(below)speechafterlinearinterpolation(â&#x20AC;Ś),medianfiltering(___)andsmoothing(ÇŚÇŚÇŚ). III.

ImprovingNaturalness

D ow nl oa de d

Visibleproperties(e.g.,nodding,eyebrowrising,blinking)aremeaningfulgestures,especiallyforpeoplewho are hard of hearing. Gestures can support turn taking in dialogues. For instance, the lift of eyebrows can representpayingattention,whilenoddingcanindicateacknowledgement.Analgorithmforheadmovement and mimicry cannot be easily produced using prosody, since the communicative context of the utterance needstobetakenintoconsideration.Tolinktheautomaticallygeneratedmovementswiththeintentofthe utterance,propertiescanbemanagedmanuallyusingagraphicaleditor(e.g.,liftingeyebrowsornoddingat sentenceaccent,orcontrollingtheeyestoimitateaglanceintoapaper). By observing the head movements of professional speakers, moderate nodding, tilting and blinking have beenintroduced.Wehavetestedmorethan15minutesofspeechfromdifferentannouncersdeliveringthe newsontelevision.Foreachvideoframe,theeyescanbeeasilyidentifiedandtracedusingaconventional algorithm for obtaining motion vectors for video compression, like MPEG standards. Horizontal motion revealspanning;whileverticalmovementmeanstilting.Unbalancedmovementofleftandrighteyesshows headinclination.Aseriesofsubjectivetestshavebeenavailabletofinetunetheseparameters.

A. FacialGestures

Head movement, gaze, eyebrow rising, and eye blink can be regulated semiÇŚrandomly or manually. Automatic generation of facial gestures is organized in a semiÇŚrandom manner. A rigid ruleÇŚbased system wouldresultinmechanical,boringandunnaturalmotions.TiltingandÇŚnoddingheadmovementsarerelated to a short time (200 ms) average energy of acoustic signal. A downward head movement is observed for sentenceaccents.Thebiggertheaveragesoundenergyis,thehighertheprobabilitythatadownwardhead

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

tilting is there. Moderate and slow head turning (or pan) and side inclination are guided randomly. Amplitudeoftheseheadmovementsisnotmorethan2ÇŚ3o.Gazeiscontrolledbymonitoringheadmovements tokeepthefacelookingintothecamera(i.e.,theobserverâ&#x20AC;&#x2122;seyes).Verticalandhorizontalheadmovements arecreatedbymovingbotheyesintheoppositedirection.

w w w .a sd fjo ur na ls .

co m

Basedonobservationofprofessionalannouncersandtheexistingliterature,eyebrowmovementiscontrolled by taking the probable strength relationship between the potential prosodic and visual cues into consideration.Theinteractionbetweenacousticintonation(F0)gesturesandeyebrowmovementshasbeen observedinproduction,asin[10].Apreliminaryhypothesisisthatdirectcouplingishighlyunnatural,but thatprominenceandeyebrowmovementmaycoÇŚoccur.Inourexperiment,thebrowshave beennotedto rise subtly at the beginning of declarative sentences and then it approaches the neutral position. A larger raisingmovementisprobabletobeinterpretedassurprise.Whenemotionsareexpressed,innerandouter eyebrowsaretoberisenindependently.Eyebrowscanbeusedtodisclosedisgust,angerandsadness.

B. ExpressingEmotions

Basicresearchresultsarepublishedin[11,12]andfocusesonDarwinâ&#x20AC;&#x2122;sfirstargumentthatfacialexpressions haveevolved.Whenexpressionsareevolved,thenhumansmustshareacommon,universalset.Manyfacial expressions are involuntary in animals, and probably in humans as well.  As a consequence, much of the research has turned toward understanding what facial expressions show about the emotional state of the speaker. [13] Emotional content should be obvious to others because everyone shares the same universal assortmentofexpressions.Inmultimodalspeech,wecanconfirmordisprovetheverbalmessagebyusing gesturesandbodylanguage.

fro m

Researchershavefoundthattherearepossiblysixdistinctemotionsthatarereadwitheaseacrosscultures. These emotions include anger, disgust, happiness, sadness, fear, and surprise. Facial expressions signal a particular emotional state held by the speaker. This perspective leads to the conclusion that there are identifiablestatesthatareeasilyperceived. 

D ow nl oa de d



Figure7.



Eyebrowsofdisplayingsurpriseandworry

Acoupleoffeaturescharacterizingbasicemotions:

Sadness:Theinnercornersoftheeyebrowsaredrawnup.Theskinbelowtheeyebrowistriangulated,with theinnercornerup.Theuppereyelidinnercomerisraised.Thecornersofthelipsaredown.

Disgust: The upper lip is raised. The lower lip is also raised and pushed up to the upper lip and slightly protruding. The nose is wrinkled. The cheeks are raised. Lines show below the lower lid, and the lid is pushedupbutnottense.Thebrowislowered,loweringtheupperlid.

Happiness:Cornersoflipsaredrawnbackandup.Thecheeksareraised.Thelowereyelidshowswrinkles below it, and may be raised but not tense. Crow'sÇŚfeet wrinkles go outward from the outer corners of the eyes.

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

w w w .a sd fjo ur na ls .

co m







 Figure8.

Expressionofneutralface,sadness,disgust,happiness,fearand(nice)surprise

fro m

Surprise: The brows are raised, so that they are curved and high. The skin below the brow is stretched. Horizontalwrinklesgoacrosstheforehead.Theeyelidsareopened;theupperlidisraisedandthelowerlid drawn down; the white of the eyeÇŚthe scleraÇŚshows above the iris, and often below as well. The jaw drops opensothatthelipsandteethareparted,butthereisnotensionorstretchingofthemouth.Fig.8.depicts sixexamplesofemotions.

D ow nl oa de d

During the utterance of a sentence, facial expressions are progressing from a neutral look to the target displayofemotion.Emotionscanbecontrolledinascalableandblendedmanner.E.g.20&+30$means20% fearand30%surprise,while20*+30$evolves20%happinessand30%surprise. Acceptabilityisacrucialfeatureofatrainingsystem.Itcanbehighlyimprovedbyafriendlyappearanceand kindmannersoffeedbackwhenpraisethestudentforaniceutterance,orencouraginghimorherwhenfalls behindhis/herownlevel. IV.

HumanÇŚMachineInteractions

The first step of the system development is preparing training patterns: phonemes, transitions, words and sentences. After rendering with the talking head with different view points, zooming and lighting, we will decidetheappearanceofthe3Dmodel.Testingthemwithtraineestudentsandanalyzingtheiranswers,we will refine its settings. Presumably it will be different for big computer monitors and smart phones. The talkingheadisgoingtobeoneoftheoutputsofthespeechassistant.

Other output is the visualized speech signal. An audiovisual transcoder converts the speech to specially designed graphic symbols. Voice can be represented real time in matrix format or the whole sample in column diagram. Both the training pattern and the actual exercise signal of the trainee student can be visualizedsoastoreconstructthetrainingpattern.

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

Intelligibilityishighlyinfluencedbyprosodyâ&#x20AC;&#x201C;thesuprasegmentalfeaturesofspeech.Pitchfrequency(F0) andwordstressisdisplayedforthetrainingsampleandtheresultofactualexercise,tobeabletocompare them.

co m

The speech assistant automatically analyzes the actual utterance and gives an assessment to the trainee student according to his/her skill level. The evaluation result has an influence on selection of the next trainingpattern.

w w w .a sd fjo ur na ls .

Inputsofthesystemâ&#x20AC;&#x201C;abovetheuserinteractionsâ&#x20AC;&#x201C;are x theactualvoiceofthetraineetobevisualizedandautomaticallyassessed, x (optionally)the2Dor3Dvideoofarticulationoftraineetobeanalyzedwiththeassistanceofa professionaltrainer.

Conclusions

Thispaperdescribestheaudiovisualspeechsynthesizerthatformsthebaseofalargerresearchprojectthat aims at creating a Hungarian audioÇŚvisual speech aid, a cognitive development system training hearing impaired people to learn speech articulation. Fine tuning of the probabilities for natural animation and avoiding mechanical, rule based repetition of gestures resulted from detailed study of speech production. PreÇŚandpostÇŚarticulation,medianfilteringforadaptationtothespeechrhythm,andfilteringfortemporal asymmetryofspeechproductionhavebeenintroducedasdirectionsforcontinuedfinetuningofhumanÇŚlike pronunciation.Inthisphase,furtherrefinementofcoÇŚarticulationisperformed.Improvingthenaturalness andexpressingemotionsmaketheperformanceoftalkingheadmoreattractive. Samplevideoscanbefound: http://mazsola.iit.uniÇŚmiskolc.hu/~czap

ACKNOWLEDGMENT

This research was carried out as part of the TAMOPÇŚ4.2.2.CÇŚ11/1/KONVÇŚ2012ÇŚ0002 project with support by theEuropeanUnion,coÇŚfinancedbytheEuropeanSocialFund.

fro m

References

[1]

E. Cosatto and H. P. Grafat, 2D photoÇŚrealistic talking head, Computer Animation, Philadelphia, Pennsylvania,1998,pp.103ÇŚ110.

[2]

M.M.CohenandD.W.Massaro,Modelingcoarticulationinsyntheticvisualspeech,InN.M.Thalmann &D.Thalmann(Eds.)ModelsandTechniquesinComputerAnimation,Tokyo:SpringerÇŚVerlag,1993. L.Czap,AudioÇŚvisualspeechrecognitionandsynthesis.PhDThesis,BudapestUniversityofTechnology andEconomics,2004.

[4]

L.E.Bernstein,E.T.Auer,Wordrecognitioninspeechreadingâ&#x20AC;&#x2122;.SpeechreadingbyHumansandMachines. SpringerÇŚVerlag,BerlinHeidelberg,Germany,1996,pp.17ÇŚ26.

[5]

D.W. Massaro, Perceiving talking faces. The MIT Press Cambridge, Massachusetts London, England, 1998,pp.359ÇŚ390.

[6]

K.Bolla,APhoneticconspectusofhungarian.TankĂśnyvkiadĂł.,Budapest,1995.

[7]

J.MolnĂĄrThemapofhungariansounds.TankĂśnyvkiadĂł,Budapest,1986.

[8]

J.MĂĄtyĂĄsVisualspeechsynthesis.MScThesis,UniversityofMiskolc,2003.

[9]

G.Feldhoffer,T.BĂĄrdi,Gy.TakĂĄcs,A.Tihanyi,Temporalasymmetryinrelationsofacousticandvisual featuresofspeech.Proc.15thEuropianSignalProcessingConf.Poznan,Poland,2007.

[10]

C.CavĂŠ,I.GuaĂŻtella,R.Bertrand,S.Santi,F.Harlay,R.Espesser,Abouttherelationshipbetweeneyebrow movementsandF0variations.InBunnell,H.T.andW.Idsardi(eds.),ProceedingsICSLP96,2175ÇŚ2178, Philadelphia,PA,USA,1996.

D ow nl oa de d [3]

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI


9RO,VV<HDU

,QWO-UQORQ+XPDQ0DFKLQH,QWHUDFWLRQ

P. Ekman, W.V. Friesen, Unmasking the Face: A Guide to Recognizing Emotions From Facial Expressions.MALORBOOKS,2003.

[12]

P. Ekman, Facial expressions In: C. Blakemore & S. Jennett (Eds) Oxford Companion to the Body. London:OxfordUniversityPress,2001.

[13]

N.Mana,F.Pianesi,Modellingofemotionalfacialexpressionsduringspeechinsynthetictalkingheads usingahybridapproach.AuditoryÇŚVisualSpeechProcessing,2007.

co m

[11]

D ow nl oa de d

fro m

w w w .a sd fjo ur na ls .

 

MRXUQDOV#DVGIUHVLQ

ZZZDVGIMRXUQDOVFRP

3DJHRI

Ijhmi0008  
Read more
Read more
Similar to
Popular now
Just for you