296
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 11, NO. 2, APRIL-JUNE 2020
Analysis and Classification of Cold Speech Using Variational Mode Decomposition Suman Deb , Samarendra Dandapat , Member, IEEE, and Jarek Krajewski, Member, IEEE Abstract—This paper presents analysis and classification of a pathological speech called cold speech, which is recorded when the person is suffering from common cold. Nose and throat are affected by the common cold. As nose and throat play an important role in speech production, the speech characteristics are altered during this pathology. In this work, variational mode decomposition (VMD) is used for analysis and classification of cold speech. VMD decomposes the speech signal into a number of sub-signals or modes. These sub-signals may better exploit the pathological information for characterization of cold speech. Various statistics, mean, variance, kurtosis and skewness are extracted from each of the decomposed sub-signals. Along with those statistics, center frequency, energy, peak amplitude, spectral entropy, permutation entropy and Renyi’s entropy are evaluated, and used as features. Mutual information (MI) is further employed to assign the weight values to the features. In terms of classification rates, the proposed feature outperforms the linear prediction coefficients (LPC), mel frequency cepstral coefficients (MFCC), Teager energy operator (TEO) based feature and ComParE feature sets (IS09-emotion and IS13-ComParE). The proposed feature shows an average recognition rate of 90.02 percent for IITG cold speech database and 66.84 percent for URTIC database. Index Terms—Cold speech, variational mode decomposition, mutual information, SVM classifier
Ç 1
INTRODUCTION
T
HE cold speech is one kind of pathological speech, which is recorded from the person suffering from common cold. Common cold is a viral infectious disease, which primarily affects the nose [1]. It normally causes sore throat, sneezing, coughing, runny nose, fever and headache [2]. Common cold affects the nose and throat. Speech is produced due to linear filtering of excitation source information by the vocal tract. The nose and throat provide an impact on vocal tract during speech production. Therefore, the speech characteristics of cold speech are altered from that of the normal speech. Fig. 1 shows two speech signals and their spectrograms for the same sentence, one is the normal speech and the other one is the cold speech. These two signals are taken from IITG cold speech database. Visible differences can be noticed in terms of amplitude and duration. The cold speech has higher average amplitude than that of the normal speech. The duration of the cold speech is less compared to the normal speech. From the spectrograms, it is noticed that the signal intensities spread broadly across the time and frequency scales in case of the cold speech compared to the normal speech. These results show that the cold speech has different signal characteristics from that of
S. Deb and S. Dandapat are with the Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India. E-mail: {suman.2013, samaren}@iitg.ernet.in. J. Krajewski is with the Experimental Industrial Psychology, Wuppertal 42119, Germany and the Industrial Psychology, Rhenish University of Applied Sciences, Cologne 50678, Germany. E-mail: krajewsk@uni-wuppertal.de. Manuscript received 27 Feb. 2017; revised 18 Sept. 2017; accepted 24 Sept. 2017. Date of publication 10 Oct. 2017; date of current version 29 May 2020. (Corresponding author: Suman Deb.) Recommended for acceptance by S. Steidl. Digital Object Identifier no. 10.1109/TAFFC.2017.2761750
the normal speech. Classification of cold speech is defined as recognizing the common cold of a person from his/her voice i.e., whether a person is suffering from common cold or not. Classification of cold speech will be beneficial in following two cases. ðiÞ Normally, speech recognition and speaker recognition are trained using normal speech, and the performances of these systems are also tested using normal speech. As the speech characteristics change under common cold, the performance of speech recognition and speaker recognition systems may degrade when these systems are tested using cold speech. Therefore, analysis of cold speech may help in improving the performances of the speech recognition [3], speaker recognition and man-machine interaction [3], [4], when these systems are tested using the speech recorded from a person suffering from common cold. ðiiÞ Also, analysis/classification of cold speech can provide useful information, which may help in automatic detection and monitoring of health of a person suffering from common cold. Research on automatic assessment of speech-disorders from pathological speech has an increasing interest in the past decade, due to its non-invasive nature. Most of the studies are based on acoustic measures such as pitch frequency [5], [6], amplitude and pitch perturbation [7], [8], [9], glottal to noise excitation ratio [8], harmonic to signal ratio [10], linear prediction coefficients (LPC) [7] and mel frequency cepstral coefficients (MFCC) [9], [11], [12]. All these measures are derived from the linear source filtering concept. Some studies show that the speech is produced due to non-linear motion of airflow through vocal tract [13]. Berry et al. [14] have introduced three different categories (periodic, aperiodic and irregular) based on non-linear dynamics and they have showed that voice disorders
1949-3045 ß 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See ht_tps://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 06,2020 at 01:39:30 UTC from IEEE Xplore. Restrictions apply.