Automatic Determination of Vocal Percussive Classes Using Unsupervised Learning

Page 16

16 Feature extraction and aggregation are performed within the FeatureExtractor class. The name associated with each audio feature (i.e., ‘mfcc’) and its method of aggregation (i.e., ‘stack’) are stored as key-value pairs in a Python dictionary feature_dict. feature_dict = { "mfcc": "stack", "delta_mfcc": "stack", "delta_delta_mfcc": "stack", }

The dictionary is passed as a parameter to the FeatureExtractor constructor and stored as a member variable self.features for use in the _perform_extraction method. The extracted and aggregated features are appended to an overall feature vector, returned from _perform_extraction as a NumPy array. The shape of this feature vector is known as its dimensionality.

3.6 Feature Engineering Audio is considered as high dimensional data due to its many features, therefore, it is often beneficial to apply dimensionality reduction to the feature vector prior to clustering. Principle Component Analysis (PCA) (also known as the Karhunen-Loève transform) projects the data onto a lowerdimensional subspace, referred to in the literature as a principle subspace (Baniya, Joonwhoan Lee and Ze-Nian Li, 2014). Another consideration is the scaling of the feature set to achieve a standard range and variance (Choi et al., 2018). The following feature handling steps were selected according to the work of Stowell, 2010 with reference to engineering audio features for QBV. Following extraction, features were either standardised to have a mean of 0 and a standard deviation of 1 or normalised between 0 - 1 across all vocal percussive elements. The scaled feature vector was then reduced in dimensionality via a dimensionality reduction technique. PCA has been the most widely used data reduction technique during this work but other options for dimension reduction were explored, including Factor Analysis (FA) and Independent Component Analysis (ICA). PCA is employed in this project as a method of dimensionality reduction to reveal principle components (PC). These are latent axes that best represent the variance of the data (fig 8). Number of components (i.e., the number of dimensions to reduce to) was chosen through a combination of manual hyperparameter testing, scree plot analysis, and automated hyperparameter tuning (this process is described at length in chapter 3.10). A scree plot (fig 7) demonstrates how much of the feature set’s variance is captured by each PC in the principle subspace. This can assist with the selection of an appropriate value for number of components when creating a PCA model.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.
Automatic Determination of Vocal Percussive Classes Using Unsupervised Learning by RachelLockeDigital - Issuu