32
Figure 20. Example of Cluster Formations that are not Clearly Defined (top) Upon Which the kelbow Metric Performs Poorly (bottom)
5. Conclusion This paper focused on the analysis and automatic class determination of vocal percussive sounds for offline and online audio durations. First, three vocal percussive datasets were curated containing three, five and six vocal percussive classes. Ground truth class tags were provided in the CSV files associated with the datasets to enable the use of the Rand-Index performance metric, crucial in the configuration of the clustering pipeline. Frame-based features were then extracted, engineered, and reduced in