Automatic Determination of Vocal Percussive Classes Using Unsupervised Learning

34

References Acero, A., Huang, X. and Hon, H. (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. 1st edn. New Jersey: Prentice Hall. Bachem, O., Lucic, M. and Krause, A. (2017) ‘Distributed and Provably Good Seedings for k-Means in Constant Rounds’. PMLR. Baniya, B. K., Lee, J. and Li, Z.-N. (2014) ‘Audio feature reduction and analysis for automatic music genre classification’, in. IEEE (2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC)). doi: 10.1109/SMC.2014.6973950. Bishop, C. M. (2010) Pattern recognition and machine learning. 10th edn. New York: Springer. Brent, W. (2010) Physical and Perceptual Aspects of Percussive Timbre. University of California. Cabral, F. S., Fukai, H. and Tamura, S. (2019) ‘Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors’, 19(16). doi: 10.3390/s19163481. Choi, K. et al. (2018) ‘A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging’, in. EURASIP (2018 26th European Signal Processing Conference (EUSIPCO)). doi: 10.23919/EUSIPCO.2018.8553106. Chung, Y.-A. et al. (2016) ‘Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder’. Du, X., Xu, H. and Zhu, F. (2021) ‘Understanding the Effect of Hyperparameter Optimization on Machine Learning Models for Structure Design Problems’, 135. doi: 10.1016/j.cad.2021.103013. Dusan, S. and Deng, L. (1998) ‘Recovering Vocal Tract Shapes from MFCC Parameters’. Evain, S. et al. (2021) ‘Human beatbox sound recognition using an automatic speech recognition toolkit’, 67, p. 102468. doi: 10.1016/j.bspc.2021.102468. Everitt, B. et al. (2011) Cluster Analysis. Chichester: Wiley (Wiley series in probability and statistics). Ghahramani, Z. (2004) Unsupervised Learning. Gonzalez, R. (2012) Better Than MFCC Audio Classification Features. New York, NY: Springer (The Era of Interactive Media). doi: 10.1007/978-1-4614-3501-3_24. Hastie, T., Tibshirani, R. and Friedman, J. H. (2009) The elements of statistical learning. 2nd edn. New York: Springer (Springer series in statistics). Hazan, A. (2005) ‘Towards Automatic Transcription of Expressive Oral Percussive Performances’, in. ACM (IUI ’05), pp. 296–298. doi: 10.1145/1040830.1040904. Jiang, N. and Liu, T. (2020) ‘An Improved Speech Segmentation and Clustering Algorithm Based on SOM and K-Means’, 2020. doi: 10.1155/2020/3608286.

Turn static files into dynamic content formats.

Create a flipbook