
3 minute read
II. LITERATURE REVIEW
Figure 2. Stages of an Onset Detection Algorithm (Bello et al., 2005)
Onsets are often categorised into two types. These are:
Advertisement
1) Hard onsets i.e., drums or percussion that exhibit sudden increases in energy 2) Soft onsets i.e., strings that exhibit a gradual change in energy over time.
II. LITERATURE REVIEW
Early detection functions were based on observing the signal in the time-domain and recording fluctuations in the amplitude envelope. This often-involved low pass filtering to reduce the effect of noise (Bello et al., 2005). Though important to acknowledge the methods based on time-domain signal features, these are not discussed further in this paper.
Instead,spectral-based and deep learning-basedmethodsare examined asthese methods betterreflect the trends of high-performing algorithms within the timeframe of interest (Böck, Arzt, Krebs, & Schedl, 2012; Eck, Douglas & Lacoste, 2007; Eyben et al., 2010; Roebel, 2005; Roebel, 2009).
The majority of spectral onset detection algorithms discussed in this paper are based on the ShortTime Fourier transform (STFT). The STFT is used in order to analyse the signal’s spectral properties at specific time frames (Müller, 2016). A moving window function (often a Hamming window) multiplies the signal at various points in time and an FFT is performed on each of the windowed frames. This technique reveals which frequencies are present and at what times (fig 3).

Figure 3. Time-Frequency Analysis via the Short-Time Fourier Transform (Gao & Yan, 2006)
When observing the signal in the time-frequency domain, increases in energy across frequency bins suggest the presence of an onset. This is known as energy-based onset detection and has shown to be highly effective for non-pitched percussive or ‘hard’ onsets (Reiss & Zhou, 2010).
Reiss and Zhou (2007) presented a new technique that combined energy-based and pitch-based detection that received the best overall score in the MIREX 2007 onset detection category. The algorithm made a classification between soft and hard onset types and applied a pitch-based method for soft onsets and an energy-based method for hard onsets (Reiss & Zhou, 2007).
When the sum of energy across frequency bins is taken, only the positive flux gives a good indication of onsets. The energy from a previous note will often decrease as energy for the following note ascends. If all changes in energy were considered there is potential for changes in total energy to be masked at points when positive and negative energy crossover (Reiss & Zhou, 2010). The sum of positive energy across frequency bins is referred to as ‘spectral flux’ . In 2010, Eyben et al described methods that had harnessed spectral flux as “some of the best performing so far” (Eyben et al., 2010).
Onset detection methods involving spectral flux have been shown to perform well on both hard and soft onsets (Böck & Widmer, 2013). For this reason, spectral methods offer an alternative to the algorithm suggested by Bello and colleagues whereby, separate methods were used for soft (pitch-based) and hard (energy-based) onsets (Reiss & Zhou, 2007).
Pitch-based methodsalsorely on the time-frequency decomposition of the signal, wherebythe energy changes of two consecutive notes with different pitches will fall into separate frequency channels (Dixon, 2006; Reiss & Zhou, 2007).
Whilst energy-based methods make use of the magnitude information in the time-frequency domain, phase-based methods consider the phase information across frequency bins. In this case, it can be assumed that a new musical event will have a different phase to a previous event. Phase-based methods have been shown to perform better than energy-based methods when detecting soft onsets (Dixon, 2006). In short, energy increases and phase irregularities in the frequency domain are good indicators of onsets.
In 2007, Eck and Lactose commented that there were few bodies of work combining onset detection and machine learning at the time. The improvement they presented involved feeding the frames of a spectrogram as inputs to a feed-forward neural network (FNN) in order to classify onsets and nononsets. A single-net and a multi-net approach were presented and submitted as entries to the very first onset detection algorithm contest at MIREX 2005. The multi-net variation performed best in the category winning them first place (Eck & Lacoste, 2007). In the ‘further work’ section of the paper accompanying this entry, Convolutional Neural Networks (CNN) are suggested as an improvement upon their work.