Acta Psychologica 41 (1977) 309-320 0 North-Holland Publishing Company
TEMPORAL STRUCTURE OF PERFORMED MUSIC SOME PRELIMINARY OBSERVATIONS*
Dirk-JanPOVEL Dept. of Psychology, Universityof Nijmegen, The Netherkznds Received August 1976
After some preliminary remarks concerning the scope and aim of our study, the present paper describes the temporal analysis of a piece of music (the fust prelude of I. S. Bach’s Wohltemperiertes Clavier) as performed by three professional musicians. A number of temporal characteristics are derived that possibly play a role in the perception and appreciation of musical performances. ‘Die Cegenstinde des Vortrages sind die Stiircke und Schwache der Tone, ihr Druck, Schnellen, Ziehen, Stossen, Beben, Brechen, Halten, Schleppen und Fortgehen. Wer diese Dinge entweder gar nicht oder zur unrechten Zeit gebrauchet, der hat einen schlechten Vortrag.’ C. Ph. E. Bach, Versuch tiber die wahre Art das Clavier zu spielen (1753)
Anyone who has ever listened to performances of the same piece of music by different interpreters will have been struck by the differences of approach and the varying degrees of success in conveying the musical content to the listener. One then naturally wonders which characteristics are involved and whether they may be specified in the acoustic domain.
* I am very grateful to mr. Paul van der Mast for his accurate analysis of the tone-oscillograms and his help in making the computations.
structure of performed
The present study is an attempt to discover whether specific features that codeterrnine the attractability of a piece can be found in professionally performed music. In an investigation of this kind the following questions must come into consideration. (1) What characteristics, rules or laws of relationship can be ascertained in, professional music performances? In what dimensions of the acoustic signal are they to be found, and how can they be isolated? (2) Are the differences between performances by different, appreciated professional musicians only gradual, or are they of a structural kind? (3) Connected with the latter problem is the question whether there can be an â€˜ideal performanceâ€™. (4) Which of the discovered characteristics, rules, etc. are perceptually relevant? Is there any hierarchy in the saliency of the different characteristics? How can perceptual relevance be assessed? Because of the complexity of the problem it will later appear to be necessary to restrict ourselves to a certain number of aspects. The type of research referred to here can be of great importance to the practice of music-teaching if it is able explicitly to formulate the manner in which the different dimensions of a melody are manipulated by the professional musician. On the one hand this knowledge can help to make the pupil sensitive to those aspects of the acoustic signal that play a role in the interpretation of music; on the other hand it enables him to gain an accurate idea of what the professional player does while interpreting a piece of music. The untrained musician cannot be expected to infer these features from merely listening to the music (certainly not in detail), nor will the professional musician be able to make explicit in all details what he does when he plays. Textbooks on musictheory are not very helpful either since they are for the greater part imprecise, sometimes even containing contradictions. Research on the actual characteristics of performed musical melodies has been very limited. We mention the research at the Institute of Musicology in Uppsala and especially the work of Gabrielson (1974) who studied the temporal aspects of the production of simple rhythmic patterns on a piano (2 Ss) and on a bongodrum (1 S) and of a number of simple melodies synchronous with a metronome. The relevance of his analyses for musical interpretation is limited because of the very simple melodies used and the rather unnatural manner of production. Besides, the music was played by three professional or semiprofessional
D.-J. Povel/Temporal structure of performed music
musicians (Gabrielson, 1973) who seem to have had rather idiosyncratic and individually variable ways of performing the rhythms which makes it very difficult to generalize the findings. We also mention the work of Fraisse (1956) and Michon (1967) on the synchronization and continuation of given stationary or modulated temporal patterns. We shall now look at the different dimensions that a performer has at his disposal when interpreting a piece of music and to what degree the use of these dimensions is determined by the musical composition itself. We shall here only be concerned with a composition represented in musical notation. It is a remarkable fact that the traditional notation of musical compositions, the score, does not fixate all aspects of the composition, but leaves room for interpretation on the part of the performer. Below we shall list these different â€˜expressive meansâ€™. Whether they can actually be used naturally depends on the type of instrument employed. We shall return to this later. (1) Pitch. In the score the approximate height of â€˜the tone is indicated, but the performer himself determines the exact height of the tone. During the production he can also change the pitch of the tone or he can add a vibrato. He is also free to modulate the speed of the vibrato. (2) Intensity. A musical score may contain dynamic symbols which indicate the relative loudness of a tone or a group of tones (p, f, etc.) but the performer himself again decides the exact loudness with which he executes the tones and thus the ratio of the different signs used. J. Ph. Rameau is said to be the first to use symbols that indicate a dynamic movement (diminuendo, crescendo, etc.). Still later in the history of music several symbols were introduced to describe various shades of loud and soft. Since all these symbols have a relative meaning, the musician must constantly adjust the parameters concerned. Moreover the performer is free to introduce dynamic variations in places where nothing is indicated in the score. (3) Timbre. Indications related to timbre given in the score include stops to be used (organ, harpsichord), the position in which a tone has to be produced (on stringed and plucked instruments) and the manner and place in which they must be produced (e.g. co1 legno, near the bridge). The timbre dimension is typically different from the abovementioned ones in that it is less continuous and more steplike. Here again it is true that the performer is free to give his own interpretation of the musical. symbols.
D.-J. PovelfTemporal structure of performed music
(4) Temporal features. The temporal aspects of a piece of music can be defined at several levels. On the lowest level are tone duration and interval duration. On a higher level more complex temporal phenomena can be discovered that are of a structural character, e.g. accelerations, decelerations and all types of temporal patterns as alternations of longer and shorter segments in all possible combinations. Overall speed in performance (tempo) is another important aspect of the temporal features of a musical performance. From some moment in musical history onwards symbols have been used to indicate tempo (adagio,, allegro, dance forms). These terms are highly variable. Indeed, in practice, very considerable differences are found between performances of the same piece of music. In the score the relative duration of the tones are given; additional symbols are used for marking minor nuances, e.g. barlines, phrasing and articulation binds and symbols indicating accent. Moreover, there are a number of unwritten conventions, specific for each period, that have to be taken into account by the musician. It will be clear from the preceding, that the musician who is to perform a piece of music, is confronted with the very complicated task of transforming the musical transcription into a sequence of tones in such a way that the freedoms are used ‘correctly’. This implies that he must have a good understanding of the ‘content’ of a musical composition on the one hand, and of the way the perceptual system of the listener reacts to the different dimensions and their interaction, on the other.
Table 1 Acoustic dimensions that can be varied on different musical instruments. Human voice, strings Frequency Vibration Intensity Timbre Temporal structure Tempo
+ + + + + +
P h Il O
+ + +
+* + +* f*
* Variations are contaminated by other dimensions. ** Variations are negligible (see text). *** Stops not included.
structure of performed
It is this behavior we wish to examine. Before. discussing the procedure followed, we shall look at the most commonly used instruments and the possibilities that exist to vary on these instruments the different dimensions mentioned above. This information is schematically given in table 1. In order to avoid complicated interactions of the different expressive means, we decided to confine ourselves to only one dimension, i.e. temporal variations. We therefore studied the performance of music played on a harpsichord that only allows variations in the temporal domain, stop variations being excluded (see table 1). From an examination of possible dynamic variation that can be made on a harpsichord, it appears that this never exceeds 5 dB. For practical purposes this may therefore be neglected. By using a harpsichord the interaction of temporal structure and tempo however remains. There is some evidence that the variations that can be brought about in the temporal domain, are the most fundamental. Thissmight already be inferred from the definition that performing music consists in realizing tones in the temporal domain. It is indeed true that almost all grouping and accentuation effects can be achieved by taking advantage of the possible temporal variations. Actually, Gregorian music only uses this type of variation. Finally, we mention the fact that many expressive means have been introduced only later in the history of music, as, for instance, dynamic variations and vibrato. The choice of the harpsichord has an advantage in another respect: the onset time of tones produced on this instrument can be determined relatively easily because of their sharp initial edge. The following working-plan has been set up. In the first stage the temporal structure of a piece of music as performed by a number of professional musicians will be analysed. From these analyses we hope to be able to infer characteristic features and rules that areâ€™applied by the musicians. In the second stage the perceptual relevance of the characteristics found will be determined with the help of especially designed perceptual experiments. The present study is only concerned with the first stage; research with respect to the second question is in progress.
D.-J. PovelfTemporal structure of performed music
Since we intended to determine onset times of the tones from the acoustic signal, only music for one voice could be considered. Moreover, since we wished to trace general laws in the execution of musical sequences, we looked for a piece of music with a simple melodic and rhythmic structure, in which the same rhythmic structure is repeated several times. A piece of music that satisfies both requirements is the first prelude from J. S. Bach’s ‘Wohltemperiertes Clavier’. The piece contains 35 bars, of which the first 32 possess an identical temporal structure and a roughly similar melodic structure ‘(broken chords). Bars 33, 34, and 35 have a melodic structure deviating from the rest. The first 32 bars each consist of 16 tones which, according to the musical notation, must all appear at equal intervals, be it that some of the tones are held longer. Each bar is composed of two identical parts of eight tones. The first four bars of the composition are shown in fig. 1.
Fig. 1. The first four measures of the piece of music.
The material used in the present study consisted in the realization of the piece by three professional musicians: Gustav Leonhardt (G.L.), Zuzana Ruzickova (Z.R.) and Helmut Walcha (H.W.), available on gramophone records. First, the performances were copied on tape (Philips PRO 12) which enabled us to replay the piece without loss of quality. Next, the signal was fed through a special circuit which facilitated the determination of the onsets of the tones. This circuit (see fig. 2) contains a high-pass filter (3 dB point at 5769 Hz). The output of the filter, after amplification, was divided into two branches: one was directly fed into a UV recorder, the other only after rectification. Thus two oscillograms were obtained (see fig. 3).
Fig. 2. Block diagram of the apparatus used for the visual display of the music.
D.-J. PovelfTemporal structure of performed music
Fig. 3. Oscillogams of the music. Top: high-pass filtered and rectified. Bottom: hi-pass filtered.
The moments of onset of the tones were determined by eye, making use of both oscillograms. Sometimes the one oscillogram gave more information, sometimes the other. Occasionally (in about 3% of the cases) it appeared to be rather difficult to decide exactly the moment of onset of a tone. On the average the moment of onset could be assessed with great precision: within l-2 msec. When we see how difficult it is to determine the moment of onset of a tone from the oscillogram, even for the harpsichord with its steeply edged tone, we realize that we are in need of a theory on the procedure followed by the human perceptual system in determining the moment of onset of tones played on musical instruments. â€˜The more do we need such a theory, when it is considered that on several instruments the moment of onset is much less clearly defined, while at the same time there is no empirical evidence that the actual performance of music on these instruments is less critical. Results
From the analyses matrices were composed, one for each of the three musicians. Each matrix contains the duration of each of the tones, duration being defined as the time elapsed from the beginning of one tone to the beginning of the next, which means that the continuation of a given tone after the onset of the following one was not taken into account. It should be mentioned, however, that the first
8 9 10 11 12 13 14 15 16 ,,I, 8 1 I, I, I I
)I_ 1. -_ IIII-
10 11 12 13 14 15 16
Fig. 4. Average temporal patterns as realized by the three performers. The vertical dashes indicate the standard deviation; in cases of overlap the dashes are given in only one direction.
two tones actually do continue to sound during approximately the first half of the bar. The analysis used did not enable us to assess this aspect. Thus, a matrix was obtained consisting of 35 rows (the 3 5 bars) and 16 columns (the 16 tones per bar). A measure of the average temporal structure used by the musician was found by determining the mean values of corresponding tones over all bars, i.e. the column means. Actually, we took the mean of the first 32 bars, since, as mentioned before, the three last bars deviate considerably from the rest. Fig. 4 presents the mean temporal structure of the three musicians and the standard deviation of the tones. In order to get an indication of the representativity of the measure used, a number of new profiles were made based on the calculated mean temporal structures. In making these profiles, a variable criterion was used to allow for calling a difference between the duration of two successive tones a real difference, whereby the preceding tone was used as reference. If the actual duration of a tone did not
D.-J. PovellTemporal structure of performed music
meet the criterion, it was considered to be of the same duration as the former tone. Criteria of 5, 10, 15,. . . 50 msec were used. As could be expected, it appears from these analyses that the greater the differences the more resistent they are, which presumably means that these are the ones that are systematically made by the performer whereas the smaller deviations are supposedly random. However, whether the deviations are intended or random cannot be inferred with any degree of certainty from these data; at most they may be assumed. Moreover, for each of the three performers we computed the Kendall Coefficient of Concordance (Siegel 1956) on the data ranked per bar and found the following values respectively: G.L.: 0.44; Z.R.: 0.28; H.W.: 0.26. The observed values, though not very high, all appear to be highly significant (p <O.OOl). Returning to the average temporal structures as produced by the three performers (see fig. 4), firstly, the differences in tempo among the three musicians are apparent: the overall mean duration of the tones are 181.8 (G.L.), 212.1 (Z.R.), 249.7 msec (H.W.) respectively. Secondly, there are the more interesting differences between the mean temporal structures the musicians use. First, there are considerable differences with respect to variation in tone duration as applied by the three performers. The mean standard deviations around the bar means are, respectively, 19.72 for G.L., 13.47 for Z.R., and 5.88 for H.W. These data are the more remarkable when it is considered that the duration differences are greater for the faster playing performers. One would expect a priori the inverse relation: the longer the mean duration of the tones, the greater the applied deviations. A second and striking point is the fact that in G.L.‘s interpretation the last tone of each bar is on the average longer, whereas in the other two performances the same tone is relatively shorter. Another interesting point is the fact that, though each bar is composed of two identical sequences of eight tones, there are considerable differences between the performed temporal structure of the two halves, especially in the case of H.W. The profiles of the two halves as played by Z.R. are similar, but the differences in the second part are much more pronounced. There is one other characteristic in the data that should be mentioned, since it may also have perceptual relevance. As a matter of routine we determined the correlation between the row means and the corresponding row variances, expecting to find a high correlation. Such a high correlation is to be expected if one assumes that the higher the overall tempo, the smaller the differences in tone durations that are produced deliberately. This assumption is reasonable if Weber’s law (and Fechner’s extension) applies in the case of discrimination of duration. Michon (1964) has found that Weber’s law approximately holds for the perception of j.n.d.s. of (empty) intervals between 100 and 250 msec, in which region most tone durations in our study he. Small and Campbell (1962), who studied the temporal discrimination of tones, report an increase of j.n.d. for lower values: 0.17 for 400 msec and 0.20 for 40 msec. Unfortunately, they did not study the region between these two values. For the three performers, the following correlations were observed: G.L.: 0.93, Z.R.: 0.31 and H.W.: 0.20. These are conspicuous differences. They show that G.L. very exactly adjusts the size of the produced duration deviations to the speed he is playing at each moment, such that when playing faster he makes the deviations correspondingly smaller. The other two do this to a consider-
D.-J. PovellTemporal structure of performed music
ably lesser degree. Operationalization of momentary tempo as the mean duration of tones per bar is perhaps rather inadequate since it is unlikely that tempo changes are made per bar; they will sooner be made continuously. Michon (1966) has pointed out that, since it does not discriminate between temporal irregularity and a slow shift in tempo, the standard deviation is not a suitable measure for produced duration differences. He proposes instead an other measure: the absolute value of the differences (IAtl) between successive intervals. Therefore we also computed the correlation between mean duration and (Z:lAtl)/n for each bar. The correlations we thus found are: 0.58 (G.L.), 0.16 (Z.R.) and 0.20 (H.W.). Although now the observed differences are smaller, it is still worth trying to discover whether this feature influences musical judgment.
Discussion Of course, the data thus far collected provide only a first step to an understanding of the determinants that make a musical performance interesting. Till now we have only traced a number of temporal characteristics of one piece of music performed by three professional musicians. Moreover, because of limitations in the acoustic analysis used, we were not able to examine other temporal characteristics which might equally well play an important role, as, for instance, overlap of tones or the release of a tone just before the following tone arrives, an articulation technique frequently used in harpsichord playing. This type of temporal phenomena can presumably more satisfactorily be studied with the help of an acoustic analysis that makes use of the frequency aspects, as, for instance, the spectrum analysis technique as used by Pols et al. (1973). The temporal resolving power of this technique is limited since the spectrum is determined only every 10 msec; it can be improved, however, by slowing down tape speed. The next stage of research will be to determine the perceptual relevance of the different characteristics of the temporal structure. For this purpose a number of perceptual experiments have been designed. In these experiments we make use of a computer-controlled electronic organ (Maarse 1974) that can generate the piece of music with any desired temporal structure. In order to determine the perceptual relevance of different features, subjects are asked to compare different realizations of the piece. However, the whole piece has a duration of about two minutes which is, of course, too long to enable an efficient collection of various judgments. It was therefore decided to use only
D.-J. Povel/Temporal structure of performed music
tonenumber Fig. 5. Successive reduction of a temporal pattern (see text).
the first four bars and to impose different temporal structures on these. Subjects will be asked to compare the first four bars in which all tones have their original durational values with the four bars in which each has the computed mean temporal structure in order for us to ascertain the representativeness of this average concept. Secondly, we will try to determine the saliency of the mean temporal patterns by investigating to what extent subjects are able to discern the mean temporal pattern as produced and the division into the four bars when all tones have the same duration, viz. the mean value of the mean temporal structure. Next we shall try to find out which aspects of the mean temporal pattern are perceptually relevant by reducing the pattern in successive steps whereby at each step one feature of the pattern is omitted. An example of such a possible successive reduction is given in fig. 5. From
structure of performed
the comparisons of the transformed patterns with the original it should be possible to determine a hierarchy of importance of the various features for perception. The problem remains that the perceptual task used may have only little bearing upon the complicated perceptual processes that are involved in listening to music and in its appreciation. Perhaps at a later stage, when our understanding of the determinants of music appreciation has grown, we may make more realistic experimental designs. The computer-coupled organ system mentioned above can provide instantaneous visual feedback of the temporal pattern of a piece of music performed on the organ. This feature would suggest its usefulness as a training device. The latter possibility will be investigated in the near future.
References Fraisse, P., 1956. Les structures rythmiqued. Lo&m: Studia Psychologica. Gabrielson, A., 1973. Similarity ratings and dimension analysis of auditory rhythm patterns. Scandinavian Journal of Psychology 14,138-160. Gabrielson, A., 1974. Performance of rhythm patterns. Scandinavian Journal of Psychology 15, 63-72. Maarse, J., 1974. Een programmeerbaar electronisch orgel. Univ. of Nijmegen (intern. rep.). Michon, J. A., 1964. Studies on subjective duration I. Differential sensitivity in the perception of repeated intervals. Acta Psychologica 22,441-450. Michon, J. A., 1966. Tapping regularity as a measure of perceptual motor load. Ergonomics 9, 401-412. Michon, J. A., 1967. Timing in temporal tracking. Assen: Van Gorcum. Pols, L. C. W., H. R. C. Tromp, R. Plomp, 1973. Frequency analysis of Dutch vowels from 50 male speakers. Journal of the Acoustical Society of America 53,1093-1102. Small, A. M. and R. A. Campbell, 1962. Temporal sensitivity for auditory stimuli. American Journal of Psychology 75,401-410. Siegel, S., 1956. Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill.