Psychol Res (1996) 58:274-283
ÂŠ Springer-Verlag 1996
Exploring the elementary harmonic forces in the tonal system
Received: 8 June 1995/Accepted: 21 August 1995
Abstract To study the elementary expectations that arise in the perception of tonal music, an exploratory study was performed in which subjects heard a sequence of four chords (to induce a key or tonality) followed by one tone (1 of the 12 tones within an octave), and indicated which tone they expected to be the next tone. Responses were divided into three groups: a random, afifth, and a tonal group. The expectations in the tonal group are of main interest, as they seem to originate from a tonal representation. In this representation the tonic attracts almost all tones, although to different degrees, while the dominant and the mediant function as centripetal centers that attract only nearby tones. A preliminary model for the formation of elementary expectations based on a 3-level mental representation of the tones in a key is proposed. The predictive capability of the model is compared to that of existing models.
Introduction At certain moments when listening to music, one may have strong and clear expectations concerning the future course of a melody or chord progression. These expectations are an intrinsic part of the process of music listening and appear to play an important role in the appreciation of music, especially in the evocation of emotional and esthetic reactions. The perceptual significance of the phenomenon of expectation in music listening has been recognized by many scholars and is a central issue in theoretical work of Cooke (1959), Jones (1981,1982), Meyer (1956), Narmour (1990, 1992), Zuckerkandl (1956), and in the research of D.J. Povel Nijmegen Institute for Cognition and Information (NICI), PO Box 9104, 6500 HE Nijmegen, The Netherlands; E-mail: Povel@NICI.KUN.NL
Bharucha (1984, 1994), Bigand (1993), Larson (1992, 1994), Schmuckler (1989), Schmuckler and Boltz (1994), among others. Expectations that develop while one is listening to music are somehow related to, and result from, perceived structural characteristics present in music. Two different types of structure in music underlie the generation of expectancies. (1) Sequential regularities in the contour of pitch sequences. Expectations related to this structural characteristic basically result from the extrapolation of a discovered regularity. For instance, the melodic fragment CDECDEFD, given that it is heard as CDEC, D E F D may give rise to the expected continuation EFGE (assuming that the fragment has induced the C major scale in the listener's mind, thus providing a set of elements to base the extrapolation on). This type of structural regularity is describable as operations on alphabets (Deutsch & Feroe, 1981; Simon & Sumner, 1968). Recently, two studies have investigated the formation of expectancies based on the rules proposed by Namour's (1990) implication-realization model (Cuddy & Lunney, 1995; Krumhansl, 1995). (2) Harmonic relations between the elements in the tonal system. These relations come into existence when the tones in a sequence are conceived of as members of a specific key or scale. In the above example, the tones CDE were conceived as the first three degrees of the scale of C major, but they could also have been conceived as members of other keys, (e.g., F or G). Depending on the scale in which the tones are interpreted, the tones acquire specific characteristics that determine possible expectations. As this study especially aims at discovering the principles that underlie the formation of the most basic expectations caused by harmonic factors, we shall first discuss the nature of these characteristics and look at the process that creates these expectancies. When a tone is recognized as a member of a particular key, it acquires a musical meaning or function
275 that has two main aspects. First, it obtains a certain stability, as all tones of a key are part of a hierarchy of stabilities (Krumhansi, 1979). Secondly, the tone may acquire a tendency to be attracted by another tone, depending on its relative stability. In general less stable tones tend to be attracted by more stable tones. Thus, an unstable tone will be perceived as having a dynamic character, setting up a vector pointing in the direction of a nearby more stable tone (Bharucha, 1984). A wellknown example of an expectation-creating unstable tone is the seventh scale degree of the major scale (the ti) which strongly evokes the last tone of the scale (the do). This tendency is reflected in the musical name for that degree: the leading tone. The tendency of less stable tones to be attracted by more stable ones causes an experience of motion in music and forms an essential source for expectancies. The question is how. The process by which harmonic expectations arise may be outlined as follows. After a sufficient number of pitches have been presented, a frame of interpretation will be induced in the listener that captures the relations between the tones in a key (assuming that a tonal piece was presented). Several models describing the intrinsic characteristics of the tonal system have been proposed (Bharucha, 1987; Krumhansl, 1979, 1990; Lerdahl, 1988; Longuet-Higgins, 1962/1987; Schoenberg, 1969/1954). The pitches presented are then identified as tones, e.g., do, re, mi or more adequately as: tonic, supertonic, mediant). By this identification, described above, the tones acquire a certain stability value as well as one or more vectors that point toward more stable tones to which they are attracted (Bharucha, 1984; Povel & van Egmond, 1993). These vectors can thus be seen as expectations evoked by these tones. A number of experimental studies have investigated issues directly or indirectly related to the present study. Lipps (1905) studied what was called "melodic trend" in melodic intervals. This trend consists in a preference for either the first or the second tone of a melodic interval (two tones presented consecutively) as an end tone. The findings can be summarized as follows: the preferred final tone for the fifth (2:3) is the lower tone; for the fourth (3:4) it is the upper; for the major third (4: 5) it is the lower; for the minor third there is no tendency; for the major second (8 : 9) it is the lower; and for the minor second (15: 16) it is the upper. Lipps described his results in the "law of the number 2" which states that the preferred end tone is always the tone that is a power of two. If there is no such tone, as in the case of the minor third (5:6) or the major sixth (3:5), there is no melodic trend. Lipps's study was replicated and extended by Van Dyke Bingham (1910) in an experiment in which all of the melodic intervals within one octave (octave not included) were presented, both ascending and descending (in random order to prevent context effects), and subjects were asked "Can your make this second tone
a final tone? Does this melody end?" (p. 23). From the results he concluded that the descending perfect fifth and the descending major third show the strongest tendencies of all the melodic intervals studied. In both cases the tendency is to hear the second tone as final. From a detailed analysis of all his results he concluded that: "Two melodically 'related' tones tend to establish a tonality, and the melody [the melodic interval] is judged to end only when the final tone is one of the members of the tonic triad-preferably the tonic itself" (p. 34). This is an important observation, showing that in interpreting these results one should reckon with the possibility that two tones may be enough to establish a tonality (key), which subsequently greatly influences the results. Carlsen (1981) studied the melodic expectancy of melodic intervals in an experiment in which he presented all two-tone intervals (called "melodic beginnings") within the equal 12-tone tempered octave. Like Lipps (1905) and Van Dyke Bingham (1910), he presented the intervals without any (tonal) context. Subjects sang the tone(s) they expected to follow the melodic interval, of which only the first sung tone was analyzed. The following findings were reported: virtually all response intervals (the intervals between the tone presented last and the tone first sung) were less than an octave (above or below the tone last presented). The most frequent response interval was the descending minor second (19.9%) followed by the ascending minor second (14.4%), the ascending major second (13%), the descending major second (11.6%), the prime (6.4%); etc.; 83.7% of the response intervals were less than a perfect fourth. The results show an enormous variety in the responses, depending on the size of the interval presented: the minimal number of different responses is three, while for most stimuli the number of different responses is seven or more. This large diversity may have been caused by the fact that the stimuli (the melodic intervals) were not presented in a tonal context. As a result of this, the induced tonal context, if any, may have been different at different occasions, resulting in the noisy data. Also, the option that the subjects could sing either one or more tones may have added to the large variation. Since we do not know what the tonal quality of the presented intervals has been (e.g., whether the interval of an ascending minor third was conceived of as the interval between the tonic and the mediant or as the interval between the submediant and the tonic), we must conclude that these data are not directly relevant for the topic under investigation. Krumhansl (1979) performed a study in which key induction was deliberately controlled. She presented a scale or a chord (to establish a context) followed by two tones, whereupon subjects judged the similarity of the two tones. The direction of the similarity was emphasized as the subjects had to indicate how similar the first tone was to the second. In a replication of this
study (Krumhansl, 1990, pp. 111 ft.) that used so-called circular tones (Shepard, 1964) to eliminate the effects of the height factor, and also included a minor-key context, subjects were asked to rate "how well the second tone followed the first in the context provided" (p. 124). The results of these experiments can be represented graphically by a cone in which the 12 chromatic tones are ordered by pitch proximity on the surface of the cone and in which the vertical dimension represents the tonal hierarchy as determined by Krumhansl and Kessler (1982). On this cone the tones of the tonic triad, the diatonic tones (excluding those contained in the tonic triad), and the nondiatonic tones appear on different levels (See Krumhansl 1979, p. 357). However, the conical representation does not capture the distinct asymmetries found in the data (e.g., C-B received a rating of 6.00 on a 7-point scale and B-C a rating of 6.53 in the 1979 study; for the 1990 study these ratings were 3.67 and 6.42 respectively; difference between studies may be due to the different type of sounds used). In general it was found that subjects gave higher ratings to intervals of which the second tone was more closely related to the key (tonality), especially when the second tone was one of the three elements of the tonic triad. For the present investigation this finding of asymmetry is most relevant, as it may be seen as a tendency for unstable tones to move toward more stable tones. Here I want to mention that Krumhansl's (1979) research was not primarily aimed at studying expectations. Nor is the procedure very suited for this purpose. First, if a subject indicates that two tones are similar (in the context presented) this does not mean that the first tone generates an expectation for the second. Secondly, by the presentation of a pair of tones and the query of how well the second tone followed the first, a continuation is suggested that may be conceivable and thus acceptable for many reasons, which may be absent in spontaneous expectancy generation. In this study we have investigated musical expectations using a procedure that controls key induction and directly asks subjects to indicate which tone they expect to follow a presented tone. In the present study, which is the first of a series in which we trace the mechanisms underlying the formation of harmonic expectations, we have tested the basic "tensions" that exist between the different tones in the tonal system according to Cooke (1959, pp. 46-47). In Cooke's view the tensions between tones originate from their relative position in the harmonic series: the earlier a tone appears in that series, the more basic it is, and the stronger it attracts tones that appear later in the series (p. 46). Applied to the C major scale, this leads to the following dynamic configuration (all chromatic tones denoted as sharps): C is the fundamental note; and G and E are also relatively fundamental, although they are pulled back to C. The other notes are pulled back to these three: DandC# toC;FtoE;F#,G# a n d A t o G ; A # to
Fig. l Attractions (tensions) between the tones of a major key (displayed in the key of C major) according to Cooke (1959)
A; and B to C. D # is pulled towards E, although this tone supposedly has a special position being (in its enharmonic form E flat) the third degree of the minor mode. This dynamic configuration is schematically displayed in Figure 1. Similar ideas have been proposed by Zuckerkandl (1956, pp. 34 ft.), and can partly be inferred from Lerdahl's (1988) model of the tonal pitch space as we shall see later.
Method Subjects. Thirty-one subjects, students and staff of the Psychology Department of Nijmegen University, participated in the Experiment. The subjects had widely varying levels of musical experience, but all had played a musical instrument for several years. None of the subjects had any formal musical training. Stimulus material. Each experimental trial consisted of a cadence followed by one tone. The cadence involved was the chord progression I-IV-V7-I that was formed using common rules of voice leading. Thus the respective chords comprised the following tones (translated to the key of C) : I: C2-E3-G3-C4; IV: F2-F3-A3-C4 and V7: G2-F3-G3-B3. The tone following the cadence could be one of the 12 tones between C4 and B4 (also translated to the key of C). The stimuli were generated on a Rhodes 770 synthesizer, using a harpsichord sound (Harpsichord 1). The synthesizer has seven octaves. Stimulus presentation and response collection were controlled by an Atari 1040STf personal computer. Design. During an experimental session a subject reacted five times to each of the 12 stimuli. Stimuli were presented in random order, and in a different key in each trial, to avoid undesired sequential effects. The key was varied by the shifting of the lowest tone of the first chord between - 5 semitones below and + 6 semitones above C2. The tone following the cadence was transposed accordingly. All subjects performed the experiment twice, so that for each subject 10 repeated measures were obtained for each tone.
Procedure. Subjects were seated in front of the keyboard of the synthesizer. At their right-hand side was a computer that controlled stimulus presentation. The monitor of this computer was placed above the synthesizer. After a subject had pressed a mouse button,
277 a trial, consisting of a cadence followed by one tone, was presented. Subsequently the subject was asked to sing the tone (s)he expected to follow the presented one and to find and play that tone on the synthesizer. To facilitate the finding of the sung tone, the keys of the keyboard were provided with small numbered labels and the number of the tone presented was displayed on the computer screen. Subjects could repeat the presentation of the trial as many times as desired. Stimulus presentation and experimental set-up are shown in Figures 2 and 3. When the subjects had found and played the expected tone, they pressed the highest tone on the synthesizer to signal the computer
optional repeat V
I IV V7 I ] [---][ //~ / ~ 800 800 800 800 800 800 ms Response
press time -----~Fig. 2 Scheme and timing of the stimulus presentation
R h o d e s 770
Illllll@llllllllllI[llll i Computer Subject Fig. 3 Experimental set-up
program that the trial was finished. Before the beginning of the experiment, subjects were familiarized with the procedure by participating in 10 practice trials, which were repeated once if needed.
Results It became apparent during the experimental sessions that the task was not easy for most subjects. For some subjects it proved difficult to find the expected note on the synthesizer, but the major problem for the subjects was to decide which tone they actually expected to follow the presented tone. Most subjects needed some time to develop a stable strategy of responding. Because the responses of the second session were much more consistent, only those will be discussed. It was also evident that not all subjects used the same strategy of responding: part of the subjects seemed to respond globally in accordance with Cooke's (1959) hypothesis discussed in the Introduction and displayed in Figure 1, while another part of the subjects seemed to expect a tone that was either a fifth below the presented tone or a fourth above it. A third group of subjects did not seem to respond at all systematically. The data pooled over all subjects and expressed in terms of pitch classes (indicated as tone names rather than as numbers), are shown in Table 1. All data are translated to the key of C. The data indicate the percentage of times a specific tone was chosen after each presented tone (for the sake of clarity, values below 5% have been omitted). The bottom row of Table 1 represents the frequency of occurrence of the different responses (expressed as percentages); this row was computed from the complete matrices, not from the matrix shown in the Table in which all cells with values below 5% were deleted. The response tendencies of the first two groups are clearly descernible in the table. Those subjects who follow Cooke's (1959) hypothesis produce a profile in which the tones C, E, and G occur quite
T a b l e 1 Expectations obtained
for all 31 subjects
C C# D D# E F F# G G# A A# B
C 26 39 26 29 46 14 10 67 13 24 14 53
35 10 12 5 8
F 28 6
A 5 5 6
26 7 9
44 6 8
G 21 5 27
34 26 8 6
Note. Values in percentages (values < 5% omitted; except bottom row)
6 5 5
often, whereas the responses of the second group mainly lie on diagonal lines, indicating the selection of a tone at a fixed interval (e.g., when presented with a C they tend to respond with an F, when given a C # with a F # etc.). In order to separate the three groups in a statistically reliable manner, we proceeded as follows: For each of the 31 subjects we compiled a separate matrix (comparable to the one shown in Table 1) representing the percentage of responses given for each presented tone. Next we created a theoretical matrix representing the responses that would be obtained if Cooke's hypothesis were strictly followed. A second theoretical matrix was created, representing a subject who either produces a tone a fifth below or a fourth above the presented tone. Subsequently, the subject matrices were correlated with the two theoretical matrices and those subjects that showed a significant correlation with the Cooke matrix were assigned to a tonal group, while those subjects that showed a significant correlation with the second theoretical matrix were assigned to a fifth group. Table 2 Expectations obtained in the Random G r o u p (7 subjects)
Thus we obtained a tonal group comprising 15 subjects and a fifth group comprising 9 subjects. The remaining 7 subjects were assigned to a random group. We used a second criterion to verify this last assignment. For all subjects a random score was calculated by measurement of the extent to which a subject responded randomly. This random score was obtained by the summing over all rows of the number of different responses and of adding these numbers. A completely random way of responding-that is, if a subject responds differently in each trial-would theoretically yield a score of 60, whereas a completely systematic way of responding would yield a response of 12. For the subjects in the random group we obtained an average random score of 36, in the tonal group of 21.8, and in the fifth group of 28.5, justifying the division into the three groups. The response matrices of the three groups are shown in Tables 2, 3, and 4. We shall now deal with the three groups in succession.
C C# D D# E F F# G G# A A# B Total %
Expected C 14 17 11 11 9 14 6 29
11 6 6
11 9 17
D ll 9 9 23 9 6 6 11 14 9
D# 6 14
F 20 11
9 9 6
17 17 17 29 17 6 17 9 6
G 17 34
29 9 6 6 11 6 20
20 6 14 6 6 11
i7 11 17 9 11 12.4
29 6 9
A 20 9 11
11 6 6
17 11 9 14 11
6 23 6 14 6.7
Note. Values in percentages (values < 5% omitted; except bottom row)
Table 3 Expectations obtained in the fifth group (9 subjects)
C C# D D# E F F# G G# A A# B Total %
7 16 11 11 82 27 27
F 71 9
F# 64 7 7
G 11 13 64 7
13 62 7
Note. Values in percentages (values < 5% omitted; except bottom row)
279 Table 4 Expectations obtained in the tonal group (14 subjects)
Expected C 44 69 48 51 81 15 11 76 9 28 24 95
C C# D D# E F F# G G# A A# B
5 40 32 11 65 7 13 7
7 81 5
61 52 13
Note. Values in percentages (values < 5% omitted; except bottom row)
The random group. The main thing we want to remark about this group, whose data are displayed in Table 2, is that apart from the very random-like response behavior of the subjects, we do see that the diatonic tones are selected more often than the chromatic tones than would have been expected on a random-based selection. (If the different alternatives had been chosen equally often, each pitch class would have been selected in 8.3% of the cases..)
The fifth group. The most important characteristic of the responses in this group is shown in the diagonally filled areas in the matrix of Table 3. This pattern results from the subjects expecting a tone that is either a fifth below the presented one or a fourth above it. This behavior cannot be described as a strategy always to choose a tone that forms a fixed interval with the presented tone. In that case we would have found that subjects, having been presented C4 in the context of C major, would have responded with either F3 or G4. It rather seems that the presented tone is conceived of as the root of an interval that, depending on its inversion, forms either a fifth or a fourth (Thomson, 1993). This behavior might be understood as an attempt to produce a melodic cadence (sol-do) mirroring the cadential gesture in the stimulus (albeit in a different key).
Fig. 4 Expectations in the tonal group. Expectations are shown as arrows pointing toward the three main attractors: tonic (C), dominant (G), and mediant (E). Length of the arrows is proportional to the frequency of expectations (Only attractions that occur in at least 10% of the cases are displayed). The C arrow in the G circle is put in to parentheses, as these responses probably result from an experimental artifact
The tonal group. Table 4, presenting the results of the tonal group, first of all reveals that the distribution of responses is completely different from that of the fifth group. Here we see an even more peaked distribution, showing that the candidate tones for the expectations are drawn from only a small set of tones. If we look at the overall results (bottom row of Table 3), 45.9% of the responses opted for the tonic, 21.3% for the dominant, and 14.9% for the mediant. That is, 82.1% of the responses pertain to the components of the tonic triad. The subjects in this group hardly show any tendency to choose a tone that is a fifth downward or a fourth upward.
The data of the tonal group are displayed in Figures 4 and 5 in a form that may help to understand their meaning better. As in the Tables, in these Figures all data have been transposed to C major so that we can speak of C, D, E, etc., instead of the abstract terms tonic, supertonic, mediant, etc. In Figure 4 the three main attractors C, G, and E, have been put in the center of three circles while the diatonic and chromatic tones have been placed on the circumferences. From some of these latter tones an arrow points toward the tone in the center; the length of this arrow is proportional to the extent that a tone on the circle is attracted by the tone in the center.
% chosen 50
~ ~ C
i i g
L_I U a# b
Presented tones Fig. 5 Expectations in the tonal group, s h o w n from the perspective of the presented tones. T h e lengths of the arrows s h o w the percentage of times w h e n the indicated attractors (targets) were c h o s e n (only attractions t h a t occur in at least 10% of the cases are displayed)
The Figure clearly shows that (1) the tonic is a strong and general attractor that attracts almost all tones; the tonic attracts proximal tones (B, C # ) as well as more distant tones (E, G); (2) the dominant is the second strongest attractor which attracts the nearby tones: F # , G # , A, and A # , in order of frequency. Surprisingly the G also attracts the C quite often but this is probably an experimental artifact caused by a bias in the subjects to expect another tone than the one presented; (3) the mediant, although chosen less frequently than the dominant, has a similar character in that it functions as a local attractor for F, D, and D #. Figure 5 shows the same data from the perspective of the presented tones, showing which tone(s) function as attractors or targets for each of the presented tones. This Figure again clearly indicates that the Tonic, Dominant, and Mediant are the most important attractors. It further shows that some tones have mainly one attractor, e.g., B and C # (target: C), while others have more than one attractor as, for instance, D (targets C and E), A (targets G and C), and A # (targets A, C and G).
Discussion In analyzing the results of this experiment we have identified three distinct groups: the random, the fifth, and the tonal groups. Because our main interest is in the so-called tonal expectations we shall focus on the results of the tonal group, discuss a few relevant aspects of it, and then try to model the results. Next, we shall briefly discuss the meaning of the three different response behaviors. First we want to judge the correctness of Cooke's (1959) hypothesis, which formed the starting point of this study. Comparing the results of the tonal group with Cooke's hypothesis, schematically presented in Figure 1, we may conclude that the hypothesis is over-
all correct, although it is too limited in a number of respects. First, Cooke's hypothesis does not predict that several tones generate the multiple expectations that we have found. Such multiple expectancies are especially observed for D, D # , A, and A # . Second, D # more often generates an expectation for C than for E, contrary to Cooke's prediction for this tone according to which it always resolves to E. The tones D, A, and A # generate alternative expectations to the one predicted by Cooke in a considerable number of instances. A second issue relates to the notions of stability and attraction. As was stated in the Introduction, these two notions are usually considered complementary, in the sense that stable tones function as attractors for the less stable ones. So the hierarchy profile for the major key reported by Krumhansl may be compared with the attractor profile obtained in this study. Using a probetone technique, Krumhansl and Shepard (1979) and Krumhansl and Kessler (1982) collected probe-tone ratings, showing how well all 12 chromatic tones completed (1979 study), or fitted with (1982 study), the previously presented tonal context (a scale or a cadence). Tones receiving high ratings are supposed to be highly stable and considered better representatives of the tonal system or key. If there indeed is a direct relation between stability and attraction, the major key profile should predict how often each of the tones functions as an attractor. In other words, the hierarchy profile observed by Krumhansl and colleagues should resemble the attractor profile obtained for the tonal group in the present study. In Figure 6 the major key profile (taken from Krumhansl and Kessler, 1982) has been plotted together with the attractor profile from the tonal group (Table 3, bottom row). Although the two profiles are globally quite similar (as expressed by a correlation of .89 between the two variables), there are also some interesting differences: in Krumhansl's profile all tones, especially the diatonic tones, have a distinct level of stability, with the strongest levels for C, G, and E respectively, followed by the diatonic tones F, A, D, and B. From the attractor profile inferred from the tonal-group data, one is tempted to conclude that in the mental representation of the subjects there are only three elements that have a distinct stability and that consequently function as powerful attractors: namely the three elements of the tonic triad. The F, which in Krumhansl's hierarchy has a relatively high degree of stability, hardly seems to function as an attractor for any of the tones. Although these differences may suggest that there is no direct relationship between stability and attraction, I rather believe that the probe-tone technique allows one to measure how well tones fitted in the presented context, but is not necessarily very suited for determining the attractive forces between tones. On the other hand, it is easily conceivable that the production task of this experiment only reveals the stronger attractors, those
Data from Krumhansl & Kessler 1982 6. _= J
k 3 2
'>/%v c d#D # E b r4 d o#
50. Data from present study
2o ~, ,<
octave 4 2 1
Fig. 7 The basic space ("oriented toward the tonic triad") from the tonal-pitch space model of Lerdahl (1988); numbers denote pitch classes
10 0 C C# D D# E F F-# G G# A A# B Tones
Fig. 6 A c o m p a r i s o n of the hierarchy profile observed by K r u m h a n s l a n d Kessler (1982) (upper panel) with the a t t r a c t o r profile o b t a i n e d f r o m the t o n a l g r o u p (lower panel) in this s t u d y
that are part of the tonic triad (suggestion of Krumhansl, personal communication). Modeling We shall now try to model the expectations generated in the tonal group. The aim is to discover what mental representation underlies the formation of the observed expectations and to characterize the process of expectancy generation. Considering the nature of the expectations in the tonal group, it is hypothesized that expectancy generation is guided by two factors: stability and pitch distance. First, from the studies of Krumhansl and Shepard (1979) and Krumhansl and Kessler (1982) we know that the most stable elements of the key are Tonic, Dominant, and Mediant, in this order. In our data we have found that these three tones are the most important attractors, which supports the role of stability in expectancy formation. Secondly, the data also show that the smaller the distance between two tones the more often one of them will function as an attractor. The following list shows that C more often functions as an attractor for closer tones than for tones farther away: B -+ C 95%; D ~ C 48%; A --+ C 28%; F --* C 15%. This finding supports the assumption that pitch distance also plays a role in expectancy formation.
The "basic space" in Lerdahl's (1988) tonal/pitch space model may be a good candidate for the model we are searching for, as it incorporates both factors, stability and pitch distance. The basic space is displayed in Figure 7. The units in the space are pitch classes (pcs) which are configured at five levels, each representing a specific type of pitch similarity (Shepard, 1982). The vertical dimension represents stability, as may be inferred from Lerdahl's (1988) remark: "Pcs at larger (higher) levels are more stable than pcs at smaller levels" (p. 320). Strictly, the horizontal dimension does not represent pitch distance as pcs do not reflect absolute pitch, but it can be treated as such if we imagine the pcs as lying on a circle. If we assume that expectancy formation within Lerdahl's model consists in moving from the presented tone to a nearby tone on a higher level (Lerdahl says: "Moving down in the space means increasing dissonance, to be relieved by moving up again" p. 322), we may conclude that the model predicts the expectancies for the diatonic tones D, F, G, A, B, and C correctly. But for E the model predicts that either C or G would be expected (both lying on the open-fifth level) with a preference for G which is closer to E than C is. However, the present data show that C is the predominant expectation created by E. The model also does not correctly predict the expectations for the chromatic tones. According to the model, the chromatic tones should create expectations for a close tone on the diatonic level. But except for A # , which in 39% of the cases evokes A, the chromatic tones chiefly resolve to the nearest element of the tonic triadâ€˘ On the basis of these considerations we conclude that Lerdahl's model correctly predicts part of the data, but cannot explain all of them. We therefore propose an alternative model that aims at explaining the major part of the expectations observed in the tonal group. Given the finding that the majority of tones (both diatonic and chromatic) tend to resolve to one of the elements of the tonic triad, it seems warranted to advance a hierarchical explanation in which stability is quantized on three levels. The explanation takes the form of a topological representation of the tones in a key in which the tones are represented on three levels: tonic, tonic-triad, and diatonic/chromatic. The spatial model has the form of an inverted cone; it shares features with the models of
tendency of the members of the tonic triad to resolve to the tonic. Krumhansl's conical configuration was originally used to describe only those regularities in the data that can be captured by means of a geometrical m o d e l - that DIATONIC CHROMATIC is, those regularities that are insensitive to time order effects, whereas I use a similar type of configuration to model the time-order dependent relations between tones. Actually, this possibility was already suggested by Krumhansl when she observed that the asymmetries found in her data can be described as an tendency to TRIADIC move towards the vertex of the conical configuration (Krumhansl, 1979, p. 363). The preliminary model presented here needs to be elaborated further in future work. LEVEL
TONIC Fig. 8 A model for the mental representation of the tones in a key yielding harmonic expectations
Lerdahl (1988) and Larson (1992), while the actual form closely resembles Krumhansl's (1979) representation of the relations between the elements in the key. The model is displayed in Figure 8. The process of expectancy formation on the basis of this representation can be specified as follows: starting from the lowest representation of the presented tone move downward to the nearest element on the next lower level. (The pre-condition "start at the lowest level" is necessary to avoid uncertainty concerning the location of the presented tone.) The model explains the expectations for all diatonic tones (including the approximately equal expectations for C and E created by the tone D) and for all chromatic tones that resolve on one of the elements of the tonic triad. It thus explains about 80% of the responses. As has been stated, the model resembles the conical representation proposed by Krumhansl (1979). However, Krumhansl's configuration resulted from a multidimensional analysis, whereas the present model is an informal attempt to describe the data. A few differences between the two models should be noted: (1) both models have three levels, but Krumhansl's has three different levels, namely: tonic-triad, diatonic and chromatic; in the present model we have not added a separate chromatic level on top of the diatonic level because most chromatic tones resolve to one of the elements of the tonic triad (except for the tone A # ; see Figure 5); (2) in contrast with Krumhansl's model, in the present model various tones are represented at different levels (e.g., the C appears at all three levels), a feature also found in the models of Deutsch and Feroe (1981) and Lerdahl (1988). In the current model this feature is needed to describe the tendency of the nondiatonic tones and the diatonic tones not contained in the tonic triad to resolve to elements of the tonic triad and the
Differences between groups. A final comment is in order with respect to the different behaviors of the subjects in the three groups discerned. These differences remind us of the study of Krumhansl and Shepard (1979) and that of Frankland and Cohen (1990), who observed considerable differences between subjects, depending on their musical experience. One may therefore wonder whether the responses of the fifth group can be seen as an earlier stage of musical development (keeping in mind the early occurrence of the fifth in music history). There are two reasons why this supposition is probably incorrect: (a) both groups contained subjects with a quite considerable musical experience; (b) it is difficult to see how the behavior of the fifth group could be considered as a stage in the development towards the expectations in the tonal group. It is indeed surprising, given that most subjects had quite some musical experience, that only some of them showed the expectancies considered essential for perceiving and enjoying tonal music. It is possible that this mode of listening was not easily evoked in the musically impoverished situation of this study. We expect to obtain more homogenous data in future work in which we intend to study tonal expectations in a richer environment. Aeknowledgements Part of this research has been reported at the 3rd International Conference on Music Perception and Cognition, Liege, 23-27 July 1994. I thank Anne-Sophie Melenhorst, Karolien de Ridder, and Roeland Arnold for running part of the experiments and Ren6 van Egmond for some thoughtful remarks on the text. I also wish to thank Ren6 van Egmond and Eric Maris for their help with the analysis of the data. I am grateful to Carol Krumhansl for several helpful comments and suggestions regarding this study.
References Bharucha, J. J. (1984). Anchoring effects in music: The resolution of dissonance. Cognitive Psychology, 16, 485 518. Bharucha, J. J. (1987). Music cognition and perceptual facilitation: A connectionist framework. Music Perception, 5, 1-30.
283 Bharucha, J. J. (1994). Tonality and expectation. In R. Aiello & J. Sloboda (Eds.), Musical perceptions. New York: Oxford University Press. Bigand, E. (1993). Contributions of music to research on human auditory cognition. In S. McAdams & E. Bigand (Eds.), Thinking in sound. Oxford: Clarendon Press. Carlsen, J. (1981). Some factors which influence melodic expectancy. PsychomusicoIogy, 2, 12-29. Cooke, D. (1959). The language of music. Oxford: Oxford University Press. Cuddy, L. L., & Lunney, C. A. (1995). Expectancies generated by melodic intervals: Perceptual judgments of melodic continuity. Perception & Psychophysics, 57, 451-462. Deutsch, D., & Feroe, J. (1981). The internal representation of pitch sequences. Psychological Review, 88, 503 522. Frankland, B., & Cohen, A. J. (1990). Expectancy profiles generated by major scales: Group differences in ratings and reaction time. Psychomusicology, 9, 173-192. Jones, M. R. (1981). Music as a stimulus for psychological motion: Part I. Some determinants of expectandes. Psychomusicology, 1, 34 51. Jones, M. R. (1982). Music as a stimulus for psychological motion: Part II. An expectancy model. Psychomusicology, 2, 1-13. Krumhansl, C. L. (1979). The psychological representation of pitch sequences in tonal context. Cognitive Psychology, 11, 346 374. Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University Press. Krumhansl, C. L (1995). Music psychology and music theory: problems and prospects. Music Theory Spectrum, 17, 53 80. Krumhansl, C. L., & Kessler, E. J. (1982). Tracing the dynamic changes in perceived tonal organization of musical keys. Psychological Review, 89, 334-368. Krumhansl, C. L., & Shepard, R. N. (1979). Quantification of the hierarchy of tonal functions within a diatonic context, Journal of
Experimental Psychology, Human Perception and Performance, 5, 579-594. Larson, S. (1992). Modeling musical expectation: Using three "musical forces" to predict melodic continuations. Proceedings of the 15th Annual Conference of the Cognitive Society, pp. 629 634.
Larson, S (1994). Musical forces, step collections, tonal pitch space and melodic expectation. Proceedings of the 3rd International Co@rence on Music Perception and Cognition, pp. 227-229. Lerdahl, F. (1988). Tonal pitch space. Music Perception, 5, 315-350. Lipps, Th. (1905). Psychologische Studien. 2nd ed. Leipzig: Verlag der Diirr'schen Buchhandlung. Longuet-Higgins, H. C. (1962). Two letters to a musical friend. Music Review, 23, 244-248 and 271 280. Reprinted in H. C. LonguetHiggins, Mental processes: Studies in cognitive science. Cambridge MA: The MIT Press. 1987. Meyer, L. B. (1956). Emotion and meaning in music. Chicago: Chicago University Press. Narmour, E. (1990). The analysis and cognition of basic melodic structures. Chicago: University of Chicago Press. Narmour, E. (1992). The analysis and cognition of melodic complexity. Chicago: University of Chicago Press. Povel, D. J., & Van Egmond, R. (1993). The function of accompanying chords in the recognition of melodic fragments. Music Perception, 11, 101-115. Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. Music Perception, 7, 109-150. Schmuckler, M. A., & Boltz, M. G. (1994). Harmonic and rhythmic influences on musical expectancy. Perception and Psychophysics, 56, 313-325. Schoenberg, A. (1969). Structural functions of harmony. (rev. ed.). New York: Norton. (Original publication 1954.) Shepard, R. N. (1964). Circularity in judgments of relative pitch. Journal of the Acoustical Society of America, 36, 2346-2353. Shepard, R. N. (1982). Geometrical approximations to the structure of musical pitch. Psychological Review, 89, 305-333. Simon, H. A., & Sumner, R. K. (1968). Pattern in music. In B. Kleinmuntz (Ed.), Formal representation of human judgment (pp. 210-250). New York: Wiley. Thomson, W. (1993). The harmonic root: A fragile marriage of concept and percept. Music Perception, 10, 385-4.16. Van Dyke Bingham, W. (1910). Studies in melody. Psychological Review, Monograph Supplements. Vol. XII, Whole No. 50. Zuckerkandl, V. (1956). Sound and symbol. Princeton University Press.