• Aucun résultat trouvé

Maximum key-profile correlation

EXTRACTING THE KEY FROM MUSIC

8.2 Musical pitch and key

8.3.2 Maximum key-profile correlation

The maximum key-profile correlation (MKC) is an algorithm for find-ing the most prominent key in a music sample [Krumhansl & Kessler, Steffen Pauws

127

Figure 8.1. The first measure of the First Prelude in C major from Book I composed by J.S.

Bach.

Figure 8.2. The first eight graphs present the harmonically compressed spectra for each note in the first measure of a piano performance of the First Prelude in C major from Book I composed by J.S. Bach. The frequency positions of the musical pitches over several octaves corresponding to chromas are marked in the spectra. The last graph at the bottom presents the chroma spectrum of all eight notes collecting the information in all eight spectra.

Extracting the Key from Music

128

Figure 8.3. Mean probe tone rating (or key profiles) in the context of the key C major (a) and the key C minor (b).

1982; Krumhansl, 1990]. It has shown its value in research on psychologi-cal tonality measures for music [Takeuchi, 1994] and harmonic progression in improvised jazz music [J¨arvinen, 1995]. Originally, the algorithm was devised for symbolic encodings of music (i.e., MIDI, notated music). Here, it is used as a back-end to a signal processing step that works on raw audio data.

The MKC algorithm is based on key profiles that represent the perceived sta-bility of each chroma within the context of a particular musical key. Krumhansl and Kessler [Krumhansl, 1990; Krumhansl & Kessler, 1982] derived the key profiles by a probe tone rating task. In this task, subjects were asked to rate, on a scale of 1 to 7, the suitability of various concluding pitches after they had listened to a preceding musical sample that established a particular key. The mean ratings represent the key profiles used in the current algorithm. These key profiles are shown for the keys C major and C minor in Figure 8.3. The graph indicates clearly that there are differences in the perceived stability of the chromas: highest ratings are given to the tonic (C), and the other two pitches of the triad (G, E), followed by the rest of pitches of the scale (F, A, D, B) to be concluded by the non-scale pitches (all sharps and flats).

Key profiles only depend on the relationship between a pitch and a tonal center and not on absolute pitches. Consequently, profiles for different major or minor keys are all transpositions of each other. For instance, the key profile for C major can be shifted six positions to arrive at a key profile for G major.

As discussed in Section 2, if music is composed in a particular key, the pitches are likely to be drawn from a single major or minor scale. Some pitches are more stable than others. The MKC algorithm is based on the assumption that the most stable chromas occur most often in a music sample. This is found to be true at least for Classical tonal compositions [Knopoff & Hutchinson, 1983]. The MKC algoritm computes the correlation (i.e., Pearson’s product moment correlation) between the distribution of chroma occurrences in the musical sample and all 24 key profiles. Recall the chroma spectrum takes Steffen Pauws

129 the role of this distribution of chroma occurences given as a vector with 12 elements. The key profile that provides the maximum correlation with the chroma spectrum is taken as the most probable key of the musical sample. The correlation value can be used as the salience of the perceived key or the degree of tonal structure of the music sample.

8.4 Evaluation

The evaluation of the algorithm consisted of an assessment of finding the correct key from a set of 237 performances of Classical piano sonatas on CD.

The correct key was defined as the main key for which the musical composition was originally composed. Recall that music composers use various key mod-ulating techniques to build up tension and relaxation in the music. However, many compositions start and end with the same key; these pieces are called monotonal. All recordings of the following CDs were used in the experiment.

J.S. Bach played by Rosalyn Tureck

The Well-tempered Clavier Books I & II, 48 Preludes and Fugues Deutsche Grammophon, 1999.

J.S. Bach played by Jeno Jando

The Well-tempered Clavier Book I, 24 Preludes and Fugues Naxos Classical, 1995.

D. Shostakovich played by Vladimir Askenazy 24 Preludes & Fugues, op.87,

Decca, 1999.

J. Brahms played by Glenn Gould The Glenn Gould Edition, Sony Classical, 1993.

F.F. Chopin played by Evgeny Kissin

24 Preludes Op. 28, Sonate no. 2, Marche funebre / Polonaise op.53 Sony Classical, 1999.

The original main key of the composition was compared with the extracted key from the CD-PCM data of the complete piano performances. In the left-hand graph of Figure 8.4, the results are shown in terms of percentage correct classification. In 86.1% of the recordings, the algorithm identified correctly the key in which the work was originally composed and performed. If we are only interested in the tonic of the key, we obtain the same accuracy, as there were no confusion between parallel keys. If we are only interested in the mode of the key, the accuracy is 92.4%. As discussed in Section 2, ‘friendly’ keys are often a source of confusion. If we consider the ‘friendly’ keys, that is, the exact key, its relative key, its dominant key (V), and its sub-dominant key (IV), as equal keys, the accuracy runs up to 96.2%. In other words, for only 3.8%

(nine recordings out of a total of 237), the algorithm provided a key whose incorrectness could not be easily interpreted on music theoretic grounds.

In the right-hand graph of Figure 8.4, the same results are projected across composers. The year and time period when the pieces were composed are Extracting the Key from Music

130

Figure 8.4. Key extraction accuracy: (Left-hand) Accuracy for finding the exact main key, the tonic of the key, the mode of the key, and all ‘friendly’ keys in 237 complete piano sonatas.

‘Friendly’ keys are the exact main key, its relative key, its dominant key (V), and its sub-dominant key (IV). (Right-hand) Accuracy for finding the exact main key for works of different composers.

shown as well. It is evident that the algoritm was less accurate for the works of Shostakovich (i.e., 75%). In addition, for five pieces, the algorithm identified a key which was not one of the ‘friendly’ keys. These works can still be consid-ered highly tonal, but contain modern interpretations of harmony use because of its recency.

8.5 Conclusion

Musical key extraction from music audio is a prerequisite for matchless end-user applications like advanced music playback (e.g., automatic dj) and ambiance creation (e.g., music and lights). We presented an algorithm that correlates a chroma spectrum from audio data with profiles for all possible 24 Western keys. The key profile that has the highest correlation with the provided chroma spectrum is taken as the key of the musical fragment. The algorithm needs only minimum amount of computing necessities; it runs about 100 times real-time on a P4-2GHz platform.

The algorithm identifies correctly the exact main key in 86.1% of the cases by analyzing the complete CD recordings of Classical piano sonatas. If we assume exact, relative, dominant, sub-dominant and parallel keys as similar, it achieves a 96.2% accuracy. We have no data on recordings with other instru-mentation or from other musical idioms.

The following points of the current algorithm need attention.

Modelling the tone of an instrument as a harmonic series is highly ide-alized, accounting only for a single tone in a steady state. For instance, it does not account for instrument, playing, and tuning characteristics.

Steffen Pauws

131 A signal pre-processing stage might reveal fragments in a musical per-formance that contain key-relevant information and fragments that do not. This stage may check on masking effects, harmonicity, and tran-sients to clearly discern fragments with harmonic instruments carrying perceived information on musical key from noisy, percussive instru-ments.

Currently, key profiles are used that were the result of empirical work.

Alternatively, the profiles can also be trained by a supervised machine learning method. The whole approach transforms then into a classifica-tion problem.

Music perceptive and cognitive factors that establish a musical key at a human listener can be further integrated into the algorithm. Tempo-ral, rhythmic and musical harmonic factors of pitches are not modelled, whereas it is known that the temporal order of pitches and the position of pitches in a metrical organization (e.g., the first beat, strong accents) influence the perception of a tonal center (i.e., the tonic of the key).

Music theoretical and compositional constructs are not modelled in the algorithm. Composers use various key modulation techniques in which they signify how strong a new key will be established.

References

Chafe, C., B. Mont-Reynaud, and L. Rush [1982]. Toward an intelligent editor of digital audio:

Recognition of musical constructs.Computer Music Journal, 6: 30–41.

Chew, E. [2002]. An algorithm for determining key boundaries. In:Proceedings of the 2nd Intl Conference on Music and Artificial Intelligence.

Cohen, A.J. [1977]. Tonality and perception: Musical scales prompted by excerpts from Das Wohl-temperierte Clavier of J.S.Bach.Paper presented at the Second Workshop on Physical and Neuroppsychological Foundations of Music, Ossiach, Austria.

Eliason, S.R. [1993].Maximum Likelihood Estimation: Logic and Practice. SAGE Publications.

Hermes, D. [1988]. Measurement of pitch by subharmonic summation,Journal of Acoustical Society of America, 83(1): 257–264.

Holtzman, S.R. [1977]. A program for key determination,Interface, 6: 29–56.

Huron, D., and R. Parncutt [1993]. An improved model of tonality perception incorporating pitch salience and echoic memory,Psychomusicology, 12: 154–171.

G´omez, E., and P. Herrera [2004]. Estimating the tonality of polyphonic audio files: Cognitive versus machine learning modelling strategies, In:Proceedings of 5th International Confer-ence on Music Information Retrieval (ISMIR2004), Barcelona, Spain.

J¨arvinen, T. [1995] Tonal hierarchies in jazz improvisation,Music Perception, 12(4): 415–437.

Kastner, M.P., and R.G. Crowder [1990]. Perception of the major/minor distinction: IV. Emo-tional connations in young children,Music Perception, 8(2): 189–202.

Knopoff, L., and W. Hutchinson [1993]. Entropy as a measure of style: The influence of sample length.Journal of Music Theory, 27: 75–97.

Krumhansl, C.L. [1990].Cognitive Foundations of Musical Pitch. Oxford Psychological Series, no. 17, Oxford University Press, New York.

Extracting the Key from Music

132

Krumhansl, C.L., and E.J. Kessler [1982]. Tracing the dynamic changes in perceived tonal orga-nization in a spatial representation of musical keys.Psychological Review, 89(4): 334–368.

Leman, M. [1994]. Schema-based tone center recognition of musical signals,Journal of New Music Research, 23: 169–204.

Longuet-Higgins, H.C., and M.J. Steedman [1971]. On interpreting Bach,Machine Intelligence, 6: 221–241.

Purwins, H., B. Blankertz, and K. Obermayer [2000]. A new method for tracking modulations in tonal music in audio data format, In:International Joint Conference on Neural Network (IJCNN’00), 6: 270–275, IEEE Computer Society.

Purwins, H., T. Graepel, B. Blamkertz, and K. Obermayer [2003]. Correspondence analysis for visualizing interplay of pitch class, key and composer. In:Perspectives in Mathematical Music TheoryE. Luis-Puebla, G. Mazzalo, and T. Noll (eds.).

Shalev-Shwartz, S., S. Dubnov, N. Friedman, and Y. Singer [2002]. Robust temporal and spec-tral modelling for query by melody,Proceedings of SIGIR’02, Tampere, Finland.

Shmulevich, I., and O. Yli-Harja [2000]. Localized key-finding: Algorithms and applications.

Music Perception, 17(4): 531–544.

Takeuchi, A.H. [1994]. Maximum key-profile (MKC) as a measure of tonal structure in music.

Perception & Psychophysics, 56(3): 335–346.

Temperly, D. [1997]. An algorithm for harmonic analysis.Music Perception, 15(1): 31–68.

Shenoy, A., R. Mohapatra, and Y. Wang [2004]. Key determination of acoustical musical sig-nals,Proceedings of IEEE Internal Conference on Multimedia and Expo (ICME 2004).

Vos, P.G., and E.W. van Geenen [1996]. A parallel processing key-finding model.Music Per-ception, 14(2): 185–224.

Steffen Pauws

Chapter 9

APPROXIMATE SEMANTIC MATCHING