Experiment 2: Percussiveness for classifying polyphonic music

PERCUSSIVENESS OF SOUNDS

7.2 Feature extraction algorithm

7.3.2 Experiment 2: Percussiveness for classifying polyphonic music

In this second experiment we checked, whether the concept of percussive-ness is useful for music genre classification. Our approach for that purpose is a three stage process. First we extracted single sounds from a 6-seconds long audio stream and estimate their percussiveness using an adequately trained classifier (first classifier). Secondly we computed secondary features (e.g. sta-tistics) over these percussiveness estimations per audio file. And thirdly we used those secondary features for the final genre classification of the audio file (second classifier).

The idea behind this approach is relatively straight forward: we assume that different music genres contain different degrees of percussive sounds, e.g.

dance music should contain a relative greater percentage of percussive sounds than classical music.

Percussiveness predictor. The task of the first classifier is to predict the per-cussiveness of the extracted sounds. Based on the results in Section 7.3.1 we decided to implement seven predictors consisting of the different combinations of the selected feature sets (Table 7.5), in order to figure out which predictor performs best. The training material of this first classifier consisted of sounds automatically extracted by the algorithm from 455 audio files. With the help of a user interface, a human annotator assigned the sounds into one of three cate-gories: percussive, between percussive and non-percussive, non-percussive.

Janto Skowronek and Martin McKinney

115 Secondary features. The secondary features used for the second classifier (music genre classification) were computed per audio file. First we took all percussiveness predictions from the first classifier belonging to one audio file.

Then we computed per audio ﬁle how often (in%) the sounds are assigned to the different classes of percussiveness, e.g. 50 % percussive, 30% between percussive and non-percussive, 20% non-percussive. Finally we used these percentages as secondary features.

Experiment set up. As test and training data for the final classification (second classifier) we used the above mentioned audio database, but according to the experiments in [McKinney & Breebaart, 2003], we took only those files from that database (188 in total) that belong to one of seven music genres:

Jazz, Folk, Electronica, R&B, Rock, Reggae and Vocal music. We used the same performance evaluation method as described in Section 7.3.1: QDA and 70/30-fold bootstrapping procedure. We computed the performances for three different tasks: classify all seven genres at once, detect each genre out of the other six, discriminate between two music genres.

Results. Table 7.6 shows the average classiﬁcation performance for the seven predictors for the

’

all at once’ experiment. First we see that the differences between the used percussiveness predictors are lower than the performance variability within the single predictors. That means there is no dominatant percussiveness predictor that outperforms the others. Secondly we see that the results are in general quite poor. Only average performance values far below 50 % are achieved.

The poor performance shows that the approach used here does not allow classiﬁcation of all seven music genres at once. However, the algorithm might be able to detect certain classes out of the remaining ones. In order to check this, we assigned one music genre to class 1 and all others to class 2 and we rerun the classiﬁcation. This we repeated for all music genres. Since there was no clear best performing percussiveness predictor, we performed these tests for all seven percussiveness predictors. Table 7.7 shows the detection perfor-mance for the seven music genres of those predictors that achieved the highest accuracy. First we see that for detecting different classes, different predictors

In a last test we investigated the algorithm’s ability to discriminate between two music genres. For that purpose the algorithm had to classify two types of music, whereas all other music genres were excluded. This we did for each combination of the genres and for the seven predictors. Figure 7.3 shows a ma-trix with the discrimination performance. The numbers depicted in the ﬁelds allowed the best detection. Secondly the detection of classes is moderate (70%). The results suggest that this algorithm might be useful for tasks in which a certain type of music is to be detected.

Features for Audio Classiﬁcation: Percussiveness of Sounds

116

Table 7.6. Classiﬁcation performance for classifying seven music genres.

percussiveness predictor

classiﬁcation performance

1 22.96±14.24%

2 24.65±13.30%

3 30.65±15.40%

4 33.51±13.34%

5 22.80±15.25%

6 31.94±12.93%

7 34.44±14.07%

Table 7.7. Accuracy of correct detections of a music genre out of the remaining six.

music genre

best perc.

predictor

detection performance

jazz 7 70.41±8.56%

folk 3 65.58±11.8%

electronica 3 72.25±10.86%

R&B 6 73.18±7.58%

rock 7 72.46±9.25%

reggae 7 80.5±9.91%

vocal 4 84.25±12.49%

Figure 7.3.

below the main diagonal are the average discrimination accuracy of each class pair, while the numbers above the main diagonal denote the best predictor for each class pair. With average values of about 60 % the genres Jazz, Folk and

Accuracy of discriminating between two music genes. The values below main diagonal are the average accuracies, the numbers above the main diagonal denote the best predictors.

Janto Skowronek and Martin McKinney

Rock are difﬁcult to distinguish from each other on basis of the percussiveness

117

well as Reggae vs. Vocal.

7.4 Summary

In an experimental study we tested percussiveness as a new feature for In a ﬁrst experiment we tested an algorithm that describes the envelope of a sound with a parametric three-phase description (Attack Decay/Sustain Release). Performing classiﬁcation runs, we saw that some of these features were able to predict the percussiveness of single instrument sounds quite well (around 90% accuracy).

Encouraged by these ﬁndings we did a second experiment in which we tried to use predictions of percussiveness as input for a seven music genre classiﬁer.

First the algorithm extracted sound events from continuous audio streams and predicted their percussiveness. Then simple statistics on these predictions were computed and used as input features in order to classify the seven music genres.

When classifying all seven genres at once, we obtained disappointingly poor results. Nevertheless, two further tests showed, that at least the detection and discrimination of some music genres is possible using these percussiveness features.

In fact we expected that the percussiveness as one descriptor for the timbre of sounds will not be sufﬁcient for classifying all types of music genres. Addi-tional descriptors that analyze other aspects of timbre (e.g. spectral structure) will be necessary in order to improve the classiﬁcation.

In summary, we have introduced the percussivenessof sounds as a higher level feature for classifying (at least some) music genres. Although there is some necessity to improve the algorithm, the major advantage is that percus-siveness is a tangible feature for users. It enables the development of classiﬁca-tion applicaclassiﬁca-tions, where a user will have more control about the used features.

For instance he could adjust the search criteria of music database systems in terms of how percussive the desired music should be.

References

McKinney, M.F., D.J. Breebaart [2003]. Features for audio and music classiﬁcation.4th Inter-national Symposium on Music Information and Retrieval, Baltimore, Maryland.

Jensen, K. [1999].Timbre Models of Musical Sounds. PhD. Thesis, University of Copenhagen, DIKU Report 99/7.

Gouyon, F., and P. Herrera [2001]. Exploration of techniques for automatic labeling of audio drum tracks instruments.Proceedings of MOSART Workshop on Current Research Direc-tions in Computer Music, Barcelona.

(about 90%): Electronica vs. Folk, Electronica vs. Vocal, Reggae vs. Folk as

automatic classiﬁcation of music.

– –

Features for Audio Classiﬁcation: Percussiveness of Sounds

predictions. In contrast several class pairs can be discriminated quite well

118

Moore, B.C.J., and B.R. Glasberg [1996]. A revision of Zwicker’s loudness model.Acta Acus-tica, 82:335–345.

Schrader, J.E. [2003].Detecting and interpreting musical note onsets in polyphonic music. MSc Thesis, Technische University Eindhoven, Eindhoven, The Netherlands.

Duda, R., and P. Hart [1973].Pattern Classiﬁcation and Scene Analysis. Wiley, New York.

Papoulis, A. [1991].Probability, Random Variables and Stochastic Processes. McGraw-Hill, New York.

Janto Skowronek and Martin McKinney

Chapter 8

Dans le document Intelligent Algorithms in Ambient and Biomedical Computing (Page 127-132)