Experiment 1: Percussiveness of single instrument sounds

PERCUSSIVENESS OF SOUNDS

7.2 Feature extraction algorithm

7.3.1 Experiment 1: Percussiveness of single instrument sounds

Experiment set up. We used single instrument sounds as test and training data, because for these sounds, the envelope approximation algorithm worked best. This minimized the probability that misclassiﬁcations occurred due to wrong detections of the A-D&S-R phases. Thus the classiﬁcation results were as independent as possible from the feature extraction algorithm, meaning that the results were mainly affected by the type of features themselves.

The database comprised 722 single instrument sounds (classical instru-ments, acoustical and synthesizer drums) that we grouped into three classes:

(1) percussive and harmonic, (2) percussive and harmonic, (3) non-percussive. The labels were assigned per instrument and play style. For in-stance all sounds coming from a cello that is played pizzicato were assigned to class 2, all sounds coming from a cello that is played sustained were assigned to class 3.

Goals and method. Some of the questions that we investigated with this experiment were: What is a good number of features? What is the value of the more complex features? What is the best feature set?

Since these questions are meant in terms of classiﬁcation performance, we performed several classiﬁcation runs, which consisted of four steps:

1. Deﬁne the conditions of the classiﬁcation run:

(a) The feature set that we want to investigate.

select.

(b) The number of features that the available ranking method shall Features for Audio Classiﬁcation: Percussiveness of Sounds

110

2. Run the feature ranking procedure:

(a) Take the complete feature set.

(b) Eliminate one feature and estimate the error probability ε of the

(d) Take that feature, whose elimination from the feature set yielded the lowest error, as the least important feature (last ranking place).

(e) Repeat steps (b) to (d) for the remaining feature set.

(f) Continue this procedure until all features are ranked.

3. Estimate the classiﬁcation performance using a 70/30-fold method with 25 bootstrap repetitions:

training data.

(b) Estimate the classes of the test data using the QDA method and compare them with their real labels.

(d) Put the test data back to the training data (

”

Bootstrap”) and repeat this procedure 25 times.

(e) Compute the mean value and the standard deviation over the result-ing 25 confusion matrices leadresult-ing to a confusion matrix containresult-ing the average classiﬁcation rates and their deviation intervals.

4. Compute the mean classiﬁcation performance by averaging the values in the main diagonal (correct classiﬁcations) of the mean confusion matrix.

Investigation A: Dependency of performance on the number of features.

Here we checked how the classiﬁcation accuracy depends on the number of features by computing the classiﬁcation performance for the different feature sets using the 3, 6 and 9 best ranked features, as well as all features from each set.

We see in Table 7.1 that the classiﬁcation performance increases in most cases with an increasing number of used features. The improvement between using the best three and using the best six features is in most cases signiﬁcant.

A further addition of features has a slightly lower impact on the classifica-tion performance, indicating a saturaclassifica-tion effect. Therefore we had to find a good compromise between classification performance and classification effort in terms of number of used features: we chose to use nine features.

[Papoulis, 1991].

remaining set, based on the so called Bhattacharyya distances

(a) Pick randomly 30% of the feature vectors as test data and 70% as Janto Skowronek and Martin McKinney

111

Table 7.1. Classiﬁcation performance depending on the number of used features. For a de-tailed description of the feature groups see Section 7.2.4.

no. feature

Investigation B: Classification performance dependent on feature com-plexity. If we have a closer look at the nine general feature groups, we see that each group requires a different computational effort. Based on the computation time, we assigned each feature set into one of three groups (A, B, C) of computational efficiency (A being most efficient, C being least effi-cient). Since the multi-band features are a kind of secondary features based on primary features computed per band, we distinguish between the feature complexity for the primary (per-band) and secondary (across-bands) features.

Note that most of the different features sets are computed by a systematic step-by-step increase of computational complexity. Only group 3 does not ﬁt into that system and is therefore omitted in the following discussion. Table 7.2 gives an overview about the remaining feature groups, their major computa-tional requirements and their assigned levels for both the primary and sec-ondary features. Especially for future applications it is interesting to see which complexity level is actually needed in order to achieve a good performance. In order to be fair, we decided to use the same number of features per set. For Features for Audio Classiﬁcation: Percussiveness of Sounds

112

Table 7.2. Complexity levels of the different feature sets.

feature set

primary (per-band) features secondary (across-bands) features

requirements level requirements level

group 1 three-phase approximation + some basic calculations

A group 2 three-phase approximation +

optimization method for ﬁnd-ing the best curve form para-meter

group 4 three-phase approximation + some basic calculations

A mean-value operations A

group 5 three-phase approximation + optimization method for ﬁnd-ing the best curve form para-meter

B mean-value operations A

group 6 three-phase approximation + some basic calculations

A variance-value operations B group 7 three-phase approximation +

optimization method for ﬁnd-ing the best curve form para-meter

B variance-value operations B

group 8 three-phase approximation + some basic calculations

A optimization method for ﬁnd-ing the best shape parameter over bands

group 9 three-phase approximation + optimization method for ﬁnd-ing the best curve form para-meter

B optimization method for ﬁnd-ing the best shape parameter over bands

that reason we chose the six (the size of the smallest feature set) best ranked features per set.

First we considered the different complexity levels of the primary (per-band) features for each level of secondary (across-bands) features separately.

That means that we compared the performance results of group 1 with group 2, group 4 with group 5, and so on. We see in Table 7.3 that using the curve shape parameters (primary level B) has a slightly positive inﬂuence on the results, except for the last case, where the performance even decreases signif-icantly. Regarding this case, we saw in the data that group 9 showed a bad performance in general.

In a second comparison we considered the different levels of secondary fea-tures for both primary feature levels separately (e.g. group 4 vs. group 6 vs.

– –

– – Janto Skowronek and Martin McKinney

113

group 8). Table 7.4 shows that the mean values (level A) as secondary across-bandsfeatures led to the best performance, followed by the variances (level B) and the shape parameters (level C).

An open issue is whether it is beneﬁcial to use the group 3 features (syn-chrony of envelopes). Because they are not based on the single-band features, we had to examine them differently. We compared the classiﬁcation perfor-mances between group 3 and the other feature sets. With a performance of 72±8%, the group 3 features belong to the lowest performing feature sets (compare with results in Table 7.3). As a consequence we skipped these group 3 features.

Investigation C: Best feature set. Following the discussions in Investigation A, we were interested in the best nine features. Considering Investigation B we decided to keep both levels of primary features (low-level parameters &

curve shape parameters). Regarding the secondary features, we chose only the mean values due to their best performance among the multi-band features. This pre-selection led us ﬁnally to the following feature groups: 1, 2, 4 and 5.

We tested various combinations of the preselected groups: each group alone, both single-band groups together, both multi-band groups together, all four groups together. The classiﬁcation accuracies in Table 7.5 are relatively high (about 80 to 90 %). In addition we see a slight tendency that with about 90%

the combinations No. 3 (group 4 alone) and No. 6 (4 & 5 together) achieved the best results.

complexity.

depending on the primary level of feature

Classiﬁcation performance depending on the secondary level of feature Features for Audio Classiﬁcation: Percussiveness of Sounds

114

Table 7.5. Classiﬁcation performance for different combinations of the feature sets.

no. feature set classiﬁcation performance

1 group 1 87±8%

2 group 2 82±8%

3 group 4 90±9%

4 group 5 86±7%

5 groups 1 & 2 85±8%

6 groups 4 & 5 89±8%

7 groups 1, 2, 4 & 5 83±8%

In summary we can state that we are able to predict the percussiveness of sounds with the extracted features.

7.3.2 Experiment 2: Percussiveness for classifying

Dans le document Intelligent Algorithms in Ambient and Biomedical Computing (Page 122-127)