Corel dataset - fusion of visual features

To avoid the trivial solution of describing a concept perfectly with only one keyword, another experiment on the Corel data is conducted that only uses the visual modality:

color, texture and face. The aim of the experiment is to show by how much content-only classification can be improved with the help of feature selection and dimensionality reduction methods.

The dataset now has 118 attributes, this means that the ratio between the number of attributes and the number of instances is more appropriate for a classification on the full feature set. Otherwise, the experimental setup is the same as before. Due to the smaller dimension of the dataset, the sequential feature forward selection (SFS) and sequential feature backward selection (SBS) could be run. In Figure 7.4 the average classification error and the average calculation time for the content-only Corel dataset and for all methods is plotted.

It is visible that the overall and relative performance of the approaches has changed significantly compared to the previous experiment that also used the annotations of the images. The classification error of the full feature baseline, which is here the concatenation of all visual attributes, has improved compared to the baseline in the previous experiment.

The classification result on the color and texture modality is the same as before. The random projection methods attain similar results to their counterparts in the previous experiment. Generally, the greedy sequential feature selection methods perform only as well as the best single modality, the texture. On large training sets, the forward selection achieves improved performance. Unsurprisingly, the forward and hence bottom-up search achieves better results than the backward search.

The plot shows that for PCA, SVD, FS with the MRMR criterion and the relevant features as well as the FS/FC methods the classification works less well when only the image content is used, because their success in the previous experiment heavily relied on the text attributes. SVD and PCA perform still very well on the content-only dataset, because they target feature diversity that seems to capture well the more complex, visual description of the concepts. As was observed before, the FS with the Maximum Relevance Minimum Redundancy (mrmr) criterion works well for high dimensional, complex learning targets, and thus outperforms on the content-only dataset the FS/FC algorithm.

As was discussed in the introduction of this chapter, in high dimensional real world settings the feature relevance detector does only work for redundant features, of which this dataset does not seem to contain a lot. Therefore, the FS with the redundant relevant features attains consistently higher classification errors than the full feature baseline. It can be concluded that the descriptions of the class labels are dominated by high dimensional synergies and hence are hard to learn. The FS/FC algorithm does not work very well; it is outperformed by the SVD, PCA and the MRMR methods over all concepts and training set sizes. This was observed in the previous experiment only for the very complex concepts when a lot of training data was used.

This malfunction is explained with the more complex semantic description, in terms of interaction type as well as the number of attributes involved, that is needed to describe an object or concept only based on visual characteristics. The result proves that PCA, SVD and the MRMR criterion do work well in this situation. The FS/FC algorithm can not keep up with this performance, because the search is stopped at level 5 to keep the computation tractable. A more profound search is needed to learn higher dimensional structural models

7.3 Corel dataset - fusion of visual features 95

(a) classification error

(b) calculation time

Fig. 7.4 :Average classification error and calculation time for the content-only Corel dataset for FS/FC algorithm with and without previous relevance detection (mifeatFin, mifeatFinNR) compared to the full feature baseline, classification on single features (texture, color), FS with the relevant features (relfeat), SVD, PCA, gaussian and sparse random projection, sequential feature forward and backward selection (SFS, SBS), and Maximum Relevance Minimum Redundancy (mrmr) FS

(a) classification error

(b) average search levelkwhere min(ce)

Fig. 7.5 :Minimum, average and maximum classification error (7.5(a)) and search levelkwhere minimum classification error is observed (7.5(b)) for each class (see ID in Table 7.2) in the content-only Corel dataset for the FS/FC algorithm with and without previous relevance detection (mifeatFin, mifeatFinNR) compared to the full feature baseline, mono-modal classification (texture, color), FS with the relevant features, SVD, PCA, gaussian and sparse random projection, sequential feature forward and backward selection (SFS, SBS) and FS with the Maximum Relevance Minimum Redundancy (mrmr) criterion based on 15 positive examples

7.3 Corel dataset - fusion of visual features 97

that can model these difficult concepts. Contrary to the experiment before, the FS/FC without previous relevance detection does not improve over the results that are attained when only the relevant features are searched.

Overall, the computation times are a lot smaller than in the last experiment due to the reduced number of attributes. The sequential backward selection is very slow. Its calcu-lation time extends the maximum value of Figure 7.4(b) and hence is not plotted to keep the time differences for the other methods visible. Compared to this, the FS/FC without previous relevance detection runs significantly faster, but is still slow for large training datasets. This is mainly caused by the complexity of learning problems. When the search for feature interactions is limited to the relevant features it only needs a few seconds to learn. The FS/FC with previous relevance detection needs a bit less time to compute than the calculation of the MRMR criterion.

Figure 7.5(a) plots the minimum, average and maximum classification error per topic and method for the largest training set. Interestingly, the concepts (4) wolves and (9) owls are still the easiest to solve. Furthermore, it can be seen that often the RP methods outperform all other methods for some of the concepts.

For about half the concepts, (2) beach, (3) clouds, (4) wolves, (8) Wales and (10) autumn, the FS with the redundant relevant features does improve the full feature baseline. This makes them easier topics and one or several methods achieve better results than the full feature baseline and the classification on the single features. The other topics seem to be very complex and the tested dimensionality reduction and feature selection methods lead, if at all, only to little performance improvements mainly achieved by PCA, SVD and/or MRMR FS.

These results can now be compared to the average search level k where the minimum classification error is observed for all search-based methods (Figure 7.5(b)). The FS with the Maximum Relevance Minimum Redundancy criterion was cross-validated over feature sets of sizeN_c= [1,70] and the dimensionality reduction methods over the bestk= [1,117]

dimensions.

An extreme case can be observed on the example of the simplest concept (4): PCA, SVD and MRMR use many dimensions or features respectively and attain with them a poor performance compared to the RP and the FS/FC algorithm that use a smaller dimensional solution. The inverse can be stated for the visually more complex concepts, (7) roses and (9) owls, where PCA, SVD and MRMR attain significantly higher dimensional solutions and therefore outperform RP and the FS/FC method.

On the basis of the search level k where the minimum classification error of the best performing method is observed for each class, one can derive the approximate problem size that underlies the concept. According to this, class (3), (4), (6) and (8) should be low dimensional withNc≤20. But since a learning problem’s complexity is also influenced by the interaction type that dominates the data, it does not necessarily mean that these con-cepts are easy to learn. Generally, the other FS and dimensionality reduction methods find much higher dimensional solutions, which proved to perform better on complex visual-only datasets.

Table 7.2 gives an overview of the selected image regions and the average number of features that are found with the FS/FC algorithm. Due to the complexity of the learning tasks generally more features are needed to model the class labels and no concept has a

15 pos examples 15 pos examples

(ID) class color # (ID) class color #

(1) apes

Tab. 7.2 : Relevant color features in form of relevant image regions on a 3x3 grid and their average number that are detected with the FS/FC algorithm for the content-only Corel data

trivial solution. Again, nearly exclusively color features were selected. Only for concept (1) apes a texture feature and for (1) apes and (4) wolves the face feature was used for modeling. It can be noted that for the concepts that were learned the easiest, (4) and (9), many features were selected, whereas for other supposedly more complex concepts fewer, which obviously did not suffice to describe the class labels well.

This indicates the incapability of the FS/FC algorithm, due to the limited search depth and the pruning strategy, to select and construct solutions with more than 10 attributes, which was the maximum relevant feature set size observed for the Corel data in the previous experiment (see Table 7.1). This suffices if the dataset contains some relevant features and many irrelevant features as was seen for the fusion of image and text, but not to detect many weakly relevant features as it is necessary for the fusion of only the visual features in the Corel data.

Dans le document Information fusion for multimedia: exploiting feature interactions for semantic feature selection and construction (Page 109-113)