Pitch Features - MARSYAS: Audio Content Features

4.3 MARSYAS: Audio Content Features

4.3.3 Pitch Features

The pitch features describe melody and harmony information in a music signal.

A pitch detection algorithm decomposes the signal into two frequency bands and amplitude envelopes are extracted for each frequency band where the envelope ex-traction is performed via half-way rectification and low-pass filtering. The envelopes are summed and an enhanced autocorrelation function is computed so as to reduce the effect of integer multiples of the peak of frequencies to multiple pitch detection.

The dominant peaks of the autocorrelation function are accumulated into pitch his-tograms and the pitch content features extracted from the pitch hishis-tograms. The pitch content features typically include the amplitudes and periods of maximum peaks in the histogram, pitch intervals between the two most prominent peaks, and the overall sums of the histograms.

Table4.1summarizes the objective feature vector description.

Table 4.1 Feature vector of MARSYAS

Feature ID Feature name

1 Mean centroid

2 Mean rolloff

3 Mean flux

4 Mean zero-crossings

5 STD of centroid

6 STD of rolloff

7 STD of flux

8 STD of zero-crossings

9 Low energy

[10. . .19] MFCCs

20 Beat A0

21 Beat A1

22 Beat RA

23 Beat P1

24 Beat P2

25 Beat sum

26 Pitch FA0

27 Pitch UP0

28 Pitch FP0

29 Pitch IP0

30 Pitch sum

References

1. Camastra, F., Vinciarelli, A.: Machine Learning for Audio, Image and Video Analysis: Theory and Applications. Springer, London (2007)

2. Cord, M., Cunningham, P.: Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval. Springer, Heidelberg (2008)

3. Divakaran, A.: Multimedia Content Analysis Theory and Applications. Springer, Heidelberg (2008)

4. Foote, J.: Content-based retrieval of music and audio. In: Proceedings of Storage and Retrieval for Image and Video Databases (SPIE), vol. 3229, pp. 138–147 (1997)

5. Gong, Y., Xu, W.: Machine Learning for Multimedia Content Analysis. Springer, NEC Labo-ratories America Inc., New York (2007)

6. Kim, H.G., Moreau, N., Sikora, T.: MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval. Wiley, New York (2005)

7. Kompatsiaris, Y., Hobson, P.: Semantic Multimedia and Ontologies: Theory and Applications.

Springer, Heidelberg (2008)

8. Manjunath, B.S., Salembier, P., Sikora, T.: Introduction to MPEG-7: Multimedia Content De-scription Interface. Wiley, New York (2002)

9. Martinez, J.M.: MPEG-7: overview of MPEG-7 description tools, part 2. IEEE Multimed.9(3), 83–93 (2002). doi:10.1109/MMUL.2002.1022862

10. MPEG-7: Visual experimentation model (xm), version 10.0 iso/iec/jtc1/sc29/wg11, doc. n4063 (2001)

11. MPEG-7: MPEG-7 overview, version 10.0 iso/iec/jtc1/sc29/wg11, doc. n6828 (2004) 12. MPEG-7: Multimedia content description interface, reference software iso/iec/15938-6 (2005).

Software http://standards.iso.org/ittf/PubliclyAvailableStandards/c035364_ISO_IEC_15938-6(E)_Reference_Software.zip

13. MPEG-7: Mpeg-7 audio reference software toolkit iso/iec/15938-4 part 6 (2007). Software http://mpeg7.doc.gold.ac.uk/mirror/index.html

14. Nixon, M.S., Aguado, A.S.: Feature Extraction and Image Processing. Academic Press, London (2008)

15. Petrou, M., Bosdogianni, P.: Image Processing: The Fundamentals. Wiley, New York (1999) 16. Petrushin, V.A., Khan, L.: Multimedia Data Mining and Knowledge Discovery. Springer,

Hei-delberg (2007)

17. Sikora, T.: The mpeg-7 visual standard for content description—an overview. IEEE Trans.

Circuits Syst. Video Technol.11(6), 696–702 (2001)

18. Sotiropoulos, D.N., Lampropoulos, A.S., Tsihrintzis, G.A.: Musiper: a system for modeling music similarity perception based on objective feature subset selection. User Model. User-Adapt. Interact.18(4), 315–348 (2008)

19. Thuraisingham, B.M.: Managing and Mining Multimedia Databases. CRC Press Inc., Boca Raton (2001)

20. Tsihrintzis, G.A., Jain, L.C. (eds.): Multimedia Services in Intelligent Environments—

Advanced Tools and Methodologies, Studies in Computational Intelligence, 120th edn.

Springer, Berlin (2008)

21. Tsihrintzis, G.A., Jain, L.C.: Multimedia Services in Intelligent Environments—Integrated Systems. Studies in Computational Intelligence. Springer, Berlin (2010)

22. Tsihrintzis, G.A., Virvou, M., Howlett, R.J., Jain, L.C. (eds.): New Directions in Intelligent Interactive Multimedia. Studies in Computational Intelligence, 142nd edn. Springer, Berlin (2008)

23. Tsihrintzis, G.A., Virvou, M., Jain, L.C. (eds.): Multimedia Services in Intelligent Environments—Software Development Challenges and Solutions. Studies in Computational Intelligence. Springer, Berlin (2010)

24. Tzanetakis, G.: Manipulation, analysis and retrieval systems for audio signals. Ph.D. thesis, Princeton University (2002)

76 4 Content Description of Multimedia Data

25. Tzanetakis, G., Cook, P.: Marsyas: a framework for audio analysis. Organ. Sound4(3), 169–175 (2000)

26. Vaseghi, S.V.: Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications. Wiley, New York (2007)

27. Virvou, M., Jain, L.C. (eds.): Intelligent Interactive Systems in Knowledge-based Environ-ments. Studies in computational intelligence, 104th edn. Springer, Berlin (2008)

28. Wu, J.K.K., Hong, D., Kankanhalli, M.S., Lim, J.H.: Perspectives on Content-Based Multime-dia Systems. Kluwer Academic Publishers, Norwell (2000)

Similarity Measures for Recommendations Based on Objective Feature Subset Selection

Abstract In this chapter, we present a content-based RS for music files, called MUSIPER, in which individualized (subjective) music similarity perception models of the system users are constructed fromobjectiveaudio signal features by associat-ing different music similarity measures to different users. Specifically, our approach in developing MUSIPER is based on investigating certain subsets in theobjective fea-ture set and their relation to thesubjectivemusic similarity perception of individuals.

5.1 Introduction

Our starting point has been the fact that each individual perceives differently the infor-mation features contained in a music file and assigns different degrees of importance to music features when assessing similarity between music files. This leads to the hypothesis that different individuals possibly assess music similarity via different feature sets and there might even exist certain features that are entirely unidentifiable by certain users. On the basis of this assumption, we utilize relevance feedback from individual users in a neural network-based incremental learning process in order to specify that feature subset and the corresponding similarity measure which exhibit the maximum possible accordance with the user’s music similarity perception. The approach followed in MUSIPER can be followed with any type of multimedia data.

For example, a variation of MUSIPER has been implemented and evaluated to func-tion with image data [5].

Dans le document Machine Learning Paradigms (Page 86-89)