What Can You Find in This Book? - Introduction into Multimedia Data Mining and Knowledge Discov

1. Introduction into Multimedia Data Mining and Knowledge Discovery

1.4 What Can You Find in This Book?

As any collection of creative material the chapters of the book are different in style, topics, mathematical rigidity, level of generality, and readiness for deployment. But altogether they build a mosaic of ongoing research in the fast changing dynamic field of research that is the multimedia data mining.

The book consists of five parts: The first part, which includes two chapters, gives an introduction into multimedia data mining. Chapter 1, which you are reading now, overviews the multimedia data mining as an industry. Chapter 2 presents an overview of MDM techniques. It describes a typical architecture of MDM systems and covers approaches to supervised and unsupervised concept mining and event discovery.

The second part includes five chapters. It is devoted to multimedia data exploration and visualization. Chapter 3 deals with exploring images. It presents a clustering method based on unsupervised neural nets and self-organizing maps, which is called the dynamic growing self-organizing tree algorithm (DGSOT). The chapter shows that the suggested algorithm outperforms the traditional hierarchical agglomerative clustering algorithm.

Chapter 4 presents a multiresolution clustering of time series. It uses the Haar wavelet transform for representing time series at different resolutions and applies k-means clustering sequentially on each level starting from the coarsest representation.

The algorithm stops when two sequential membership assignments are equal or when it reaches the finest representation. The advantages of the algorithm are that it works faster and produces better clustering. The algorithm has been applied to clustering images, using color and texture features.

Chapter 5 describes a method for unsupervised classification of events in mul-ticamera indoors surveillance video. The self-organizing map approach has been applied to event data for clustering and visualization. The chapter presents a tool for browsing clustering results, which allows exploring units of the self-organizing maps at different levels of hierarchy, clusters of units, and distances between units in 3D space for searching for rare events.

Chapter 6 presents the density-based data analysis and similarity search. It in-troduces a visualization technique called the reachability plot, which allows visually exploring a data set in multiple representations and comparing multiple similarity models. The chapter also presents a new method for automatically extracting cluster hierarchies from a given reachability plot and describes a system prototype, which serves for both the visual data analysis and a new way of object retrieval called navigational similarity search.

Chapter 7 presents an approach to the exploration of variable length multiattribute motion data captured by data glove sensors and 3-D human motion cameras. It sug-gests using the singular value decomposition technique to regularize multiattribute motion data of different lengths and classify them applying support vector machine classifiers. Classification motion data using support vector machine classifiers is com-pared with classification by related similarity measures in terms of accuracy and CPU time.

The third part is devoted to multimedia data indexing and retrieval. It consists of three chapters. Chapter 8 focuses on developing an image retrieval methodology, which includes a new indexing method based on fuzzy logic, a hierarchical indexing structure, and the corresponding hierarchical elimination-based A^∗ retrieval algo-rithm with logaalgo-rithmic search complexity in the average case. It also deals with user relevance feedbacks to tailor the semantic retrieval to each user’s individualized query preferences.

Chapter 9 presents a methodology that uses both clustering and characterization rules to reduce the search space and produce a summarized view of an annotated image database. These data mining techniques are performed separately on visual descriptors and textual information such as annotations, keywords, etc. A visual ontology is derived from the textual part and enriched with representative images associated to each concept of the ontology. The ontology-based navigation is used as a user-friendly tool for retrieving relevant images.

Chapter 10 describes a methodology and a tool for end user that allows creating classifiers for images. The process consists of two stages: First, small image fragments called patches are classified. Second, frequency vectors of these patch classifications are fed into a second-level classier for global scene classification (e.g., city, portrait, or countryside). The first-stage classifiers can be seen as a set of highly specialized feature detectors that define a domain-specific visual alphabet. The end user builds the second-level classifiers interactively by simply indicating positive examples of a scene. The scene classifier approach has been successfully applied to several problem domains, such as content-based video retrieval in television archives, automated sewer inspection, and pornography filtering.

The fourth part is a collection of eight chapters that describe approaches to mul-timedia data modeling and evaluation. Chapter 11 presents an approach to automatic novelty detection in video stream and compares it experimentally to human perfor-mance. The evaluation of human versus machine-based novelty detection is quantified by metrics based on location of novel events, number of novel events, etc.

Chapter 12 presents an effective approach for event detection using both audio and visual features with its application in the automatic extraction of goal events in soccer videos. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The approach uses the nearest neighbor classifier with generalization scheme. The proposed approach has been tested using soccer videos of different styles that have been produced by different broadcasters.

Chapter 13 describes an approach for mining and automatically discovering map-pings in hierarchical media data, metadata, and ontologies, using the structural infor-mation inherent in hierarchical data. It uses structure-based mining of relationships, which provides high degrees of precision. The approach works even when the map-pings are imperfect, fuzzy, and many-to-many.

Chapter 14 proposes a new approach to high-level concept recognition in images using the salient objects as the semantic building blocks. The novel approach uses support vector machine techniques to achieve automatic detection of the salient ob-jects, which serve as a basic visual vocabulary. Then a high-level concept is modeling using the Gaussian mixture model of weighted dominant components, i.e., salient objects. The chapter proves the efficiency of the modeling approach, using the results of broad experiments obtained on images of nature.

Chapter 15 is devoted to a fundamental problem—image segmentation. It intro-duces a new MPEG-7 friendly system that integrates a user’s feedback with image segmentation and region recognition, supporting the user in the extraction of image region semantics in highly dynamic environments. The results obtained for aerial photos are provided and discussed.

Chapter 16 describes techniques for query by example in an image database, when the exemplar image can have different position and scale in the target im-age and/or be captured by different sensor. The approach matches imim-ages against the exemplar by comparing the local entropies in the images at corresponding positions. It employs a search strategy that combines sampling in the space of exemplar positions, the Fast Fourier Transform for efficiently evaluating object translations, and iterative optimization for pose refinement. The techniques are ap-plied to matching exemplars with real images such as aerial and ground reconnais-sance photos. Strategies for scaling this approach to multimedia databases are also described.

Chapter 17 presents a neural experts architecture that enables faster neural net-works training for datasets that can be decomposed into loosely interacting sets of attributes. It describes the expressiveness of this architecture in terms of functional composition. The experimental results show that the proposed neural experts archi-tecture can achieve classification performance that is statistically identical to that of a fully connected feedforward neural networks, while significantly improving training efficiency.

Chapter 18 describes a methodology that allows building models of expressive music performance. The methodology consists of three stages. First, acoustic features are extracted from recordings of both expressive and neutral musical performances.

Then, using data mining techniques a set of transformation rules is derived from per-formance data. Finally, the rules are applied to description of inexpressive melody to synthesize an expressive monophonic melody in MIDI or audio format. The chap-ter describes, explores, and compares different data mining techniques for creating expressive transformation models.

Finally, the fifth part unites seven chapters that describe case studies and appli-cations. Chapter 19 presents a new approach for supporting design and redesign of virtual collaborative workspaces, based on combining integrated data mining tech-niques for refining the lower level models with a reverse engineering cycle to create upper-level models. The methodology is based on the combination of a new model of vertical information integration related to virtual collaboration that is called the information pyramid of virtual collaboration.

Chapter 20 presents a time-constrained sequential pattern mining method for extracting patterns associated with semantic events in produced videos. The video is separated into shots and 13 streams of metadata are extracted for each shot. The metadata not only cover such shot’s attributes as duration, low-level color, texture, shape and motion features, and sound volume, but also advanced attributes such as the presence of weapons in the shot and sound type, i.e., silence, speech, or music.

All metadata are represented as quantized values forming finite alphabets for each stream. Data mining techniques are applied to stream of metadata to derive patterns that represent semantic events.

Chapter 21 describes an approach for people localization and tracking in an office environment using a sensor network that consists of video cameras, infrared tag readers, a fingerprint reader, and a pan-tilt-zoom camera. The approach is based on a Bayesian framework that uses noisy, but redundant data from multiple sensor streams

and incorporates it with the contextual and domain knowledge. The experimental results are presented and discussed.

Chapter 22 presents an approach that allows estimating a potential attractiveness of a banner image based on its attributes. A banner image is an advertisement that is designed to attract Web users to view and eventually to buy the advertised product or service. The approach uses a Bayesian classifier to predict the level of the click-thru rates based on the features extracted from the banner image GIF file, such as image dimensions, color features, number of frames in an animated file, and frames’

dynamic and chromatic features. The experimental results are discussed.

Chapter 23 presents an approach that allows deriving users’ profiles from their performance data when they are working with a video search engine. The approach suggests a two-level model. The goal of the first level is modeling and clustering user’s behavior on a single video sequence (an intravideo behavior). The goal of the second level is modeling and clustering a user’s behavior on a set of video sequences (an in-tervideo behavior). The two-phase clustering algorithm is presented and experimental results are discussed.

Chapter 24 presents a method for an automatic identification of a person by iris recognition. The method uses arrays of pixels extracted from a raster image of a human iris and searches for similar patterns using the latent semantic indexing approach.

The comparison process is expedited by replacing the time consuming singular value decomposition algorithm with the partial symmetric eigenproblem. The results of experiments using a real biometric data collection are discussed.

Chapter 25 deals with data mining of medical images. It presents the research results obtained for texture extraction, classification, segmentation, and retrieval of normal soft tissues in computer tomography images of the chest and abdomen. The chapter describes various data mining techniques, which allow identifying different tissues in images, segmenting and indexing images, and exploring different similarity measures for image retrieval. Experimental results for tissue segmentation, classifi-cation and retrieval are presented.

References

1. Groth R.Data Mining. A Hands-On Approach for Business Professionals. Prentice Hall, Upper Saddle River, NJ, 1998.

2. Berry MJA, Linoff G.Data Mining Techniques for Marketing, Sales, and Customer Sup-port. Wiley Computer Publishing, New York, 1997.

3. Agosta L. The future of data mining—Predictive analytics. IT View Report, Forrester Research October 17, 2003. Available at: http://www.forrester.com/Research/LegacyIT/

0,7208,32896,00.html.

4. Berry MW (Ed.).Survey of Text Mining: Clustering, Classification, and Retrieval. Springer-Verlag, New York, 2004, 244 p.

5. Zhong N, Liu J, Yao Y (Eds.).Web Intelligence. Springer, New York, 2005; 440 p.

6. Manjunath BS, Salembier Ph, Sikora T (Eds.).Introduction to MPEG-7. Multimedia Con-tent Description Interface. Wiley, New York, 2002, 371 p.

7. Maybury MT. Intelligent Multimedia Information Retrieval. AAAI Press/MIT Press, Cambridge, MA, 1997, 478 p.

8. Furht B (Ed.).Handbook of Multimedia Computing. CRC Press, Boca Raton, FL, 1999, 971 p.

9. Petrushin V. Creating emotion recognition agents for speech signal. In Dauntenhahn K, Bond AH, Canamero L and Edmonds B. (Eds.),Socially Intelligent Agents. Creating Relationships with Computers and Robots. Kluwer, Boston, 2002, pp. 77–84.

10. Remagnino P, Jones GA, Paragios N, Regazzoni CS.Video-based Surveillance Systems.

Computer Vision and Distributed Processing. Kluwer, Boston, 2002, 279 p.

11. TRECVID Workshop website. http://www-nlpir.nist.gov/projects/trecvid/.

12. Furht B, Marques O. (Eds.).Handbook of Video Databases. Design and Applications. CRC Press, Boca Raton, FL, 2004, 1211 p.

13. Rosenfeld A, Doermann D, DeMenthon D.Video Mining. Kluwer, Boston, 2003, 340 p.

14. Zaiane OR, Simoff SJ, Djeraba Ch.Mining Multimedia and Complex Data. Lecture Notes in Artificial Intelligence, Vol. 2797. Springer, New York, 2003; 280 p.

15. Kulesh V, Petrushin VA, Sethi IK. Video clip recognition using joint audio-visual pro-cessing model. InProceedings of 16th International Conference on Pattern Recognition (ICPR’02), 2002, Vol. 1, pp. 1051–1054.

16. Helmer S, Lowe DG. Object recognition with many local features. InWorkshop on Gen-erative Model Based Vision, Washington, DC, 2004.

17. Brown M, Ramos L. Searching For A Better Search, IT View Tech Choices. Forrester Research, August 29, 2005. Available at: http://www.forrester.com/Research/Document/

0,7211,37615,00.html

18. Stamou G, Kollias S.Multimedia Content and the Semantic Web. Wiley, New York, 2005, 392 p.

19. Kulesh V, Petrushin VA, Sethi IK. PERSEUS: Personalized multimedia news portal.

In Proceedings of IASTED Intl Conf. on Artificial Intelligence and Applications, September 4–7, 2001, Marbella, Spain, pp. 307–312.

20. Kosch H. Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21. CRC Press, Boca Raton, FL, 2004, 258 p.

21. Bormans J, Hill K. MPEG-21 Multimedia framework. Overview. Available at: http://www.

chiariglione.org/mpeg/standards/mpeg-21/mpeg-21.htm.

22. The Dublin Core Metadata Initiative website. http://dublincore.org/ . 23. Resource Description Framework (RDF). http://www.w3.org/RDF/.

24. Simple Knowledge Organisation System (SKOS): http://www.w3.org/2004/02/skos/.

25. Thesaurus for Graphic Materials (TGM): http://www.loc.gov/rr/print/tgm1/.

26. OWL. Web Ontology Language Overview. http://www.w3.org/TR/owl-features/.

Dans le document Multimedia Data Mining and Knowledge Discovery (Page 32-38)