Multimedia Data Mining and Knowledge Discovery

(1)

(2)

Multimedia Data Mining and Knowledge Discovery

i

(3)

Valery A. Petrushin and Latifur Khan (Eds)

Multimedia Data Mining and

Knowledge Discovery

iii

(4)

Valery A. Petrushin, MS, PhD Accenture Technology Labs Accenture Ltd

161 N. Clark St.

Chicago, IL 60601, USA

Latifur Khan, BS, MS, PhD EC 31

2601 N. Floyd Rd.

Richardson, TX 75080-1407, USA

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library Library of Congress Control Number: 2006924373

ISBN-10: 1-84628-436-8 Printed on acid-free paper ISBN-13: 978-1-84628-436-6

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.

The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use.

The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.

Whilst we have made considerable efforts to contact all holders of copyright material contained in this book, we may have failed to locate some of them. Should holders wish to contact the Publisher, we will be happy to come to some arrangement with them.

9 8 7 6 5 4 3 2 1

Springer Science+Business Media springer.com

iv

(5)

Preface

In recent years we witnessed the real revolution in media recordings and storage.

Because of advances in electronics, computing engineering, storage manufacturing, and networking, the market is flooded with cheap computers, mass memory, camera phones, and electronic devices for digitizing and producing visual and audio information. Ten years ago only a professional studio could create an audio CD or a DVD, but today everybody can do it, using a home computer. Under threat of the digital expansion, the entertainment, show, and education industries started changing their business models. Tomorrow nobody will be surprised to find that a personal computer has a terabyte hard disk and four gigabyte RAM. This phenomenon has made corporate, public, and personal multimedia repositories widespread and fast growing.

Currently, there are some commercial tools for managing and searching multimedia audio and image collections, but the need for tools to extract hidden useful knowledge embedded within multimedia collections is becoming pressing and central for many decision-making applications. The tools needed today are tools for discovering relationships between objects or segments within images, classifying images on the basis of their content, extracting patterns in sound, categorizing speech and music, recognizing and tracking objects in video streams, etc.

Today data mining efforts are going beyond the databases to focusing on data collected in fields like art, design, hypermedia, and digital media production, medical multimedia data analysis and computational modeling of creativity, including evolutionary computation. These fields use variety of data sources and structures, interrelated by the nature of the phenomenon. As a result there is an increasing interest in new techniques and tools that can detect and discover patterns that can lead to a new knowledge in the problem domain, where the data have been collected.

There is also an increasing interest in the analysis of multimedia data generated by different distributed applications, like collaborative virtual environments, virtual communities, and multiagent systems. The data collected from such environments include record of the actions in them, audio and video recordings of meetings and collaborative sessions, variety of documents that are part of the business process, asyn- chronous threaded discussions, transcripts from synchronous communications, and other data records. These heterogeneous multimedia data records require sophisticated

xvii

(18)

preprocessing, synchronization, and other transformation procedures before even get- ting to the analysis stage.

On the other hand, researchers in multimedia information systems, in the search for techniques for improving the indexing and retrieval of multimedia information are looking into new methods for discovering indexing information. Variety of techniques from machine learning, statistics, databases, knowledge acquisition, data visualization, image analysis, high performance computing, and knowledge-based systems, have been used mainly as a research handcraft activity. The development of ontologies and vocabularies for multimedia data fosters the adoption and merging with the Semantic Web technology. The emerging international standards for multimedia content description (MPEG-7) and multimedia resources delivery and usage (MPEG-21) promise to expedite the progress in the field giving a uniform data representation and the open multimedia framework.

This book is based mostly on extended and updated papers that have been presented at the two Multimedia Data Mining Workshops—MDM KDD 2003 and MDM KDD 2004 that held in conjunction with the ACM SIGKDD Conference in Wash- ington, DC, August 2003 and the ACM SIGKDD Conference in Seattle, WA, August 2004, respectively. The book also includes several invited surveys and papers. The book chapters give a snapshot of research and applied activities in the multimedia data mining.

The editors are grateful to the founders and active supporters of the Multimedia Data Mining Workshop Series Simeon Simoff, Osmar Zaiane, and Chabane Djeraba.

We also thank the reviewers of book papers for their well-done job and organizers of ACM SIGKDD Conferences for their support.

We thank the Springer-Verlag’s employees Wayne Wheeler, who initiated the book project, Catherine Brett, and Frank Ganz for their help in coordinating the publication and editorial assistance.

Chicago, IL Valery A. Petrushin

Dallas, TX Latifur Khan

January 2006

(19)

List of Contributors

Nitin Agarwal

Department of Computer Science and Engineering

Arizona State University Tempe, AZ 82857 USA

nitin.agarwal@asu.edu Muhammad A. Ahmad Rochester Institute

of Technology

102 Lomb Memorial Drive Rochester, NY 14623-5608 USA

maa2454@cs.rit.edu Sergio A. Alvarez

Computer Science Department Boston College

Chestnut Hill, MA 02467 USA

alvarez@cs.bc.edu Marie-Aude Aufaure

Computer Science Department SUPELEC

Plateau du Moulon 3, rue Joliot Curie

F-91 192 Gif-sur-Yvette Cedex France

Marie-Aude.Aufaure@supelec.fr

Robert P. Biuk-Aghai

Faculty of Science and Technology University of Macau

Avenida Padre Tom´as Pereira, S.J.

Taipa

Macau S.A.R.

China

robertb@umac.mo Fatma Bouali

LIFL UMR CNRS 8022 France

Fatma.Bouali@univ-lille2.fr Marinette Bouet

LIMOS – UMR 6158 CNRS Blaise Pascal University of Clermont-Ferrand II

Campus des C´ezeaux 24, avenue des Landais

F-63173 Aubiere Cedex France

Marinette.Bouet@cust.univ- bpclermont.fr

Stefan Brecheisen

Institute for Computer Science University of Munich

Oettingenstr. 67, 80538 Munich, Germany

brecheis@dbs.ifi.lmu.de

xix

(20)

K. Selcuk Candan

Arizona State University, Tempe, AZ 82857

USA

candan@asu.edu Min Chen

Distributed Multimedia Information System Laboratory

School of Computing & Information Sciences

Florida International University Miami, FL 33199

USA

mchen005@cs.fiu.edu Shu-Ching Chen

Distributed Multimedia Information System Laboratory

School of Computing & Information Sciences

Florida International University Miami, FL 33199

USA

chens@cs.fiu.edu Charles Daniel

Pennsylvania State University – Harrisburg

Middletown, PA 17057 USA

Marten J. den Uyl VicarVision b.v.

Singel 160, 1015 AH Amsterdam

The Netherlands denuyl@vicarvision.nl Qin Ding

Pennsylvania State University – Harrisburg

Middletown, PA 17057

USA

qding@psu.edu Chabane Djeraba

LIFL – UMR USTL CNRS 8022 Universite de Lille1, Bˆat. M3 59655 Villeneuve d’Ascq Cedex France

djeraba@lifl.fr Jianping Fan

Dept of Computer Science

University of North Carolina – Charlotte Charlotte, NC 28223

USA

jfan@uncc.edu Farshad Fotouhi

Department of Computer Science Wayne State University

Detroit, MI 48202 USA

fotouhi@wayne.edu Roger Gaborski

Rochester Institute of Technology 102 Lomb Memorial Drive Rochester, NY 14623-5608 USA

rsg@cs.rit.edu Yuli Gao

USA

ygao@uncc.edu Anatole V. Gershman Accenture Technology Labs Accenture Ltd.

161 N. Clark St.

Chicago, IL 60601 USA

anatole.v.gershman@accenture.com

(21)

William Grosky

Department of Computer and Information Science

University of Michigan – Dearborn Dearborn, MI 48128

USA

wgrosky@umich.edu Dimitrios Gunopulos

Computer Science & Engineering Department

University of California, Riverside Riverside, CA 92521

USA

dg@cs.ucr.edu Amaury Hazan

Music Technology Group Pompeu Fabra University Ocata 1, 08003 Barcelona Spain

ahazan@iua.upf.es Koichi Ideno

Graduate School of Science and Technology

Kobe University, Nada, Kobe, 657-8501 Japan

ideno@ai.cs.scitec.kobe-u.ac.jp Menno Isra¨el

ParaBot Services b.v.

Singel 160, 1015 AH Amsterdam The Netherlands

m.israel@parabots.nl James M. Kang

University of Minnesota Minneapolis, MN 55455 USA

jkang@cs.umn.edu

Takeshi Kawato

Computer Science Department Worcester Polytechnic Institute Worcester, MA 01609

USA

takeshi@wpi.edu Eamonn Keogh

Computer Science & Engineering Department

University of California, Riverside Riverside, CA 92521

USA

eamonn@cs.ucr.edu Latifur Khan

Department of Computer Science University of Texas at Dallas 75083 Richardson, Texas USA

lkhan@utdallas.edu Jong Wook Kim

jong@asu.edu Hans-Peter Kriegel

Oettingenstr. 67, 80538 Munich Germany

kriegel@dbs.ifi.lmu.de Peer Kr¨oger

Oettingenstr. 67, 80538 Munich, Germany

kroegerp@dbs.ifi.lmu.de

(22)

Chuanjun Li

Department of Computer Science The University of Texas at Dallas Richardson, Texas 75083 USA

chuanjun@utdallas.edu Xin Li

Department of Computer Science Oklahoma City University Oklahoma City

OK 73106 USA

xinli@okcu.edu Jessica Lin

Department of Information and Software Engineering,

George Mason University Fairfax, VA 22030 USA

jessica@ise.gmu.edu Huan Liu

huan.liu@asu.edu Hangzai Luo

USA

hluo@uncc.edu Libor Machala

Department of Experimental Physics Palacky University

Svobody 26 779 00 Olomouc Czech Republic machala@sloup.upol.cz

Esteban Maestre Music Technology Group Pompeu Fabra University Ocata 1, 08003 Barcelona, Spain emaestre@iua.upf.es

Sylvain Mongy

LIFL – UMR USTL CNRS 8022 Universite de Lille1, Bˆat. M3 59655 Villeneuve d’Ascq Cedex France

mongy@lifl.fr Clark F. Olson

Computing and Software Systems University of Washington, Bothell 18115 Campus Way NE

Campus Box 358534 Bothell, WA 98011-8246 USA

cfolson@u.washington.edu Nilesh Patel

Department of Computer and Information Science

University of Michigan – Dearborn Dearborn, MI 48128

USA

patelnv@umich.edu Valery A. Petrushin Accenture Technology Labs Accenture Ltd.

161 N. Clark St.

valery.a.petrushin@accenture.com Martin Pfeifle

Oettingenstr. 67 80538 Munich Germany

pfeifle@dbs.ifi.lmu.de

(23)

Balakrishnan Prabhakaran Department of Computer Science University of Texas at Dallas, 75083 Richardson, Texas USA

praba@utdallas.edu Pavel Praks

Department of Mathematics and Descriptive Geometry

Department of Applied Mathematics VSB – Technical University of Ostrava

17 Listopadu 15, 708 33 Ostrava, Czech Republic

pavel.praks@vsb.cz Daniela Stan Raicu

Intelligent Multimedia Processing Laboratory

School of Computer Science, Telecommunications, and Information Systems (CTI) DePaul University

Chicago, Illinois USA

draicu@cs.depaul.edu Rafael Ramirez

Music Technology Group Pompeu Fabra University Ocata 1

08003 Barcelona Spain

rafael@iua.upf.es Carolina Ruiz

Computer Science Department Worcester Polytechnic Institute Worcester, MA 01609

USA

ruiz@cs.wpi.edu Matthias Schubert

Institute for Computer Science

University of Munich Oettingenstr. 67, 80538 Munich Germany

schubert@dbs.ifi.lmu.de Xavier Serra

Music Technology Group Pompeu Fabra University Ocata 1, 08003 Barcelona Spain

xserra@iua.upf.es Ishwar K. Sethi

Oakland University Rochester, MI 48309 USA

isethi@oakland.edu Kimiaki Shirahama

Kobe University Nada, Kobe 657-8501 Japan

kimi@ai.cs.scitec.kobe-u.ac.jp Mei-Ling Shyu

Department of Electrical and Computer Engineering

University of Miami Coral Gables, FL 33124 USA

shyu@miami.edu Simeon J. Simoff

Faculty of Information Technology University of Technology,

Sydney P.O. Box 123

Broadway NSW 2007 Australia

simeon@it.uts.edu.au

(24)

Vaclav Snasel

Department of Computer Science VSB - Technical University of Ostrava

17 Listopadu 15, 708 33 Ostrava Czech Republic

vaclav.snasel@vsb.cz Reshma Suvarna

Arizona State University Tempe, AZ 82857, USA reshma.suvarna@asu.edu Ankur Teredesai

Rochester Institute of Technology 102 Lomb Memorial Drive Rochester, NY 14623-5608, USA amt@cs.rit.edu

Kuniaki Uehara

Kobe University Nada, Kobe 657-8501 Japan

uehara@ai.cs.scitec.kobe-u.ac.jp Egon L. van den Broek

Center for Telematics and Information Technology (CTIT) and

Institute for Behavioral Research (IBR) University Twente

P.O. box 217, 7500 AE Enschede, The Netherlands

e.l.vandenbroek@utwente.nl Peter van der Putten LIACS

University of Leiden P.O. Box 9512 2300 RA Leiden The Netherlands putten@liacs.nl

Michail Vlachos

IBM T. J. Watson Research Center

Hawthorne, NY 10532 USA

vlachos@ibm.com Lei Wang

Department of Computer Science University of Texas at Dallas, 75083 Richardson, Texas USA

leiwang@utdallas.edu Gang Wei

Accenture Technology Labs Accenture Ltd.

161 N. Clark St.

gang.wei@accenture.com Chengcui Zhang

Department of Computer & Information Science

University of Alabama at Birmingham Birmingham, AL 35294-1170

USA

zhang@cis.uab.edu Ruofei Zhang

Computer Science Department Watson School

SUNY Binghamton

Binghamton, NY, 13902-6000 USA

rzhang@cs.binghamton.edu Zhongfei (Mark) Zhang Computer Science Department Watson School,

SUNY Binghamton

Binghamton, NY 13902-6000 USA

zhongfei@cs.binghamton.edu

(25)

Arthur Zimek

Oettingenstr. 67, 80538 Munich Germany

zimek@dbs.ifi.lmu.de

(26)

Part I Introduction

1

(27)

1. Introduction into Multimedia Data Mining and Knowledge Discovery

Valery A. Petrushin

Summary. This chapter briefly describes the purpose and scope of multimedia data mining and knowledge discovery. It identifies industries that are major users or potential users of this technology, outlines the current state of the art and directions to go, and overviews the chapters collected in this book.

1.1 What Is Multimedia Data Mining?

The traditional definition of data mining is that it “is the process of automating information discovery” [1], which improves decision making and gives a company advantages on the market. Another definition is that it “is the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules” [2]. It is also assumed that the discovered patterns and rules are meaningful for business. Indeed, data mining is an applied discipline, which grew out of the statistical pattern recognition, machine learning, and artificial intelligence and coupled with business decision making to optimize and enhance it.

Initially, data mining techniques have been applied to structured data from databases.

The term “knowledge discovery in databases,” which is currently obsolete, reflects this period. However, knowledge is interpretation of data meanings and knowledge discovery goes beyond finding simple patterns and correlations in data to identifying concepts and finding relationships. Knowledge-based modeling creates a consistent logical picture of the world. In recent years the term “predictive analytics” has been widely adopted in the business world [3].

On one hand growing computer power made data mining techniques affordable by small companies, but on the other, emergence of cheap massive memory and digital recording electronic devices, such as scanners, microphones, cameras, and camcorders, allowed digitizing all kind of corporate, governmental, and private documents. Many companies consider these electronic documents as valuable assets and other sources of data for data mining. For example, e-mail messages from customers and recordings of telephonic conversations between customers and operators could serve as valuable sources of knowledge about both customers’ needs and the quality of service. Unprecedented growing information on the World Wide Web made it

3

(28)

an indispensable source of data for business intelligence. However, processing new sources of semistructured (Web pages, XML documents) and unstructured (text, images, audio, and video recordings) information required new data mining methods and tools.

Recently, two branches of data mining, text data mining and Web data mining, have emerged [4, 5]. They have their own research agenda, communities of researchers, and supporting companies that develop technologies and tools. Unfortunately, today multimedia data mining is still in an embryonic state. It could be explained by im- mature technology, high cost of storing and processing media data, and the absence of successful stories that show the benefits and the high rate of return on investments into multimedia data mining.

For understanding multimedia data mining in depth, let us consider its purpose and scope. First, let us describe what kinds of data belong to the multimedia data.

According to MPEG-7 Standard [6], there are four types of multimedia data: audio data, which includes sounds, speech, and music; image data (black-and-white and color images); video data, which include time-aligned sequences of images; and electronic or digital ink, which is sequences of time aligned 2D or 3D coordinates of a stylus, a light pen, data glove sensors, or a similar device. All this data is generated by specific kind ofsensors.

Second, let us take a closer look at the termmultimedia data mining. The word multimediaassumes that several data sources of different modalities are processing at the same time. It could be or could not be the case. A data mining project can deal with only one modality of data, for example, customers’ audio recordings or surveillance video. It would be better to use the termmedia data mininginstead, but the wordmediausually connotes by mass media such as radio and television, which could be or could not be the data source for the data mining project. The termsensor data miningextends the scope too far covering such sensors as radars, speedometers, accelerometers, echo locators, thermometers, etc. This book is devoted to discussing the first three media types mentioned above and we shall use the termmultimedia data miningand the acronym MDM keeping in mind the above discussion.

The MDM’s primary purpose is to process media data alone or in a combination with other data for finding patterns useful for business. For example, analyze customer traffic in a retail store using video recordings to find optimal location for a new product display. Besides explicit data mining projects, the MDM techniques can be used as a part of complex operational or manufacturing processes. For example, using images for finding defective products or indexing a video database of company’s meetings.

The MDM is a part of multimedia technology, which covers the following areas [7, 8]:

r Media compression and storage.

r Delivering streaming media over networks with required quality of service.

r Media restoration, transformation, and editing.

r Media indexing, summarization, search, and retrieval.

r Creating interactive multimedia systems for learning/training and creative art production.

r Creating multimodal user interfaces.

(29)

The major challenge that the MDM shares with the multimedia retrieval is the so-called semantic gap, which is the difficulty of deriving a high-level concept such as

“mountain landscape” or “abnormal customer behavior” from low-level features such as color histogram, homogeneous texture, contour-based shape, motion trajectory, etc., that are extracted from media data. A solution of this problem requires creating an ontology that covers different aspects of the concept and a hierarchy of recognizers that deduce the probability of the concept from the probabilities of its components and relationships among them.

1.2 Who Does Need Multimedia Data Mining?

To answer the above question, we consider the application areas of MDM and related industries and companies who are (potential) users of technology. The following are the five major application areas of MDM:

Customer Insight—Customer insight includes collecting and summarizing informa- tion about customers’ opinions about products or services, customers’ complains, customers’ preferences, and the level of customers’ satisfaction of products or services. All product manufacturers and service providers are interested in customer insight. Many companies have help desks or call centers that accept telephone calls from the customers. These calls are recorded and stored. If an operator is not available a customer may leave an audio message. Some organizations, such as banks, insurance companies, and communication companies, have offices where customers meet company’s representatives. The conversations between customers and sales representatives can be recorded and stored. The audio data serve as an input for data mining to pursue the following goals:

r Topic detection—The speech from recordings is separated into turns, i.e., speech segments spoken by only one speaker. Then the turns are transcribed into text and keywords are extracted. The keywords are used for detecting topics and estimating how much time was spent on each topic. This information aggregated by day, week, and month gives the overview of hot topics and allows the management to plan the future training taking into account the emerging hot topics.

r Resource assignment—Call centers have a high turn around rate and only small percentage of experienced operators, who are a valuable resource. In case when the call center collects messages, the problem is how to assign an operator who will call back. To solve the problem, the system transcribes the speech and detects the topic. Then it estimates the emotional state of the caller. If the caller is agitated, angry, or sad, it gives the message a higher priority to be responded by an experienced operator. Based on the topic, emotional state of the caller, and operators’ availability the system assigns a proper operator to call back [9].

r Evaluation of quality of service—At a call center with thousands of calls per day, the evaluation of quality of service is a laborious task. It is done by people who listen to selected recordings and subjectively estimate the quality of service.

Speech recognition in combination with emotion recognition can be used for automating the evaluation and making it more objective. Without going into

(30)

deep understanding of the conversation meaning, the system assigns a score to the conversation based on the emotional states of speakers and keywords. The average score over conversations serves as a measure of quality of service for each operator.

Currently, there are several small companies that provide tools and solutions for customer management for call centers. Some tools include procedures that are based on MDM techniques. The extension of this market is expected in the future.

Surveillance—Surveillance consists of collecting, analyzing, and summarizing au- dio, video, or audiovisual information about a particular area, such as battlefields, forests, agricultural areas, highways, parking lots, buildings, workshops, malls, retail stores, offices, homes, etc. [10]. Surveillance often is associated with intelligence, security, and law enforcement, and the major uses of this technology are military, police, and private companies that provide security services. The U.S. Gov- ernment is supporting long-term research and development activities in this field by conducting the Video Analysis and Content Extraction (VACE) program, which is oriented on development technology for military and intelligence purposes. The National Institute of Standards and Technology is conducting the annual evaluation of content-based retrieval in video (TRECVID) since 2000 [11]. However, many civilian companies use security services for protecting their assets and monitoring their employees, customers, and manufacturing processes. They can get valuable insight from mining their surveillance data. There are several goals of surveillance data mining:

r Object or event detection/recognition—The goal is to find an object in an image or in a sequence of images or in soundtrack that belongs to a certain class of objects or represents a particular instance. For example, detect a military vehicle in a satellite photo, detect a face in sequence of frames taken by a camera that is watching an office, recognize the person whose face is detected, or recognize whether sound represents music or speech. Another variant of the goal is to identify the state or attributes of the object. For example, classify an X-ray image of lungs as normal or abnormal or identify the gender of a speaker. Finally, the goal can be rather complex for detecting or identifying an event, which is a sequence of objects and relationships among objects. For example, detect a goal event in a soccer game; detect violence on the street, or detect a traffic accident. This goal is often a part of the high-level goals that are described below.

r Summarization—The goal is to aggregate data by summarizing activities in space and/or time. It covers summarizing activities of a particular object (for example, drawing a trajectory of a vehicle or indicating periods of time when a speaker was talking) or creating a “big picture” of activities that happened during some period of time. For example, assuming that a bank has a surveillance system that includes multiple cameras that watch tellers and ATM machines indoors and outdoors, the goal is to summarize the activities that happened in the bank during 24 h.

To meet the goal, unsupervised learning and visualization techniques are used.

Summarization also serves as a prerequisite step to achieve another goal—find frequent and rare events.

(31)

r Monitoring—The goal is to detect events and generate response in real time. The major challenges are real-time processing and generating minimum false alarms.

Examples are monitoring areas with restricted access, monitoring a public place for threat detection, or monitoring elderly or disabled people at home.

Today we witness a real boom in video processing and video mining research and development [12–14]. However, only a few companies provide tools that have elements of data mining.

Media Production and Broadcasting—Proliferation of radio stations and TV channels makes broadcasting companies to search for more efficient approaches for creating programs and monitoring their content. The MDM techniques are used to achieve the following goals:

r Indexing archives and creating new programs—A TV program maker uses raw footage calledrushesand clips from commercial libraries calledstockshot li- brariesto create a new program. A typical “shoot-to-show” ratio for a TV program is in the range from 20 to 40. It means that 20–40 hours of rushes go into one hour of a TV show. Many broadcasting companies have thousands of hours of rushes, which are poorly indexed and contain redundant, static, and low-value episodes. Using rushes for TV programs is inefficient. However managers believe that rushes could be valuable if the MDM technology could help program makers to extract some “generic” episodes with high potential for reuse.

r Program monitoring and plagiarism detection—A company that pays for broadcasting its commercials is interested to know how many times the commercial has been really aired. Some companies are also interested how many times their logo has been viewed on TV during a sporting event. The broadcasting companies or independent third-party companies can provide such service. The other goals are detecting plagiarism in and unlicensed broadcasting of music or video clips. This requires using MDM techniques for audio/video clip recognition [15]

and robust object recognition [16].

Intelligent Content Service—According to Forrester Research, Inc, (Cambridge, MA) the Intelligent Content Service (ICS) is “a semantically smart content-centric set of software services that enhance the relationship between information workers and computing systems by making sense of content, recognizing context, and understanding the end user’s requests for information” [17]. Currently, in spite of many Web service providers’ efforts to extend their search services beyond basic keyword search to ICS, it is not available for multimedia data yet. This area will be the major battlefield among the Web search and service providers in next 5 years.

The MDM techniques can help to achieve the following goals:

r Indexing Web media and using advanced media search—It includes creating indexes of images and audio and video clips posted on the Web using audio and visual features; creating ontologies for concepts and events; implementing advance search techniques, such as search images or video clips by example or by sketch and search music by humming; using Semantic Web techniques to infer the context and improve search [18]; and using context and user feedback to better understand the user’s intent.

r Advanced Web-based services—Taking into account exponential growth of information on the Web such services promise to be an indispensable part of

(32)

everybody’s everyday life. The services include summarization of events according the user’s personal preferences, for example, summarizing a game of the user’s favorite football or basketball team; generating a preview of a music album or a movie; or finding relevant video clip for a news article [19].

Knowledge Management—Many companies consider their archives of documents as a valuable asset. They spend a lot of money to maintain and provide access to their archives to employees. Besides text documents, these archives can contain draw- ings of designs, photos and other images, audio and video recording of meetings, multimedia data for training, etc. The MDM approaches can provide the ICS for supporting knowledge management of companies.

1.3 What Shall We See in the Future?

As to the future development in the multimedia data mining field, I believe that we shall see the essential progress during the next 5 years. The major driven force behind it is creating techniques for advanced Web search to provide intelligent content services to companies for supporting their knowledge management and business intelligence. It will require creating general and industry-specific ontologies, develop recognizers for entities of these ontologies, merging probabilistic and logical inferences, and usage of metadata and reasoning represented in different vocabularies and languages, such as MPEG-7 [6], MPEG-21 [20, 21], Dublin Core [22], RDF [23], SKOS [24], TGM [25], OWL [26], etc.

Another force, which is driven by funding from mostly governmental agencies, is video surveillance and mass media analysis for improving intelligence, military, and law enforcement decision making.

Mining audio and video recordings for customer insight and using MDM for improving broadcasting companies’ production will also be growing and elaborating in the future.

From the viewpoint of types of media, most advances in video processing are exected in research and development for surveillance and broadcasting applications, audio processing will get benefits from research for customer insight and creating advanced search engines, and image processing will benefit from developing advanced search engines As to electronic ink, I believe this type of multimedia data will be accumulated and used in data mining in the future, for example, for estimating the skills level of a user of an interactive tool or summarizing the contribution to a project of each participant of a collaborative tool.

1.4 What Can You Find in This Book?

As any collection of creative material the chapters of the book are different in style, topics, mathematical rigidity, level of generality, and readiness for deployment. But altogether they build a mosaic of ongoing research in the fast changing dynamic field of research that is the multimedia data mining.

(33)

The book consists of five parts: The first part, which includes two chapters, gives an introduction into multimedia data mining. Chapter 1, which you are reading now, overviews the multimedia data mining as an industry. Chapter 2 presents an overview of MDM techniques. It describes a typical architecture of MDM systems and covers approaches to supervised and unsupervised concept mining and event discovery.

The second part includes five chapters. It is devoted to multimedia data exploration and visualization. Chapter 3 deals with exploring images. It presents a clustering method based on unsupervised neural nets and self-organizing maps, which is called the dynamic growing self-organizing tree algorithm (DGSOT). The chapter shows that the suggested algorithm outperforms the traditional hierarchical agglomerative clustering algorithm.

Chapter 4 presents a multiresolution clustering of time series. It uses the Haar wavelet transform for representing time series at different resolutions and appliesk- means clustering sequentially on each level starting from the coarsest representation.

The algorithm stops when two sequential membership assignments are equal or when it reaches the finest representation. The advantages of the algorithm are that it works faster and produces better clustering. The algorithm has been applied to clustering images, using color and texture features.

Chapter 5 describes a method for unsupervised classification of events in mul- ticamera indoors surveillance video. The self-organizing map approach has been applied to event data for clustering and visualization. The chapter presents a tool for browsing clustering results, which allows exploring units of the self-organizing maps at different levels of hierarchy, clusters of units, and distances between units in 3D space for searching for rare events.

Chapter 6 presents the density-based data analysis and similarity search. It in- troduces a visualization technique called the reachability plot, which allows visually exploring a data set in multiple representations and comparing multiple similarity models. The chapter also presents a new method for automatically extracting cluster hierarchies from a given reachability plot and describes a system prototype, which serves for both the visual data analysis and a new way of object retrieval called navigational similarity search.

Chapter 7 presents an approach to the exploration of variable length multiattribute motion data captured by data glove sensors and 3-D human motion cameras. It suggests using the singular value decomposition technique to regularize multiattribute motion data of different lengths and classify them applying support vector machine classifiers. Classification motion data using support vector machine classifiers is com- pared with classification by related similarity measures in terms of accuracy and CPU time.

The third part is devoted to multimedia data indexing and retrieval. It consists of three chapters. Chapter 8 focuses on developing an image retrieval methodology, which includes a new indexing method based on fuzzy logic, a hierarchical indexing structure, and the corresponding hierarchical elimination-based A^∗ retrieval algorithm with logarithmic search complexity in the average case. It also deals with user relevance feedbacks to tailor the semantic retrieval to each user’s individualized query preferences.

(34)

Chapter 9 presents a methodology that uses both clustering and characterization rules to reduce the search space and produce a summarized view of an annotated image database. These data mining techniques are performed separately on visual descriptors and textual information such as annotations, keywords, etc. A visual ontology is derived from the textual part and enriched with representative images associated to each concept of the ontology. The ontology-based navigation is used as a user-friendly tool for retrieving relevant images.

Chapter 10 describes a methodology and a tool for end user that allows creating classifiers for images. The process consists of two stages: First, small image fragments called patches are classified. Second, frequency vectors of these patch classifications are fed into a second-level classier for global scene classification (e.g., city, portrait, or countryside). The first-stage classifiers can be seen as a set of highly specialized feature detectors that define a domain-specific visual alphabet. The end user builds the second-level classifiers interactively by simply indicating positive examples of a scene. The scene classifier approach has been successfully applied to several problem domains, such as content-based video retrieval in television archives, automated sewer inspection, and pornography filtering.

The fourth part is a collection of eight chapters that describe approaches to multimedia data modeling and evaluation. Chapter 11 presents an approach to automatic novelty detection in video stream and compares it experimentally to human performance. The evaluation of human versus machine-based novelty detection is quantified by metrics based on location of novel events, number of novel events, etc.

Chapter 12 presents an effective approach for event detection using both audio and visual features with its application in the automatic extraction of goal events in soccer videos. The extracted goal events can be used for high-level indexing and selective browsing of soccer videos. The approach uses the nearest neighbor classifier with generalization scheme. The proposed approach has been tested using soccer videos of different styles that have been produced by different broadcasters.

Chapter 13 describes an approach for mining and automatically discovering map- pings in hierarchical media data, metadata, and ontologies, using the structural information inherent in hierarchical data. It uses structure-based mining of relationships, which provides high degrees of precision. The approach works even when the map- pings are imperfect, fuzzy, and many-to-many.

Chapter 14 proposes a new approach to high-level concept recognition in images using the salient objects as the semantic building blocks. The novel approach uses support vector machine techniques to achieve automatic detection of the salient objects, which serve as a basic visual vocabulary. Then a high-level concept is modeling using the Gaussian mixture model of weighted dominant components, i.e., salient objects. The chapter proves the efficiency of the modeling approach, using the results of broad experiments obtained on images of nature.

Chapter 15 is devoted to a fundamental problem—image segmentation. It intro- duces a new MPEG-7 friendly system that integrates a user’s feedback with image segmentation and region recognition, supporting the user in the extraction of image region semantics in highly dynamic environments. The results obtained for aerial photos are provided and discussed.

(35)

Chapter 16 describes techniques for query by example in an image database, when the exemplar image can have different position and scale in the target image and/or be captured by different sensor. The approach matches images against the exemplar by comparing the local entropies in the images at corresponding positions. It employs a search strategy that combines sampling in the space of exemplar positions, the Fast Fourier Transform for efficiently evaluating object translations, and iterative optimization for pose refinement. The techniques are applied to matching exemplars with real images such as aerial and ground reconnais- sance photos. Strategies for scaling this approach to multimedia databases are also described.

Chapter 17 presents a neural experts architecture that enables faster neural networks training for datasets that can be decomposed into loosely interacting sets of attributes. It describes the expressiveness of this architecture in terms of functional composition. The experimental results show that the proposed neural experts architecture can achieve classification performance that is statistically identical to that of a fully connected feedforward neural networks, while significantly improving training efficiency.

Chapter 18 describes a methodology that allows building models of expressive music performance. The methodology consists of three stages. First, acoustic features are extracted from recordings of both expressive and neutral musical performances.

Then, using data mining techniques a set of transformation rules is derived from performance data. Finally, the rules are applied to description of inexpressive melody to synthesize an expressive monophonic melody in MIDI or audio format. The chapter describes, explores, and compares different data mining techniques for creating expressive transformation models.

Finally, the fifth part unites seven chapters that describe case studies and applications. Chapter 19 presents a new approach for supporting design and redesign of virtual collaborative workspaces, based on combining integrated data mining techniques for refining the lower level models with a reverse engineering cycle to create upper-level models. The methodology is based on the combination of a new model of vertical information integration related to virtual collaboration that is called the information pyramid of virtual collaboration.

Chapter 20 presents a time-constrained sequential pattern mining method for extracting patterns associated with semantic events in produced videos. The video is separated into shots and 13 streams of metadata are extracted for each shot. The metadata not only cover such shot’s attributes as duration, low-level color, texture, shape and motion features, and sound volume, but also advanced attributes such as the presence of weapons in the shot and sound type, i.e., silence, speech, or music.

All metadata are represented as quantized values forming finite alphabets for each stream. Data mining techniques are applied to stream of metadata to derive patterns that represent semantic events.

Chapter 21 describes an approach for people localization and tracking in an office environment using a sensor network that consists of video cameras, infrared tag readers, a fingerprint reader, and a pan-tilt-zoom camera. The approach is based on a Bayesian framework that uses noisy, but redundant data from multiple sensor streams

(36)

and incorporates it with the contextual and domain knowledge. The experimental results are presented and discussed.

Chapter 22 presents an approach that allows estimating a potential attractiveness of a banner image based on its attributes. A banner image is an advertisement that is designed to attract Web users to view and eventually to buy the advertised product or service. The approach uses a Bayesian classifier to predict the level of the click- thru rates based on the features extracted from the banner image GIF file, such as image dimensions, color features, number of frames in an animated file, and frames’

dynamic and chromatic features. The experimental results are discussed.

Chapter 23 presents an approach that allows deriving users’ profiles from their performance data when they are working with a video search engine. The approach suggests a two-level model. The goal of the first level is modeling and clustering user’s behavior on a single video sequence (an intravideo behavior). The goal of the second level is modeling and clustering a user’s behavior on a set of video sequences (an in- tervideo behavior). The two-phase clustering algorithm is presented and experimental results are discussed.

Chapter 24 presents a method for an automatic identification of a person by iris recognition. The method uses arrays of pixels extracted from a raster image of a human iris and searches for similar patterns using the latent semantic indexing approach.

The comparison process is expedited by replacing the time consuming singular value decomposition algorithm with the partial symmetric eigenproblem. The results of experiments using a real biometric data collection are discussed.

Chapter 25 deals with data mining of medical images. It presents the research results obtained for texture extraction, classification, segmentation, and retrieval of normal soft tissues in computer tomography images of the chest and abdomen. The chapter describes various data mining techniques, which allow identifying different tissues in images, segmenting and indexing images, and exploring different similarity measures for image retrieval. Experimental results for tissue segmentation, classification and retrieval are presented.

References

1. Groth R.Data Mining. A Hands-On Approach for Business Professionals. Prentice Hall, Upper Saddle River, NJ, 1998.

2. Berry MJA, Linoff G.Data Mining Techniques for Marketing, Sales, and Customer Sup- port. Wiley Computer Publishing, New York, 1997.

3. Agosta L. The future of data mining—Predictive analytics. IT View Report, Forrester Research October 17, 2003. Available at: http://www.forrester.com/Research/LegacyIT/

0,7208,32896,00.html.

4. Berry MW (Ed.).Survey of Text Mining: Clustering, Classification, and Retrieval. Springer- Verlag, New York, 2004, 244 p.

5. Zhong N, Liu J, Yao Y (Eds.).Web Intelligence. Springer, New York, 2005; 440 p.

6. Manjunath BS, Salembier Ph, Sikora T (Eds.).Introduction to MPEG-7. Multimedia Con- tent Description Interface. Wiley, New York, 2002, 371 p.

7. Maybury MT. Intelligent Multimedia Information Retrieval. AAAI Press/MIT Press, Cambridge, MA, 1997, 478 p.

(37)

8. Furht B (Ed.).Handbook of Multimedia Computing. CRC Press, Boca Raton, FL, 1999, 971 p.

9. Petrushin V. Creating emotion recognition agents for speech signal. In Dauntenhahn K, Bond AH, Canamero L and Edmonds B. (Eds.),Socially Intelligent Agents. Creating Relationships with Computers and Robots. Kluwer, Boston, 2002, pp. 77–84.

10. Remagnino P, Jones GA, Paragios N, Regazzoni CS.Video-based Surveillance Systems.

Computer Vision and Distributed Processing. Kluwer, Boston, 2002, 279 p.

11. TRECVID Workshop website. http://www-nlpir.nist.gov/projects/trecvid/.

12. Furht B, Marques O. (Eds.).Handbook of Video Databases. Design and Applications. CRC Press, Boca Raton, FL, 2004, 1211 p.

13. Rosenfeld A, Doermann D, DeMenthon D.Video Mining. Kluwer, Boston, 2003, 340 p.

14. Zaiane OR, Simoff SJ, Djeraba Ch.Mining Multimedia and Complex Data. Lecture Notes in Artificial Intelligence, Vol. 2797. Springer, New York, 2003; 280 p.

15. Kulesh V, Petrushin VA, Sethi IK. Video clip recognition using joint audio-visual processing model. InProceedings of 16th International Conference on Pattern Recognition (ICPR’02), 2002, Vol. 1, pp. 1051–1054.

16. Helmer S, Lowe DG. Object recognition with many local features. InWorkshop on Gen- erative Model Based Vision, Washington, DC, 2004.

17. Brown M, Ramos L. Searching For A Better Search, IT View Tech Choices. Forrester Research, August 29, 2005. Available at: http://www.forrester.com/Research/Document/

0,7211,37615,00.html

18. Stamou G, Kollias S.Multimedia Content and the Semantic Web. Wiley, New York, 2005, 392 p.

19. Kulesh V, Petrushin VA, Sethi IK. PERSEUS: Personalized multimedia news portal.

In Proceedings of IASTED Intl Conf. on Artificial Intelligence and Applications, September 4–7, 2001, Marbella, Spain, pp. 307–312.

20. Kosch H. Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21. CRC Press, Boca Raton, FL, 2004, 258 p.

21. Bormans J, Hill K. MPEG-21 Multimedia framework. Overview. Available at: http://www.

chiariglione.org/mpeg/standards/mpeg-21/mpeg-21.htm.

22. The Dublin Core Metadata Initiative website. http://dublincore.org/ . 23. Resource Description Framework (RDF). http://www.w3.org/RDF/.

24. Simple Knowledge Organisation System (SKOS): http://www.w3.org/2004/02/skos/.

25. Thesaurus for Graphic Materials (TGM): http://www.loc.gov/rr/print/tgm1/.

26. OWL. Web Ontology Language Overview. http://www.w3.org/TR/owl-features/.

Multimedia Data Mining and Knowledge Discovery

Multimedia Data Mining and Knowledge Discovery

Valery A. Petrushin and Latifur Khan (Eds)

Multimedia Data Mining and

Knowledge Discovery

Contents

Preface

List of Contributors

Part I Introduction

1. Introduction into Multimedia Data Mining and Knowledge Discovery