Dealing with Television Archives: Television Structuring

(1)

HAL Id: inria-00507752

https://hal.inria.fr/inria-00507752

Submitted on 30 Jul 2010

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Xavier Naturel, Patrick Gros

To cite this version:

Xavier Naturel, Patrick Gros. Dealing with Television Archives: Television Structuring. [Research Report] RR-7301, INRIA. 2010, pp.24. �inria-00507752�

(2)

a p p o r t

d e r e c h e r c h e

9-6399ISRNINRIA/RR--7301--FR+ENG

Vision, Perception and Multimedia Understanding

Dealing with Television Archives: Television Structuring

Xavier Naturel — Patrick Gros

N° 7301

June 2010

(3)

(4)

Xavier Naturel, Patrik Gros

Theme: Vision, PereptionandMultimediaUnderstanding

Pereption,Cognition,Interation

Équipes-Projetstexmex

Rapportdereherhe n° 7301June201023pages

Abstrat: This paper investigatesthe problem of managing very large dig-

ital television arhives. This problem is alled television struturing (or TV

broadast maro-segmentation)and isdened astheproessof identifyingthe

strutureofatelevisionstreamaswatherspereiveit: asuessionofprograms.

Thisistheveryrststepinordertomanageatelevisionolletion. Inthispa-

per,aompletesolutionfortelevisionstruturingisproposed,whihmakesuse

of simple yet eient methods in order to dealwith huge datasets. Methods

fromommerialdetetionaregeneralizedtobeabletodistinguishregularpro-

grams from non-programs. It is shown how television program guides an be

usedto labeltheidentiedprograms. It isnallyshownhowanupdateproe-

dure an improvethesegmentation resultsover time. Results areprovided on

3 weeksofFrenhtelevision.

Key-words: Video indexing, Television struturing, maro-segmentation,

Pereptualhashing

(5)

Résumé : Ce rapportdereherhes'intéresseàlastruturationdelarges vol-

umesd'arhivesdetélévision. Parstruturation,nousentendonsl'identiation

des programmes de télévision, leur début et leur n, dans le ux, et don le

déoupage dee ux en une suessiondeprogrammes. Ceiest latoute pre-

mière étapedans un proessus d'indexation d'un ux detélévision, an de le

rendrefailementnaviguable etrequêtable. Nousprésentons unesolutionom-

plètebaséesurdesméthodessimplesandepouvoirtraiterdetrèsimportantes

quantitédedonnées. Nousgénéralisonsdesméthodesprovenantdeladétetion

depubliitéstéléviséesandedistinguerlesprogrammesdesinter-programmes.

Ilest égalementmontré ommentles guidesdeprogrammes peuventêtreutil-

isésan d'étiqueterlesprogrammes identiés. Nous proposonsnalement une

proéduredemiseàjour,quipermetd'obtenirdesrésultatsonstantsauours

du temps. Des résultats sur trois semaines de télévision française permettent

devérierl'eaitédesméthodes.

Mots-lés: Indexationvidéo,Struturationdetélévision,maro-segmentation,

hahagepereptuel

(6)

Contents

1 Introdution 4

2 Previous work 5

3 Struturingmethod 5

3.1 Denitionsandoverview . . . . . . . . . . . . . . . . . . . . . . . 5

3.2 Segmentationandlassiation . . . . . . . . . . . . . . . . . . . 6

3.2.1 Separationdetetion . . . . . . . . . . . . . . . . . . . . . 6

3.2.2 Repetitiondetetion . . . . . . . . . . . . . . . . . . . . . 8

3.2.3 Classiationandfusion . . . . . . . . . . . . . . . . . . . 10

3.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3.1 AlignmentusingDTW. . . . . . . . . . . . . . . . . . . . 11

3.3.2 Improvements. . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Updating the referene videodataset 14 4.1 Identiationofunknownnon-programs . . . . . . . . . . . . . . 16

4.2 Identifying trailers . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.2.1 Identifyingsequenes. . . . . . . . . . . . . . . . . . . . . 17

4.2.2 Identifyingandlabelingtrailers . . . . . . . . . . . . . . . 19

4.3 Updateproedureand results . . . . . . . . . . . . . . . . . . . . 21

5 Conlusion 22

(7)

1 Introdution

Television isanimportant part oftoday's soureof information,as wellas an

important part of our ulturalheritage. Information retrieval from television

streams is however still in its infany, whereas the amount of television on-

tentis inreasing, with anever larger set of available hannels. For example,

sineSeptember2006, theNationalInstitute ofAudiovisualin Frane(ina) is

arhiving540.000hoursof televisionperyear. Thereis thus aneedto develop

methods to retrieve information from very large television olletions, whih

mightomewithfewornoadditionalinformationapartfromthevideo stream

itself.

Theontextoftheworkisassumedtobetelevisionarhives. Itmeansthat

weeks,months,oryearsofontinuousreordingshavetobeanalyzedtoextrat

relevant information. We assumethat television program guides areavailable

togetherwiththevideodata. WhilethisisatuallytheaseinFraneatina,it

maybedierentinotherountries. Inthisontext,weareinterestedinnding

thestrutureofthetelevisionstreamasTVwatherspereiveit: asuessionof

well-identiedprograms,togetherwithsomenon-programevents(ommerials,

trailers,sponsoring...). Thisiswhatweall televisionstruturing.

Morepreisely,thegoalis rstto performa segmentationof thetelevision

stream into programs and non-programs, and in a seond step, to labelthese

programs,withinformationomingfromtheprogramguide. Thisproessmay

also be viewed as metadata renement: the proess takes as input the video

streamaswellas somemetadata(theprogramguide),andoutputsaorreted

version ofthe programguide, where thegiven shedules havebeenhekedin

orderto math withtheatualvideo stream.

This may be seen as a trivial or non-needed problem sine the EPG an

bethought as self-suient. Berrani et al.[1 ℄ have shown thatthis is notthe

ase, and that preise television struturing annot be ahieved onlywith the

provided information oming from hannelsand/or broadasters. Theyassess

theneessityofontent-basedtehniques.

Even ifourrstgoalisarhiving, quitea largenumberof appliationsan

benet from this oneptuallysimple task of nding programsin a television

stream. In the ase of television arhives, it is obvious that exat program

boundaries shouldbeavailable whenbrowsingor querying a televisionorpus.

Manual struturingis a very tediousand time-onsumingtask, andautomati

orsemi-automatimethodsshouldbeinvestigatedtoreduetheneedofmanual

parsing and labeling. Monitoring television may also be an appliation, for

exampletoverifylegalregulationsaboutommerials,orprovidesomestatistis

about delays between the atual broadast time and the sheduled one. A

moreuser-orientedappliationmightbetomanagereordedprograms,allowing

features likeskipping ommerialsor ndingprogramsthat may notbein the

EPG 1

.

Our approah is stream based and bottom-up. Three kinds of informa-

tion are mainly used. Joint silene and monohrome image detetion allows

to nd program boundaries and to detet non-program segments. Repetition

detetion[15 ℄using areferenevideo dataset(RVD) allowsto ndsimilarseg-

mentsappearingseveraltimesin thestreamandisusefultoharaterizemany

1

Inthepaper, programguidesand EPG (EletroniProgram Guides)are usedas syn-

onyms.

(8)

non-programs. Finally, the programguide is usedto assign labelsto program

segments[16 ℄.

The paper is organized as follows. Setion 2 provides a brief overview of

the state-of-the-art. Setion 3 presents the struturing tehnique. Setion 4

explainstheproblemofupdatingtheRVD.Setion5 onludesthework.

2 Previous work

Video struturing for spei programs like sports or news is a well-studied

domain. The aim is to infer the struture of the program by analyzing the

video andaudiostreams[10 ,6,2 ℄. Studies havealsobeenondutedonolle-

tionsof programs,leading tonon-obvioustasksliketopithreading [9 ℄. These

worksaredealingwithsetsofhomogeneousprograms(mainlynewsoraspei

sport),andarelookingforastrutureinside theprogramitself. Theythusare

quite dierent from our task, and are notlikely to be suited to nd program

boundaries.

Amorerelevanttopiisommerialdetetion. Thisisawell-studieddomain,

where eetive solutions havebeen proposed to identify ommerials in a TV

stream. Some simple but eetive rule-based methods have been proposed,

whih use detetion of monohrome frames and silene between ommerials

[12 , 14 , 19 ℄. Classiation tehniques have also been proposed [5 , 22 , 13 ℄, as

wellastehniquesbasedonreognition[8,4,21 ,18℄. Commerialdetetionisof

majorimportanefortelevisionstruturingbeauseitandetetnon-programs.

However, some non-programs are notommerials, and these tehniques have

thustobeextendedtohandle alltypesofnon-programs.

Very few works have onsidered television struturing. Liang et al. [11℄

proposed to detetprograms bytheir lead-in/lead-out. Interestingresults are

obtainedontheirdatasetbutannotbegeneralizedtootherTVhannels,whih

maynotagtheirprogramsbysystematislead-inandlead-out.

AverydierentworkisproposedbyPoli [17 ℄. Thebasiideaistoimplement

a top-downapproah usinga very largesetof alreadyannotateddatatolearn

a model of theTV stream. Poli proposes to use a hidden Markovmodel and

a deisiontree forthat purpose. Theresult isa weekly programproviding an

approximate start timeand durationand thetypeforeah programand non-

programduringtheweek. This modeliseventuallyhekedwiththestream.

To thebestofourknowledge,these two methods aretheonlyonesdealing

withtelevisionstruturing.

3 Struturing method

3.1 Denitions and overview

First,letus denemore preiselythenotion of programand non-program.

Programs are regular television broadasts whih makethe ore of a hannel

broadasting (e.g. news, weather foreast, movies, shows...). Non-programs

areeither ommerials,sponsoring,hannelpromotion (e.g. trailers,jingles).

An overview of the proposed method is shown on gure 1. Three inputs

are used: the video stream itself, the program guide, and a Referene Video

(9)

Dataset(RVD).ThisRVDisadatasetomposedofmanuallylabeledprograms

andnon-programs,withthefollowinginformation:

aategory(programornon-program).

atitle

EPG

Segmentation Labeled stream

Reference video

Labeling

dataset Stream

Figure1: Overviewofthestruturingmethod

The struturing method is in two steps: segmentation and labeling. Seg-

mentationutsthestreamintosegments 2

,whiharethenlassiedintoeither

program andnon-program. Labelingassigns labelsto every segmentlassied

as aprogram.

To testthemethod, aorpus ofthreeweeksoftelevisionhasbeenreorded

from a frenh hannel (Frane2) from 5/9/2005 to 5/30/2005. This orpus is

omposed of21les,eahonerepresenting24hofTV. Manualstruturinghas

beenperformedonthisorpusto obtaingroundtruth.

3.2 Segmentation and lassiation

Theaim ofthesegmentationisto ndthedierentsegmentsof programsand

non-programs. Methods developed in theontext of ommerialdetetion are

wellsuitedtomakethissegmentation. However,asstatedinsetion2,thetask

has to be generalized to detet all kinds of non-programs, e.g. ommerials,

trailers,jingles,sponsoring...Sinetrailersandjingleshavedierentharater-

istisfromommerials,intermsofshot lengthandvisualativity,approahes

basedonlassiationarenotlikelytoperformwell. Wefousonmethodsthat

areabletogeneralizetoallnon-programs: reognition-basedmethods,andjoint

silene andmonohromeframesdetetion.

3.2.1 Separation detetion

Weallseparations simultaneousourrenesofmonohromeframesandsilene

that happenbetweenommerials. This isavery popularfeaturefordeteting

ommerials [12 ,19,14℄anditisusedonevery FrenhTVhannel.

Todetetmonohromeframes,a48-binhistogramontheluminanehannel

isrstomputed. Detetingmonohromeframesisthenahievedbythreshold-

ingthehistogramentropy. Foranhistogramh^quantizedîntoN^bins,îtsêntropy

2

Thetermsegmentwillbeusedintheremainingofthepaperastheresultofthissegmen-

tationproess.

(10)

Detection of Separations

Detection of Repetitions

Pre−segmentation

Final segmentation Classification/fusion

Detection phase

Segmentation phase

Figure 2: Overviewofthesegmentationandlassiationproesses

isgivenby:

H =− XN

i=1

pilogpi ^withpi= h(i) P

kh(k)

Figure3showsasampleofthehistogramentropyon1hourofourorpus. The

thresholdissetexperimentally to2.

Figure3: Variationoftheluminanehistogramentropyonone hourofTV

To detet silene, a very simple method is used. It onsists in building

overlappingaudioframesof10ms,andomputingthelog-energyoneahframe

(11)

usingthestandardformula:

Edb(i) = 10 log10

XN

n=1

x²_n(i)

Thethresholdissetto60(seegure4),andonlysegmentslongerthan30msare

kept. Thereasonforusingasosimplemethodisgivenbygure4 ,whihshows

the variation of energy on 1 hour of TV. One an easily see two separations,

wheretheenergyisatuallyzero. Thisphenomenonhasbeenobservedonevery

Frenhhannel. Itmightbedierentin otherountries.

Figure4: Variationoftheaudioenergyona fewminutesofTV

Theresultsofthesileneandmonohromeframesdetetionarethenmerged

using a suessiveanalysis. Sine theaudio feature isfar more disriminative

than the image one, the proess onsists in taking the segments deteted by

the audio as andidate segment, and then hek orretness using the image

feature. Results are shown in table 1 where the resultsare the preision and

reallomputedoverthenumberofimagesorretlydetetedasbelongingtoa

separation.

Modality Preision Reall

Audio 0.82 0.9

Image 0.41 0.89

Fusion 1 0.9

Table1: Separationdetetionresults

3.2.2 Repetition detetion

Televisionstreamsarehighlyredundant. Detetingrepetitionsangreatlyhelp

tounoverthestrutureofthestream. Inpartiular,allkindsofnon-programs

(12)

arefrequentlyrepeatedwithnomodiationexeptforbroadastandompres-

sionnoise. Beause onlyminortransformationsexist betweentwoinstanes of

thesame video lip, it isquite easyto detetthose repetitions. However,it is

importanttobeabletodealwithaverylargedatabaseandtohavealowom-

plexity. Therefore, a repetition detetionmethod has to put the emphasis on

thosetwoaspets. Apopularmethodtoahievebotheenyandeetiveness

in theontextofommerial detetionispereptualhashing [4 ,8 ℄. Notethat

this method is partiularly suited in ourontext beause it an deal with all

kindsofnon-programs.

−5 56 110

00000000000000 00000000000000 00000000000000 11111111111111 11111111111111 11111111111111

0000000000000000000000000000000 0000000000000000000000000000000 0000000000000000000000000000000 1111111111111111111111111111111 1111111111111111111111111111111 1111111111111111111111111111111 00000000000000000000

00000000000000000000 00000000000000000000 11111111111111111111 11111111111111111111 11111111111111111111 Shots

Image

DCT

64 first coefficients (except DC)

0 1 Signature 1

Binarization w.r.t median value

C1 C64

. . .

Figure 5: Imagesignatureomputation.

Todetetrepeatedvideolipsusingpereptualhashingthemethodproposed

in [15 ℄ is used. Shots are onsidered as the reognition unit, i.e. we detet

repeated shots. A visual signature is built for eah image of eah shot. The

signatureextrationispresentedongure5 . Foreahimage,theDCTisapplied

on thewhole image, onthe luminane hannel only, and the8x8 top-leftsub-

matrix isextrated from the lowest AC frequenyoeients, (DC oeient

is nottakeninto aount). The median valueofthis sub-matrix is omputed,

andoeientsare thenbinarizedaordingto thismedian, thusmakinga 64

bitssignature.

Thissignatureissuientlyrobusttonoisetobequeriedbyexatmathing,

allowing the use of a fast retrieval struture like a hash table. The retrieval

proessmakesindeeduseofahashtable,inwhiha pair(signature,shotid)is

storedforeveryframeofeveryshotofthedatabase. Whenaqueryismade,i.e.

wewanttoknowifaertainshothasadupliateinthedatabase,signaturesof

eahframeofthisqueryshotareomputed,andthenqueriedonebyoneagainst

thehashtable. If anexatmath ours,that isa signatureofthequeryshot

sq îsêqual^to â^signaturesd ⁱⁿ^the^hash^table,â^pair⁽sd^,^shotîd)îs ^reovered.

(13)

This shot id givesaandidate shot,whihis further analyzedbyomputinga

similaritydistanebetweenthisandidateshotandthequeryshot.

The similarity distane between shots is dened as the average Hamming

distanebetweenthesignaturesoftheretrievedandqueryshots. Thisdistane

makesuseoftherelativepositionsofthemathedsignaturestoaligntheshots,

thusgainingrobustnesstotemporalvariationsandshotsegmentationartifats.

Todeideiftwo shotsmath,this distaneisthresholded.

Thismethodisused severaltimesthroughoutthestruturingproess. It is

usedtogetherwiththereferenevideodataset(RVD)previouslydened,whih

ishosenastherstdayofour3weeksorpus,day5/9/2005,andwasmanually

labeled.

3.2.3 Classiationand fusion

Programmes

00000000 0000 11111111

1111Non−programs

0000 0000 0000 0000

1111 1111 1111 1111

0000 0000 00 1111 1111 11

0000 0000 00 1111 1111 11 0000 0000 00 1111 1111 11

000000 000000 000000 000000

111111 111111 111111 111111

00000 00000 00000 00000 00000 00000 00000 00000

11111 11111 11111 11111 11111 11111 11111 11111

0000 0000 00 1111 1111 11

00 00 00 00 0

11 11 11 11 1

0000 0000 00 1111 1111 11

0000 0000 00 1111 1111 11 0000 0000 00 1111 1111 11

00 00 00 00 0

11 11 11 11 1

00 00 00 00 0

11 11 11 11 1

00 00 00 00 0

11 11 11 11 1

00 0 11 1

0000 00 1111 11

0000 0000 00 1111 1111 11

Classification

Fusion

Pre−segmentation

Pre−segments Separations Repeated non−programs

Programs Repeated non−programs

Separations non−programs

Figure6: Thethreestepsofsegmentation: pre-segmentation,lassiation,and

fusion.

Thedetetionofseparationsandrepetitionsyieldsapre-segmentationofthe

stream,asanbeseeningure6. Onethispre-segmentationisomputed,the

next step is to lassify pre-segmentsas either program or non-program. The

deision is taken by simply thresholding the length of the pre-segment, short

pre-segments are lassied as non-programs and long ones as programs. The

thresholdTs îs^hosen^soâs^to^maximize^the^F-measureôfôrret^lassiation

onasampledayofourorpus.

Finally, ontiguous segmentsof repetitions, separations, and non-programs

aremerged into a singlenon-program segment. Figure 6 sumsupthesegmen-

tationproess.