• Aucun résultat trouvé

Distributed Indexing and Computing Web data management and distribution Serge Abiteboul Philippe Rigaux Marie-Christine Rousset Pierre Senellart

N/A
N/A
Protected

Academic year: 2022

Partager "Distributed Indexing and Computing Web data management and distribution Serge Abiteboul Philippe Rigaux Marie-Christine Rousset Pierre Senellart"

Copied!
21
0
0

Texte intégral

(1)

Distributed Indexing and Computing

Web data management and distribution

Serge Abiteboul Philippe Rigaux Marie-Christine Rousset Pierre Senellart

http://gemo.futurs.inria.fr/wdmd

February 2, 2010

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 1 / 21

(2)

Outline

1 Introduction to Distributed Systems

2 Distributed Search Trees

3 A Case Study: Bigtable

4 Distributed Computing: MapReduce

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 2 / 21

(3)

Local Networks

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 3 / 21

(4)

Outline

1 Introduction to Distributed Systems

2 Distributed Search Trees

3 A Case Study: Bigtable

4 Distributed Computing: MapReduce

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 4 / 21

(5)

Design issues for distributed trees

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 5 / 21

(6)

Basic features of the DST

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 6 / 21

(7)

Balancing the tree with a rotation

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 7 / 21

(8)

The client caching mechanism

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 8 / 21

(9)

Example of an out-of-range request followed by an adjutment

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 9 / 21

(10)

Four replication strategies in a binary distributed tree

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 10 / 21

(11)

Outline

1 Introduction to Distributed Systems

2 Distributed Search Trees

3 A Case Study: Bigtable

4 Distributed Computing: MapReduce

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 11 / 21

(12)

Overview of Bigtable structure

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 12 / 21

(13)

Persistence management in Bigtable

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 13 / 21

(14)

Outline

1 Introduction to Distributed Systems

2 Distributed Search Trees

3 A Case Study: Bigtable

4 Distributed Computing: MapReduce

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 14 / 21

(15)

Centralized computing with distributed data storage

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 15 / 21

(16)

Distributed computing with distributed data storage

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 16 / 21

(17)

The programming model of MapReduce

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 17 / 21

(18)

Counting terms occurrences

The map() function

mapCW ( String key , String value ):

// key : d o c u m e n t name

// value : d o c u m e n t c o n t e n t s for each term t in value :

return (t , 1);

The reduce() function.

r e d u c e C W( String key , I t e r a t o r values ):

// key : a term

// values : a list of counts int result = 0;

// Loop on the values list ; c u m u l a t e in result for each v in values :

result += v ;

// Send the result return result ;

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 18 / 21

(19)

// A s p e c i f i c a t i o n object for M a p R e d u c e e x e c u t i o n M a p R e d u c e S p e c i f i c a t i o n spec ;

// Define input files

M a p R e d u c e I n p u t * input = spec . a d d _ i n p u t ();

input - > s e t _ f i l e p a t t e r n (" d o c u m e n t s. xml " );

input - > s e t _ m a p p e r _ c l a s s ( " MapWC " );

// S p e c i f y the output files :

M a p R e d u c e O u t p u t * out = spec . output ();

out - > s e t _ f i l e b a s e (" wc . txt " );

out - > s e t _ n u m _ t a s k s (100);

out - > s e t _ r e d u c e r _ c l a s s (" R e d u c e W C" );

// Now run it

M a p R e d u c e R e s u l t result ;

if (! M a p R e d u c e( spec , & result )) abort ();

// Done : ’ result ’ s t r u c t u r e c o n t a i n s result info return 0;

}

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 19 / 21

(20)

Distributed execution of a MapReduce job.

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 20 / 21

(21)

The end for today

Merci

Gemo, Lamsade, LIG, Télécom (WDMD) Distributed Indexing and Computing February 2, 2010 21 / 21

Références

Documents relatifs

The serialized form is a textual, linear representation of the tree; it complies to a (sometimes complicated) syntax;.. There exist an object-oriented model for the tree form:

It provides a hierarchical representation, where each node is an object instance of a DOM class.. Normalized by the W3C

The expression “textual value of an Element N” denotes the concatenation of all the Text node values which are descendant of N, taken in the document order.. Gemo, Lamsade, LIG,

Calls with xsl:apply-templates : find and instantiate a template for each node selected by the XPath expression select. Template call substitution: any call to other templates

Multiple document input, and multiple document output document function, <xsl:document> element (XSLT 2.0, but widely implemented in XSLT 1.0 as an extension

a native XML atomic type, which can be queried in XQuery style a set of XML publishing functions: extracting XML elements out of relational data by querying. mapping rules:

insert, delete, replace, rename: updating expressions transform: non-updating expression.. XQuery

Single keyword query: just consult the index and return the documents in index order. Boolean