• Aucun résultat trouvé

Introduc)on  to  Peer  to  Peer  Systems

N/A
N/A
Protected

Academic year: 2022

Partager "Introduc)on  to  Peer  to  Peer  Systems"

Copied!
71
0
0

Texte intégral

(1)

Introduc)on  to  Peer  to  Peer  Systems  

Based on materials by:

-  Franck Petit, Lip6

- Yang Guo and Christoph Neumann, Corporate Research, Thomson Inc.

-  Jon Crowcroft, Univ. of Cambridge

Fabien Mathieu LIAFA, INRIA

(2)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(3)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(4)

History,  mo9va9on  and  evolu9on  

1999:  Napster,  first  widely  used  P2P-­‐applica9on  

P2P represented ~65%

of Internet

Traffic at end 2006

(5)

1999:  Napster,  first  widely  used  P2P   applica9on  

The  applica9on:  

•  A  P2P  applica9on  for  the  distribu9on  of  mp3  files  

– Each  user  can  contribute  its  own  content  

How  it  works:  

•  Central  index  server  

– Maintains  list  of  all  ac9ve  peers  and  their  available   content  

•  Distributed  storage  and  download  

– Client  nodes  also  act  as  file  servers   – All  downloaded  content  is  shared  

(6)

History,  mo9va9on  and  evolu9on  -­‐  Napster  (cont’d)

 

Central index server

join

 Ini)al  join  

– 

Peers  connect  to  Napster  server  

– 

Transmit  current  lis9ng  of  shared   files  to  server  

(7)

History,  mo9va9on  and  evolu9on  -­‐  Napster  (cont’d)  

•  Content  search  

– Peers  request  to  Napster  server  

– Napster  server  checks  the  database   and  returns  list  of  matched  peers  

1) query

2) answer Central index server

(8)

History,  mo9va9on  and  evolu9on  -­‐  Napster  (cont’d)  

•  File  retrieval  

– The  reques9ng  peer  contacts  the   peer  having  the  file  directly  and   downloads  the  it  

1)  request 2)  download Central index server

1) 2)

(9)

History,  mo9va9on  and  evolu9on  -­‐  File  Download

 

•  Napster  was  the  first  simple  but  successful  P2P-­‐

applica9on.  Many  others  followed…  

P2P  File  Download  Protocols:  

•  1999:  Napster  

•  2000:  Gnutella,  eDonkey  

•  2001:  Kazaa  

•  2002:  eMule,  BitTorrent

 

(10)

Why  is  P2P  so  successful?  

•  Scalable  –  It’s  all  about  sharing  resources  

– No  need  to  provision  servers  or  bandwidth   – Each  user  brings  its  own  resource  

– e.g.  resistant  to  flash  crowds    

•  flash  crowd  =  a  crowd  of  users  all  arriving  at  the  same  9me   Resources could be:

• Files to share;

• Upload bandwidth;

• Disk storage;…

(11)

Why  is  P2P  so  successful?  (cont’d)  

•  Cheap  -­‐  No  infrastructure  needed  

•  Everybody  can  bring  its  own  content  (at  no  cost)  

Homemade  content   –  Niche  content  

–  Illegal  content  

–  But  also  legal  content   –  …  

•  High  availability  –  Content  accessible  most  of  the  9me   (replica9on)  

(12)

Remark  

•  P2P  is  decreasing  since  2008  

– Growth  of  web-­‐based  streaming  (YouTube)   – On-­‐line  storage  (MegaUpload)  

– Ergonomy   – Legal  issues  

•  But  next-­‐gen  (hybrid)  P2P  is  on  the  move  

(13)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini)on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(14)

Defini9ons  of  P2P  

•  Common  sense:  BitTorrent,  eMule,  UUSee  PPLive,   Skype  

•  DADVSI:  allow  to  get  illegal  content  

•  Network  I:  out  of  the  client/server  paradigm  

•  Network  II:  distributed  system  made  of  end  users  

that  shares  resources  

(15)

Defini9ons  of  P2P  

(16)

Peer  to  peer  systems  

•  Assumed  to  be  with  no  a  priori  dis)nguished  role  

•  So  no  single  point  of  bofleneck  or  failure  

•  However,  this  means  they  need  distributed  algorithms  for  

–  Connec9on  protocol  

–  Service  discovery  (name,  address,  route,  metric,  etc)   –  Neighbour  status  tracking  

–  Applica9on  layer  rou9ng  (based  possibly  on  content,   interest,  etc)  

–  Resilience,  handling  link  and  node  failures   –  etc  

(17)

Overlays  and  peer  2  peer  systems  

•  P2P  systems  rely  on  overlays  structure  

(18)

Focus at the application level

Overlays  and  peer  2  peer  systems  

•  P2P  systems  rely  on  overlays  structure  

(19)

P2P-­‐Overlay  (cont’d)  

Underlay Overlay

Source Source

(20)

P2P-­‐Overlay  (cont’d)  

Underlay Overlay

Source

(21)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(22)

Taxonomy  of  Applica9ons  

Applica9ons  

Parallel  Computa9on  

(GRID  Compu9ng)   Resource  

Management   Collabora9on  

Applica9ons  

Chagng    

Shared   Appli.  

(Games,   Specific  …)   Intensive  

Compu9ng   (same  task  x  

9mes)    

Component   Compu9ng  

(y  small   tasks)    

Content  

Sharing     Shared  File  

System   Distributed   Database  

(23)

Taxonomy  of  Applica9ons  

• P2P-­‐File  download  

–  Napster,  Gnutella,  KaZaa,   eDonkey,  Biforrent…  

• P2P-­‐Communica9on  

VoIP,  Skype,  Messaging,  …  

• P2P-­‐Video-­‐on-­‐Demand  

• P2P-­‐Computa9on  

–   GRID,  scien9fic  computa9on  

(se9@home,  XtreemOS,  BOINC,…)  

• P2P-­‐Streaming  

– PPLive,  End  System  Mul9cast   (ESM),…  

• P2P-­‐Gaming  

–   WOW,  City  of  Heroes,…  

(24)

History,  mo9va9on  and  evolu9on  -­‐  Applica9ons  

•  P2P  is  not  restricted  to  file  download!  

P2P  Protocols:  

•  1999:  Napster,  End  System  Mul9cast  (ESM)  

•  2000:  Gnutella,  eDonkey  

•  2001:  Kazaa    

•  2002:  eMule,  BitTorrent  

•  2003:  Skype  

•  2004:  PPLive  

•  2006:  TVKoo,  TVAnts,  PPStream,  SopCast,  Video-­‐on-­‐

Demand,  Gaming  

File Download Streaming

Telephony

Video-on-Demand Gaming

Application type:

(25)

Technical  taxonomies  

•  Indexa9on/distribu9on  

•  Structured/unstructured  

•  Push/pull/publish-­‐suscribe  

•  …  

(26)

Examples  

•  Unstructured  P2P-­‐overlays  

– Generally  random  overlay  

– Used  for  content  download,  telephony,  streaming  

•  Structured  P2P-­‐overlays  

– Distributed  Hash  Tables  (DHTs)  

– Used  for  node  localiza9on,  content  download,  streaming  

(27)

To  summarize  

•  P2P  gathers  a  lot  of  techniques  

•  Not  black  or  white,  all  grey  

•  Hint:  P2P  is  all  about  answering  the  ques9on  

Who  gets  what  from  whom?  

(28)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(29)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu)ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(30)

Unstructured  P2P-­‐overlays  

•  Unstructured  P2P-­‐overlays  do  not  really  care  how  the   overlay  is  constructed  

– Peers  are  organized  in  a  random  graph  topology    

•  e.g.,  new  node  randomly  chooses  three  exis9ng  nodes  as   neighbors  

Flat  or  hierarchical  

– Build  your  P2P-­‐service  based  on  this  graph  

•  Several  proposals  

– Gnutella  

– KaZaA/FastTrack*  

– BitTorrent*  

(31)

Unstructured  P2P-­‐overlays  (cont’d)  

•  Unstructured  P2P-­‐overlays  are  just  a  framework,  you  can   build  many  applica9ons  on  top  of  it  

•  “Pure”  P2P  

•  Unstructured  P2P-­‐overlays  pros  &  cons   – Pros  

•  Very  flexible:  copes  with  dynamicity  

•  Supports  complex  queries  (conversely  to  structured  overlays)  

– Cons  

•  Content  search  is  difficult:  There  is  a  tradeoff  between  

generated  traffic  (overhead)  and  the  horizon  of  the  par9al  

(32)

One  Example  of  usage  of  unstructured  overlays  

•  Typical  problem  in  unstructured  overlays:  How  to  do   content  search  and  query?  

–  Flooding  

–  Limited  Scope,  send  only  to  a  subset  of  your  neighbors   –  Time-­‐To-­‐Live,  limit  the  number  of  hops  per  messages  

Search “Britney Spears”

Example of flooding:

(similar to Gnutella)

Found entry!

Notify Upload

(33)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu)ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(34)

Structured  P2P-­‐overlays  

•  Mo9va9on    

– Locate  content  efficiently  

•  Solu9on  –  DHT  (Distributed  Hash  Table)  

– Par9cular  nodes  hold  par9cular  keys  

– Locate  a  key:  route  search  request  to  a  par=cular  node   that  holds  the  key  

– Representa9ve  solu9ons  

•  CAN,  Pastry/Tapestry,  Chord,  etc.  

(35)

Challenges  to  Structured  P2P-­‐overlays  

•  Load  balance  

– spreading  keys  evenly  over  the  nodes.  

•  Decentraliza9on  

– no  node  is  more  important  than  any  other.  

•  Scalability  

– Lookup  must  be  efficient  even  with  large  systems    

•  Peer  dynamics  

– Nodes  may  come  and  go,  may  fail  

– Make  sure  that  “the/a”  node  responsible  for  a  key  can   always  be  found  

(36)

Content-­‐Addressable  Network  (CAN)  

•  Maps  a  2-­‐dimensional   coordinate  space  

•  Each  node  is    responsible  for  a   frac9on  of  the  space  

•  A  hash  pair  is  provided  using  

two  hash  func9ons  on  the  value   to  be  stored  

•  Rou9ng  is  done  by  forwarding   to  the  neighbor  that  is  closest  in   Cartesian  distance  

0 2m

2m

(x, y)

(37)

CAN  example  

8

1

8 7

6

5 4

3

9

10 2

Node Range 2 (2, 4) to (4, 8) 1 (0, 4) to (2, 8) 3 (4, 7) to (5, 8) 4 (4, 6) to (5, 7) 6 (0, 2) to (2, 4)

Routing table for node 2

2

(7, 1)

Entry hashed to (7, 1) is stored at node 10

Search path from 2 to (7, 1):

10 2

4

9

10

(38)

CAN  joins  and  departures  

•  Joining  node  picks  a  random  pair  (X,  Y)  and  contacts  node  A  

–  A  sends  a  join  message  to  B,  the  node  responsible  for  the  pair   –  B  divides  is  its  space  with  A  and  shares  its  rou9ng  informa9on  

appropriately  

A  and  B  contact  all  its  neighbors  with  updated  info  

•  Depar9ng  node  contacts  its  neighbors  and  finds  one  that   can  manage  its  space  

–  The  node  sends  updated  rou9ng  info  to  its  neighbors  

(39)

CAN  features  

•  Simple  to  understand  and  thus  simple  to  add   features  to:  

– Overlapping  regions  for  redundancy   – D-­‐dimensional  space  (not  just  2)  

– Dynamic  division  of  space  for  load  balancing  

•  Hop  #:  O(Dn

1/D

)  

•  Rou9ng  Table:  O(D)  

•  Possible  overload  

(40)

Tapestry/Pastry  

•  Both  based  on  Plaxton  Mesh  

Nodes  –  Act  as  routers,  clients  and  servers  simultaneously.  

–  Objects  –  Stored  on  servers,  searched  by  clients.  

•  Objects  and  nodes  have  unique,  loca)on  independent  names   –  Random  fixed-­‐length  bit  sequence  in  some  base  

–  Typically,  hash  of  hostname  or  of  object  name  

–  A  hashing  func9on  h  assigns  each  node  and  key  a  m-­‐bit   iden9fier  

•  What  Plaxton  does:    

Key=“LetItBe” h ID=60 IP=“157.25.10.1” h ID=101

(41)

Plaxton  Mesh   Rou9ng  

•  Rou9ng  done  incrementally,  digit   by  digit  

–  In  each  step,  message  sent  to  node   with  address  one  digit  closer  to   des9na9on.  

•  Example  -­‐  node  0325  sends  M  to   node  4598  

–  Possible  route:    

•  0325  

•  B4F8  

•  9098  

•  7598  

•  4598  

(42)

Plaxton  Mesh   Rou9ng  Tables  

•  Each  node  has  a  neighbor  map  

– Used  to  find  a  node  whose  address  has  one  more  digit   in  common  with  the  des9na9on  

•  Size  of  the  map:  b*log

b

N,  where  b  is  the  base  used  

for  the  address  

(43)

Plaxton  Mesh   Rou9ng  Tables  

•  x  are  wildcards.  Can  be  filled   with  any  address.  

–  Each  slot  can  have  several   addresses.  Redundancy.  

•  Example:  node  3642  receives   message  for  node  2342  

–  Common  prefix:  XX42  

•  Look  into  second  column.    

Must  send  M  to  a  node  one   digit  “closer”  to  the  

des9na9on.    

•  Any  host  with  an  address  like   X342.  

Node 3642

(44)

Chord  Protocol  

•  A  consistent  hashing  func9on  (SHA-­‐1)  assigns  each   node  and  key  a  m-­‐bit  iden9fier  

•  Iden9fiers  are  ordered  on  a  iden9fier  circle   modulo  2

m

 called  a  chord  ring.  

•  successor(k)  =  first  node  whose  iden9fier  is  >=  

iden9fier  of  k  in  iden9fier  space.  

Key=“LetItBe” SHA-1 ID=60 IP=“157.25.10.1” SHA-1 ID=101

(45)

Consistent  Hashing  (cont.)  

N32"

N123" K20"

K5"

Circular 7-bit ID space

IP=“198.10.10.1” 0

K101"

SHA1(Key=“LetItBe”)

  A key is stored at its successor: node with next higher ID

(46)

Theorem  

•  For  any  set  of  N  nodes  and  K  keys,  with  high   probability:  

1.  Each  node  is  responsible  for  at  most  (1+e)K/N  keys.  

2.  When  an  (N+1)st  node  joins  or  leaves  the  network,   responsibility  for  O(K/N)  keys  changes  hands.  

           e  =  O(log  N)  

(47)

Simple  Key  Loca9on  Scheme  

N1

N8

N14

N38 N42

N48 K45

(48)

Scalable  Lookup  Scheme  

N8+1 N14 N8+2 N14 N8+4 N14 N8+8 N21 N8+16 N32 N8+32 N42

N1

N8

N14

N38 N42

N48 N51

N56 Finger Table for N8

finger 1,2,3

finger 4 finger 6

finger [k] = first node that succeeds (n

finger 5

(49)

Lookup  Using  Finger  Table  

N1

N8

N14

N38 N42

N51

N56

N48

lookup(54)

(50)

Scalable  Lookup  Scheme  

•  Each  node  forwards  query  at  least  halfway  along   distance  remaining  to  the  target  

•  Theorem:  With  high  probability,  the  number  of  

nodes  that  must  be  contacted  to  find  a  successor  

in  a  N-­‐node  network  is  O(log  N)  

(51)

Chord  joins  and  departures  

•  Joining  node  gets  an  ID  N  and  contacts  node  A  

–  A  searches  for  N’s  predecessor  B   –  N  becomes  B’s  successor  

–  N  contacts  B  and  gets  its  successor  pointer  and  finger  table,  and   updates  the  finger  table  values  

–  N  contacts  its  successor  and  inherits  keys  

•  Depar9ng  node  contacts  its  predecessor,  no9fies  it  of  its   departure,  and  sends  it  the  new  successor  pointer  

(52)

Chord  features  

•  Correct  in  the  face  of  rou9ng  failures  since  can   correctly  route  with  successor  pointers  

•  Replica9on  can  easily  reduce  the  possibility  of   loosing  data  

•  Caching  can  easily  place  data  along  a  likely  search   path  for  future  queries  

•  Simple  to  understand  

(53)

Scalable  Lookup  Scheme  

//  ask  node  n  to  find  the  successor  of  id   n.find_successor(id)  

 if  (id  belongs  to  (n,  successor])      return  successor;  

 else  

   n0  =  closest  preceding  node(id);  

   return  n0.find_successor(id);  

//  search  the  local  table  for  the  highest  predecessor  of  id   n.closest_preceding_node(id)  

 for  i  =  m  downto  1  

   if  (finger[i]  belongs  to  (n,  id))        return  finger[i];  

 return  n;  

(54)

Pastry:  Rou9ng  procedure  

If (destination is within range of our leaf set) forward to numerically closest member else

let l = length of shared prefix

let d = value of l-th digit in D’s address if (Rld exists)

forward to Rld

else

forward to a known node that

(a) shares at least as long a prefix

(55)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu)on  

– Bandwidth  provisioning   – Delay  issues  

(56)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(57)

Full  hybrid  model  (Servers,Leechers,Seeders)  

Central Servers (C)

Leechers (L) Client

(58)

Nota9on  

•  Servers/leechers/seeders:  

•  Popula9on  size:  

•  Upload:    

•  Download:    

(59)

The  Bandwidth  Conserva9on  Law  

•  What  you  download  is  uploaded  somewhere  

•  Assuming  unbounded  download  capaci9es  

•         :  efficiency  parameter  

•  Ideal  systems:    

(60)

The  Bandwidth  Conserva9on  Law  

•  Assume  bandwidth  is  evenly  distributed  

•  Works  for  fixed  size  or  open  systems  

•  Works  for  streaming  or  elas9c  rate  

•  Can  be  upgraded  at  will  (efficiency,  variable  d

max

,  

(61)

Example  #1:  streaming  

•  For  a  given            let                                            and  

If                        ,  the  streamrate  can  be  achieved,  otherwise…  

•                                           :  normalized  capacity  of  the  L/S  system                                          

•  Case  1:  

System  is  scalable  (it  works  independently  of                ),  

Servers  are  unnecessary  (except  for  bootstrap/security)  

•  Case  2  :  

–                 is  bounded  by  

(62)

Joost  (the  old  version)  

The  monitored  upload/download  ra9o  gives  

•  If                            (no  seeders),  P2P  increases  the  server  capacity  by  20%  

•  If                            ,  this  becomes  50%  (most  likely  behavior)  

•  If                            ,  100%  

•  …  

•  True  scalability  can  be  achieved  only  if                            all  the  9me!  

•  Conclusion:  Joost  was  probably  hybrid  (not  scalable)  

(63)

Example  #2:  filesharing  

•  Goal:  download  a  file  of  size  

•  Peers  enter  at  rate            as  leechers  

•  Peer        needs                        to  download  the  file,  then  seeds  

•  A•er  some  9me                            a  seeder  leaves  

•  This  is  a  dynamic  system  

•  Focus  on  sta9onary  regime  

(64)

Sta9onary  regime  

λ

(65)

Overprovisioning  condi9on  

(66)

Roadmap  

•  To  begin  with…  

– Why  P2P?  

– Defini9on  and  key  concepts   – Taxonomies  

•  Content  indexing  

– Unstructured  solu9ons   – Structured  solu9ons  

•  Content  distribu9on  

– Bandwidth  provisioning   – Delay  issues  

(67)

One-­‐to-­‐one  transmission  

•  Time  needed  for  data  to  be  transmifed  from  one   peer  to  another  

•  For  stripes  (con9nuous  streams  of  data)   latency  

•  For  chunks  (atomic  units  of  data)  

NegoTime  +latency+ChunkSize/Bandwidth  

•  For  ``big’’  chunks  

ChunkSize/Bandwidth  

(68)

P2P  broadcas9ng  Delays  

•  Play-­‐out  delay:  

– Gap  between  the  actual  event  and  its  display   – For  live  only  

•  Start-­‐up  delay:  

– Time  needed  to  start  watching   – For  any  mul9media  content  

•  Propaga9on  delay  

– Time  for  data  to  reach  all  peers  a•er  injec9on   – Absolute  lower  bound  for  play-­‐out  

(69)

The  logarithmic  bound  

•  Homogeneous  case:  all  values  are  equal  

–   stripes  

– Big  chunks  

(70)

The  constant  bound  

•  Delay  grows  logarithmically  with    

•  If              is  bounded,  so  is  the  delay  

– When  not  scalable,  you  have  necessarily  

– Even  if  scalable,  you  may  want  a  high    

(71)

Pour  aller  plus  loin…  

•  DHTs  

– Thèse  de  Philippe  Gauron  (et  références)  

 hfp://tel.archives-­‐ouvertes.fr/docs/00/14/09/01/PDF/

thesePhilippeGauron.pdf  

•  Distribu9on  de  contenus  

– Mémoire  d’habilita9on  (et  références)  

hfp://gang.inria.fr/~fmathieu/pub/files/mathieu09autour.pdf  

Références

Documents relatifs

Avec l'arrivée de l'Internet et son rôle actif croissant dans l'économie et la société, l'informatique réseau n'a eu de cesse de trouver des innovations pour exploiter les

Hypothesis 1: For creating realistic load conditions, such as presented by [3], the test driver must scale up by distributing the test tasks and sending concurrent requests to

Ce travail de doctorat contribue à la compréhension du profil multi-résistant de cette espèce habituellement sensible et qui a pu être isolée dans plusieurs

In this paper, we are interested in defining the portion of nodes and frequency one has to probe, given the churn observed in the system in order to achieve a given probability

Des essais de cisaillement d’interface ont été réalisés en laboratoire à la boîte de cisaillement direct entre des matériaux (bois, mortier, acier) et du limon.. Les

Qu’elle allègue que cette interprétation est non seulement conforme à la lettre de l’article, à son esprit mais est surtout rendue obligatoire par la directive 2001/29

The sys- tem provides services for publishing informa- tion about existing streams, and for query- ing this information in particular for stream reuse.. We briey describe at the end

In section 7 we show how to classify quasi-semi-simple conjugacy classes, first for a (possibly disconnected) reductive group over an arbitrary algebraically closed field, and then