Slides - IfIS - Technische Universität Braunschweig

Transcrição

Slides - IfIS - Technische Universität Braunschweig
Peer-to-Peer
Data Management
Hans-Dieter Ehrich
Institut für Informationssysteme
Technische Universität Braunschweig
http://www.ifis.cs.tu-bs.de
7. Unstructured P2P Networks
The transparencies of this chapter are based on the package
Unstructured Peer-to-Peer Networks
by
Wolf-Tilo Balke and Wolf Siberski
24.10.2007
●
Original slides partially provided by
►
Rüdiger Schollmeier
►
Jörg Eberspächer
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-2/62
7. Unstructured P2P Networks
●
●
●
History
Overlay Network Characteristics
Network Types
►
►
►
Centralized P2P
Pure P2P
Hybrid P2P
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-3/62
Review: Driving Forces of P2P – File Sharing
●
Sharing of otherwise unused resources
►
►
►
●
Storage
Bandwidth
(Processing)
No central control
►
►
►
No single point of failure
No administrative efforts
Difficult to attack with judicial means
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-4/62
Unstructured P2P Networks
●
●
●
History
Overlay Network Characteristics
Network Types
►
►
►
Centralized P2P
Pure P2P
Hybrid P2P
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-5/62
How It All Began: From Arpanet to Peer-to-Peer
1.
2.
3.
How It All Began: From Arpanet to Peer-to-Peer
The Napster Story
Gnutella and its Relatives: Fully Decentralized Architectures
3)
2)
1)
1) Freenet
2) Buzzpad
3) WuWu
[Most relevant P2P-Applications in the year 2001]
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-6/62
From ARPANET to Peer-to-Peer
●
Late 1960s: Establishment of the ARPANET
►
Goal: share computing resources and documents between US research
facilities
► The logical network matched the physical network to a large extent
► Applications: FTP and TelNet  client/server model
► Central steering committee to organize the network
●
1979: Development of the UseNet protocol
►
Newsgroup application to organize content
► Newsgroup server network exhibits P2P characteristics
 Self organizing approach to add and remove newsgroup servers
 Fully distributed content replication
►
●
Still client/server application with respect to endpoints
~1990 rush of the general public to join the Internet
►
Applications following the client/server approach: WWW, email, streaming
► Straightforward model to administrate and control the content distribution
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-7/62
The Napster Story
1.
2.
3.
How It All Began: From Arpanet to Peer-to-Peer
The Napster Story
Gnutella and its Relatives: Fully Decentralized Architectures
3)
2)
1)
1) Freenet
2) Buzzpad
3) WuWu
[Most relevant P2P-Applications in the year 2001]
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-8/62
The Napster Story
●
MAY 1999: Disruption of the Internet community
First Generation of P2P
►
Introduction of Napster
► User not only consume and download content but also offer and provide
content to other participants
► Users establish a virtual network, entirely independent from physical network
and administrative authorities or restrictions
► Basis: UDP and TCP connections between the peers
●
December 1999: RIAA files a lawsuit against Napster Inc.
►
●
●
Target of the RIAA: the central lookup server of Napster
February 2001: 2.79 billion files exchanged via the Napster network per month
July 2001: Napster Inc. is convicted
►
Napster has to stop the operation of the Napster server
► Napster network breaks down
► BUT: Already a number of promising successors available
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-9/62
Centralized Directory Model
●
Centralized Directory Model
► The index service is provided centrally by a coordinating entity.
► Search request is issued to the coordinating entity which delivers a
list of peers having the desired files available for download.
► Requesting peer obtains the respective files directly from the peer
offering them.
●
Characteristics
►
►
►
●
Lookup of existing documents can be guaranteed.
Index service is a “Single Point of Failure”.
Centralized P2P system
Representative: Napster
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-10/62
Centralized Directory Model - Example
Mr. Müller shares on
his client several MP3files.
? Prince
purple rain ?
Mr. Arayama shares on
his client several MP3files, too.
Central database with
index of all shared files
! Prince
! Prince
purple rain !
purple rain !
@ Mr. Arayama
? Prince purple rain ?
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-11/62
Gnutella and its Relatives
1.
2.
3.
How It All Began: From Arpanet to Peer-to-Peer
The Napster Story
Gnutella and its Relatives: Fully Decentralized Architectures
3)
2)
1)
1) Freenet
2) Buzzpad
3) WuWu
[Most relevant P2P-Applications in the year 2001]
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-12/62
Gnutella
●
March 2000: Nullsoft releases Gnutella as
an open source project
►
Major developer: Gene Khan
► Additionally to servent functionality, the peers also take over routing tasks
► Becomes extremely popular after Napster has to shut down
●
October 2000: introduction of hierarchical routing layers.
►
Gnutella 0.6: Ultrapeer concept
► Increases the scalability significantly
●
Variety of similar fully decentralized P2P-protocols followed soon:
►
Audiogalaxy
► FastTrack/KaZaA
► iMesh
► Freenet
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-13/62
Flooded Request Model
●
Flooded Request Model:
►
Atomic P2P system
 Without central coordination authority (all peers are equal).
 Search request is passed on to neighbors.
 If they cannot answer the request, they pass it on to various other nodes until a
predetermined search depth (ttl=time-to-live).
 When requested file has been located, positive search results are sent to the
requesting entity.
 Requesting peer can then download the desired file directly from the entity which is
offering it.
●
Characteristics:
►
Fully decentralized, no central lookup server
 no single point of failure or control
► Unreliable lookup (no guarantees)
► System doesn't scale.
► Pure P2P system
●
Representative: Gnutella 0.4
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-14/62
Flooded Request Model - Example
http://www.gnutelliums.com/
?
Mr. Müller is
searching for
Prince
No central
Database
?
?
?
?
?
?
Mr. Arayama
serves Prince
?
?
?
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-15/62
Super-peer based Flooding
●
Super-peer based Flooding:
►
Hierarchical P2P system
 Super-peers (Ultrapeers) form pure P2P subnet
 All other peers (Leaf nodes) directly attach to one super-peer
 Super-peer netwerk acts as distributed file index
o Super-peers request file list from their leaf peers
 Queries are distributed in super-peer subnet only
►
●
Combination of Pure and Central P2P
Characteristics:
►
Fully decentralized, no central lookup server
 no single point of failure or control
► Systems scales much better
► Unreliable lookup
(but more success due to smaller network)
► Hybrid P2P system
●
Representative: Gnutella 0.6
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-16/62
Super-peer based Flooding- Example
http://www.gnutelliums.com/
?
Mr. Müller is
searching for
Prince
?
?
?
?
?
?
?
?
?
Mr. Arayama
serves Prince
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-17/62
Gnutella and its Relatives: The Story Goes on
●
August 2001
►
►
●
Year 2001
►
●
►
►
Bittorrent is released
Soon causes majority of the observed traffic, due to its efficiency
Middle of 2003
►
►
●
Amount of exchanged data in KaZaA decreases, caused by a high number of defected files
(reason: weak hash keys to identify files)
Edonkey and Gnutella regain popularity
May 2003
►
●
Invention of structured P2P networks (regular instead of random network graph)
August 2002
►
●
Users adapt very fast to the breakdown of Napster
Already 3.05 billion files exchanged per months via the Gnutella network
Beyond the exchange of content, new concepts are developed to use P2P also for other
applications
Skype a Voice over P2P application is developed
Today:
►
►
Major efforts are made to increase the reliability of P2P-searches, also in mobile networks, …
In 2005 Ebay buys Skype to use the paradigm for the communication between bidders and
sellers
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-18/62
1st and 2nd Generations of P2P
Client-Server
1. Server is the central
entity and only provider
of service and content.
 Network managed
by the Server
2. Server as the higher
performance system.
3. Clients as the lower
performance system
Example: WWW
Peer-to-Peer
1. Resources are shared between the peers
2. Resources can be accessed directly from other peers
3. Peer is provider and requestor (Servent concept)
Unstructured P2P
Structured P2P
Centralized P2P
Pure P2P
Hybrid P2P
1. All features of Peer-toPeer included
1. All features of Peer-toPeer included
1. All features of Peer-toPeer included
2. Central entity is
necessary to provide the
service
2. Any terminal entity can be
removed without loss of
functionality
2. Any terminal entity can be
removed without loss of
functionality
3. Central entity is some
kind of index/group
database
3.  No central entities
3.  dynamic central
entities
Examples: Gnutella 0.4,
Freenet
Pure P2P
Hybrid P2P
Example: Gnutella 0.6, JXTA
Example: Napster
1st Gen.
2nd Gen.
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-19/62
Unstructured P2P Networks
●
●
●
History
Overlay Network Characteristics
Network Types
►
►
►
Central P2P:
Pure P2P:
Hybrid P2P:
Napster
Gnutella 0.4
Gnutella 0.6
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-20/62
Overlay Networks
●
“Virtual” signaling network established via TCP connections between the peers
●
Characteristics of the overlay topology:
►
completely independent from physical network
►
Separate addressing and routing scheme
►
No relation between physical network edges and overlay network edges
►
May include hierarchies (hub network)
(e.g. rendezvous peers in JXTA)
►
May include centralized elements (star network)
(lookup server in Napster)
►
May be a completely randomized network
(Gnutella 0.4) (randomly meshed network)
►
Overlay network can be seen as graph
 Peers as nodes
 Conceptual connections as edges
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-21/62
General Characteristics of 1st And 2nd Gen. P2P
1st and 2nd Generation P2P systems are overlay architectures, with the following
characteristics:
►
TCP/IP based
► Decentralized and self organizing (with possible centralized elements)
► Content:
 Distributed “randomly” on the network, with several replicas (due to popularity)
 Content stays at provider peer
 Content transfer:
o Out of band, i.e. on separate connections and not via
signaling connections
o Mostly via HTTP
►
Employ distributed shared resources (data storage, bandwidth)
► Generally two kinds of requests:
 Content requests: to find content in the overlay
 Keep-alive requests: stay connected in the overlay
►
Initially developed for file-sharing
► Various realizations exist
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-22/62
Basic Routing Behavior
●
Request messages:
►
Include a hop-counter, a GUID (Globally Unique Identifier) and a TTL (Time-To-Live) in the header
►
TTL determines along how many hops a message may be forwarded
►
Are flooded in the overlay network
►
●

Every node forwards every incoming message to all neighbors except the neighbor, it received the message
from

Exceptions: see below
Request messages terminate, if

Same message-type with same GUID is received more than once (loop!!)

Hop-counter=TTL
Response messages:
►
Include a hop-counter, a GUID and a TTL (Time-to-Live) in the header
►
GUID is the same as of the initializing request message
►
Are routed back on the same way to the requestor, the request message was transmitted to the
responding peer

every peer has to store the GUID of each request for a certain amount of time

No flooding to save resources
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-23/62
Basic Bootstrapping
●
Mostly not part of the protocol specification
●
Necessary to know at least one active participant of the network
●
Otherwise no participation at the overlay possible for a new node
●
Address (TCP) of an active node can be retrieved by different means:
►
Bootstrap cache: Try to establish one after another a connection to a node seen in a previous
session
►
Bootstrap server:

Connect to a “well known host”, which almost always participates

Ask a bootstrap server to provide a valid address of at least one active node

Realizations:
o FIFO of all node-addresses which recently used this bootstrap (a node which just connected is assumed
to be still active)
o Random pick of addresses which recently connected via this server to the overlay (+ no loops, -may be
outdated)
►
Broadcast on the IP layer

Use multicast channels

Use IP broadcasting (-limited to local network)
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-24/62
Unstructured P2P Networks
●
●
●
History
Overlay Network Characteristics
Network Types
►
►
►
Central P2P:
Napster
1. Basic Characteristics
2. Signaling Characteristics
3. Discussion
Pure P2P:
Gnutella 0.4
Hybrid P2P:
Gnutella 0.6
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-25/62
Definition of centralized P2P
●
●
●
●
●
All peers are connected to central entity
Peers establish connections between each other on demand to exchange user
data (e.g. mp3 compressed data)
Central entity is necessary to provide network services
Central entity is some kind of index/group database
Central entity is lookup/routing table
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-26/62
Basic Characteristics of centralized P2P
Bootstrapping: Bootstrap-server = central server
● Central entity can be established as a server farm, but one single
entry point = single point of failure (SPOF)
● All signaling connections are directed to central entity
● Peer  central entity: P2P protocol, e.g. Napster protocol
●
●
►
To find content
►
To log on to the overlay
►
To register
►
To update the routing tables
►
To update shared content information
Peer  Peer: HTTP
►
To exchange content/data
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-27/62
Topology of Centralized P2P
Servent
Connection between
2 servents (TCP)
Connection between
router & servent
Connection between
routers (Core)
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-28/62
Napster: How Does It Work
●
●
Application-level, client-server protocol over point-to-point TCP
Partcipants:
►
Napster Hosts/peers
► Client Service
 Login
 Data-requests
 Download-requests
►
P2P Service
Napster
Host
Data
Transfer
Napster
Host
Central
Napster
Index
server
Napster
Host
 Data-transfer
►
Napster Indexserver
 Pure Server
●
Napster
Host
Five steps:
►
►
►
►
►
Connect to Napster Server
Upload your list of files (push) to server
Query Indexserver with a list of keywords to search the full list with
Select “best” of correct answers
Connect to providing host/peer
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-29/62
Napster Message Structure
General Header Structure:
HEADER 4byte
<Payload Length>
2byte
Describes the
message type (e.g.
login, search,…)
PAYLOAD
<Function>
2Byte
Describes parameters
of the message (e.g.
IDs, keywords,…)
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-30/62
Napster: Initialization
Client/Server Service
1: LOGIN (Function:0x02)
<Nick>
<Password>
<Port>
<Client-Info>
<Link-type>
2: LOGIN ACK (Function: 0x03)
3: NOTIFICATION OF SHARED FILE (0x64)
„<Filename>“
Napster
Host
IP: 001
Nick: LKN
<MD5>
<Size>
<Bitrate>
<Freq>
NOTIFICATION(0x64)
LOGIN
ACK(0x03)
LOGIN(0x02)
„band
- song.mp3“
3f3a3... 5674544
128
342 „nap v0.8“ 9
lkn44100
54332 6699
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
<Time>
Central
Napster
Index
server
7-31/62
Napster: File Request Procedure
1: SEARCH (Function: 0xC8)
[FILENAME CONTAINS „Search Criteria“]
[MAX_RESULT <Max>]
[LINESPEED <Compare> <Link-Type>]
[BITRATE <Compare> “<Bitrate>”]
[FREQ <Compare> “<Freq>”]
2: SEARCH RESPONSE (Function: 0xC9)
„<Filename>“
<MD5>
<Time>
Central
Napster
Index
server
<Size>
<Nick>
<Bitrate>
<IP>
<Freq>
<Link-Type>
SEARCH(0xC8)
FILENAME CONTAINS „song“ MAX_RESULTS 100
LINESPEED „AT LEAST“ 6 BITRATE „AT LEAST“
„128“
Napster
Host
IP: 002
Nick: MIT
FREQ „EQUAL TO“ „44100“
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-32/62
Summary of Napster Signaling
Napster
Peer (Req)
Napster
Server
Napster
Peer (Prov)
Login: [0x24|0x02|…]
Login Ack: [0x00|0x03|…]
Notif: [0x46|0x64|…]
Notif: [0x46|0x64|…]
Notif: [0x46|0x64|…]
Search: [0x7E|0xC8|…]
Response: [0xC4|0xC9|…]
Response: [0xC4|0xC9|…]
HTTP: GET[Filename]
OK[data]
Sample message sequence chart for one Napster server with one requesting
and one providing peer
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-33/62
Napster: Wrap-Up I
1.
Search Request
User sends out a music
file request and Napster
searches its central
data base.
0101
1001
1001
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-34/62
Napster: Wrap-Up II
2.
Search Response
The Napster Server
sends back a list of
peers that share
the file.
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-35/62
Napster: Wrap-Up II
3.
File Download
The requesting user
downloads the file directly
from the computer of
another Napster user via
HTTP.
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-36/62
Centralized P2P: Discussion
●
Disadvantages
►
►
►
►
●
Advantages
►
►
►
●
Fast and complete lookup (one hop lookup)
Central managing/trust authority
No keep alive necessary, beyond content updates
Application areas
►
►
►
●
Single Point of Failure easily attackable
Bottleneck
Potential of congestion
Central server in control of all peers
File Sharing
VoIP (SIP, H.323)
Conceptually: „Social Web‟ applications (eBay, YouTube, del.icio.us, etc.)
Systems
►
BitTorrent, Audiogalaxy, WinMX
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-37/62
Unstructured P2P Networks
●
●
●
History
Overlay Network Characteristics
Network Types
►
►
►
Central P2P:
Napster
Pure P2P:
Gnutella 0.4
1. Basic Characteristics
2. Signaling Characteristics
3. Discussion
Hybrid P2P:
Gnutella 0.6
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-38/62
Definition of Pure P2P
●
●
●
Any terminal entity can be removed without loss of functionality
No central entities employed in the overlay
Peers establish connections between each other randomly
►
►
To route request and response messages
To insert request messages into the overlay
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-39/62
Model of Pure P2P Networks
Degree distribution:

cd -1.4 , 0 
p (d )
d 7
p (d ) ==
, with c 


c
d
0 , in any other case

-1
average : d = 2.2
var (d ) =1.63
According Sample Graph:
Separate sub
networks
Major
component
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-40/62
Basic Characteristics of Pure P2P
●
Bootstrapping:
►
Via bootstrap-server (host list from a web server)
► Via peer-cache (from previous sessions)
► Via well-known host
► No registration
●
Routing:
►
►
Completely decentralized
Reactive protocol: routes to content providers are only established on
demand, no content announcements
► Requests: flooding (limited by TTL and GUID)
► Responses: routed (Backward routing with help of GUID)
● Signaling connections (stable, as long as neighbors do not change):
► Based on TCP
► Keep-alive
► Content search
● Content transfer connections (temporary):
► Based on HTTP
► Out of band transmission
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-41/62
Topology of Pure P2P
Servent
Connection between
2 servents (TCP)
Connection between
router & servent
Connection between
routers (Core)
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-42/62
Gnutella 0.4: How Does It Work
●
Application-level, peer-to-peer protocol over point-to-point TCP
Partcipants:
►
►
Gnutella peers/servents
Router Service
 Flood incoming requests (regard TTL!)
o
o
Keep alive
content
G
G
G
Keep alive (PING/PONG)
Content (QUERY/QUERYHIT)
 Data-requests
 Download-requests
►
Peer/
Servent
G
G
G
 Route responses for other peers (regard GUID of message)
o
o
G
G
G
G
G
G
TCP
connection
Lookup Service
 Initialize Data requests
 Initialize keep alive requests
►
“Server”-Service
 Serve Data-requests (HTTP)
●
Five steps:
►
►
►
►
►
Connect to at least one active peer (address received from bootstrap)
Explore your neighborhood (PING/PONG)
Submit Query with a list of keywords to your neighbors (they forward it)
Select “best” of correct answers (which we receive after a while)
Connect to providing host/peer
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-43/62
The Gnutella Network
Measurements taken at
the LKN in May 2002
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-44/62
Gnutella Message Structure
General Header Structure:
MESSAGEHEADER: 23Byte
GnodeID
16 Bytes
Function
Describes the
message type (e.g.
login, search,…)
1 Byte
TTL
1 Byte
Hops
1 Byte
Payload Length
4 Bytes
Describes parameters
of the message (e.g.
IDs, keywords,…)
• GnodeID: unique 128bit Id of any Hosts
• TTL(Time-To-Live): number of servents, a message may pass
before it is killed
• Hops: number of servents a message already passed
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-45/62
Gnutella Messages
PING (Function:0x00)
No Payload
PONG (Function:0x01)
Port
2 Bytes
IP Address
4 Bytes
Nb. of shared Files
4 Bytes
Nb. of Kbytes shared
4 Bytes
QUERY (Function:0x80)
Minimum Speed
2 Bytes
Search Criteria
n Bytes
QUERY HIT (Function:0x81)
Nb. of Hits
1 Byte
Port
2 Bytes
IP Address
4 Bytes
Speed
1 Byte
File Index
4 Bytes
Result Set
n Bytes
GnodeID
16 Bytes
File Name
n Bytes
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-46/62
Gnutella Routing
• Basic Routing Principle: „Enhanced“
Flooding
• Save Origin of received PINGs and QUERIEs
• Increase Hops by 1
• If Hops equals TTL, kill the message
• Flooding: Received PINGS and QUERIES must
be forwarded to all connected Gnodes
• PINGS or QUERYS with the same FUNCTION ID
and GNODE ID as previous messages are
destroyed (avoid loops)
• PONG and QUERY HIT are forwarded to the
origin of the according PING or QUERY
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-47/62
Gnutella Connection Setup
Gnode 2000 establishes a connection to 4000
17Gnutella Connect
GNODE
ID: 1000
IP: 001
GNODE
ID: 2000
IP: 002
17 19
20
18
27
28
26
25
GNODE
ID: 3000
IP: 003
22
24
GNODE
ID: 4000
IP: 004
18Gnutella OK
19PING
20PONG/IP:004
21PING
23PONG/IP:001
27PONG/IP:001
22PING
24PONG/IP:003
28PONG/IP:003
25PING
26PING
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-48/62
Summary of the Signaling in Gnutella 0.4
5
2
Sample Gnutella 0.4 network:
4
1
3
6
7
8
Sample message sequence
chart according to the sample
network:
Peer7
Peer3
Peer1
Gnu-Con
OK
Peer5
Peer2
Peer4
Peer6
Peer8
Gnu-Con
OK
Gnu-Con
OK
PING
PING
PING
PING
PING
PING
PING
PING
PING
PING
PING
PING
PONG
PONG
PONG
PONG
PONG
PONG
PONG
PONG
PONG
PONG
PONG
PONG
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-49/62
Gnutella Wrap-Up I
Requesting Servent
Servent
Query […]
Query-Hit […]
Requested Data
• Broadcast
Query[XYZ, TTL = 3, …]
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-50/62
Gnutella Wrap-Up II
Requesting Servent
Servent
Query […]
Query-Hit […]
Requested Data
• 1. Query-Hit
• Broadcast
Query[XYZ, TTL = 2, …]
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-51/62
Gnutella Wrap-Up III
Requesting Servent
Servent
Query […]
Query-Hit […]
Requested Data
• 2. Query-Hit
• Broadcast
Query[XYZ, TTL = 1, …]
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-52/62
Gnutella Wrap-Up IV
Requesting Servent
Servent
Query […]
Query-Hit […]
Requested Data
• 3. + 4. Query-Hit
• [TTL = 0]  no further
Broadcast
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-53/62
Gnutella Wrap-Up V
Requesting Servent
Servent
Query […]
Query-Hit […]
Requested Data
• Establish HTTP
Connection
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-54/62
Gnutella Wrap-Up VI
Requesting Servent
Servent
Query […]
Query-Hit […]
Requested Data
• HTTP Connection
Get[XYZ, …, …]
Download Data
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-55/62
Discussion
●
Disadvantages
►
►
►
High signaling traffic, because of decentralization
Modem nodes may become bottlenecks
Overlay topology not optimal, as
 no complete view available,
 no coordinator
►
If not adapted to physical structure delay and total network load increases
 Zigzag routes
 loops
●
●
●
Advantages
►
►
►
►
No single point of failure
Can be adapted to physical network
Can provide anonymity
Can be adapted to special interest groups
►
►
File-sharing
Context based routing (see chapter about mobility)
►
Freenet, Gnutella, Gnunet
Application areas
Systems
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-56/62
Unstructured P2P Networks
●
●
●
History
Overlay Network Characteristics
Network Types
►
►
►
Central P2P:
Napster
Pure P2P:
Gnutella 0.4
Hybrid P2P:
Gnutella 0.6
1. Basic Characteristics
2. Signaling Characteristics
3. Discussion
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-57/62
Definition of Hybrid P2P
●
●
●
●
●
●
Main characteristic, compared to pure P2P: Introduction of another dynamic
hierarchical layer
Hub based network
Reduces the signaling load without reducing the reliability
Election process to select and assign Superpeers
Superpeers: high degree (degree>>20, depending on network size)
Leafnodes: connected to one or more Superpeers (degree<7)
leafnode
Superpeer
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-58/62
Model of Hybrid P2P Networks
Degree distribution:
cd 1.4 , 1
d 7
-1
-1.4

p (d )
c1 -0.05, d =1
p (d ) ==
, with c 
 c
c 0.05, d =20
d

0, in any other case
Separate sub
networks
average : d = 2.8
var (d ) = 3.55
According sample graph:
Major
component
leafnode
Hub connections
(2nd hierarchy)
Superpeer
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-59/62
Basic Characteristics of Hybrid P2P
●
Bootstrapping:
►
Via bootstrap-server (host list from a web server)
► Via peer-cache (from previous sessions)
► Via well-known host
► Registration of each leafnode at the Superpeer it connects to, i.e. it announces its shared files to the
Superpeer
●
Routing:
►
Partly decentralized



Leafnodes send request to a Superpeer
Superpeer distributes this request in the Superpeer layer
If a Superpeer has information about a matching file shared by one of its leafnodes, it sends this information back to the
requesting leafnode (backward routing)
►
Hybrid protocol (reactive and proactive): routes to content providers are only established on demand;
content announcements from leafnodes to their Superpeers
► Routing within Superpeer layer equal to Pure P2P
●
Signaling connections (stable, as long as neighbors do not change):
►
Based on TCP
► Keep-alive
► Content search
●
Content transfer connections (temporary):
►
Based on HTTP
► Out of band transmission (directly between leafnodes)
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-60/62
Gnutella 0.6 Network Organization
New connection/network setup
►
►
►
►
Upon connection to the network via a Superpeer, each node is a
leafnode
It announces its shared content to the Superpeer it connected to
Superpeer thus updates its routing tables
Election mechanism decides which node becomes a Superpeer or a
leafnode (depending on capabilities (storage, processing power)
network connection, the uptime of a node,…), if
 Too many nodes are connected to one Superpeer
 A Superpeer leaves the network
 To less nodes are connected to a Superpeer
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-61/62
Gnutella 0.6 Routing
●
Content requests:
►
►
►
►
►
►
●
Content responses:
►
►
►
●
Leafnode sends request to Superpeer
Superpeer looks up in its routing tables whether content is offered by one of its leafnode. In this
case the request is forwarded to this node.
Additionally the Superpeer increases the hopcounter and forwards this request to the Superpeers it
is connected to.
To enable backward routing, the peer has to store the GUID of the message connected to the
information from which peer it received the request in the previous hop
If a Superpeer receives such a request from another Superpeer, this request is handled the same
way, as if it would have received it from one of its leafnodes
After the hopcounter of the request reaches the TTL-value it is not forwarded any further (prevent
circles)
If a leafnode receives a request, it double-checks whether it shares the file (should be the case, as
long as the routing tables of the Superpeer are correct)
In case of success, the leafnode sends a content reply back to the requesting peer, by sending it
back to that node (Superpeer) it received the message from (backward routing)
Hop by hop the message can thus be routed back to the requesting node
Content exchange:
►
Directly between the leafnodes, via HTTP connections
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-62/62
Topology of Hybrid P2P
116
118
18
118
100
39
116
18
7
3, 43
43
100
7
39
3
Abstract network structure of a part of
the Gnutella network (222 nodes
Geographical view given by Figure on
the right, measured on 01.08.2002
Geographical view of a part of the Gnutella
network (222 nodes); The numbers depict the
node numbers from the abstract view (Figure on
the left, measured on 01.08.2002)
• Virtual network not matched to physical network. See path from node 118 to node 18.
• Superpeer (hub) structure clearly visible in abstract view
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-63/62
Gnutella 0.6 Messages
●
Content requests and responses
►
QUERY (defined as in Gnutella 0.4)
► QUERY_HIT (defined as in Gnutella 0.4)
●
Keep alive:
►
PING (defined as in Gnutella 0.4)
► PONG (defined as in Gnutella 0.4)
●
Announcement of shared content:
►
ROUTE_TABLE_UPDATE (0x30), Reset variant (0x0): to clear the routing table and to
set a new routing table for one leafnode
0
Variant
►
1
4
5
Infinity
Table_Length
ROUTE_TABLE_UPDATE (0x30), Patch variant(0x1): to update and set a new routing
table with a certain number of entries (e.g. new shared files)
0
Variant
1
Seq_No
2
Seq_Size
3
Compressor
4
Entry_Bits
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
5
n+4
DATA
7-64/62
Summary of the Signaling in Gnutella 0.6
L7
Sample Gnutella 0.6 network:
L6
S3
S2
S1
L1
L2
Sample message sequence
chart according to the sample
network:
L2
L3
L5
L4
4
L3
L1
S1
S3
S2
L7
L6
L5
L4
Gnu-Con
OK
PING
PING
RTU
PING
PING
PONG
PONG
PONG
PONG
PONG
QUERY
QUERY
QUERY
QUERY
QUERY
QUERY
QUERY
QUHIT
QUHIT
QUHIT
QUHIT
QUHIT
QUHIT
QUHIT
QUHIT
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-65/62
Gnutella 0.6: How Does It Work I
Ultrapeer
Leafnode
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-66/62
Gnutella 0.6: How Does It Work II
Ultrapeer
Leafnode
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-67/62
Gnutella 0.6: How Does It Work III
Ultrapeer
Leafnode
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-68/62
Gnutella 0.6: How Does It Work III
Ultrapeer
Leafnode
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-69/62
Gnutella 0.6: How Does It Work IV
Ultrapeer
Leafnode
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-70/62
Gnutella 0.6: How Does It Work V
Ultrapeer
Leafnode
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-71/62
Discussion
●
Disadvantages
►
Still High signaling traffic, because of decentralization
► No definitive statement possible if content is not available or not found
► Overlay topology not optimal, as
 no complete view available,
 no coordinator
►
If not adapted to physical structure delay and total network load increases
 Zigzag routes
 Loops
►
●
Difficult to adapt to physical network completely because of hub structure
Advantages
►
No single point of failure
► Can provide anonymity
► Can be adapted to special interest groups
●
Application areas
►
File-sharing
► Context based routing (see chapter about mobility)
●
Systems
►
Gnutella, eDonkey, Kazaa
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-72/62
Topology combinations
●
Each approach comes with a different set of
advantages/disadvantages
►
●
Suitability depends on application context
Combination of approaches
►
Use different techniques for different
application aspects
► Example: Skype
 Centralized P2P for Login/Account Mgmt.
o Routed by super-nodes if necessary
 Attempts to establish direct Voice over IP
connections
 Hybrid P2P to route through firewall,
between NATs, etc.
Figure from Salman A. Baset and Henning G. Schulzrinne: An Analysis
of the Skype Peer-to-Peer Internet Telephony Protocol, INFOCOM2006
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-73/62
1st and 2nd Generations of P2P
Client-Server
1. Server is the central
entity and only provider
of service and content.
 Network managed
by the Server
2. Server as the higher
performance system.
3. Clients as the lower
performance system
Example: WWW
Peer-to-Peer
1. Resources are shared between the peers
2. Resources can be accessed directly from other peers
3. Peer is provider and requestor (Servent concept)
Unstructured P2P
Structured P2P
Centralized P2P
Pure P2P
Hybrid P2P
1. All features of Peer-toPeer included
1. All features of Peer-toPeer included
1. All features of Peer-toPeer included
2. Central entity is
necessary to provide the
service
2. Any terminal entity can be
removed without loss of
functionality
2. Any terminal entity can be
removed without loss of
functionality
3. Central entity is some
kind of index/group
database
3.  No central entities
3.  dynamic central
entities
Examples: Gnutella 0.4,
Freenet
Pure P2P
Hybrid P2P
Example: Gnutella 0.6, JXTA
Example: Napster
1st Gen.
2nd Gen.
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-74/62
Outlook
●
Structured Networks
►
►
►
●
Distributed Hash Table (DHT) Basics
DHT Algorithms
DHT Dynamics
File Distribution Networks
VDBMS und P2P - Hans-Dieter Ehrich - Institut für Informationssysteme - TU Braunschweig
7-75/62