PG DynaSearch Markus Benter 31th October, 2013
Introduction Centralized P2P-Networks Unstructured P2P-Networks Structured P2P-Networks Misc 1
What is a Peer-to-Peer System? Definition Peer-to-peer systems are distributed systems consisting of interconnected nodes able to self-organize into network topologies with the purpose of sharing resources [...] capable of adapting to failures and accommodating transient populations of nodes while maintaining acceptable connectivity and performance, without [...] a global centralized server or authority. [Androutsellis-Theotokis and Spinellis, 2004] 2
What is a Peer-to-Peer System? Overlay over the Internet Peers connected such that... a given search task can be solved efficiently maintenance overhead is low Service A Service B Overlay Services Service C Overlay Network Peers identified by PeerID Firewall + NAT TCP/IP Network TCP/IP Network Relay HTTP Underlay Networks TCP TCP/IP Network Figure: Kalman Graffi 3
Applications Application Examples Content Delivery File Sharing Gnutella, BitTorrent, Freenet Streaming Media Joost, PPLive Information Retrieval Web Search Service Discovery Social Networks Digital Currencies YaCy, FAROO Web Service Discovery, SFB901 Diaspora, LifeSocial Bitcoin, Peercoin 4
P2P Overlays Basically three different types of Peer-To-Peer Networks 1. Centralized P2P-Networks 2. Unstructured P2P-Networks 3. Structured P2P-Networks 5
1. Centralized P2P-Networks Peers connect to server Data transfer in Peer-To-Peer manner 1. Join 2. Request 3. Response 4. Tansfer 6
2. Unstructured P2P-Networks Nodes connected randomly What would be good properties? Low degree (usually constant) Small diameter (usually logarithmic) Topic Assignment Constant degree random network with logarithmic diameter: Kathlén 7
Search in Unstructured P2P-Networks Broadcast: Flooding with some TTL (Time To Live) TTL=3 8
Search in Unstructured P2P-Networks Broadcast: Flooding with some TTL (Time To Live) TTL=3 8
Search in Unstructured P2P-Networks Broadcast: Flooding with some TTL (Time To Live) TTL=3 8
Search in Unstructured P2P-Networks Broadcast: Flooding with some TTL (Time To Live) TTL=3 8
Search in Unstructured P2P-Networks Random Walk with maximal length 9
Search in Unstructured P2P-Networks Random Walk with maximal length 9
Search in Unstructured P2P-Networks Random Walk with maximal length 9
Search in Unstructured P2P-Networks Random Walk with maximal length 9
Search in Unstructured P2P-Networks Random Walk with maximal length 9
Flooding Versus Random Walk Assumptions Some files/resources rarely available Some files/resources well spread Random Walk Finds rarely available resources Predictable overhead (#messages) Flooding Low latency Fault tolerant 10
Rendevouz Routing: BubbleStorm (1/2) Overlay is random graph Peer publishes data by broadcasting to q other peers. Peer sends query by broadcasting to p other peers. Define q := c n and p := c n Probability of rendevouz: c 1 2 3 4 r = 1 e c2 62.21% 98,17% 99,99% 99.999999% Query Bubble Rendevouz Data Bubble [Terpstra et al., 2007] 11
Rendevouz Routing: BubbleStorm (2/2) Advantages over Broadcast Routing Query hit w.h.p. Less overhead for same hit probability Data available after unexpected node failure Question What are the disadvantages? 12
3. Structured P2P-Networks DHT (Distributed Hash Table) Search: well defined routing path Chord [Stoica et al., 2001] Routing path length: O(log n) Degree: O(log n) K54 N56 N51 N48 N42 N4 N8 N14 N21 Lookup (K54) K10 N56 N51 N48 +32 +16 +8 N4 N8 +4 +2 +1 N14 Finger table N8 + 1 N14 N8 + 2 N14 N8 + 4 N14 N8 + 8 N21 N8 + 16 N32 N8 + 32 N42 K38 N38 N32 K30 K24 N42 N38 N32 N21 Figure: A Survey and comparision of Peer-To-Peer Overlay Networks Schemes. Eng Keong Lua et al. 13
Degree Optimal Chord: Koorde (1/3) Basically De Bruijn Graph + Chord Ring De Brujin graph with n = 2 b nodes V = {000, 001, 010, 011, 100, 101, 110, 111} E: shift 0 and shift 1 Example: 010 0 100 and 010 1 101 Properties Degree: 2 Diameter: O(log n) 100 000 010 101 001 011 110 [Kaashoek and Karger, 2003] 111 14
Degree Optimal Chord: Koorde (2/3) Basically De Bruijn Graph + Chord Ring (1)-De-Bruijn-Edge = (0)-De-Bruijn-Edge + Chord-Edge Can remove (1)-De-Bruijn-Edges Not of all 2 b nodes online Fill gaps: see paper! 100 000 010 101 001 011 110 111 15
Degree Optimal Chord: Koorde (3/3) Koorde matches several lower bounds on routing time. Degree Routing Time Lower Bound 2 O(log n) Ω(log n) k O( log n log k ) log n Ω( log k ) log n O( log n log log n ) log n Ω( log log n ) 16
Further Structured P2P-Networks Prefix routing Tapestry Pastry Logarithmic degree Chord Symphony Kademlia Bi-directional routing XOR distance function Fixed Degree CAN D-dim. coordinate system Cycloid Cube connected cycle Koorde De Bruijn Graph O(1)-hop EpiChord Learning by queries 17
Further Structured P2P-Networks Topic Assignment Tobias will present CAN Topic Assignment Raymond will present EpiChord 18
Structured Versus Unstructured P2P-Networks Structured P2P-Networks Predictable and bounded routing time (often logarithmic) Very efficient for simple queries (e.g. key-value store) Simple load-balancing algorithms Complete results guaranteed Unstructured P2P-Networks Simple network structure and routing policies Efficient for widely available resources Churn robustness Robust against attacks Remark For complex queries, as in our project group, it is not obvious whether structured or unstructured P2P-Networks are the better choice. 19
Performance Measures (Degree) Hop count Fault tolerance Maintenance overhead Load balance The number of neighbors of a certain node. Worst case average case number of hops needed to get a message from u to v. Which fraction of nodes can fail. Number of messages to be (contiguously) exchanged. How evenly are the keys distributed. How munch load (messages) has a certain node to pass. 20
Information Locality (1/3) Definition (Information Locality) Information locality means that similar information is stored close together in the overlay graph. Example: Define close as numerically close. I.e. v 1 v 2 defines closeness of value v 1 and value v 2. What is it good for? Similarity queries can be solved efficiently! Example: Find all values 100 +/- 5 21
Information Locality (2/3) Chord does not provide information locality (hashing!) Chord can be adapted easily K54 = Key 54 V54 = Value 54 K54 N56 N4 N8 Lookup (K54) V54 N56 N4 N8 Lookup (K54) N51 N14 N51 N14 N48 K10 N48 V10 K38 N42 N38 N32 K30 N21 K24 V38 N42 N38 N32 V30 N21 V24 22
Information Locality (3/3) Disadvantage: Need explicit load balancing! N56 N4 N8 Lookup (V26) N51 N14 N48 N21 N42 N38 N32 V30 V23 V26 Remark Information Locality might be an important technique to solve problems the Project Group is dealing with. 23
Thank you for your attention! 24