Peer-to-Peer Networks 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross
Administrivia Quiz #1 is next week (25 Sep) Covers all material up to and including Queueing Theory Web site: Study Guide, Equation Sheet HW #1 is posted (due 2 Oct) Lab #1 is posted (due 4 Oct) TAs are here to help! Ask them questions! 2
traceroute P2P Overview Architecture components Napster (Centralized) Gnutella (Distributed) Skype and KaZaA (Hybrid, Hierarchical) KaZaA Reverse Engineering Study 3
What is P2P? Client / Server interaction Client: any end-host Server: specific end-host P2P: Peer-to-peer Any end-host PowerBook G4 PowerBook G4
Aim to leverage resources available on clients (peers) Hard drive space Bandwidth (especially upload) Computational power Anonymity (i.e. Zombie botnets) Edge-ness (i.e. being distributed at network edges)
Clients are particularly fickle Users have not agreed to provide any particular level of service Users are not altruistic -- algorithm must force participation without allowing cheating Clients are not trusted Client code may be modified And yet, availability of resources must be assured
P2P History Proto-P2P systems exist DNS, Netnews/Usenet Xerox Grapevine (~1982): name, mail delivery service Kicked into high gear in 1999 Many users had always-on broadband net connections 1st Generation: Napster (music exchange) 2nd Generation: Freenet, Gnutella, Kazaa, BitTorrent More scalable, designed for anonymity, fault-tolerant 3rd Generation: Middleware -- Pastry, Chord Provide for overlay routing to place/find resources 7
P2P Architecture Content Directory Database of content Structured? Unstructured? Which peer has what files? Metadata: Other info about files Signaling protocol How do peers exchange coordination messages? Proprietary? Encrypted? 8
Architecture (2) File transfer How does a peer retrieve a file from another peer? HTTP or HTTP-like Any peer must be able to send reply messages 9
Overlay network is not the network Overlay networks are formed on top of network graph Connect peers via abstract links in the overlay Transport accomplished on network edges Overlay algorithms abstract particulars of the network P2P Application Application Overlay Network one edge Transport Network perhaps even built on HTTP for transport! Data Link Physical
traceroute P2P Overview Architecture components Napster (Centralized) Gnutella (Distributed) Skype and KaZaA (Hybrid, Hierarchical) KaZaA Reverse Engineering Study 11
Napster Original centralized design 1. When peer connects it informs central server of IP address content 2. Marcia queries for I Like It Server looks through index Reply: Daichi has I Like It 3. Marcia requests file from Daichi Daichi Marcia 3 1 1 1 1 2 centralized directory server
Problems? File transfer is decentralized, but locating content is highly centralized Single point of failure Performance bottleneck Single point of lawsuit Result: Napster was owned by Best Buy Now it s a rebranded Rhapsody music streaming service 13
traceroute P2P Overview Architecture components Napster (Centralized) Gnutella (Distributed) Skype and KaZaA (Hybrid, Hierarchical) KaZaA Reverse Engineering Study 14
Gnutella Created in response to Napster problems Fully decentralized Does not depend on central directory Participants arrange themselves in overlay Queries flood network to find file Fully anonymous Public domain protocol Various Gnutella clients 15
Bootstrapping 1. New peer X must find some member of the Gnutella network Use a list of candidate peers 2. X sequentially attempts to make TCP connection with peers on list until successful with peer Y 3. X sends ping message to Y; Y forwards ping message 4. All peers receiving a ping message respond to X with a pong message 5. X receives many pong messages and can setup additional TCP connections 16
Query Flooding Query messages sent over existing TCP connections Peers forward query message File transfer (HTTP) Query Query QueryHit Query Query QueryHit QueryHit messages sent over reverse path Query File transfer arranged over HTTP QueryHit
Limited Scope Query Flooding Original design not scalable Exponential increase in signaling traffic Solution is to limit scope of query Include peer-count field in query message, e.g. peer-count = 4 This field gets decremented by 1 at each hop Message stops propagating when peer-count hits zero Query (peer-count = 3) Query (peer-count = 2) 18
Question If peer-count = 4 at the start, how many peers would the query message eventually reach? 19
More Questions Is limited scope query flooding scalable? (i.e. How does number of nodes affect message counts?) 20
Even more questions Are we guaranteed to find an object? (Assume the object exists somewhere in the overlay network) 21
traceroute P2P Overview Architecture components Napster (Centralized) Gnutella (Distributed) Skype and KaZaA (Hybrid, Hierarchical) KaZaA Reverse Engineering Study 22
KaZaa: Exploiting Heterogeneity Each peer is either a Super Node (SN) or an Ordinary Node (ON) assigned to a SN TCP connection between ON and its SN TCP connections between some pairs of SNs SN tracks the content in all its children
KaZaa Queries Each file has a hash and a descriptor Client sends keyword query to its SN SN responds with matches: For each match: metadata, hash, IP address If SN forwards query to other SNs, they respond with matches Client then selects files for downloading HTTP requests using hash as identifier sent to peers holding desired file 24
Measurement Study Developed tools to reverse engineering KaZaA Attempt to answer the following questions: What is the ratio of SN to ONs? What is the fraction of SNs overall? How are SNs connected, sparsely or densely? How does ON pick best SN? Random port numbers and NATs? 25
Structural Properties Deployed apparatus in Polytechnic campus and broadband residential network SN connects to 40-50 other SNs (dynamic) SN has 100-160 ONs at Polytechnic, 55-70 at access network Given 3 million peers, 25000 40000 SNs SN is connected to ~0.1% of other SNs 26
Unanswered Questions... Details about the residential access network? Where is it? What is it? What is the uplink/download bandwidth? How long was the measurement study? 6 hours on 2 days? Aug 22 03, Oct 24 03 How are these time periods representative samples? Where did the 3 million peers number come from? From KaZaA? 27
Overlay Dynamics Connection lifetimes are short Average for ON-SN is 34 mins, SN-SN is 11 mins 38% of ON-SN and 32% of SN-SN lasted < 30 secs Why so short? SN searching for other SNs with small workload Long-term connection shuffling, so larger set of SNs can be explored Exchange of SN lists 28
Unanswered Questions... Big jump from overlay dynamic numbers to conjectures of what SNs are doing How can we interpret these numbers better? Staircases in the cumulative distribution? Different distinct groups of connection times Compare these times to conjectures 29
Parent Selection Workload Exact algorithm to calculate workload is unknown Tied to the number of connections a SN is current supporting Locality RTT measurements 60% of SN-SN connections < 50 msec 40% of ON-SN < 5 msecs Transatlantic traffic ~ 100 msecs Transpacific traffic ~ 180 msecs Topological closeness (Prefix matching) SNs in SN list close to ON Issues with this methodology? 30
Skype P2P Voice-over-IP (VoIP) pc-to-pc, pc-to-phone, phoneto-pc also IM, video proprietary application-layer protocol (inferred via reverse engineering) Skype login server hierarchical overlay
Making a Call User starts Skype Client registers with SN list of bootstrap SNs Client logs in (authenticates) Skype login server Call: client queries SN with callee ID SN contacts other SNs (how? unknown) to find addr of callee SN returns address to client Client directly contacts callee (TCP)
Lesson Objectives Now, you should be able to: list reasons that led to the creation of P2P networks describe what an overlay network is and how it is different from the internet use historical P2P networks to describe centralized P2P networks, fully distributed P2P networks, and hierarchical P2P networks describe search techniques in the various P2P forms, and to analyze search efficiencies 33