Acknowledgements. Peer-peer Computing & Networking CS 699/IT 818 Fall Peer-peer computing and networking. Peer-peer network.

Similar documents
Peer to Peer Computing

Peer-peer and Application-level Networking. CS 218 Fall 2003

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

CS 3516: Advanced Computer Networks

Chapter 2: Application layer

Content Search. Unstructured P2P. Jukka K. Nurminen

CS 3516: Computer Networks

Web caches (proxy server) Applications (part 3) Applications (part 3) Caching example (1) More about Web caching

Last Lecture SMTP. SUNY at Buffalo; CSE 489/589 Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 1

Content Search. Unstructured P2P

P2P Computing. Nobuo Kawaguchi. Graduate School of Engineering Nagoya University. In this lecture series. Wireless Location Technologies

Peer-to-Peer Networks

Lecture 8: Application Layer P2P Applications and DHTs

Unit 8 Peer-to-Peer Networking

CMSC 332 Computer Networks P2P and Sockets

Peer-to-Peer Applications Reading: 9.4

Overlays and P2P Networks

Peer-to-Peer Internet Applications: A Review

Advanced Computer Networks

Assignment 5. Georgia Koloniari

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Distributed Knowledge Organization and Peer-to-Peer Networks

Peer-to-Peer Systems. Chapter General Characteristics

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Introduction to Peer-to-Peer Systems

Peer-to-Peer (P2P) Systems

EE 122: Peer-to-Peer Networks

Overlay Networks: Motivations. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Motivations (cont d) Goals.

Page 1. How Did it Start?" Model" Main Challenge" CS162 Operating Systems and Systems Programming Lecture 24. Peer-to-Peer Networks"

Making Gnutella-like P2P Systems Scalable

Peer-to-Peer Computing

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Telecommunication Services Engineering Lab. Roch H. Glitho

A Survey of Peer-to-Peer Content Distribution Technologies

Distributed Systems. 17. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2016

CPSC156a: The Internet Co-Evolution of Technology and Society

Special Topics: CSci 8980 Edge History

Applications & Application-Layer Protocols: The Domain Name System and Peerto-Peer

Using peer to peer. Marco Danelutto Dept. Computer Science University of Pisa

Slides for Chapter 10: Peer-to-Peer Systems

Lecture 21 P2P. Napster. Centralized Index. Napster. Gnutella. Peer-to-Peer Model March 16, Overview:

Peer-to-Peer Systems and Distributed Hash Tables

Flooded Queries (Gnutella) Centralized Lookup (Napster) Routed Queries (Freenet, Chord, etc.) Overview N 2 N 1 N 3 N 4 N 8 N 9 N N 7 N 6 N 9

Distributed Systems. peer-to-peer Johan Montelius ID2201. Distributed Systems ID2201

Scalable overlay Networks

Telematics Chapter 9: Peer-to-Peer Networks

CPS 214: Computer Networks and Distributed Systems Networked Environments: Grid and P2P systems

Content Overlays. Nick Feamster CS 7260 March 12, 2007

Goals. EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Solution. Overlay Networks: Motivations.

Introduction on Peer to Peer systems

P2P. 1 Introduction. 2 Napster. Alex S. 2.1 Client/Server. 2.2 Problems

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Architectures for Distributed Systems

CSE 486/586 Distributed Systems Peer-to-Peer Architectures

Today. Why might P2P be a win? What is a Peer-to-Peer (P2P) system? Peer-to-Peer Systems and Distributed Hash Tables

Peer-to-peer systems and overlay networks

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza

Peer-to-Peer Protocols and Systems. TA: David Murray Spring /19/2006

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

From the Editors: Peer-to-Peer Community: Looking beyond the Legacy of Napster and Gnutella

Introduction to Peer-to-Peer Networks

Advanced Distributed Systems. Peer to peer systems. Reference. Reference. What is P2P? Unstructured P2P Systems Structured P2P Systems

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

0x1A Great Papers in Computer Security

Peer to Peer Systems and Probabilistic Protocols

Stratos Idreos. A thesis submitted in fulfillment of the requirements for the degree of. Electronic and Computer Engineering

P2P Applications. Reti di Elaboratori Corso di Laurea in Informatica Università degli Studi di Roma La Sapienza Canale A-L Prof.ssa Chiara Petrioli

Georges Da Costa Introduction on Peer to Peer systems

CSC 4900 Computer Networks: P2P and Sockets

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen

Scaling Problem Computer Networking. Lecture 23: Peer-Peer Systems. Fall P2P System. Why p2p?

Department of Computer Science Institute for System Architecture, Chair for Computer Networks. File Sharing

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Peer-to-peer computing research a fad?

Extreme Computing. BitTorrent and incentive-based overlay networks.

Unit background and administrivia. Foundations of Peer-to- Peer Applications & Systems

Handling Churn in a DHT

Introduction to P2P Computing

PEER-TO-PEER NETWORKS, DHTS, AND CHORD

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Ossification of the Internet

Scalable overlay Networks

Outline A Hierarchical P2P Architecture and an Efficient Flooding Algorithm

P2P technologies, PlanetLab, and their relevance to Grid work. Matei Ripeanu The University of Chicago

Peer-to-Peer Architectures and Signaling. Agenda

Chapter 10: Peer-to-Peer Systems

416 Distributed Systems. Mar 3, Peer-to-Peer Part 2

Peer-to-Peer Networking

6. Peer-to-peer (P2P) networks I.

CSCI-1680 P2P Rodrigo Fonseca

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

Peer to Peer I II 1 CS 138. Copyright 2015 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved.

Main Challenge. Other Challenges. How Did it Start? Napster. Model. EE 122: Peer-to-Peer Networks. Find where a particular file is stored

Web Protocols and Practice

1(11) Peer to peer networking

Course Staff Intro. Distributed Systems Intro. Logistics. Course Goals. Instructor: David Andersen TAs: Alex Katkova.

CC451 Computer Networks

CDNs and Peer-to-Peer

Transcription:

Peer-peer Computing & Networking CS 699/IT 88 Fall 2004 Sanjeev Setia http://www.cs.gmu.edu/~setia/cs699/ Acknowledgements Some of the followings slides are based on the slides made available by the authors of Computer Networking: A Top Down Approach Featuring the Internet, 2 nd edition. Jim Kurose, Keith Ross Addison-Wesley, July 2002. 2 Peer-peer computing and networking Peer-peer network Focus at the application level 3 4

Peer-to-Peer: Some Definitions A P2P computer network refers to any network that does not have fixed clients and servers, but a number of peer nodes that function as both clients and servers to other nodes on the network. Wikipedia.org The sharing of computer resources and services by direct exchange between systems Intel P2P working group The use of devices on the internet periphery in a non-client capacity Alex Weytsel, Aberdeen Group P2P is a class of applications that takes advantage of resources storage, cycles, content, human presence available at the edges of the internet. Clay Shirky, openp2p.com Peer-peer applications File sharing Napster, Gnutella, KaZaa Second generation projects Oceanstore, PAST, Freehaven Distributed Computation SETI@home, Entropia, Parabon, United Devices, Popular Power Other Applications Content Distribution (BitTorrent) Instant Messaging (Jabber), Anonymous Email Groupware (Groove) P2P Databases 5 6 Is Peer-to-peer new? P2P concept certainly not new Usenet - News groups first truly decentralized system DNS - Handles huge number of clients Basic IP - Vastly decentralized, many equivalent routers What is new? Scale: people are envisioning much larger scale Security: Systems must deal with privacy and integrity Anonymity: Protect identity and prevent censorship (In)Stability: Deal with unstable components at the edges P2P: Related Technologies Distributed computing. How is P2P different from distributed computing? Grid computing. How is the computational grid different from P2P networks? KEY DIFFERENCES: Peers are on the edges of the Internet, are autonomous, have variable connectivity, and temporary network addresses Application-level networking. Resilient overlay networks for multicast, video distribution, etc. 7 8 2

P2P: Related Technologies Wireless ad-hoc networks. Sensor networks. P2P devices/ubiquitous computing. JINI. Web services..net framework, SOAP, UDDI. Why the hype??? File Sharing: Napster (+Gnutella, KaZaa, etc) High coolness factor Served a high-demand niche: online jukebox Anonymity/Privacy/Anarchy: FreeNet, Publis, etc Libertarian dream of freedom Extremely valid concern of Censorship/Privacy In search of copyright violators, RIAA challenging rights to privacy Computing: The Grid Scavenge the numerous free cycles of the world to do work Seti@Home most visible version of this Industry/Management Looking for the next big thing A lot of interest/hype in autonomic computing /Computing as a utility 9 0 Class Logistics: a research seminar Q: Is a seminar just like a course? NO! I will present papers for the first 7 weeks Second half of the semester: student-presentations 2 presentations per class Each presentation based on -2 papers student-led discussions no well-defined body of knowledge to impart seminar is about searching for answers (and questions) great opportunity to find interesting research projects Requirements Workload: Reading assigned papers Write short reviews of papers to be discussed in class (20% of grade) Email them to me before class. Template for reviews is on class web site Participate in class discussions Make a class presentation (25%) Assist others in preparing their presentations (5%) Each presenter will be assigned a partner who will go over the presentation with them Course project (50%) Pre-requisites: previous class on networking, operating systems,algorithms, distributed systems 2 3

Presentations after first seven weeks: student-led presentations and discussions you will need to: read, think deeply about paper, topic area look for additional outside material prepare ~60 minute class presentation Go over presentation with partner, instructor lead in-class discussion of material everyone wants your presentation to be wellprepared, interesting, thoughtful Presentations (more) Preparing your presentation in advance: read documents about preparing a good talk (on web site) week in advance: meet with presentation partner Practice your talk! meet with instructor about presentation post overheads in advance of class What s in a presentation? paper contents additional material you have found critical analysis: questions, strengths, weaknesses, improvements, future work For every paper find: 3 most important points 3 weaknesses (flaws with the assumptions/methodology/etc.) 3 4 Class Project Various Options Explore an open topic Read literature, find open problem, propose a new solution, evaluate it Could be the beginning of a MS/Ph.d. thesis Re-evaluate/Validate the conclusions of a previous study do experiments that have been reported in a paper Measurement, Implementation, Simulation Programming-oriented project Implement a P2P application/service, e.g. use JXTA to develop a P2P app, do measurements, etc. Identify project by end of Sept. Discuss with instructor Make a presentation in class (5-0 minutes) Schedule First 7 weeks. Intro (Week ) 2. Structured P2P systems (DHTs) Tapestry, Chord, Pastry, CAN (Weeks 2 & 3) 3. Unstructured P2P systems (Week 4) 4. P2P File Systems (Week 5) 5. Performance & P2P Workload Issues (Week 6) 6. Security/Applications (Week 7) Next 6 weeks student presentations See reading list on class web site (soon!) Last week + Finals week Project Presentations 5 6 4

P2P Applications Taxonomy Content and File Sharing Napster, Gnutella, KaZaa, etc. Most research has focused on this class of apps Parallelizable Compute Intensive (Same task on every peer using different parameters) Componentized applications different components on each peer (not yet widely supported/recognized) Collaborative Instant messaging, groupware, games Many startups but not that much academic research P2P file sharing Example Alice runs P2P client application on her notebook computer Intermittently connects to Internet; gets new IP address for each connection Asks for Hey Jude Application displays other peers that have copy of Hey Jude. Alice chooses one of the peers, Bob. File is copied from Bob s PC to Alice s notebook: HTTP While Alice downloads, other users uploading from Alice. Alice s peer is both a Web client and a transient Web server. All peers are servers = highly scalable! 7 8 P2P Content Location & Routing P2P: centralized directory Three approaches Centralized directory (Napster) Decentralized directory + Flooding-based search (Gnutella) original Napster design ) when peer connects, it informs central server: IP address centralized directory server Bob peers Unstructured P2P systems Distributed Hash Tables (DHT) based document search and publication Structured P2P systems (Chord, CAN, Tapestry, etc) Presented in weeks 2 & 3 content 2) Alice queries for Hey Jude 3) Alice requests file from Bob 2 Alice 3 9 20 5

P2P: problems with centralized directory Single point of failure Performance bottleneck Copyright infringement file transfer is decentralized, but locating content is highly centralized 2 Napster program for sharing files over the Internet a killer application? history: 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service 2/99: first lawsuit 3/00: 25% UWisc traffic Napster 2000: est. 60M users 2/0: US Circuit Court of Appeals: Napster knew users violating copyright laws 7/0: # simultaneous online users: Napster 60K, Gnutella: 40K, Morpheus: 300K 200: Napster shut down; Bertelsmann acquire assets, etc. Today Napster 2.0 music download service (Roxio) Also OpenNap (open source napster server) 22 Napster: how did it work Napster Application-level, client-server protocol over pointto-point TCP. File list is uploaded napster.com Four steps: Connect to Napster server Upload your list of files (push) to server. Give server keywords to search the full list with. Select best of correct answers. (pings) users 23 24 6

Napster Napster 2. User requests search at server. Request and results napster.com user 3. User pings hosts that apparently have data. Looks for best transfer rate. pings napster.com user pings 25 26 Napster Napster: architecture notes 4. User retrieves file napster.com centralized server: single logical point of failure can load balance among servers using DNS rotation potential for congestion Retrieves file no security: passwords in plain text user no authentication no anonymity 27 28 7

P2P: decentralized directory More about decentralized directory Each peer is either a group leader or assigned to a group leader. Group leader tracks the content in all its children. Peer queries group leader; group leader may query other group leaders. ordinary peer group-leader peer neighoring relationships in overlay network overlay network peers are nodes edges between peers and their group leaders edges between some pairs of group leaders virtual neighbors bootstrap node connecting peer is either assigned to a group leader or designated as leader advantages of approach no centralized directory server location service distributed over peers more difficult to shut down disadvantages of approach bootstrap node needed group leaders can get overloaded 29 30 P2P: Query flooding P2P: more on query flooding Gnutella no hierarchy use bootstrap node to learn about others join message Send query to neighbors Neighbors forward query If queried peer has object, it sends message back to querying peer Pros peers have similar responsibilities: no group leaders highly decentralized no peer maintains directory info Cons excessive query traffic query radius: may not have content when present bootstrap node join maintenance of overlay network 3 32 8

Gnutella Gnutella: how it works peer-to-peer networking: applications connect to peer applications focus: decentralized method of searching for files each application instance serves to: store selected files route queries (file searches) from and to its neighboring peers respond to queries (serve file) if file stored locally Gnutella history: 3/4/00: release by AOL, almost immediately withdrawn too late many iterations to fix poor initial design (poor design turned many people off) What we care about: How much traffic does one query generate? how many hosts can it support at once? What is the latency associated with querying? Searching by flooding: If you don t have the file you want, query 7 of your partners. If they don t have it, they contact 7 of their partners, for a maximum hop count of 0. Requests are flooded, but there is no tree structure. No looping but packets may be received twice. Reverse path forwarding Note: Play gnutella animation at: http://www.limewire.com/index.jsp/p2p Is there a bottleneck? 33 34 Flooding in Gnutella: loop prevention Seen already list: A Distributed Computing Current supercomputers are too expensive ASCI White (# in TOP500) costs more than $0 million and needed a new building Few institutions or research groups can afford this level of investment There are more than 500 million PCs around the world some as powerful as early 90s supercomputers they are idle most of the time (60% to 90%), even when being used (spreadsheet, typing, printing,...) corporations and institutions have hundreds or thousands of PCs on their networks Try to harness idle PCs on a network and use them on computationally intensive problems 35 36 9

How it works Embarrassingly parallel applications Large computation to communication ratio Master/worker model Applications can use local disk for checkpointing Provider farms out work to idle PCs across the internet PC owners volunteer idle cycles (for money or altruistic purposes) Entropia network Born in 997 to apply idle computers worldwide to problems of scientific interest In 2 years grew to more than 30,000 computers with aggregate speed of over Tflop/second Several scientific achievements, e.g. Identification of largest known prime number Gone commercial: www.entropia.com and used for applications from: Life sciences Financial services Product design, etc. Today: appears to not have succeeded as a business Business model for distributed computing not yet successful 37 38 SETI @ home project SETI = Search for Extraterrestrial Intelligence Started in 996 to enlist PCs to work on analyzing data from the Arecibo radio telescope Good mix of popular appeal and good technology Now running on more than _ million PCs delivering ~,200 CPU years per day ~ 35 Tflops/sec setiathome.ssl.berkeley.edu fastest (but special-purpose) computer in the world Folding @ home project www.stanford.edu/group/pandegroup/cosm Enlists PCs to work on the protein folding problem most important problem in modern molecular biology From genome to structure: Genome sequence of DNA specifies amino acids that make up proteins, but says little about their functions: what is needed is how a protein fold (3D structure) Protein folding is very fast (microseconds) and complex Simulation timescale is of the order of nanoseconds 0^3 gap distributed computing Currently around 20,000 users 39 40 0

Readings P2P Survey Article on Class web page For next week: Tapestry Pastry 4