P2P technologies, PlanetLab, and their relevance to Grid work. Matei Ripeanu The University of Chicago

Similar documents
CPS 214: Computer Networks and Distributed Systems Networked Environments: Grid and P2P systems

Peer to Peer Computing

Unit 8 Peer-to-Peer Networking

High Performance Computing Course Notes Grid Computing I

Peer-to-peer computing research a fad?

Overlay and P2P Networks. Introduction and unstructured networks. Prof. Sasu Tarkoma

Agent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma. Distributed and Agent Systems

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

An Introduction to Overlay Networks PlanetLab: A Virtual Overlay Network Testbed

Grids and Security. Ian Neilson Grid Deployment Group CERN. TF-CSIRT London 27 Jan

Fault tolerance based on the Publishsubscribe Paradigm for the BonjourGrid Middleware

Distributed Knowledge Organization and Peer-to-Peer Networks

Introduction Distributed Systems

Peer-to-peer. T Applications and Services in Internet, Fall Jukka K. Nurminen. 1 V1-Filename.ppt / / Jukka K.

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Overlays and P2P Networks

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

Introduction to Peer-to-Peer Systems

Personal Grid Running at the Edge of Internet *

Enabling Grids for E-sciencE. EGEE security pitch. Olle Mulmo. EGEE Chief Security Architect KTH, Sweden. INFSO-RI

Assignment 5. Georgia Koloniari

Introduction to Grid Computing

Unit background and administrivia. Foundations of Peer-to- Peer Applications & Systems

Peer-to-peer & Energy Consumption

90 % of WAN decision makers cite their

Table of Contents. Cisco Blocking Peer to Peer File Sharing Programs with the PIX Firewall

Why Datrium DVX is Best for VDI

Grid Scheduling Architectures with Globus

Grid Architectural Models

R. K. Ghosh Dept of CSE, IIT Kanpur

Chapter 10: Peer-to-Peer Systems

Transform your network and your customer experience. Introducing SD-WAN Concierge

VMware vshield App Design Guide TECHNICAL WHITE PAPER

P2P Computing. Nobuo Kawaguchi. Graduate School of Engineering Nagoya University. In this lecture series. Wireless Location Technologies

Path Optimization in Stream-Based Overlay Networks

Grid Middleware and Globus Toolkit Architecture

Why the Cloud is the Network

Motivation for peer-to-peer

Dell EMC ScaleIO Ready Node

Transform your network and your customer experience. Introducing SD-WAN Concierge

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Replicate It! Scalable Content Delivery: Why? Scalable Content Delivery: How? Scalable Content Delivery: How? Scalable Content Delivery: What?

Content Search. Unstructured P2P

Mapping Your Requirements to the NIST Cybersecurity Framework. Industry Perspective

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Cloud Computing the VMware Perspective. Bogomil Balkansky Product Marketing

Virtualizing the SAP Infrastructure through Grid Technology. WHITE PAPER March 2007

Moving From Reactive to Proactive Storage Management with an On-demand Cloud Solution

Boundary control : Access Controls: An access control mechanism processes users request for resources in three steps: Identification:

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

INTRODUCTION OUR SERVICES

Grid Computing Systems: A Survey and Taxonomy

IPv6: Myth and Reality Jeff Doyle IPv6 Solutions Manager

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

An introductory look. cloud computing in education

Paper. Delivering Strong Security in a Hyperconverged Data Center Environment

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

QOS Quality Of Service

Managing CAE Simulation Workloads in Cluster Environments

Evolution of the ATLAS PanDA Workload Management System for Exascale Computational Science

Content Search. Unstructured P2P. Jukka K. Nurminen

Peer-to-Peer Applications Reading: 9.4

Towards Ensuring Collective Availability in Volatile Resource Pools via Forecasting

The End of Storage. Craig Nunes. HP Storage Marketing Worldwide Hewlett-Packard

A Survey of Peer-to-Peer Content Distribution Technologies

Cloud Computing introduction

Peer-to-Peer Networks

Introduction to Grid Technology

IPv6 Deployment in Africa

Distributed Systems Principles and Paradigms

Potential for Technology Innovation within the Internet2 Community: A Five-Year View

Telecommunication Services Engineering Lab. Roch H. Glitho

The Software Driven Datacenter

Dynamic Creation and Management of Runtime Environments in the Grid

Distributed Systems Peer-to-Peer Systems

Peer-to-Peer Secure Update for Heterogeneous Edge Devices

Smarter Systems In Your Cloud Deployment

volley: automated data placement for geo-distributed cloud services

Hyper-Converged Infrastructure: Providing New Opportunities for Improved Availability

Enterprise Cloud Computing. Eddie Toh Platform Marketing Manager, APAC Data Centre Group Cisco Summit 2010, Kuala Lumpur

CLOUND COMPUTING II STUDY NOTES. Unit 1. Introduction to Cloud Computing

Introduction to Distributed Systems

Never Drop a Call With TecInfo SIP Proxy White Paper

Slides for Chapter 10: Peer-to-Peer Systems

Delivering Windows-based Client-Server Applications Anywhere, On Demand with Presentation Server 4.5

Connectivity to Cloud-First Applications

MidoNet Scalability Report

Cisco Prime Data Center Network Manager 6.2

Computing as a Service

Next-Generation HCI: Fine- Tuned for New Ways of Working

TRANSFORMING TO IT-AS-A- SERVICE

Feature Comparison Summary

Peer-to-Peer Systems. Chapter General Characteristics

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Better Security with Virtual Machines

VMware vsphere 4. The Best Platform for Building Cloud Infrastructures

Designing Windows Server 2008 Network and Applications Infrastructure

[TITLE] Virtualization 360: Microsoft Virtualization Strategy, Products, and Solutions for the New Economy

Transcription:

P2P technologies, PlanetLab, and their relevance to Grid work Matei Ripeanu The University of Chicago

Why don t we build a huge supercomputer? Top500 supercomputer list over -0.70 time: -0.72 Zipf distribution: Perf(rank) rank -k LinPack perf.gflops (log scale). 10000 1000 100 10 1-0.68-0.74-0.76-0.78-0.80-0.82-0.84 Parameter 'k' evolution. 2001 2000 1999 1998 1997 1996 1995 1 10 100 1000 Rank (log scale) 1995 1996 1997 1998 1999 2000 2001 2002 2003

Impact -0.84 Trend: it is increasingly interesting to aggregate the capabilities of the machines in the tail of this distribution. A virtual machine that aggregates the last 10 in Top500 would rank 32 nd in 95 but 14 th in 03 Both Grid and P2P computing are results of this trend: Grids: focus on assembling (a relatively small number of) resources to enable controlled, secure resource sharing P2P focus: scale, deployability. Parameter 'k' evolution. Challenge: design services that offer the best of both worlds complex, secure services, that deliver controlled QoS; are scalable and can be easily deployed. -0.68-0.70-0.72-0.74-0.76-0.78-0.80-0.82 1995 1996 1997 1998 1999 2000 2001 2002 2003

Roadmap P2P Definitions Impact Applications Mechanisms PlanetLab Why is this interesting for you? Compare resource management in PlanetLab and Globus Wrap-Up

P2P Definition(s) Def 1: A class of applications that take advantage of resources (e.g., storage, cycles, content) available at the edge of the Internet. Edges often turned off, without permanent IP addresses, etc. Def 2: A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric. Lots of other definitions that fit in between

P2P Impact (1) Widespread adoption leading to KaZaA 170 millions downloads (3.5M/week) one of the most popular applications ever! (almost) zero-cost data distribution is forcing companies to change their business models FastTrack might impact copyright laws imesh edonkey DirectConnect Blubster Gnutella Cvernet 4,114,120 1,375,199 569,097 136,552 97,128 92,678 91,750 Source: www.slyck.com, 2/4/2003

P2P Impact (2) Killer application for broadband to consumers P2P generated traffic may be the single largest contributor to Internet traffic today 100% 80% 60% Internet2 traffic statistics Other Data transfers Unidentified File sharing 40% 20% 0% Feb.'02 Aug.'02 Feb.'03 Aug.'03 Feb. '04 July'04 Source: www.internet2.edu

P2P Impact (3) A huge pool of underutilized resources lays around, users are willing to donate these resources Seti@Home $1.5M / year in additional power consumed which can be put to work efficiently (at least for some types of applications) Users Results received Total CPU Time Floating point operations Total 4,236,090 764M 1.3M years Last 24 hours 2,365 1.13M 1.3K years 51.4 TFLOPS Source: Seti@Home website, Oct. 2003

Roadmap P2P Definitions Impact Applications Mechanisms PlanetLab Why is this interesting for you? Compare resource management in PlanetLab and Globus Wrap-Up

Applications: Number crunching Examples: Seti@Home, Entropia, UnitedDevices, DistributedScience, many others Approach suitable for a particular class of problems. Massive parallelism Low bandwidth/computation ratio Error tolerance Problems: Centralized. How to extend the model to problems that are not massively parallel? Relevant to Grid space: Valuable resources at the network edge Shows ability to operate in an environment with limited trust in participating resources

Applications: File sharing The killer application to date Too many to list them all: Napster, FastTrack (KaZaA, imesh), Gnutella (LimeWire, Morpheus, BearShare), Relevant to Grid space: Building a reliable, high-performance (?) data-delivery service using a large set of unreliable components.

Applications: Content Streaming Streaming: the user plays the data as as it arrives Possible solution: The first few users get the stream from the server New users get the stream from the server or from users who are already receiving the stream P2P approach source Oh, I am exhausted! Client/server approach Relevant to Grid space: offload part of the server load to consumers to improve scalability

Applications: Performance benchmarking Problem: Evaluate the performance of your Web site form end-user perspective Multiple views on your site performance Generate Internet statistics Connectivity statistics Routing errors, routing maps Relevance to Grid space: Grid clients are heterogeneous, geographically dispersed How do I benchmark my services for this set of consumers?

End-to-end Performance Benchmarking Back-end Infrastructure Network Landscape Last-mile Blind Spot Database App server Web server 3 rd party content Firewall Backbone Backbone Regional Network ISP ISP Major Provider Enterprise Provider Local ISP T1 Corporate Network Corporate User Component Testing Datacenter Monitoring Consumer User End-to-end Web Performance Testing Slide source: www.porivo.com

Many other P2P applications Backup storage (HiveNet, OceanStore) Collaborative environments (Groove Networks) Instant messaging (Yahoo, AOL) Web serving communities (userv) Spam filtering Anonymous email Censorship-resistant publishing systems (Ethernity, Freenet)

Mechanisms To obtain a resilient system: integrate multiple components with uncorrelated failure curves, and use data and service replication. To improve delivered QoS: move service delivery closer to the user, and integrate multiple providers with uncorrelated demand curves (reduces over-provisioning for peak loads) To generate meaningful statistics, to detect anomalies: Provide views from multiple vantage points To provide anonymity: use large number of nodes, hide in the crowd, and make search costly (or impossible)

Roadmap P2P Definitions Impact Applications Mechanisms PlanetLab Why is this interesting for you? Compare resource management in PlanetLab and Globus Wrap-Up

PlanetLab Testbed to experiment with your P2P/Grid applications. >400 nodes, >150 sites, 452 nodes 162 sites 450 research projects Funding: Intel, HP, Princeton University, View presented to users: a distributed set of VMs Allocation unit: a slice = a set of virtual machines (VM), one VM at each node. Slice K Slice K Slice K Slice K OS OS OS OS

PlanetLab usage examples Stress-test Globus Replica Location Service GSLab: a playground for experimenting with grid-services users have N machines already configured with Globus one mouse-click away Better-than-Internet services: multipath TCP (mtcp) Resilient Overlays Multicast Overlays

Why should you find PlanetLab interesting? 1. Open, large-scale testbed to test your P2P applications or your Grid services 2. Solves a similar problem to Grids/Globus: building virtual organizations (or resource federations) Grids: testbeds (deployments of hardware and software) to solve computational problems. PlanetLab: testbed to play with CS applications Main problem for both: enable resource sharing across multiple administrative domains

But Roadmap: Attempt to trace differences back to starting assumptions on: user communities, applications, resources. Look at mechanisms to build VO.

Assumptions: User communities PlanetLab: users are CS scientists that experiment with and deploy infrastructure services. Globus: users come from a more diverse pool of scientists that are interested to run efficiently their (enduser) applications. Implication: functionality offered User applications Globus OS User applications PL Services PlanetLab. OS

Assumptions: Application characteristics Different view on geographical resource distribution: PlanetLab services: «distribution is a goal» leverage multiple vantage points for network measurements, exploit uncorrelated failure rates in large sets of components, Grid applications: «distribution is a fact of life» generally resource distribution is a fact of life: a result of how the VO was assembled (due to administrative constraints). Implication: mechanism design for resource allocation

Assumptions: Resources PlanetLab mission as testbed for a new class of networked services allows for little HW/SW heterogeneity. Globus is supported for large set architectures; for sites with multiple security requirements Implications: complexity, development speed

Assumptions: Resource ownership PlanetLab Goal: individual sites retain control over their resources PlanetLab limits the autonomy of individual sites in a number of ways: VO admins: Root access, Remote power button Sites: Limited choice of OS, security infrastructure Globus imposes fewer limits on site autonomy Requires fewer privileges (also can run in user space, ) PlanetLab emphasizes global coordination over local autonomy to a greater degree than Globus Implications: ease to manage and evolve the testbed Control at VO level Globus Individual site autonomy

Building Virtual Organizations Individual node/site functionality Mechanisms at the aggregate level Security infrastructure Delegation mechanisms Resource allocation and scheduling Resource discovery, monitoring, and selection.

Delegation mechanisms: Identity Delegation Broker/scheduler usage scenario: User A submits a job to S1 which, in turn, submits to S2. S2 needs to do accounting or make authorization decisions based on the identity that originated the job (A). Globus Implementation based on delegated X.509 proxies PlanetLab None

Delegation mechanisms: Delegating the Right to Resources Broker/scheduler usage scenario: User A acquires capabilities from various brokers then submits the job. Globus/GGF WS-Agreements protocols: To represent contracts between providers and consumers. Local enforcement mechanism is not specified The two efforts are complementary! PlanetLab Individual nodes managers hand out capabilities: akin to timelimited reservations Capabilities can be traded Extra layer to: provide secure transfer, prevent double spending, offer external representation

Global resource allocation and scheduling Globus PlanetLab Users Application Managers Brokers / Agents Node Managers Nodes Identity delegation Sends job descriptions Resource usage delegation Sends capabilities (leases)

Functionality & infrastructure Wrap-up Grids Convergent Environment: Large, Dynamic, Self-Configuring Grids Large scale: o Weak trust assumptions o Ease of integration Intermittent resource participation Little technical support Diversity in shared resources Build infrastructures to support diverse applications P2P Scale & volatility

What can Grids learn from P2P? Scalability Light-weight implementations Aggregate desktops and other small resources Intermittent operation, highly dynamic connectivity Autonomy

More information Links to papers, tech-reports, slides: http://dsl.cs.uchicago.edu http://www.cs.uchicago.edu/~matei/p2p/ Thank you.