Distributed Meta-data Servers: Architecture and Design. Sarah Sharafkandi David H.C. Du DISC

Similar documents
A Survey of Peer-to-Peer Content Distribution Technologies

Distributed Information Processing

Architectures for Distributed Systems

CSE 5306 Distributed Systems

March 10, Distributed Hash-based Lookup. for Peer-to-Peer Systems. Sandeep Shelke Shrirang Shirodkar MTech I CSE

CSE 5306 Distributed Systems. Naming

Peer-to-Peer Systems. Chapter General Characteristics

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Peer-to-peer computing research a fad?

Distributed Hash Table

Making Gnutella-like P2P Systems Scalable

Early Measurements of a Cluster-based Architecture for P2P Systems

Telecommunication Services Engineering Lab. Roch H. Glitho

Network Definition A network can be defined as two or more computers connected together in such a way that they can share resources.

Chapter 10: Peer-to-Peer Systems

LessLog: A Logless File Replication Algorithm for Peer-to-Peer Distributed Systems

Kademlia: A P2P Informa2on System Based on the XOR Metric

CSE 124 Finding objects in distributed systems: Distributed hash tables and consistent hashing. March 8, 2016 Prof. George Porter

A Scalable Content- Addressable Network

Searching for Shared Resources: DHT in General

Searching for Shared Resources: DHT in General

Bayeux: An Architecture for Scalable and Fault Tolerant Wide area Data Dissemination

Unit 8 Peer-to-Peer Networking

Distributed Knowledge Organization and Peer-to-Peer Networks

EE 122: Peer-to-Peer (P2P) Networks. Ion Stoica November 27, 2002

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Agent and Object Technology Lab Dipartimento di Ingegneria dell Informazione Università degli Studi di Parma. Distributed and Agent Systems

RBF: A New Storage Structure for Space- Efficient Queries for Multidimensional Metadata in OSS

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Introduction to Peer-to-Peer Systems

Database Management Systems

DHT Overview. P2P: Advanced Topics Filesystems over DHTs and P2P research. How to build applications over DHTS. What we would like to have..

CS 640 Introduction to Computer Networks. Today s lecture. What is P2P? Lecture30. Peer to peer applications

殷亚凤. Naming. Distributed Systems [5]

Asynchronous Logging and Fast Recovery for a Large-Scale Distributed In-Memory Storage

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

Naming. Distributed Systems IT332

Assignment 5. Georgia Koloniari

Telematics Chapter 9: Peer-to-Peer Networks

Peer- Peer to -peer Systems

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Opportunistic Application Flows in Sensor-based Pervasive Environments

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 2 ARCHITECTURES

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Chord: A Scalable Peer-to-peer Lookup Service For Internet Applications

Scalability In Peer-to-Peer Systems. Presented by Stavros Nikolaou

CIS 700/005 Networking Meets Databases

A LOAD BALANCING ALGORITHM BASED ON MOVEMENT OF NODE DATA FOR DYNAMIC STRUCTURED P2P SYSTEMS

Locality in Structured Peer-to-Peer Networks

L3S Research Center, University of Hannover

A Hybrid Peer-to-Peer Architecture for Global Geospatial Web Service Discovery

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

Efficient Resource Management for the P2P Web Caching

MPLS, THE BASICS CSE 6067, UIU. Multiprotocol Label Switching

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

CS November 2018

Naming in Distributed Systems

Design and Implementation of a Distributed Object Storage System on Peer Nodes. Roger Kilchenmann. Diplomarbeit Von. aus Zürich

Building a low-latency, proximity-aware DHT-based P2P network

LECT-05, S-1 FP2P, Javed I.

PEER-TO-PEER (P2P) systems are now one of the most

! Naive n-way unicast does not scale. ! IP multicast to the rescue. ! Extends IP architecture for efficient multi-point delivery. !

Overview Computer Networking Lecture 16: Delivering Content: Peer to Peer and CDNs Peter Steenkiste

Decentralized Object Location In Dynamic Peer-to-Peer Distributed Systems

Challenges in the Wide-area. Tapestry: Decentralized Routing and Location. Global Computation Model. Cluster-based Applications

Peer-to-Peer Internet Applications: A Review

P2P Network Structured Networks (IV) Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

P2P Network Structured Networks: Distributed Hash Tables. Pedro García López Universitat Rovira I Virgili

Peer-to-Peer Signalling. Agenda

Distributed Systems Final Exam

P2P: Distributed Hash Tables

EARM: An Efficient and Adaptive File Replication with Consistency Maintenance in P2P Systems.

Microsoft SQL Server

Content Overlays. Nick Feamster CS 7260 March 12, 2007

What s in Installing and Configuring Windows Server 2012 (70-410):

Data Replication under Latency Constraints Siu Kee Kate Ho

Page 1. Key Value Storage"

Slides for Chapter 10: Peer-to-Peer Systems

ICT 6544 Distributed Systems Lecture 2: ARCHITECTURES

The Google File System

EECS 122: Introduction to Computer Networks Overlay Networks and P2P Networks. Overlay Networks: Motivations

Athens University of Economics and Business. Dept. of Informatics

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Naming WHAT IS NAMING? Name: Entity: Slide 3. Slide 1. Address: Identifier:

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

0!1. Overlaying mechanism is called tunneling. Overlay Network Nodes. ATM links can be the physical layer for IP

THE DATACENTER AS A COMPUTER AND COURSE REVIEW

Chapter 11: File System Implementation. Objectives

Three Layer Hierarchical Model for Chord

CompSci 356: Computer Network Architectures Lecture 21: Overlay Networks Chap 9.4. Xiaowei Yang

Distributed K-Ary System

Deployment Models for Cisco UMG

Simulations of Chord and Freenet Peer-to-Peer Networking Protocols Mid-Term Report

Drafting Behind Akamai (Travelocity-Based Detouring)

DISTRIBUTED SYSTEMS CSCI 4963/ /4/2015

CS November 2017

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Mapping a Dynamic Prefix Tree on a P2P Network

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

DIVING IN: INSIDE THE DATA CENTER

Transcription:

Distributed Meta-data Servers: Architecture and Design Sarah Sharafkandi David H.C. Du DISC 5/22/07 1

Outline Meta-Data Server (MDS) functions Why a distributed and global Architecture? Problem description Current distributed solutions Proposed architecture Challenges for the proposed architecture

Metadata server responsibilities Map a Globally Unique Identifier (GUID) to each object based on its local directory and user Given the GUID of each object, find the nearby copy Maintain object deletion, migration, versioning, locking Access permissions (at the MDS level) Answer queries from search applications

Why a distributed and global infrastructure? Scalability: Number of objects in enterprises, is growing fast A centralized solution simply does not scale well Accessibility: accessible by clients from anywhere several copies from an object: Global infrastructure Policy based support: Redundant copies, backup policies, fault tolerance policies, QoS requirement, Security policy

Problem description Design a scalable and efficient Distributed MDS (DMDS) structure which provides lookup services in an extremely large scale and preserves : Locality (Finding a nearby copy) Object Replication, migration Data consistency Load balancing Versioning

Current distributed solutions Distributed Hash Table: Application: Peer to peer networks(p2p) (Chord, CAN) where the structure is very dynamic (nodes join and leave frequently) Design: Both object id and node id are randomly hashed to the same id space Advantage: scalable (but not at the scale of global file system) Disadvantage: 1. Don t preserve the locality 2. Usually load balancing is not considered. Hot spots can occur.

Current distributed solutions (cont.) OceanStore Application: Global file system: Design: For look up uses P2P solutions: Tapestry Advantage: Versioning and update support for objects Disadvantage: High cost of publishing the pointer for each object The locality is preserved with high cost

How about Internet? Domain Name System: Application : Internet Design: A Hierarchical structure Resolves names to the IP addresses Advantage : Simple Disadvantage: File naming depends on the location of the file if the file migrates the address is not valid anymore

DMDS Versus P2P 1. Dynamic: Nodes join and leave the network frequently P2P DMDS 1. More Rigid 2. No Data Consistency 2. Data Consistency 3.Locality may not be maintained 4. There is no knowledge about user rolls and groups 3.Locality is maintained 4.Knowledge about user rolls and groups is available

Proposed Architecture for DMDS Topology: A two layer topology First layer : Hierarchical structure in low levels based on : Proximity User groups Second layer: An overlay structure

Proposed topology Layer 2 Layer 1 Two layer Architecture: Layer 1: hierarchical structure, Layer 2 : Overlay structure

Two layer topology (cont.) Layer 1: Hierarchical structure based on: Proximity: Geographical location User groups: Users are members of groups based on their file requests and their role in the company Each user may be in multiple groups and each group may be in different sites Roles and groups may change dynamically according to the changes in user s demand Layer 2: An overlay infrastructure p2p techniques for search can be used

Advantages of two layer design: Scalable Flexible structure Locality awareness at layer 1 Efficient search by user grouping Avoid long search paths using the overlay structure The popular objects can be replicated at overlay layer

Challenges: What is the optimal number of levels at layer 1? How many overlay nodes are needed? How to preserve data consistency with low cost? How to maintain the security? How to provide a good balance between proximity and user groups? How to migrate data based on the object popularity?

Thank you!