Networking Recap Storage Intro. CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

Similar documents
Cloud Computing. Lecture 2: Leaf-Spine and PortLand Networks

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Utilizing Datacenter Networks: Centralized or Distributed Solutions?

15-744: Computer Networking. Data Center Networking II

Lecture 7: Data Center Networks

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

The Google File System

Computer Systems Laboratory Sungkyunkwan University

Google File System. Arun Sundaram Operating Systems

Lecture 9: Bridging & Switching"

Some portions courtesy Srini Seshan or David Wetherall

Lecture 9: Bridging. CSE 123: Computer Networks Alex C. Snoeren

BGP IN THE DATA CENTER

CSE 123: Computer Networks Alex C. Snoeren. HW 2 due Thursday 10/21!

Migrating Oracle Databases To Cassandra

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Chapter 17: Distributed Systems (DS)

The Google File System

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Windows Azure Services - At Different Levels

Ceph: A Scalable, High-Performance Distributed File System

The Google File System

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Data Center Network Topologies II

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Google File System. By Dinesh Amatya

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Principles behind data link layer services:

Principles behind data link layer services:

Themes. The Network 1. Energy in the DC: ~15% network? Energy by Technology

Distributed File Systems II

Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani

MongoDB Schema Design for. David Murphy MongoDB Practice Manager - Percona

Distributed Data Store

Routing Algorithms. Review

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Chapter 6. Storage and Other I/O Topics

Aerospike Scales with Google Cloud Platform

Storage. Hwansoo Han

EECS 498 Introduction to Distributed Systems

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

The Google File System (GFS)

Computer Networks. Sándor Laki ELTE-Ericsson Communication Networks Laboratory

Dynamic Metadata Management for Petabyte-scale File Systems

A Guide to Architecting the Active/Active Data Center

Principles behind data link layer services

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Network Design Considerations for Grid Computing

Example Networks on chip Freescale: MPC Telematics chip

Changing Requirements for Distributed File Systems in Cloud Storage

CSC 5930/9010 Cloud S & P: Virtualization

Building Consistent Transactions with Inconsistent Replication

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

measurement goals why traffic measurement of Internet is so hard? measurement needs combined skills diverse traffic massive volume of traffic

Disk Scheduling COMPSCI 386

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Distributed Filesystem

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Design Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013

6.033 Spring 2015 Lecture #11: Transport Layer Congestion Control Hari Balakrishnan Scribed by Qian Long

Operating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Datacenter Backbone Enterprise Cellular Wireless

TDT Appendix E Interconnection Networks

CA485 Ray Walshe NoSQL

Brocade Ethernet Fabrics

CA464 Distributed Programming

Enabling High Performance Data Centre Solutions and Cloud Services Through Novel Optical DC Architectures. Dimitra Simeonidou

More on Link Layer. Recap of Last Class. Interconnecting Nodes in LAN (Local-Area Network) Interconnecting with Hubs. Computer Networks 9/21/2009

Chapter 11: File System Implementation. Objectives

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Network Layer (Routing)

Data Center Interconnect Solution Overview

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

The Google File System

Veritas Storage Foundation for Windows by Symantec

PowerVault MD3 SSD Cache Overview

CSE 123A Computer Networks

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

10. Replication. Motivation

Architecting the High Performance Storage Network

Lecture 27 Programming parallel hardware" Suggested reading:" (see next slide)"

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Scalable Cache Coherence

Distributed File Systems

Medium Access Protocols

Assignment 5. Georgia Koloniari

Mellanox Virtual Modular Switch

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Congestion Management in Lossless Interconnects: Challenges and Benefits

Data Center Fundamentals: The Datacenter as a Computer

OceanStor 9000 InfiniBand Technical White Paper. Issue V1.01 Date HUAWEI TECHNOLOGIES CO., LTD.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Transcription:

Networking Recap Storage Intro CSE-291 (Cloud Computing), Fall 2016 Gregory Kesden

Networking Recap Storage Intro

Long Haul/Global Networking Speed of light is limiting; Latency has a lower bound (.) Throughput is a cost, but not otherwise a limitation Running more fiber optic is technologically reasonable, within reason Demand will drive growth

Vs. Networking Within A Data Center Speed of light not limiting, short distance Serialization can be limiting, as bits clocked into and off of media Parallelization helps, but quickly limited in practice Bandwidth (Throughput) needs to be managed, both by distributing load and relieving bottlenecks Task drives access pattern, drives hot spots

Traditional 3-Tier Network Convenient for distribution All links same width More links at lower layers = More bandwidth lower layers Works if strong locality at leaf (access layer) Challenged if leaf-to-leaf up-and-down patterns Challenged if request distributed and wait reply (inflow) Maximizing work within time budget means replies at same time collide

Fat Tree Topology Multi-Rooted Tree = All paths same width Only useful if load distributed Doesn t work if all pick same spanning tree Can distribute via Layer-1,-2, or -3 techniques

Utilizing Multiple Paths Layer-2 Forwarding tables; Consider PortLand Hard to be adaptive Layer-3 ECMP; Can do via routing Suffer collision of paths under load Slow to adapt, slow implementation Handles failure slowly Layer-4 Schedule flows Not all flows are equal Can change Big vs small can t be balance DCTCP; Multiplex flows Positive Feedback to balance, not based upon interpreting drops. Adaptive, balanceable

PortLand Uses FatTree Topology Provides way of using multiple paths Separates Identifier (IP address) from Locator (PMAC) Facilitates migration (Benefit over other solutions) Fabric Manager enables failure management, mapping, etc Link Discovery enables Auto-Configuration

Clos Networks Obvious benefit to flatter, all-to-all Connectivity Presently enables 2-Layer topologies in many cases Fabric has limits. Only possible to a point. Scale may more commonly outgrow fabric in near future leading to migration b ack to 3-Tier Manage scale limitations by using locally within pods, among cores, etc. Provides local density where needed, without blowing up interconnectivity scale globally

Software Defined Networks (SDNs) Separate Control Plane (Management) from Data Plane (Forwarding) Enable central management of entire control plane Essentially makes a programmable network Enables management applications with appropriate abstraction for use case (enterprise, public cloud, etc) and management goal (traffic management, isolation, etc) Dramatically reduces management overhead, correctness, and security This is a killer app, by itself, at scale Critically important for virtualization Enable on-demand management, and real-time adaptation to failure. Replace multiple types of devices with single, generic, programmable devices

Networking Recap Storage Intro

Types of Storage General Purpose File System Special Purpose File System NoSQL Database (Key-Value Store, Column-oriented, Etc) Relational Databases, i.e. SQL In Memory (Critical)

General Purpose File System Emulate traditional file systems at larger scale and/or distributing near users E.g., Manage named user data with hierarchical access and protections, Andrew File System (AFS) is classical global example, many more like NFS approximate at DC or campus scale Friendly for end users, but how useful? Commonness of global use case Can be very distant with reasonable latency. Disk = 10mS; Network = 10mS/1,000mi (very approximate) High cost Synchronization, concurrency, etc

Special Purpose File System Relax constraints to limit complexity Accessing programmatically? Use index, drop directory hierarchy. Upload-based creation? Version and don t support edits. Limits concurrency problem to version number Logging? Append-only writes. Limits concurrency problem to allocation. Single application? Access model limited to roles, let app manage permissions Stale okay? Easier replication. Etc. Relieving concurrency problems Improves caching, replication Decreases latency

NoSQL Databases Break up limitations associated with table (relation) CAP: Trade consistency for accessibility and partition tolerance. Speed at scale Not likely ACID Can do some SQL-like things, but some are hard, e.g. generalized join Many kinds, e.g. key-value, column-oriented, etc

In-Memory When reacting to human users in real time, time budget is limited Trip from client to service, trip from front-end to back-ends, processing, trip from back-ends to frontend, processing, trip back to client, etc. There isn t time to constantly hit disk in many ways Would need a huge amount of disk time to be able to sustain throughout at scale What is reasonable response time for human user? 0.5 second (500 ms)? 75mS round trip to east coast The rest is data center latency, queuing for services, and actual work Best solution is to have the answer waiting Or, at least the components of the answer Sadly, sometimes block for slowest component