Cluster-Level Google How we use Colossus to improve storage efficiency
|
|
- Buddy Harmon
- 10 months ago
- Views:
Transcription
1 Cluster-Level Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer November 13, 2017 Keynote at the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Intensive Computing Systems
2
3 What do you call a few PB of free space?
4 What do you call a few PB of free space? An emergency low disk space condition
5 Typical Cluster: 10s of thousands of machines PB of distributed HDD Optional multi-tb local SSD 10 GB/s bisection bandwidth
6 Part 1: Transition From GFS to Colossus
7 GFS architectural problems GFS master One machine not large enough for large FS Single bottleneck for metadata operations Fault tolerant, not HA Predictable performance No guarantees of latency
8 Some obvious GFSv2 goals Bigger! Faster! More predictable tail latency GFS master replaced by Colossus GFS chunkserver replaced by D
9 Solve an easier problem A file system for Bigtable Append-only Single-writer (multi-reader) No snapshot / rename Directories unnecessary Where to put metadata?
10 Storage options back then GFS Sharded MySQL with local disk & replication Ads databases Local key-value store with Paxos replication Chubby Bigtable (sorted key-value store on GFS)
11 Storage options back then GFS Sharded MySQL lacks useful database features poor load balancing, complicated Local key-value store doesn t scale Bigtable hmmmmmm.
12 Why Bigtable? Bigtable solves many of the hard problems: Automatically shards data across tablets Locates tablets via metadata lookups Easy to use semantics Efficient point lookups and scans File system metadata kept in an in-memory locality group
13 Metadata in Bigtable (!?!?) Application Bigtable (XX,XXX tabletservers) METADATA user1 tablets metadata user2 tablets... data CFS Bigtable (XXX tabletservers) XX,XXX D chunkservers METADATA GFS metadata GFS master FS META GFS data XXX GFS chunkservers Note: GFS still present, storing file system metadata
14 GFS master -> CFS CFS curators run in Bigtable tablet servers /cfs/ex-d/home/denis/myfile is-finalized? mtime, ctime,... encoding r=3.2 stripe 0, checksum, length Bigtable row corresponds to a single file chunk0 chunk1 chunk2 stripe 1, checksum, length Stripes are replication groups: open, closed, finalized chunk0 chunk1 chunk2 stripe 2, OPEN chunk0 chunk1 chunk2
15 Colossus for metadata? Metadata is ~1/10000 the size of data So if we host a Colossus on Colossus 100 PB data 10 TB metadata 10TB metadata 1GB metametadata 1GB metametadata 100KB meta... And now we can put it into Chubby!
16 Part 2: Colossus and Efficient Storage
17 Themes Colossus enables scale, declustering Complementary applications cheaper storage Placement of data, IO balance is hard
18 What s a cluster look like? Machine 1 Machine 2 Machine XX000 YouTube Serving Ads MapReduce YouTube MapReduce GMail Bigtable YouTube Serving CFS Bigtable D Server D Server D Server
19 Let s talk about money Total Cost of Ownership TCO encompasses much more than the retail price of a disk A denser disk might sell at a premium $/GB but still cheaper to deploy (power, connection overhead, repairs)
20 The ingredients of storage TCO Most importantly, we care about storage TCO, not disk TCO. Storage TCO is the cost of data durability and its availability, and the cost of serving it We minimize total storage TCO if we keep the disk full and busy
21 What disk should I buy? Which disks should I buy? We ll have a mix because we re growing We have an overall goal for IOPS and capacity We select disks to bring the cluster and fleet closer to our overall mix
22 What we want cold data cold data hot data hot data small disk big disk Equal amounts of hot data (spindle is busy) Rest of disk filled with cold data (disks are full)
23 How we get it Colossus rebalances old, cold data...and distributes newly written data evenly across disks
24 When stuff works well Each box is a D server Sized by disk capacity Colored by spindle utilization
25 Rough scheme Buy flash for caching to bring IOPS/GB into disk range Buy disks for capacity and fill them up Hope that the disks are busy otherwise we bought too much flash but not too busy If we buy disks for IOps, byte improvements don t help If cold bytes grow infinitely, we have lots of IO capacity
26 Filling up disks is hard Filesystem doesn t work well when 100% full Can t remove capacity for upgrades and repairs without empty space Individual groups don t want to run near 100% of quota Administrators are uncomfortable with statistical overcommit Supply chain uncertainty
27 Applications must change Unlike almost anything else in our datacenters, disk I/O cost is going up Applications that want more accesses than HDDs offer probably need to think about making their hot data hotter (so flash works well) and cold data colder An application written X years ago might cause us to buy smaller disks, increasing storage costs
28 Conclusion Colossus has been extremely useful for optimizing our storage efficiency Metadata scaling enables declustering of resources Ability to combine disks of various sizes and workloads of varying types is very powerful Looking forward, I/O cost trends will require both applications and storage systems to evolve
29 Thank you! Denis Serenyi
Distributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
The Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system
CA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
Google File System. By Dinesh Amatya
Google File System By Dinesh Amatya Google File System (GFS) Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung designed and implemented to meet rapidly growing demand of Google's data processing need a scalable
The Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
The Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung December 2003 ACM symposium on Operating systems principles Publisher: ACM Nov. 26, 2008 OUTLINE INTRODUCTION DESIGN OVERVIEW
18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Nov 01 09:53:32 2012 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2012 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 15. Distributed File Systems Paul Krzyzanowski Rutgers University Fall 2017 1 Google Chubby ( Apache Zookeeper) 2 Chubby Distributed lock service + simple fault-tolerant file system
! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016
Distributed Systems 05r. Case study: Google Cluster Architecture Paul Krzyzanowski Rutgers University Fall 2016 1 A note about relevancy This describes the Google search cluster architecture in the mid
Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao
Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement
Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897
Flat Datacenter Storage Edmund B. Nightingale, Jeremy Elson, et al. 6.S897 Motivation Imagine a world with flat data storage Simple, Centralized, and easy to program Unfortunately, datacenter networks
Optimizing the Data Center with an End to End Solutions Approach
Optimizing the Data Center with an End to End Solutions Approach Adam Roberts Chief Solutions Architect, Director of Technical Marketing ESS SanDisk Corporation Flash Memory Summit 11-13 August 2015 August
The Google File System
The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
NetApp SolidFire and Pure Storage Architectural Comparison A SOLIDFIRE COMPETITIVE COMPARISON
A SOLIDFIRE COMPETITIVE COMPARISON NetApp SolidFire and Pure Storage Architectural Comparison This document includes general information about Pure Storage architecture as it compares to NetApp SolidFire.
Isilon Performance. Name
1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.
Extreme Computing. NoSQL.
Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable
Google File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
Cloud Computing. What is cloud computing. CS 537 Fall 2017
Cloud Computing CS 537 Fall 2017 What is cloud computing Illusion of infinite computing resources available on demand Scale-up for most apps Elimination of up-front commitment Small initial investment,
Google Disk Farm. Early days
Google Disk Farm Early days today CS 5204 Fall, 2007 2 Design Design factors Failures are common (built from inexpensive commodity components) Files large (multi-gb) mutation principally via appending
CSE-E5430 Scalable Cloud Computing Lecture 9
CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay
How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1
MySQL UC 2010 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul MySQL UC 2010 How Fractal Trees Work 2 More Information You can download this talk and others at http://tokutek.com/technology
CS5412: OTHER DATA CENTER SERVICES
1 CS5412: OTHER DATA CENTER SERVICES Lecture V Ken Birman Tier two and Inner Tiers 2 If tier one faces the user and constructs responses, what lives in tier two? Caching services are very common (many
Distributed Systems [Fall 2012]
Distributed Systems [Fall 2012] Lec 20: Bigtable (cont ed) Slide acks: Mohsen Taheriyan (http://www-scf.usc.edu/~csci572/2011spring/presentations/taheriyan.pptx) 1 Chubby (Reminder) Lock service with a
Today s Papers. Array Reliability. RAID Basics (Two optional papers) EECS 262a Advanced Topics in Computer Systems Lecture 3
EECS 262a Advanced Topics in Computer Systems Lecture 3 Filesystems (Con t) September 10 th, 2012 John Kubiatowicz and Anthony D. Joseph Electrical Engineering and Computer Sciences University of California,
The Google File System GFS
The Google File System GFS Common Goals of GFS and most Distributed File Systems Performance Reliability Scalability Availability Other GFS Concepts Component failures are the norm rather than the exception.
Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612
Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very
Scalable Concurrent Hash Tables via Relativistic Programming
Scalable Concurrent Hash Tables via Relativistic Programming Josh Triplett September 24, 2009 Speed of data < Speed of light Speed of light: 3e8 meters/second Processor speed: 3 GHz, 3e9 cycles/second
BigTable. CSE-291 (Cloud Computing) Fall 2016
BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes
Advanced Architectures for Oracle Database on Amazon EC2
Advanced Architectures for Oracle Database on Amazon EC2 Abdul Sathar Sait Jinyoung Jung Amazon Web Services November 2014 Last update: April 2016 Contents Abstract 2 Introduction 3 Oracle Database Editions
Storage Optimization with Oracle Database 11g
Storage Optimization with Oracle Database 11g Terabytes of Data Reduce Storage Costs by Factor of 10x Data Growth Continues to Outpace Budget Growth Rate of Database Growth 1000 800 600 400 200 1998 2000
Building Durable Real-time Data Pipeline
Building Durable Real-time Data Pipeline Apache BookKeeper at Twitter @sijieg Twitter Background Layered Architecture Agenda Design Details Performance Scale @Twitter Q & A Publish-Subscribe Online services
What Is Datacenter (Warehouse) Computing. Distributed and Parallel Technology. Datacenter Computing Architecture
What Is Datacenter (Warehouse) Computing Distributed and Parallel Technology Datacenter, Warehouse and Cloud Computing Hans-Wolfgang Loidl School of Mathematical and Computer Sciences Heriot-Watt University,
How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010.
6.172 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul Guest Lecture in MIT 6.172 Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 2 I m an MIT
CS5412: DIVING IN: INSIDE THE DATA CENTER
1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs
A Practical Scalable Distributed B-Tree
A Practical Scalable Distributed B-Tree CS 848 Paper Presentation Marcos K. Aguilera, Wojciech Golab, Mehul A. Shah PVLDB 08 March 8, 2010 Presenter: Evguenia (Elmi) Eflov Presentation Outline 1 Background
W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b l e d N e x t - G e n e r a t i o n V N X
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R U n l o c k i n g t h e P o w e r o f F l a s h w i t h t h e M C x - E n a b
Decentralized Distributed Storage System for Big Data
Decentralized Distributed Storage System for Big Presenter: Wei Xie -Intensive Scalable Computing Laboratory(DISCL) Computer Science Department Texas Tech University Outline Trends in Big and Cloud Storage
Gecko: Contention-Oblivious Disk Arrays for Cloud Storage
Gecko: Contention-Oblivious Disk Arrays for Cloud Storage Ji-Yong Shin Cornell University In collaboration with Mahesh Balakrishnan (MSR SVC), Tudor Marian (Google), and Hakim Weatherspoon (Cornell) FAST
Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud
Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud Sudipto Das, Divyakant Agrawal, Amr El Abbadi Report by: Basil Kohler January 4, 2013 Prerequisites This report elaborates and
Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale
Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale What You Will Learn Cisco and Scality provide a joint solution for storing and protecting file, object, and OpenStack data at
Red Hat Gluster Storage performance. Manoj Pillai and Ben England Performance Engineering June 25, 2015
Red Hat Gluster Storage performance Manoj Pillai and Ben England Performance Engineering June 25, 2015 RDMA Erasure Coding NFS-Ganesha New or improved features (in last year) Snapshots SSD support Erasure
CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.
Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message
Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process
Hyper-converged Secondary Storage for Backup with Deduplication Q & A The impact of data deduplication on the backup process Table of Contents Introduction... 3 What is data deduplication?... 3 Is all
Petal and Frangipani
Petal and Frangipani Petal/Frangipani NFS NAS Frangipani SAN Petal Petal/Frangipani Untrusted OS-agnostic NFS FS semantics Sharing/coordination Frangipani Disk aggregation ( bricks ) Filesystem-agnostic
Storage systems. Computer Systems Architecture CMSC 411 Unit 6 Storage Systems. (Hard) Disks. Disk and Tape Technologies. Disks (cont.
Computer Systems Architecture CMSC 4 Unit 6 Storage Systems Alan Sussman November 23, 2004 Storage systems We already know about four levels of storage: registers cache memory disk but we've been a little
Ambry: LinkedIn s Scalable Geo- Distributed Object Store
Ambry: LinkedIn s Scalable Geo- Distributed Object Store Shadi A. Noghabi *, Sriram Subramanian +, Priyesh Narayanan +, Sivabalan Narayanan +, Gopalakrishna Holla +, Mammad Zadeh +, Tianwei Li +, Indranil
Introduction to High Performance Parallel I/O
Introduction to High Performance Parallel I/O Richard Gerber Deputy Group Lead NERSC User Services August 30, 2013-1- Some slides from Katie Antypas I/O Needs Getting Bigger All the Time I/O needs growing
The Oracle Database Appliance I/O and Performance Architecture
Simple Reliable Affordable The Oracle Database Appliance I/O and Performance Architecture Tammy Bednar, Sr. Principal Product Manager, ODA 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved.
Chapter 11: File System Implementation. Objectives
Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block
Distributed Systems. 19. Spanner. Paul Krzyzanowski. Rutgers University. Fall 2017
Distributed Systems 19. Spanner Paul Krzyzanowski Rutgers University Fall 2017 November 20, 2017 2014-2017 Paul Krzyzanowski 1 Spanner (Google s successor to Bigtable sort of) 2 Spanner Take Bigtable and
RESEARCH DATA DEPOT AT PURDUE UNIVERSITY
Preston Smith Director of Research Services RESEARCH DATA DEPOT AT PURDUE UNIVERSITY May 18, 2016 HTCONDOR WEEK 2016 Ran into Miron at a workshop recently.. Talked about data and the challenges of providing
Introduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)
Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes) This week: Announcements PA2 Work-in-progress submission Next six weeks: Two labs and two projects
Apache Hadoop Goes Realtime at Facebook. Himanshu Sharma
Apache Hadoop Goes Realtime at Facebook Guide - Dr. Sunny S. Chung Presented By- Anand K Singh Himanshu Sharma Index Problem with Current Stack Apache Hadoop and Hbase Zookeeper Applications of HBase at
Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp.
Understanding Primary Storage Optimization Options Jered Floyd Permabit Technology Corp. Primary Storage Optimization Technologies that let you store more data on the same storage Thin provisioning Copy-on-write
Nutanix Complete Cluster Reference Architecture for Virtual Desktop Infrastructure
Nutanix Complete Cluster Reference Architecture for Virtual Desktop Infrastructure with VMware vsphere 5 and View 5 Table of Contents 1. Executive Summary...1 2. Introduction to Project Colossus...2 2.1.
Splunk is a great tool for exploring your log data. It s very powerful, but
Sysadmin David Lang David Lang is a site reliability engineer at Google. He spent more than a decade at Intuit working in the Security Department for the Banking Division. He was introduced to Linux in
Seminar Report On. Google File System. Submitted by SARITHA.S
Seminar Report On Submitted by SARITHA.S In partial fulfillment of requirements in Degree of Master of Technology (MTech) In Computer & Information Systems DEPARTMENT OF COMPUTER SCIENCE COCHIN UNIVERSITY
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
INFINIDAT Data Protection. White Paper
INFINIDAT Data Protection White Paper Abstract As data has taken on the role of being the lifeblood of business, protecting that data is the most important task IT has in the datacenter today. Data protection
Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment
Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment Case Study Order Number: 334534-002US Ordering Information Contact your local Intel sales representative for ordering
CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
FLASHARRAY//M Smart Storage for Cloud IT
FLASHARRAY//M Smart Storage for Cloud IT //M AT A GLANCE PURPOSE-BUILT to power your business: Transactional and analytic databases Virtualization and private cloud Business critical applications Virtual
Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads
Red Hat Ceph Storage and Samsung NVMe SSDs for intensive workloads Power emerging OpenStack use cases with high-performance Samsung/ Red Hat Ceph reference architecture Optimize storage cluster performance
RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University
RAMCloud and the Low- Latency Datacenter John Ousterhout Stanford University Most important driver for innovation in computer systems: Rise of the datacenter Phase 1: large scale Phase 2: low latency Introduction
Cloud Computing. DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech
Cloud Computing DB Special Topics Lecture (10/5/2012) Kyle Hale Maciej Swiech Managing servers isn t for everyone What are some prohibitive issues? (we touched on these last time) Cost (initial/operational)
LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data
LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Xingbo Wu Yuehai Xu Song Jiang Zili Shao The Hong Kong Polytechnic University The Challenge on Today s Key-Value Store Trends on workloads
Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
Spanner : Google's Globally-Distributed Database. James Sedgwick and Kayhan Dursun
Spanner : Google's Globally-Distributed Database James Sedgwick and Kayhan Dursun Spanner - A multi-version, globally-distributed, synchronously-replicated database - First system to - Distribute data
Cassandra Design Patterns
Cassandra Design Patterns Sanjay Sharma Chapter No. 1 "An Overview of Architecture and Data Modeling in Cassandra" In this package, you will find: A Biography of the author of the book A preview chapter
HCI: Hyper-Converged Infrastructure
Key Benefits: Innovative IT solution for high performance, simplicity and low cost Complete solution for IT workloads: compute, storage and networking in a single appliance High performance enabled by
Turning Object. Storage into Virtual Machine Storage. White Papers
Turning Object Open vstorage is the World s fastest Distributed Block Store that spans across different Datacenter. It combines ultrahigh performance and low latency connections with a data integrity that
Was ist dran an einer spezialisierten Data Warehousing platform?
Was ist dran an einer spezialisierten Data Warehousing platform? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Data warehousing, Exadata, specialized hardware proprietary hardware Introduction
White paper Version 3.10
White paper Version 3.10 Table of Contents About LizardFS 2 Architecture 3 Use Cases of LizardFS 4 Scalability 4 Hardware recommendation 6 Features 7 Snapshots 7 QoS 8 Data replication 8 Replication 9
Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015
Distributed Systems 18. MapReduce Paul Krzyzanowski Rutgers University Fall 2015 November 21, 2016 2014-2016 Paul Krzyzanowski 1 Credit Much of this information is from Google: Google Code University [no
Virtual Desktop Infrastructure (VDI) Bassam Jbara
Virtual Desktop Infrastructure (VDI) Bassam Jbara 1 VDI Historical Overview Desktop virtualization is a software technology that separates the desktop environment and associated application software from
Interface Trends for the Enterprise I/O Highway
Interface Trends for the Enterprise I/O Highway Mitchell Abbey Product Line Manager Enterprise SSD August 2012 1 Enterprise SSD Market Update One Size Does Not Fit All : Storage solutions will be tiered
IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide
V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication
Goal of the presentation is to give an introduction of NoSQL databases, why they are there.
1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in
FLASHARRAY//M Business and IT Transformation in 3U
FLASHARRAY//M Business and IT Transformation in 3U TRANSFORM IT Who knew that moving to all-flash storage could help reduce the cost of IT? FlashArray//m makes server and workload investments more productive,
COMP 273 Winter physical vs. virtual mem Mar. 15, 2012
Virtual Memory The model of MIPS Memory that we have been working with is as follows. There is your MIPS program, including various functions and data used by this program, and there are some kernel programs
10. Replication. Motivation
10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure
Flash Decisions: Which Solution is Right for You?
Flash Decisions: Which Solution is Right for You? A Guide to Finding the Right Flash Solution Introduction Chapter 1: Why Flash Storage Now? Chapter 2: Flash Storage Options Chapter 3: Choosing the Right
Programming Systems for Big Data
Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There
PowerVault MD3 SSD Cache Overview
PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS
Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
BigTable: A System for Distributed Structured Storage
BigTable: A System for Distributed Structured Storage Jeff Dean Joint work with: Mike Burrows, Tushar Chandra, Fay Chang, Mike Epstein, Andrew Fikes, Sanjay Ghemawat, Robert Griesemer, Bob Gruber, Wilson
EMC XtremIO All-Flash Applications. Sonny Aulakh VP, Sales Engineering November 2014
EMC XtremIO All-Flash Applications Sonny Aulakh VP, Sales Engineering XtremIO @sonnyaulakh November 2014 1 XtremIO #1 All-Flash Array in the Market Gartner Magic Quadrant Leader >$300,000,000
FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
HDFS: Hadoop Distributed File System. Sector: Distributed Storage System
GFS: Google File System Google C/C++ HDFS: Hadoop Distributed File System Yahoo Java, Open Source Sector: Distributed Storage System University of Illinois at Chicago C++, Open Source 2 System that permanently
SPEED, EFFICIENCY, & SCALE FOR DATA- INTENSIVE WORKLOADS
SPEED, EFFICIENCY, & SCALE FOR DATA- INTENSIVE WORKLOADS CISCO S MULTI-PRONGED STRATEGY IS DESIGNED TO ADDRESS EVOLVING CUSTOMER NEEDS EXECUTIVE SUMMARY The way companies think about data is changing dramatically
SoftNAS Cloud Performance Evaluation on AWS
SoftNAS Cloud Performance Evaluation on AWS October 25, 2016 Contents SoftNAS Cloud Overview... 3 Introduction... 3 Executive Summary... 4 Key Findings for AWS:... 5 Test Methodology... 6 Performance Summary
Big Data and Object Storage
Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity
Recall from Tuesday. Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS.
Paging 11/10/16 Recall from Tuesday Our solution to fragmentation is to split up a process s address space into smaller chunks. Physical Memory OS Process 3 Process 3 OS: Place Process 3 Process 1 Process
Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage
Accelerating Real-Time Big Data Breaking the limitations of captive NVMe storage 18M IOPs in 2u Agenda Everything related to storage is changing! The 3rd Platform NVM Express architected for solid state