A Case for Packing and Indexing in Cloud File Systems
|
|
- Vanessa O’Connor’
- 5 years ago
- Views:
Transcription
1 A Case for Packing and Indexing in Cloud File Systems Saurabh Kadekodi, Bin Fan*, Adit Madan*, Garth Gibson and Greg Ganger University, *Alluxio Inc.
2 In a nutshell Cloud object stores have a per-operation pricing structure Small object workloads have poor performance and high cost Packing Performance Price
3 Cloud file systems Client S3, Azure, GCS AWS, GCE, etc. Object store Natural approach is 1:1 file to object mapping; bad for small files Packing (batching or coalescing) small files improves local FS performance How much does packing help in cloud file systems?
4 Motivation Performance Price
5 Performance Motivation Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 r4.4xlarge r4.4xlarge r4.4xlarge r4.4xlarge 4KiB objects 1 object per 1 file to S3 3.2M objects of 4KiB each = 12.5GiB
6 Performance Motivation Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 r4.4xlarge r4.4xlarge r4.4xlarge r4.4xlarge 4MiB objects 1 object per 1K files to S3 3.2K objects of 4MiB each = 12.5GiB
7 Performance Motivation Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 r4.4xlarge r4.4xlarge r4.4xlarge r4.4xlarge 40MiB objects 1 object per 10K files to S3 320 objects of 40MiB each = 12.5GiB
8 Performance Motivation Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 r4.4xlarge r4.4xlarge r4.4xlarge r4.4xlarge 400MiB objects 1 object per 100K files to S3 32 objects of 400MiB each = 12.5GiB
9 Throughput By Object Size Throughput (MB / s) KB (1:1) 4MB (1K:1) 40MB (10K:1) 400MB (100K:1) Object Size
10 S3 Throttling in Action PUT req / sec PUT Req / min S3 Exceptions / min S3 Exceptions / sec 02:45 02:50 02:55 03:00 03:05 03:10 03:15 03:20 03:25 03:30 03:35 03:40 03:45 03:50 03:55 04:00 04:05 04:10 04:15 04:20 Time (hh:mm) Request rate throttling by S3 for 4KiB writes S3 warns routinely requesting >100 PUT req / s subject to throttling NW bandwidth of r4.4xlarge instances ~ 10 Gb / sec minimum 13MB writes to avoid being throttled packing
11 Motivation Performance Price
12 Price Motivation S3 PUT, COPY, POST GET Data Retrieval Data Storage Standard $0.05 / 10K req $0.04 / 100K req Free (for certain data center locations) $0.023 / GB-month 1400 Operation cost Storage cost Say we want to store 1TB per month as 4KiB objects, approx 100 req/s for a month Price ($) Percentage of operation costs 98% Nil 16% <0.001% 0 S3 EFS DynamoDB Packing Data ingest methodology
13 Price Motivation S3 PUT, COPY, POST GET Data Retrieval Data Storage Standard $0.05 / 10K req $0.04 / 100K req Free (for certain data center locations) $0.023 / GB-month 1400 Operation cost Storage cost Say we want to store 1TB per month as 4KiB objects, approx 100 req/s for a month Price ($) Percentage of data storage costs 2% 100% 84% >99.999% 0 S3 EFS DynamoDB Packing Data ingest methodology
14 Price Motivation S3 PUT, COPY, POST GET Data Retrieval Data Storage Standard $0.05 / 10K req $0.04 / 100K req Free (for certain data center locations) $0.023 / GB-month 1400 Operation cost Storage cost Say we want to store 1TB per month as 4KiB objects, approx 100 req/s for a month Price ($) Percentage of data storage costs Packing cost split almost equal to EFS, but EFS storage cost >10x S3 storage costs 2% 100% 84% >99.999% 0 S3 EFS DynamoDB Packing Data ingest methodology
15 Packing in Alluxio Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 to S3
16 Packing in Alluxio AWS Master Packed blob Alluxio Worker 1 Worker 2 Worker 3 Buffer for packing Worker 4 Index to S3
17 Packing in Alluxio Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 to S3
18 Packing in Alluxio Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 to S3
19 Packing in Alluxio Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 to S3
20 A Packed Blob Packing policy determines what to pack Triggered by dirty bytes & timeout Embedded Index blob-extent 1 blob-extent 2 blob-extent k Blob Extent: alluxio-path:logical-offset:physical-offset:length
21 Evaluation
22 Configuration Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4
23 Configuration Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4
24 Configuration AWS Master r4.4xlarge EC2 instance Alluxio Worker 1 Worker 2 Worker 3 Worker 4
25 Configuration AWS Master r4.4xlarge EC2 instance Alluxio 8 workload generators Worker 1 Worker 2 Worker 3 Worker 4
26 Configuration AWS Master r4.4xlarge EC2 instance Alluxio 8 workload generators Worker 1 Worker 2 Worker 3 Worker 4 100K files; 8KB each
27 Configuration Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4
28 Configuration Alluxio AWS Master Worker 1 Worker 2 Worker 3 Worker 4 800K files 800K files 800K files 800K files 3.2M files of 8KiB = 24.4GB
29 Packing Configuration Max blob size: 1 GB Packing interval: 5 sec # Packing threads: 16 Index # Master threads: 16 Backup interval: 1 min
30 Motivation Revisited Performance Price
31 Write Performance Comparison Direct Packed x Data Metadata 0.9 MB / s 10 60x Seconds GB Throughput Comparison 1 Runtime Comparison 0 Direct Packing
32 Motivation Revisited Performance Price
33 S3 Data Ingest Price x Direct Packed Data Ingest Cost 1 Rate-Limited Retries Request rate is throttled much more than data rate
34 Research questions (feedback) 1 Price with packing = price without packing Will fine-tuning help? By how much? Are there workload specific tuning opportunities / challenges? Garbage collection of packed blobs Different from LSM Trees, LFS, TableFS, etc. Cost-driven GC policies? Interference with foreground workload?
35 Conclusion Packing Performance Price Price reduction because of: Matching application write sizes to storage system write sizes Elimination of retries due to aggressive rate-limiting imposed by S3-like services
A Case for Packing and Indexing in Cloud File Systems
A Case for Packing and Indexing in Cloud File Systems Saurabh Kadekodi 1, Bin Fan 2, Adit Madan 2, and Garth A. Gibson 1 1 Carnegie Mellon University 2 Alluxio Inc. CMU-PDL-17-105 October 2017 Parallel
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationDatabase Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:
Database Workload + Low throughput (0.8 IPC on an 8-wide superscalar. 1/4 of SPEC) + Naturally threaded (and widely used) application - Already high cache miss rates on a single-threaded machine (destructive
More informationAmazon Elastic File System
Amazon Elastic File System Choosing Between the Different Throughput & Performance Modes July 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided
More informationAmazon Aurora Deep Dive
Amazon Aurora Deep Dive Anurag Gupta VP, Big Data Amazon Web Services April, 2016 Up Buffer Quorum 100K to Less Proactive 1/10 15 caches Custom, Shared 6-way Peer than read writes/second Automated Pay
More informationBig Data solution benchmark
Big Data solution benchmark Introduction In the last few years, Big Data Analytics have gained a very fair amount of success. The trend is expected to grow rapidly with further advancement in the coming
More informationBlobPorter Documentation. Jesus Aguilar
Jesus Aguilar May 16, 2018 Contents: 1 Getting Started 3 1.1 Linux................................................... 3 1.2 Windows................................................. 3 1.3 Command Options............................................
More informationZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency
ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationPebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees
PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research
More informationAzure-persistence MARTIN MUDRA
Azure-persistence MARTIN MUDRA Storage service access Blobs Queues Tables Storage service Horizontally scalable Zone Redundancy Accounts Based on Uri Pricing Calculator Azure table storage Storage Account
More informationPocket: Elastic Ephemeral Storage for Serverless Analytics
Pocket: Elastic Ephemeral Storage for Serverless Analytics Ana Klimovic*, Yawen Wang*, Patrick Stuedi +, Animesh Trivedi +, Jonas Pfefferle +, Christos Kozyrakis* *Stanford University, + IBM Research 1
More informationAWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect
AWS Storage Gateway Not your father s hybrid storage University of Arizona IT Summit 2017 Jay Vagalatos, AWS Solutions Architect October 23, 2017 The AWS Storage Portfolio Amazon EBS (persistent) Block
More informationYCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores
YCSB++ Benchmarking Tool Performance Debugging Advanced Features of Scalable Table Stores Swapnil Patil Milo Polte, Wittawat Tantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie
More information4 Myths about in-memory databases busted
4 Myths about in-memory databases busted Yiftach Shoolman Co-Founder & CTO @ Redis Labs @yiftachsh, @redislabsinc Background - Redis Created by Salvatore Sanfilippo (@antirez) OSS, in-memory NoSQL k/v
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO Albis Pocket
More informationAmazon Aurora Deep Dive
Amazon Aurora Deep Dive Enterprise-class database for the cloud Damián Arregui, Solutions Architect, AWS October 27 th, 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Enterprise
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationAnalyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016
Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis
More informationHow to go serverless with AWS Lambda
How to go serverless with AWS Lambda Roman Plessl, nine (AWS Partner) Zürich, AWSomeDay 12. September 2018 About myself and nine Roman Plessl Working for nine as a Solution Architect, Consultant and Leader.
More informationSMORE: A Cold Data Object Store for SMR Drives
SMORE: A Cold Data Object Store for SMR Drives Peter Macko, Xiongzi Ge, John Haskins Jr.*, James Kelley, David Slik, Keith A. Smith, and Maxim G. Smith Advanced Technology Group NetApp, Inc. * Qualcomm
More informationExperiences with Serverless Big Data
Experiences with Serverless Big Data AWS Meetup Munich 2016 Markus Schmidberger, Head of Data Service Munich, 17.10.16 Key Components of our Data Service Real-Time Monitoring Enable our development teams
More informationDASH COPY GUIDE. Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31
DASH COPY GUIDE Published On: 11/19/2013 V10 Service Pack 4A Page 1 of 31 DASH Copy Guide TABLE OF CONTENTS OVERVIEW GETTING STARTED ADVANCED BEST PRACTICES FAQ TROUBLESHOOTING DASH COPY PERFORMANCE TUNING
More informationAmazon Aurora Deep Dive
Amazon Aurora Deep Dive Kevin Jernigan, Sr. Product Manager Amazon Aurora PostgreSQL Amazon RDS for PostgreSQL May 18, 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda
More informationAtlas: Baidu s Key-value Storage System for Cloud Data
Atlas: Baidu s Key-value Storage System for Cloud Data Song Jiang Chunbo Lai Shiding Lin Liqiong Yang Guangyu Sun Jason Cong Wayne State University Zhenyu Hou Can Cui Peking University University of California
More informationGet ready to be what s next.
Get ready to be what s next. Jared Shockley http://jaredontech.com Senior Service Engineer Prior Experience @jshoq Primary Experience Areas Agenda What is Microsoft Azure? Provider-hosted Apps Hosting
More informationStrata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements
More informationSHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device
SHRD: Improving Spatial Locality in Flash Storage Accesses by Sequentializing in Host and Randomizing in Device Hyukjoong Kim 1, Dongkun Shin 1, Yun Ho Jeong 2 and Kyung Ho Kim 2 1 Samsung Electronics
More informationData Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research
Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO FS Albis Streaming
More informationCumulus: Filesystem Backup to the Cloud
Cumulus: Filesystem Backup to the Cloud 7th USENIX Conference on File and Storage Technologies (FAST 09) Michael Vrable Stefan Savage Geoffrey M. Voelker University of California, San Diego February 26,
More informationGeorgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong
Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong Relatively recent; still applicable today GFS: Google s storage platform for the generation and processing of data used by services
More informationFlash-Conscious Cache Population for Enterprise Database Workloads
IBM Research ADMS 214 1 st September 214 Flash-Conscious Cache Population for Enterprise Database Workloads Hyojun Kim, Ioannis Koltsidas, Nikolas Ioannou, Sangeetha Seshadri, Paul Muench, Clem Dickey,
More informationADAPTIVE STREAMING AT. Justin Ruggles Lead Engineer, Transcoding & Delivery
ADAPTIVE STREAMING AT Justin Ruggles Lead Engineer, Transcoding & Delivery justinr@vimeo.com ABOUT VIMEO Video hosting platform, founded in 2004, that allows creators to share their content in high quality,
More informationAWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS
AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS suneys@amazon.com AWS Core Infrastructure and Services Traditional Infrastructure Amazon Web Services Security Security Firewalls ACLs
More informationThe Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput
Solution Brief The Impact of SSD Selection on SQL Server Performance Understanding the differences in NVMe and SATA SSD throughput 2018, Cloud Evolutions Data gathered by Cloud Evolutions. All product
More informationSQL SERVER Lubo Goryl Solution Professional Microsoft Slovakia
SERVER 2012 Lubo Goryl Solution Professional Microsoft Slovakia AGENDA Prehlaď noviniek v 2012 Nové technológie pre HA & DR Výkonové zlepšenia a možnosti škálovateľnosti Nové technológie pre DWH (FastTrack,
More informationData Center TCP (DCTCP)
Data Center Packet Transport Data Center TCP (DCTCP) Mohammad Alizadeh, Albert Greenberg, David A. Maltz, Jitendra Padhye Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, Murari Sridharan Cloud computing
More informationDynamic Vertical Memory Scalability for OpenJDK Cloud Applications
Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications Rodrigo Bruno, Paulo Ferreira: INESC-ID / Instituto Superior Técnico, University of Lisbon Ruslan Synytsky, Tetiana Fydorenchyk: Jelastic
More informationYCSB++ benchmarking tool Performance debugging advanced features of scalable table stores
YCSB++ benchmarking tool Performance debugging advanced features of scalable table stores Swapnil Patil M. Polte, W. Tantisiriroj, K. Ren, L.Xiao, J. Lopez, G.Gibson, A. Fuchs *, B. Rinaldi * Carnegie
More informationHashKV: Enabling Efficient Updates in KV Storage via Hashing
HashKV: Enabling Efficient Updates in KV Storage via Hashing Helen H. W. Chan, Yongkun Li, Patrick P. C. Lee, Yinlong Xu The Chinese University of Hong Kong University of Science and Technology of China
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationMONITORING SERVERLESS ARCHITECTURES
MONITORING SERVERLESS ARCHITECTURES CAN YOU HELP WITH SOME PRODUCTION PROBLEMS? Your Manager (CC) Rachel Gardner Rafal Gancarz Lead Consultant @ OpenCredo WHAT IS SERVERLESS? (CC) theaucitron Cloud-native
More informationFlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs
FlashBlox: Achieving Both Performance Isolation and Uniform Lifetime for Virtualized SSDs Jian Huang Anirudh Badam Laura Caulfield Suman Nath Sudipta Sengupta Bikash Sharma Moinuddin K. Qureshi Flash Has
More informationOS-caused Long JVM Pauses - Deep Dive and Solutions
OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com 2016-4-21 Outline q Introduction
More informationAgenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration
Storage on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationPOSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US
POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN PostgresConf US 2018 2018-04-20 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: alexander.kukushkin@zalando.de
More informationSparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality
Sparse Indexing: Large-Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble Work done at Hewlett-Packard
More informationImproving throughput for small disk requests with proximal I/O
Improving throughput for small disk requests with proximal I/O Jiri Schindler with Sandip Shete & Keith A. Smith Advanced Technology Group 2/16/2011 v.1.3 Important Workload in Datacenters Serial reads
More informationVlad Vinogradsky
Vlad Vinogradsky vladvino@microsoft.com http://twitter.com/vladvino Commercially available cloud platform offering Billing starts on 02/01/2010 A set of cloud computing services Services can be used together
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationMapReduce. U of Toronto, 2014
MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationScalability and Performance of Dell EMC OpenManage Essentials (OME) Version 2.3
Scalability and Performance of Dell EMC OpenManage Essentials (OME) Version 2.3 Dell EMC Engineering August 2017 A Dell EMC Technical White Paper Revisions Date January 2013 August 2017 Description Initial
More informationSSD Architecture for Consistent Enterprise Performance
SSD Architecture for Consistent Enterprise Performance Gary Tressler and Tom Griffin IBM Corporation August 21, 212 1 SSD Architecture for Consistent Enterprise Performance - Overview Background: Client
More informationUsing Alluxio to Improve the Performance and Consistency of HDFS Clusters
ARTICLE Using Alluxio to Improve the Performance and Consistency of HDFS Clusters Calvin Jia Software Engineer at Alluxio Learn how Alluxio is used in clusters with co-located compute and storage to improve
More informationLive Migration: Even faster, now with a dedicated thread!
Live Migration: Even faster, now with a dedicated thread! Juan Quintela Orit Wasserman Vinod Chegu KVM Forum 2012 Agenda Introduction Migration
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More information10/26/2017 Universal Java GC analysis tool - Java Garbage collection log analysis made easy
Analysis Report GC log le: atlassian-jira-gc-2017-10-26_0012.log.0.current Duration: 14 hrs 59 min 51 sec System Time greater than User Time In 25 GC event(s), 'sys' time is greater than 'usr' time. It's
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationSingle-pass restore after a media failure. Caetano Sauer, Goetz Graefe, Theo Härder
Single-pass restore after a media failure Caetano Sauer, Goetz Graefe, Theo Härder 20% of drives fail after 4 years High failure rate on first year (factory defects) Expectation of 50% for 6 years https://www.backblaze.com/blog/how-long-do-disk-drives-last/
More informationIBM Education Assistance for z/os V2R2
IBM Education Assistance for z/os V2R2 Item: RSM Scalability Element/Component: Real Storage Manager Material current as of May 2015 IBM Presentation Template Full Version Agenda Trademarks Presentation
More informationWhite Paper: Bulk Data Migration in the Cloud. Migration in the Cloud
Bulk Data Migration in the Cloud Executive Summary Cloud storage has the potential to transform IT operations for organizations struggling to deliver a consistent level of service to a distributed user
More informationReshaping Text Data for Efficient Processing on Amazon EC2. Gabriela Turcu, Ian Foster, Svetlozar Nestorov
Reshaping Text Data for Efficient Processing on Amazon EC2 Gabriela Turcu, Ian Foster, Svetlozar Nestorov Outline Motivation Goals: Determine empirically simple application performance model Statically
More informationErik Riedel Hewlett-Packard Labs
Erik Riedel Hewlett-Packard Labs Greg Ganger, Christos Faloutsos, Dave Nagle Carnegie Mellon University Outline Motivation Freeblock Scheduling Scheduling Trade-Offs Performance Details Applications Related
More informationParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Flash Devices
ParaFS: A Log-Structured File System to Exploit the Internal Parallelism of Devices Jiacheng Zhang, Jiwu Shu, Youyou Lu Tsinghua University 1 Outline Background and Motivation ParaFS Design Evaluation
More informationand data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed
5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid
More informationPerformance Characterization of ONTAP Cloud in Amazon Web Services with Application Workloads
Technical Report Performance Characterization of ONTAP Cloud in Amazon Web Services with Application Workloads NetApp Data Fabric Group, NetApp March 2018 TR-4383 Abstract This technical report examines
More informationUsing Transparent Compression to Improve SSD-based I/O Caches
Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationPYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads
PYTHIA: Improving Datacenter Utilization via Precise Contention Prediction for Multiple Co-located Workloads Ran Xu (Purdue), Subrata Mitra (Adobe Research), Jason Rahman (Facebook), Peter Bai (Purdue),
More informationAMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION?
AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION? Mayur Palankar and Adriana Iamnitchi University of South Florida Matei Ripeanu University of British Columbia Simson Garfinkel Harvard University Amazon
More informationCloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014
Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Karthik Krishnan Page 1 of 20 Table of Contents Table of Contents... 2 Abstract... 3 What
More informationCloud scaling of Visual Weather
Cloud scaling of Visual Weather Jozef Matula CTO 1 EGOWS Reading, United Kingdom, 15 th -17 th October 2018 Cloud - Separation of responsibilities 2 https://blogs.technet.microsoft.com/yungchou/2010/11/15/cloud-computing-primer-for-it-pros/
More informationGearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction
GearDB: A GC-free Key-Value Store on HM-SMR Drives with Gear Compaction Ting Yao 1,2, Jiguang Wan 1, Ping Huang 2, Yiwen Zhang 1, Zhiwen Liu 1 Changsheng Xie 1, and Xubin He 2 1 Huazhong University of
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationHIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS
HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS Proven Companies and Products Fusion-io Leader in PCIe enterprise flash platforms Accelerates mission-critical applications
More informationLoosely coupled: asynchronous processing, decoupling of tiers/components Fan-out the application tiers to support the workload Use cache for data and content Reduce number of requests if possible Batch
More informationPredictable Time-Sharing for DryadLINQ Cluster. Sang-Min Park and Marty Humphrey Dept. of Computer Science University of Virginia
Predictable Time-Sharing for DryadLINQ Cluster Sang-Min Park and Marty Humphrey Dept. of Computer Science University of Virginia 1 DryadLINQ What is DryadLINQ? LINQ: Data processing language and run-time
More informationvrealize Business for Cloud User Guide
vrealize Business for Cloud Standard 7.2 and vrealize Business for Cloud Advanced 7.2 vrealize Business 7.2 vrealize Business for Cloud 7.2 You can find the most up-to-date technical documentation on the
More informationLecture 15: Datacenter TCP"
Lecture 15: Datacenter TCP" CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mohammad Alizadeh Lecture 15 Overview" Datacenter workload discussion DC-TCP Overview 2 Datacenter Review"
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationServerless Computing: Design, Implementation, and Performance. Garrett McGrath and Paul R. Brenner
Serverless Computing: Design, Implementation, and Performance Garrett McGrath and Paul R. Brenner Introduction Serverless Computing Explosion in popularity over the past 3 years Offerings from all leading
More informationD E N A L I S T O R A G E I N T E R F A C E. Laura Caulfield Senior Software Engineer. Arie van der Hoeven Principal Program Manager
1 T HE D E N A L I N E X T - G E N E R A T I O N H I G H - D E N S I T Y S T O R A G E I N T E R F A C E Laura Caulfield Senior Software Engineer Arie van der Hoeven Principal Program Manager Outline Technology
More informationAppendix B. Standards-Track TCP Evaluation
215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error
More informationDiagnosis via monitoring & tracing
Diagnosis via monitoring & tracing Greg Ganger, Garth Gibson, Majd Sakr adapted from Raja Sambasivan 15-719: Advanced Cloud Computing Spring 2017 1 Problem diagnosis is difficult For developers of clouds
More informationVirtual SQL Servers. Actual Performance. 2016
@kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture Health
More informationCS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures
CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2 I/O performance measures I/O performance measures diversity: which I/O devices can connect to the system? capacity: how many I/O devices
More informationName: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 25 Feb 2009 Spring 2010 Exam 1
CMU 18-746/15-746 Storage Systems 25 Feb 2009 Spring 2010 Exam 1 Instructions Name: There are four (4) questions on the exam. You may find questions that could have several answers and require an explanation
More informationFile Systems Management and Examples
File Systems Management and Examples Today! Efficiency, performance, recovery! Examples Next! Distributed systems Disk space management! Once decided to store a file as sequence of blocks What s the size
More informationOperating Systems. File Systems. Thomas Ropars.
1 Operating Systems File Systems Thomas Ropars thomas.ropars@univ-grenoble-alpes.fr 2017 2 References The content of these lectures is inspired by: The lecture notes of Prof. David Mazières. Operating
More informationCloud e Datacenter Networking
Cloud e Datacenter Networking Università degli Studi di Napoli Federico II Dipartimento di Ingegneria Elettrica e delle Tecnologie dell Informazione DIETI Laurea Magistrale in Ingegneria Informatica Prof.
More informationEECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun
EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,
More informationLessons from Post-processing Climate Data on Modern Flash-based HPC Systems
Lessons from Post-processing Climate Data on Modern Flash-based HPC Systems Adnan Haider 1, Sheri Mickelson 2, John Dennis 2 1 Illinois Institute of Technology, USA; 2 National Center of Atmospheric Research,
More informationExam4Tests. Latest exam questions & answers help you to pass IT exam test easily
Exam4Tests http://www.exam4tests.com Latest exam questions & answers help you to pass IT exam test easily Exam : 070-765 Title : Provisioning SQL Databases Vendor : Microsoft Version : DEMO Get Latest
More informationpblk the OCSSD FTL Linux FAST Summit 18 Javier González Copyright 2018 CNEX Labs
pblk the OCSSD FTL Linux FAST Summit 18 Javier González Read Latency Read Latency with 0% Writes Random Read 4K Percentiles 2 Read Latency Read Latency with 20% Writes Random Read 4K + Random Write 4K
More informationData Centers and Cloud Computing
Data Centers and Cloud Computing CS677 Guest Lecture Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet
More informationData Centers and Cloud Computing. Slides courtesy of Tim Wood
Data Centers and Cloud Computing Slides courtesy of Tim Wood 1 Data Centers Large server and storage farms 1000s of servers Many TBs or PBs of data Used by Enterprises for server applications Internet
More informationdata within a computer system are stored in one of 2 physical states (hence the use of binary digits)
Binary Digits (bits) data within a computer system are stored in one of 2 physical states (hence the use of binary digits) 0V and 5V charge / NO charge on a transistor gate ferrite core magnetised clockwise
More informationAmazon Aurora Relational databases reimagined.
Amazon Aurora Relational databases reimagined. Ronan Guilfoyle, Solutions Architect, AWS Brian Scanlan, Engineer, Intercom 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Current
More informationDeep Dive on Amazon Elastic File System
Deep Dive on Amazon Elastic File System Yong S. Kim AWS Business Development Manager, Amazon EFS Paul Moran Technical Account Manager, Enterprise Support 28 th of June 2017 2015, Amazon Web Services, Inc.
More information