Cloud Storage and Parallel File Systems
|
|
- Clare Bennett
- 6 years ago
- Views:
Transcription
1 Cloud Storage and Parallel File Systems SNI Storage Developer Conference (SDC09), Sept 2009 Garth Gibson Carnegie Mellon University and Panasas Inc and
2 irth of RID ( ) Member of 4th erkeley RISC CPU design team (SPUR: 84-89) CPU design is a solved problem SYSTEM PERFORMNCE depends on data storage, so IM 3380 disk is 4 arms in a 7.5 G washing machine box SLED: Single Large Expensive Disk New PC industry demands cost effective 100 M 3.5 disks Enabled by new SCSI embedded controller architecture Use many PC disks for parallelism: CM SIGMOD 1988 RID paper 2 Garth Gibson, Sept 15, 2009
3 Object Storage (CMU NSD, 95-99) efore NSD there was store&forward Server-ttached Disks (SD) Move access control, consistency out-of-band and cache decisions Raise storage abstraction: encapsulate layout, offload data access Now NSI T10 SCSI command set standard (v2 emerging soon) 3 Garth Gibson, Sept 15, 2009
4 Fine Grain ccess Enforcement State of art is VPN of all out-of-band clients, all sharable data and metadata ccident prone & vulnerable to subverted client; analogy to single-address space computing File manager Private Communication NSD Integrity/Privacy Secret Key 2: Caprgs, CapKey 1: Request for access Object Storage uses a digitally signed, objectspecific capabilities on each request CapKey= MC SecretKey (Caprgs) Caprgs= ObjID, Version, Rights, Expiry,... ReqMC = MC CapKey (Req,NonceIn) 3: Caprgs, Req, NonceIn, ReqMC Client Secret Key NSD 4: Reply, NonceOut, ReplyMC ReplyMC = MC CapKey (Reply,NonceOut) 4 Garth Gibson, Sept 15, 2009
5 Panasas Inc. Spins Out (1999) Storage that accelerates the world s highest performance and most data-intensive applications - 10,000+ clients, 50+ G/s, 1,000+ storage nodes - Primary storage on first & fastest computer (Los lamos) Founded 1999, shipping solutions since Software innovation, packaged with industry-standard HW - Scalable RID over storage nodes, end-to-end check codes - Integrated SSD, extensive H, snapshot, async mirroring Record Q2 growth despite economy - 50% growth in revenue YoY vs -18% WW - Strong growth with partners, international, new accounts - Strong balance sheet 5
6 Panasas Storage Cluster Integrated 10GE Switch Shelf Front 1 D, 10 S attery Module (2 Power units) Shelf Rear Directorlade Midplane routes GE, power Storagelade 6
7 Leaders in HPC choose Panasas ENERGY SWIFTCOMPNY 7
8 World's Fastest Computer Los lamos RoadRunner Panasas Performance Scales # The world's first Linpak sustained 1.0 petaflops system. # #1 on TOP500 - chieved petaflops on May 25, 2008 # #3 on Green 500 # Time Magazine s 10 th Top Innovation for 2008 # een in development since 2002 # Open science phase in progress now # More than 4 P of Panasas # Greater than 50 G/s to apps # Computational chemistry at 369 Tflops Ref:
9 SciDC Petascale Data Storage Institute Eight organizations on the team Carnegie Mellon University, Garth Gibson, PI U. of California, Santa Cruz, Darrell Long U. of Michigan, nn rbor, Peter Honeyman Lawrence erkeley Nat. Lab, John Shalf Oak Ridge National Lab, Phil Roth Pacific Northwest National Lab, Evan Felix Los lamos National Lab, Gary Grider Sandia National Lab, Lee Ward Garth Gibson, 11/21/2008
10 Future is data-led Expert human translator LEU Score Usable translation Human-edittable translation Topic identification Useless Google ISI IM+CMU UMD JHU+CU Edinburgh NIST: translate 100 articles rabic-english competition 2005 outcome: Google wins! Qualitatively better on 1st entry rute force statistics with more data & compute!! 0.1 Systran Mitre 200M words from UN translations 1 billion words of English grammar 0.0 FSC 1000 processor cluster 10
11 Science of many types is data-led Contact Field Comments J Lopez, CSD strophysics SDSS digital sky survey including spectroscopy, 50T T Di Matteo, Physics strophysics igben HCosmo hydrodynamics (1 particles simulated), 30T F Gilman, Physics strophysics Large Synoptic Survey Telescope, LSST (2012) digital sky survey, 15T/day C Langmead,CSD iology Xray, NMR, CryoEM images; Sim d molecular dynamics trajectories J ielak, CE Earth sciences USGS sensor images; Sim d 4D earthquake wavefields >10T/run D rumley, ECE Cyber security Worldwide Malware rchive; 2T doubling each year O Mutlu, ECE Genomics 50G per compressed genome sequencing; expands to Ts to process Yu, ECE Neuroscience Neural recordings (electrodes, optical) for prosthetics; G each J Callan, LTI Info Retrieval ClueWeb09, 25T, 1 high rank web pages, 10 languages T Mitchell, MLD Machine Learning English sentences of ClueWeb for continuous automated reading (5T) M Herbert, RI Image Understanding Flickr archive (>4T); broadcast TV archive; street video; soldier video Y Sheikh, RI Virtual Reality Terascale VR sensor, 1000 camera+ 200 microphone, up to 5T/sec C Guestrin, CSD Machine Learning log update archives, 2T now + 2.7T/yr (about 500K blogs/day) C Faloutsos, CSD Data Mining Wikipedia change archive (1T), Fly embryo images (1.5T), links from Yahoo web S Vogel, LTI Machine Translation Pre-filtered N-gram language model based on statistics on word alignment, 100 T J aker, LTI Machine Translation Spoken language recording archive, many languages, many sources, up to 1P ecker, RI Computer Vision Social network image/video archive for training computer vision systems, 1-5T 11
12 Cloud as Home for Massive Data Moving massive data is not a good idea PanStarrs:1 T/day over nationally funded networks Large Synoptic Survey Telescope: 15 T/day planned South frican firm: homing pigeon faster than net Seismic survey firms use trucks and helicopters uild processing for massive data & share Share cost of storage, buy your own processing Private clouds share among units, college depts What semantics for storage in cloud? R. Wolski (Eucalytus): we haven t seen the cloud storage model yet 12
13 Why do I care about common semantics? Programmer productivity & ease of deployment HPC FSs are more mature, wider feature set In the comfort zone of programmers (vs cloud FS) High concurrent reads and writes Wide support, adoption, acceptance possible pnfs (NFS v4.1) working to be equivalent Netpp, EMC, Sun, IM, Panasas, Reuse standard data management tools ackup, disaster recovery, tiering, 13
14 Cloud HDFS & HPC PVFS Meta-data servers Store all file system metadata Handle all metadata operations Data servers Store actual file system data Handle all read and write operations Files are divided into chunks (objects) Chunks of a file are distributed across servers Designed for collocation of disks & compute HDFS is, PVFS is not (mazon-like) but can do it 14
15 HPC PVFS shim under Hadoop Hadoop applications Hadoop framework Extensible file system PI ~1,700 lines of code PVFS shim layer Readahead buffer HDFS client library HDFS client library HDFS servers HDFS servers PVFS shim layer PVFS shim layer Unmodified PVFS Unmodified PVFS client library (C) client library (C) Unmodified PVFS Unmodified serverspvfs servers File layout info Replication Client Client Server Server 15
16 It takes a little work, but PVFS gets there No changes in PVFS, just stdio -class shim library Out of the box, a big difference PVFS cache coherent by not prefetching; HDFS immutable Prefetching in the shim pretty much trivial but not enough HDFS exposes IP of each block s data server Hadoop schedules read on best node Exposing layout in PVFS simple, given an PI rchitecture is issue: where are disks? RID? In nodes/hdfs or not/s3 Or not using all nodes Rack awareness similar but smaller differences 16
17 closer look: reading a single file ggregate read throughput (M/ s) PVFS (no replication) HDFS (no replication) Completion Time (sec) Read (16G, 16 nodes) Number of Clients PVFS HDFS N clients, each reads 1/N of a single file (left) Round-robin file layout in PVFS avoids contention that occurs with random allocation in HDFS ut if Hadoop is given all work at once (right) Scheduling lots of work overcomes contention 17
18 ggregate write throughput (M/s) Writing has larger differences Number of Clients PVFS (no replication) HDFS (no replication) Writing: N clients to N files (left) HDFS writes locally first, so linear PVFS writes over net, shifting work to idle clients Writing: N clients to one file (right) HDFS does not support multiple concurrent writers Completion Time (sec) Parallel Copy (16G, 16 nodes) PVFS (16 writers) HDFS (1 writer) 18
19 Completion Time (sec) Grep (100G, 50 nodes) OpenCirrus/Yahoo M45 tests PVFS HDFS Completion Time (sec) Sort (100G, 50 nodes) PVFS & HDFS similar for reading as before PVFS HDFS PVFS(2 copies) HDFS non-striped local copy saves network traffic in network-limited write-intensive sort How fundamental is network-limited in the cloud? Sort (100G, 50 nodes) 1000 Cloud surge protection : data rarely local to small compute Network Traffic (G) PVFS HDFS PVFS(2 copies) 19
20 CMU testbeds: 10 GE networking Combined: 3 TF, 2.2 T, 142 nodes, 1.1K cores, ½ P 1 GE vs 10 GE Network vs disk bottleneck OpenCloud writes remote as fast a local EtherCLOS networks at SigCOMM No need for datacenter network to be big limit 20
21 Revisit HDFS Triplication GFS & HDFS triplicate every data block Triplication: one local + two remote copies 200% space overhead ut RID5 *is* simple? 21
22 Revisit HDFS Triplication GFS & HDFS triplicate every data block Triplication: one local + two remote copies 200% space overhead ut RID5 is simple? Can be done at scale Panasas does it with Object RID over servers >1PF, >50G/s, >10Kclients 22
23 Revisit HDFS Triplication GFS & HDFS triplicate every data block Triplication: one local + two remote copies 200% space overhead ut RID5 is simple? Panasas does it >PF, >50 G/s, >10K clients Can be done at scale ut sync error handling hard GFS & HDFS defer repair ackground task repairs copies Notably less scary to developers 23
24 DiskReduce: ackground Repair for RID Start the same: triplicate every data block Triplication: one local + two remote copies 200% space overhead 24
25 DiskReduce: ackground Repair for RID Start the same: triplicate every data block Triplication: one local + two remote copies 200% space overhead ackground encoding In coding terms: Data is, Check is,, f(,)=+ + 25
26 DiskReduce: ackground Repair for RID Start the same: triplicate every data block Triplication: one local + two remote copies 200% space overhead ackground encoding In coding terms: Data is, Check is,, f(,)=+ Std single failure recovery + 26
27 DiskReduce: ackground Repair for RID Start the same: triplicate every data block Triplication: one local + two remote copies 200% space overhead ackground encoding In coding terms: Data is, Check is,, f(,)=+ Std single failure recovery Double failure recovery uses parity and related data blocks Then applies standard single failure recovery + 27
28 Preliminary Evaluation Pre-OpenCirrus testbed 16 nodes, PentiumD dual-core 3.00GHz 4G memory, 7200 rpm ST 160G disk Gigabit Ethernet Implementation specification: Hadoop/HDFS version Test conditions File size distribution from Yahoo! M45 sample enchmarks modeled on Google FS paper enchmark input after all parity groups are encoded enchmark output has encoding in background No failures during tests 28
29 The Obvious Little Degradation Completion Time Sort with 7.5G data Original Hadoop 3 replicas Parity Group Size 8 Parity Group Size Completion Time Grep with 45G data Original Hadoop 3 replicas Parity Group Size 8 Parity Group Size 16 29
30 DiskReduce v2.0 Only 33% saving with v1.0 Stopped work on v1.0 b/c encoding didn t scale Started v2.0 model based on codeword selection Select a RID group before creating data Retain random distribution Worry more about cleaning post block delete Return of Small Write bottleneck of RID 5 & 6 Select blocks in group to promote co-deletion We ll get to performance degradation. ackup jobs like having a second copy 30
31 Closing: Clouds & Parallel File Systems ig similarity between cloud & parallel file systems Similar scale 1000s of nodes Two layer implementation, object-based PVFS can match HDFS performance Datacenter network bottleneck & colocation overplayed 10GE & fat trees coming; surge protected = non-local Differences stem from maturity & simplicity Simpler semantics works sooner ut users want familiar semantics Writing, directories, concurrency, low overhead,. Integrates into broader infrastructure seamlessly My view: layer Parallel NFS under cloud PIs Exposing it as customers ask for richer semantics 31
32 Thank you 32
33 INCST: TCP Throughput Collapse Cluster Environment 1Gbps Ethernet 100us Delay 200ms RTO S50 Switch 1M lock Size [Nagle04] called this Incast; used app-level & switch selection workaround [Fast08] Cause of collapse: 200ms TCP timeouts on loss of whole window Is reducing timeout effective, practical and safe in the real world? 33
34 Mitigating Incast Cluster Environment 1Gbps Ethernet 100us Delay S50 Switch 1M lock Size Eliminating RTO bound: 5ms timeouts - clock granularity limit Single line, server-only change is effective at medium scales Eliminating RTO bound + Microsecond TCP timeouts avoids Incast Exploits High resolution timers to avoid high interrupt overhead 34
35 The Need for Microsecond Timeouts Simulation Environment 10Gbps Ethernet 20us Delay 40M lock Size Future datacenters: More bandwidth, less delay, more servers Retransmission timeouts should not be bounded below ccepted into SIGCOMM
Crossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationDiskReduce: Making Room for More Data on DISCs. Wittawat Tantisiriroj
DiskReduce: Making Room for More Data on DISCs Wittawat Tantisiriroj Lin Xiao, in Fan, and Garth Gibson PARALLEL DATA LAORATORY Carnegie Mellon University GFS/HDFS Triplication GFS & HDFS triplicate every
More informationDiskReduce: Making Room for More Data on DISCs. Wittawat Tantisiriroj
DiskReduce: Making Room for More Data on DISCs Wittawat Tantisiriroj Lin Xiao, Bin Fan, and Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University GFS/HDFS Triplication GFS & HDFS triplicate
More informationObject Storage: Redefining Bandwidth for Linux Clusters
Object Storage: Redefining Bandwidth for Linux Clusters Brent Welch Principal Architect, Inc. November 18, 2003 Blocks, Files and Objects Block-base architecture: fast but private Traditional SCSI and
More informationFailure in Supercomputers in the Post-Terascale Era
Failure in Supercomputers in the Post-Terascale Era Thanks to: Garth Gibson, Carnegie Mellon University and Panasas Inc. DOE SciDAC Petascale Data Storage Institute (PDSI), www.pdsi-scidac.org w/ Bianca
More informationScalable Table Stores: Tools for Understanding Advanced Key-Value Systems for Hadoop
Scalable Table Stores: Tools for Understanding Advanced Key-Value Systems for Hadoop Garth Gibson Professor, Carnegie Mellon Univ., & CTO, Panasas Inc. with Julio Lopez, Swapnil Patil, Milo Polte, Kai
More informationReflections on Failure in Post-Terascale Parallel Computing
Reflections on Failure in Post-Terascale Parallel Computing 2007 Int. Conf. on Parallel Processing, Xi An China Garth Gibson Carnegie Mellon University and Panasas Inc. DOE SciDAC Petascale Data Storage
More informationStructuring PLFS for Extensibility
Structuring PLFS for Extensibility Chuck Cranor, Milo Polte, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University What is PLFS? Parallel Log Structured File System Interposed filesystem b/w
More informationStorage Aggregation for Performance & Availability:
Storage Aggregation for Performance & Availability: The Path from Physical RAID to Virtual Objects Univ Minnesota, Third Intelligent Storage Workshop Garth Gibson CTO, Panasas, and Assoc Prof, CMU ggibson@panasas.com
More informationHPC Storage Use Cases & Future Trends
Oct, 2014 HPC Storage Use Cases & Future Trends Massively-Scalable Platforms and Solutions Engineered for the Big Data and Cloud Era Atul Vidwansa Email: atul@ DDN About Us DDN is a Leader in Massively
More informationPublished by: PIONEER RESEARCH & DEVELOPMENT GROUP (www.prdg.org)
Need to reinvent the storage stack in cloud computing Sagar Wadhwa 1, Dr. Naveen Hemrajani 2 1 M.Tech Scholar, Suresh Gyan Vihar University, Jaipur, Rajasthan, India 2 Profesor, Suresh Gyan Vihar University,
More informationAnalytics in the cloud
Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA
More informationNASD: Network-Attached Secure Disks
NASD: Network-Attached Secure Disks Garth Gibson garth.gibson@cmu.edu also David Nagle, Khalil Amiri,Jeff Butler, Howard Gobioff, Charles Hardin, Nat Lanza, Erik Riedel, David Rochberg, Chris Sabol, Marc
More informationBUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.
BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST 1 UNSTRUCTURED DATA GROWTH 75% 78% 80% 2015 71 EB 2016 106 EB 2017 133 EB Total Capacity Shipped, Worldwide % of Unstructured Data
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationLustre A Platform for Intelligent Scale-Out Storage
Lustre A Platform for Intelligent Scale-Out Storage Rumi Zahir, rumi. May 2003 rumi.zahir@intel.com Agenda Problem Statement Trends & Current Data Center Storage Architectures The Lustre File System Project
More informationData-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL )
Carnegie Mellon University Research Showcase @ CMU Parallel Data Laboratory Research Centers and Institutes 10-2008 Data-intensive File Systems for Internet Services: A Rose by Any Other Name... (CMU-PDL-08-114)
More informationIsilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team
Isilon: Raising The Bar On Performance & Archive Use Cases John Har Solutions Product Manager Unstructured Data Storage Team What we ll cover in this session Isilon Overview Streaming workflows High ops/s
More informationSystem that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files
System that permanently stores data Usually layered on top of a lower-level physical storage medium Divided into logical units called files Addressable by a filename ( foo.txt ) Usually supports hierarchical
More informationLet s Make Parallel File System More Parallel
Let s Make Parallel File System More Parallel [LA-UR-15-25811] Qing Zheng 1, Kai Ren 1, Garth Gibson 1, Bradley W. Settlemyer 2 1 Carnegie MellonUniversity 2 Los AlamosNationalLaboratory HPC defined by
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More information朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC
October 28, 2013 Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration 朱义普 Director, North Asia, HPC DDN Storage Vendor for HPC & Big Data
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationFLASHARRAY//M Business and IT Transformation in 3U
FLASHARRAY//M Business and IT Transformation in 3U TRANSFORM IT Who knew that moving to all-flash storage could help reduce the cost of IT? FlashArray//m makes server and workload investments more productive,
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More information18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
More informationLeveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands
Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands Unleash Your Data Center s Hidden Power September 16, 2014 Molly Rector CMO, EVP Product Management & WW Marketing
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationCloud Computing CS
Cloud Computing CS 15-319 Distributed File Systems and Cloud Storage Part I Lecture 12, Feb 22, 2012 Majd F. Sakr, Mohammad Hammoud and Suhail Rehman 1 Today Last two sessions Pregel, Dryad and GraphLab
More information18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Nov 01 09:53:32 2012 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2012 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
More informationCPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationIncreasing Performance of Existing Oracle RAC up to 10X
Increasing Performance of Existing Oracle RAC up to 10X Prasad Pammidimukkala www.gridironsystems.com 1 The Problem Data can be both Big and Fast Processing large datasets creates high bandwidth demand
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More informationFLAT DATACENTER STORAGE CHANDNI MODI (FN8692)
FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication
More informationAdvanced Database Systems
Lecture II Storage Layer Kyumars Sheykh Esmaili Course s Syllabus Core Topics Storage Layer Query Processing and Optimization Transaction Management and Recovery Advanced Topics Cloud Computing and Web
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationCSE 124: Networked Services Fall 2009 Lecture-19
CSE 124: Networked Services Fall 2009 Lecture-19 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa09/cse124 Some of these slides are adapted from various sources/individuals including but
More informationIBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage
IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage Silverton Consulting, Inc. StorInt Briefing 2017 SILVERTON CONSULTING, INC. ALL RIGHTS RESERVED Page 2 Introduction Unstructured data has
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationMassively Scalable File Storage. Philippe Nicolas, KerStor
Philippe Nicolas, KerStor SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under
More informationOpen Source Storage. Ric Wheeler Architect & Senior Manager April 30, 2012
Open Source Storage Architect & Senior Manager rwheeler@redhat.com April 30, 2012 1 Linux Based Systems are Everywhere Used as the base for commercial appliances Enterprise class appliances Consumer home
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationDemystifying the Cloud With a Look at Hybrid Hosting and OpenStack
Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Robert Collazo Systems Engineer Rackspace Hosting The Rackspace Vision Agenda Truly a New Era of Computing 70 s 80 s Mainframe Era 90
More information! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
More informationThe Google File System
The Google File System By Ghemawat, Gobioff and Leung Outline Overview Assumption Design of GFS System Interactions Master Operations Fault Tolerance Measurements Overview GFS: Scalable distributed file
More informationCloudian Sizing and Architecture Guidelines
Cloudian Sizing and Architecture Guidelines The purpose of this document is to detail the key design parameters that should be considered when designing a Cloudian HyperStore architecture. The primary
More informationDDN About Us Solving Large Enterprise and Web Scale Challenges
1 DDN About Us Solving Large Enterprise and Web Scale Challenges History Founded in 98 World s Largest Private Storage Company Growing, Profitable, Self Funded Headquarters: Santa Clara and Chatsworth,
More informationCloud Computing at Yahoo! Thomas Kwan Director, Research Operations Yahoo! Labs
Cloud Computing at Yahoo! Thomas Kwan Director, Research Operations Yahoo! Labs Overview Cloud Strategy Cloud Services Cloud Research Partnerships - 2 - Yahoo! Cloud Strategy 1. Optimizing for Yahoo-scale
More informationTALK THUNDER SOFTWARE FOR BARE METAL HIGH-PERFORMANCE SOFTWARE FOR THE MODERN DATA CENTER WITH A10 DATASHEET YOUR CHOICE OF HARDWARE
DATASHEET THUNDER SOFTWARE FOR BARE METAL YOUR CHOICE OF HARDWARE A10 Networks application networking and security solutions for bare metal raise the bar on performance with an industryleading software
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationHorizontal Scaling Solution using Linux Environment
Systems Software for the Next Generation of Storage Horizontal Scaling Solution using Linux Environment December 14, 2001 Carter George Vice President, Corporate Development PolyServe, Inc. PolyServe Goal:
More informationTHE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.
THE EMC ISILON STORY Big Data In The Enterprise Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon August, 2012 1 Big Data In The Enterprise Isilon Overview Isilon Technology
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 presented by Kun Suo Outline GFS Background, Concepts and Key words Example of GFS Operations Some optimizations in
More informationHigh Throughput WAN Data Transfer with Hadoop-based Storage
High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San
More informationFLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568
FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected
More informationGoogle File System. Arun Sundaram Operating Systems
Arun Sundaram Operating Systems 1 Assumptions GFS built with commodity hardware GFS stores a modest number of large files A few million files, each typically 100MB or larger (Multi-GB files are common)
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationCurrent Topics in OS Research. So, what s hot?
Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationCSE 153 Design of Operating Systems
CSE 153 Design of Operating Systems Winter 2018 Lecture 22: File system optimizations and advanced topics There s more to filesystems J Standard Performance improvement techniques Alternative important
More informationCSE 124: Networked Services Lecture-16
Fall 2010 CSE 124: Networked Services Lecture-16 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/23/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationThe Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler
The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler MSST 10 Hadoop in Perspective Hadoop scales computation capacity, storage capacity, and I/O bandwidth by
More informationSee what s new: Data Domain Global Deduplication Array, DD Boost and more. Copyright 2010 EMC Corporation. All rights reserved.
See what s new: Data Domain Global Deduplication Array, DD Boost and more 2010 1 EMC Backup Recovery Systems (BRS) Division EMC Competitor Competitor Competitor Competitor Competitor Competitor Competitor
More informationGlobal Headquarters: 5 Speen Street Framingham, MA USA P F
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com W H I T E P A P E R T h e R e a l i t y o f D a t a P r o t e c t i o n a n d R e c o v e r y a n
More informationIBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights
IBM Spectrum NAS Easy-to-manage software-defined file storage for the enterprise Highlights Reduce capital expenditures with storage software on commodity servers Improve efficiency by consolidating all
More informationImproved Solutions for I/O Provisioning and Application Acceleration
1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationCSD3 The Cambridge Service for Data Driven Discovery. A New National HPC Service for Data Intensive science
CSD3 The Cambridge Service for Data Driven Discovery A New National HPC Service for Data Intensive science Dr Paul Calleja Director of Research Computing University of Cambridge Problem statement Today
More informationDistributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju
Distributed Data Infrastructures, Fall 2017, Chapter 2 Jussi Kangasharju Chapter Outline Warehouse-scale computing overview Workloads and software infrastructure Failures and repairs Note: Term Warehouse-scale
More informationIntroducing Panasas ActiveStor 14
Introducing Panasas ActiveStor 14 SUPERIOR PERFORMANCE FOR MIXED FILE SIZE ENVIRONMENTS DEREK BURKE, PANASAS EUROPE INTRODUCTION TO PANASAS Storage that accelerates the world s highest performance and
More informationAUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT
AUTOMATING IBM SPECTRUM SCALE CLUSTER BUILDS IN AWS PROOF OF CONCEPT By Joshua Kwedar Sr. Systems Engineer By Steve Horan Cloud Architect ATS Innovation Center, Malvern, PA Dates: Oct December 2017 INTRODUCTION
More informationOracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.
Oracle NoSQL Database and Cisco- Collaboration that produces results 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. What is Big Data? SOCIAL BLOG SMART METER VOLUME VELOCITY VARIETY
More informationBrent Welch. Director, Architecture. Panasas Technology. HPC Advisory Council Lugano, March 2011
Brent Welch Director, Architecture Panasas Technology HPC Advisory Council Lugano, March 2011 Panasas Background Technology based on Object Storage concepts invented by company founder, Garth Gibson, a
More informationThe Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION
The Data-Protection Playbook for All-flash Storage KEY CONSIDERATIONS FOR FLASH-OPTIMIZED DATA PROTECTION The future of storage is flash The all-flash datacenter is a viable alternative You ve heard it
More informationHow to host and manage enterprise customers on AWS: TOYOTA, Nippon Television, UNIQLO use cases
How to host and manage enterprise customers on AWS: TOYOTA, Nippon Television, UNIQLO use cases Kazutaka Goto - Evangelist, cloudpack Ken Tamagawa - Sr. Manager, Solutions Architecture, Amazon Web Services
More informationpnfs Update A standard for parallel file systems HPC Advisory Council Lugano, March 2011
pnfs Update A standard for parallel file systems HPC Advisory Council Lugano, March 2011 Brent Welch welch@panasas.com Panasas, Inc. 1 Why a Standard for Parallel I/O? NFS is the only network file system
More informationTECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1
TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1 ABSTRACT This introductory white paper provides a technical overview of the new and improved enterprise grade features introduced
More informationEvaluating Cloud Storage Strategies. James Bottomley; CTO, Server Virtualization
Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization Introduction to Storage Attachments: - Local (Direct cheap) SAS, SATA - Remote (SAN, NAS expensive) FC net Types - Block
More informationSão Paulo. August,
São Paulo August, 28 2018 A Modernização das Soluções de Armazeamento e Proteção de Dados DellEMC Mateus Pereira Systems Engineer, DellEMC mateus.pereira@dell.com Need for Transformation 81% of customers
More informationScaleArc for SQL Server
Solution Brief ScaleArc for SQL Server Overview Organizations around the world depend on SQL Server for their revenuegenerating, customer-facing applications, running their most business-critical operations
More informationYuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013
Yuval Carmel Tel-Aviv University "Advanced Topics in About & Keywords Motivation & Purpose Assumptions Architecture overview & Comparison Measurements How does it fit in? The Future 2 About & Keywords
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationRUNNING PETABYTE-SIZED CLUSTERS
1 RUNNING PETABYTE-SIZED CLUSTERS CASE STUDIES FROM THE REAL WORLD 2 GROWTH OF UNSTRUCTURED DATA 80% 74% 67% 2013 2015 2017 37 EB Total Capacity Shipped, Worldwide 71 EB 133 EB Source: IDC Unstructured
More informationMarket Report. Scale-out 2.0: Simple, Scalable, Services- Oriented Storage. Scale-out Storage Meets the Enterprise. June 2010.
Market Report Scale-out 2.0: Simple, Scalable, Services- Oriented Storage Scale-out Storage Meets the Enterprise By Terri McClure June 2010 Market Report: Scale-out 2.0: Simple, Scalable, Services-Oriented
More informationStore Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete
Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete 1 DDN Who We Are 2 We Design, Deploy and Optimize Storage Systems Which Solve HPC, Big Data and Cloud Business
More informationExtreme computing Infrastructure
Outline Extreme computing School of Informatics University of Edinburgh Replication and fault tolerance Virtualisation Parallelism and parallel/concurrent programming Services So, you want to build a cloud
More informationISILON X-SERIES. Isilon X210. Isilon X410 ARCHITECTURE SPECIFICATION SHEET Dell Inc. or its subsidiaries.
SPECIFICATION SHEET Isilon X410 Isilon X210 ISILON X-SERIES The Dell EMC Isilon X-Series, powered by the Isilon OneFS operating system, uses a highly versatile yet simple scale-out storage architecture
More informationDeduplication File System & Course Review
Deduplication File System & Course Review Kai Li 12/13/13 Topics u Deduplication File System u Review 12/13/13 2 Storage Tiers of A Tradi/onal Data Center $$$$ Mirrored storage $$$ Dedicated Fibre Clients
More informationDATA PROTECTION IN A ROBO ENVIRONMENT
Reference Architecture DATA PROTECTION IN A ROBO ENVIRONMENT EMC VNX Series EMC VNXe Series EMC Solutions Group April 2012 Copyright 2012 EMC Corporation. All Rights Reserved. EMC believes the information
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationFLASHARRAY//M Smart Storage for Cloud IT
FLASHARRAY//M Smart Storage for Cloud IT //M AT A GLANCE PURPOSE-BUILT to power your business: Transactional and analytic databases Virtualization and private cloud Business critical applications Virtual
More informationAdvanced Computer Networks. Datacenter TCP
Advanced Computer Networks 263 3501 00 Datacenter TCP Spring Semester 2017 1 Oriana Riva, Department of Computer Science ETH Zürich Today Problems with TCP in the Data Center TCP Incast TPC timeouts Improvements
More informationGenomics on Cisco Metacloud + SwiftStack
Genomics on Cisco Metacloud + SwiftStack Technology is a large component of driving discovery in both research and providing timely answers for clinical treatments. Advances in genomic sequencing have
More informationSSDs that Think. Noam Mizrahi Vice President, Technology and Architecture CTO Office, Marvell
SSDs that Think Intelligent SSDs Can Handle a Larger Computing Load at the Edge Noam Mizrahi Vice President, Technology and Architecture CTO Office, Marvell People have been mining forever 18xx 19xx Gold
More informationIBM System Storage DCS3700
IBM System Storage DCS3700 Maximize performance, scalability and storage density at an affordable price Highlights Gain fast, highly dense storage capabilities at an affordable price Deliver simplified
More informationSoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility
Control Any Data. Any Cloud. Anywhere. SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility Understanding SoftNAS Cloud SoftNAS, Inc. is the #1 software-defined
More information