BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
|
|
- Madlyn Ashlie Gilmore
- 5 years ago
- Views:
Transcription
1 BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014
2 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest Benefit: Data development velocity Reciprocal Impact: Faster application development
3 WHAT WE RE NOT LOOKING AT TODAY Streaming Technologies In-memory Technologies
4 HADOOP 1.0 SYSTEMS ARCHITECTURE ASSUMPTIONS
5 HADOOP 1.0 SYSTEMS ARCHITECTURE ASSUMPTIONS Map/Reduce Abstracts storage, concurrency, execution HDFS Distributed, fault-tolerant filesystem Primarily designed for cost/scale Not POSIX compliant Works on commodity hardware Files are large (GBs to TBs) and append-only Access is large and sequential Hardware failure is common Fault-tolerance baked in Replicate data 3x Incrementally re-execute computation Avoid single points of failure
6 THE HADOOP SYSTEMS ARCHITECTURE PROBLEM
7 THE HADOOP PROBLEM - SYSTEMS ARCHITECTURE VIEW Technical View: Hadoop is a giant I/O platform I/O access fallen behind CPU/Memory density Strategy to address I/O vs processing divergence: Read/Write to as many drives in parallel! Related variable: Increase in spindle count drives additional network traffic (between nodes) Bounded by latency from read/write to disk (in addition to bandwidth)
8 THE HADOOP PROBLEM - SYSTEMS ARCHITECTURE VIEW Technical View (cont.): Increased number of disk read/writes has reciprocal impact on network bandwidth Teragen is a method for synthetic testing of network capacity Generates 3-9x the network load over normal operations Direct relationship between number of drives per node and number of MapReduce slots for that node Business View: Greater the spindle count, the lower the cost per TB Generally more average nodes are better than super nodes Consider data protection an additional consideration
9 THE HADOOP PROBLEM CPU CPU Performance Typically, CPU clock speed does not impact processing times Typically CPU is not a performance bottleneck (there are exceptions) Heuristics on CPU: No negative impact for running more and or higher quality CPU s Price and power consumption become primary boundary values for optimal ROI Single task typically uses one thread at a time Typically investing in more cores does not see a linear return Typically investing in more performant CPU s does not see a linear return Typically, threads experience a large amount of idle time while waiting for I/O response
10 THE HADOOP PROBLEM - MEMORY Memory Performance: Memory capacity does not have a significant impact on processing times Heuristics on Memory: No negative impact for running more and or higher quality Memory Price becomes the primary boundary values for optimal ROI Typically, Memory capacity does not have a significant impact on processing times Additional Memory will support MapReduce in the sorting process
11 THE HADOOP PROBLEM - DRIVES Drive Density Popular Drive Sizes: 1, 2, 3, 4TB drives Heuristics on drives: Larger the drive, the cheaper the $/TB = optimal ROI Larger drives create an opportunity for replication storms Disk rebuild can take longer and has potential to saturate the network impacting cluster performance Typically, drive size and latency has little impact on cluster performance There are exceptions Typically a less optimal ROI is achieved by using faster drives MapReduce is designed for long sequential reads and writes Less value in addressing disk latency
12 THE HADOOP PROBLEM - NETWORK Network Performance: Typically, 1GbE is not enough bandwidth for production Hadoop clusters Network Heuristics: Networking is a critical area for Hadoop clusters Production clusters have 10GbE, sometimes 2GbE Compression can drastically improve network performance Bandwidth beyond 10GbE is rarely a necessity Note on Networking: Differences between bandwidth and latency Higher bandwidth can lead to higher volume at a given latency Lower latency fabrics can lead to higher volume and higher response (improved environment performance)
13 THE HADOOP PROBLEM - POWER Power Considerations: Availability versus cost is the primary consideration Value tapers with size of cluster, for instance: 10 node production cluster for a smaller organization Larger than 20 nodes, the value tapers off If using single power supply: Consider MTBF at node level and network impact for rebuild Exception - Master Nodes: Dual power supplies are recommended
14 HADOOP COST CONSIDERATIONS Price per Node Performance per Node Capacity per Node Space, Power, Cooling Supportability - FTE Resiliency: Availability Fragmentation Failure Impact (risk)
15 THE HADOOP SYSTEMS ARCHITECTURE PROBLEM Architecture 3x Full Copy Replication No Compression No Data De-Duplication Near linear scalability (95%) Performance Profile Primary Bottleneck I/O Secondary Bary Bottleneck internode traffic (100 s nodes) CPU/Memory under-utilized per chassis Configuration Backup Solution Prod Sized Cluster Fixed disk sizing at the chassis level
16 WHY ZFS? Performance Compression Block Size Analytics Backup/Recovery Cost
17 ZFS HYPOTHESIS ZFS advantages for Hadoop DRAM Faster processing Larger block size (128k-1MB) Faster processing Compression Reduced footprint Encryption (slipped to Fall 2014) Expected Outcome Equivalent/near-equivalent processing Economical backup solution Reduced disk footprint Right size disk allocation to server
18 WE GET A DISRUPTIVE WIN IF Drive Hadoop from being I/O bound to being CPU/Memory bound Significantly Reduce disk footprint Huge implications if we drive all load to CPU
19 ZFS TESTING SYS ARCH LOCAL CLUSTER Hadoop: Cloudera Name Node 5 Data Nodes Servers: (6) X4-2L s OL 6.3 (upgraded to OL 6.5) (2) Intel Xeon E v2 10-core 3.0 GHz proc s 128GB Memory (DDR3-1600) (12) 4TB 7200 rpm 3.5-inch SAS-2 HDD Local disk Storage: 240TB total local disk
20 ZFS TESTING SYS ARCH ARRAY CLUSTER Hadoop: Cloudera Name Node 5 Data Nodes Servers: (6) X4-2L s OL 6.3 (upgraded to OL 6.5) (2) Intel Xeon E v2 10-core 3.0 GHz proc s 128GB Memory (DDR3-1600) (12) 4TB 7200 rpm 3.5-inch SAS-2 HDD Local disk Storage: ZS3-4 (Clustered) 2TB DRAM 6 Shelves 900GB 10K RPM HDD 108 TB
21 ZFS STORAGE REFERENCE ARCHITECTURE
22 BENCHMARK APPROACH Cluster Type Local Cluster Array Cluster Terasort 10GB 100GB 1TB TestDFSIO 100GB 1TB 10TB
23 DATA TESTING APPROACH Cluster Type Local Cluster Array Cluster Types of Jobs 3 Types written in Hive Simple (4x) Medium Complexity (4x) High Complexity/Inefficient Process (4x) Job Size 400GB 800 GB 1.6 TB
24 DATA TESTING FINDINGS LOCAL CLUSTER 1.6TB Simple (s) Medium (s) Complex (s) ARRAY CLUSTER 1.6TB Simple (s) Medium (s) Complex (s) *128K block
25 HADOOP AND ZFS TEST RESULTS SUMMARY Hadoop Operations: Completion of jobs approx 280% faster Larger jobs trend in a near 1:1 linear fashion Compression Compression of x achieved on lowest setting
26 BENEFITS OF RUNNING ZFS ON HADOOP Reduced cluster overhead with replication factor of 2x Reduced storage with replication factor to 2x Increased protection: number of copies of data to 4x Added compression of > 3x (for compressible data) Added caching decreasing I/O response times Added data protection (RAID 1) no overhead Added fault tolerance via clustered heads
27 PROCESSING IMPLICATIONS TYPE STORAGE CAPACITY PROCESSING (SERVERS) 24 HOURS (PB) ANNUAL (PB) Server Array
28 IMPACT OF YARN AND SPARK Reduced Map/Reduce Ratio Management for mixed workloads Greater flexibility on coding choices Lower latency for request to completion = faster QOS by job/process opportunities Greater flexibility on archiving/storing data Possibility of using higher levels of compression for data segments Increased complexity of process/library management
29 EXABYTE PLATFORM CONSIDERATIONS Compression Access Tiered Data Encryption Capacity Network Speed Workload Segmentation Data Fragmentation Block rebuild/disk rebuild process
30 MINE IS BIG HOW BIG IS YOURS? Global Data Census: zettabytes 2020: 50+ zettabytes (est) Data Scale: KB: 1,000 B MB: 1,000,000 B GB: 1,000,000,000 B TB: 1,000,000,000,000 B PB: 1,000,000,000,000 B EB: 1,000,000,000,000,000 B ZB: 1,000,000,000,000,000,000B
31 MINE IS BIG HOW BIG IS YOURS? GraySort Benchmark: 2009: TB/Min Yahoo, 3452 Nodes (2x, 8GB, 4 SATA) 2011: TB/Min UC San Diego, 52 Nodes (2 CPU, 24GB, 16x 500GB) 2013: 1.42 TB/Min - Yahoo, 2100 Nodes (2CPU, 64GB, 12x 3TB) Yahoo: 2012: 42,000 nodes, 200PB, 20 Prod Clusters (largest is 4000 nodes) Facebook: 2010: 2000 nodes, 21PB Spotify: 2014: 694 heterogeneous nodes, 14.25PB (12k jobs/day)
32 HADOOP ON ZFS TECHNICAL WHITEPAPER Technical Whitepaper Published Follow for notification of link
33 Contact Information: Brett Weninger, Managing Director
Analytics in the cloud
Analytics in the cloud Dow we really need to reinvent the storage stack? R. Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, Renu Tewari Image courtesy NASA
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Rene Meyer, Ph.D. AMAX Corporation Publish date: October 25, 2018 Abstract Introduction
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationAccelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet
WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test
More informationDeep Learning Performance and Cost Evaluation
Micron 5210 ION Quad-Level Cell (QLC) SSDs vs 7200 RPM HDDs in Centralized NAS Storage Repositories A Technical White Paper Don Wang, Rene Meyer, Ph.D. info@ AMAX Corporation Publish date: October 25,
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationThe World s Fastest Backup Systems
3 The World s Fastest Backup Systems Erwin Freisleben BRS Presales Austria 4 EMC Data Domain: Leadership and Innovation A history of industry firsts 2003 2004 2005 2006 2007 2008 2009 2010 2011 First deduplication
More informationOptimizing Apache Spark with Memory1. July Page 1 of 14
Optimizing Apache Spark with Memory1 July 2016 Page 1 of 14 Abstract The prevalence of Big Data is driving increasing demand for real -time analysis and insight. Big data processing platforms, like Apache
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationCisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr
Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationHCI: Hyper-Converged Infrastructure
Key Benefits: Innovative IT solution for high performance, simplicity and low cost Complete solution for IT workloads: compute, storage and networking in a single appliance High performance enabled by
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationMicron and Hortonworks Power Advanced Big Data Solutions
Micron and Hortonworks Power Advanced Big Data Solutions Flash Energizes Your Analytics Overview Competitive businesses rely on the big data analytics provided by platforms like open-source Apache Hadoop
More informationWHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD
Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Exclusive Summary We live in the data age. It s not easy to measure the total volume of data stored
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationCold Storage: The Road to Enterprise Ilya Kuznetsov YADRO
Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO Agenda Technical challenge Custom product Growth of aspirations Enterprise requirements Making an enterprise cold storage product 2 Technical Challenge
More informationDeploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c
White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits
More informationDell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results
Dell Fluid Data solutions Powerful self-optimized enterprise storage Dell Compellent Storage Center: Designed for business results The Dell difference: Efficiency designed to drive down your total cost
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationMicrosoft Exchange Server 2010 workload optimization on the new IBM PureFlex System
Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Best practices Roland Mueller IBM Systems and Technology Group ISV Enablement April 2012 Copyright IBM Corporation, 2012
More informationMicrosoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage
Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationWarehouse- Scale Computing and the BDAS Stack
Warehouse- Scale Computing and the BDAS Stack Ion Stoica UC Berkeley UC BERKELEY Overview Workloads Hardware trends and implications in modern datacenters BDAS stack What is Big Data used For? Reports,
More informationCSE 124: Networked Services Lecture-17
Fall 2010 CSE 124: Networked Services Lecture-17 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/30/2010 CSE 124 Networked Services Fall 2010 1 Updates PlanetLab experiments
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationGFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures
GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,
More informationAccelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card
Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database
More informationDell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions
Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions A comparative analysis with PowerEdge R510 and PERC H700 Global Solutions Engineering Dell Product
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationCluster Setup and Distributed File System
Cluster Setup and Distributed File System R&D Storage for the R&D Storage Group People Involved Gaetano Capasso - INFN-Naples Domenico Del Prete INFN-Naples Diacono Domenico INFN-Bari Donvito Giacinto
More informationEvaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA
Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for storage
More informationThe Google File System
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School
More informationIdentifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage
Identifying Performance Bottlenecks with Real- World Applications and Flash-Based Storage TechTarget Dennis Martin 1 Agenda About Demartek Enterprise Data Center Environments Storage Performance Metrics
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationEvaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades
Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state
More informationIsilon Performance. Name
1 Isilon Performance Name 2 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance Streaming Reads Performance Tuning OneFS Architecture Overview Copyright 2014 EMC Corporation.
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationTITLE. the IT Landscape
The Impact of Hyperconverged Infrastructure on the IT Landscape 1 TITLE Drivers for adoption Lower TCO Speed and Agility Scale Easily Operational Simplicity Hyper-converged Integrated storage & compute
More informationIBM Emulex 16Gb Fibre Channel HBA Evaluation
IBM Emulex 16Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance
More informationStorage Adapter Testing Report
-Partnership that moves your business forward -Making imprint on technology since 1986 LSI MegaRAID 6Gb/s SATA+SAS Storage Adapter Testing Report Date: 12/21/09 (An Authorized Distributor of LSI, and 3Ware)
More informationConfiguring Short RPO with Actifio StreamSnap and Dedup-Async Replication
CDS and Sky Tech Brief Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication Actifio recommends using Dedup-Async Replication (DAR) for RPO of 4 hours or more and using StreamSnap for
More informationYuval Carmel Tel-Aviv University "Advanced Topics in Storage Systems" - Spring 2013
Yuval Carmel Tel-Aviv University "Advanced Topics in About & Keywords Motivation & Purpose Assumptions Architecture overview & Comparison Measurements How does it fit in? The Future 2 About & Keywords
More informationStorage Systems Market Analysis Dec 04
Storage Systems Market Analysis Dec 04 Storage Market & Technologies World Wide Disk Storage Systems Market Analysis Wor ldwi d e D i s k Storage S y s tems Revenu e b y Sup p l i e r, 2001-2003 2001
More informationCloudian Sizing and Architecture Guidelines
Cloudian Sizing and Architecture Guidelines The purpose of this document is to detail the key design parameters that should be considered when designing a Cloudian HyperStore architecture. The primary
More informationTPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage
TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage Performance Study of Microsoft SQL Server 2016 Dell Engineering February 2017 Table of contents
More informationAccelerate Applications Using EqualLogic Arrays with directcache
Accelerate Applications Using EqualLogic Arrays with directcache Abstract This paper demonstrates how combining Fusion iomemory products with directcache software in host servers significantly improves
More informationPerformance Benefits of Running RocksDB on Samsung NVMe SSDs
Performance Benefits of Running RocksDB on Samsung NVMe SSDs A Detailed Analysis 25 Samsung Semiconductor Inc. Executive Summary The industry has been experiencing an exponential data explosion over the
More informationDell EMC SCv3020 7,000 Mailbox Exchange 2016 Resiliency Storage Solution using 7.2K drives
Dell EMC SCv3020 7,000 Mailbox Exchange 2016 Resiliency Storage Solution using 7.2K drives Microsoft ESRP 4.0 Abstract This document describes the Dell EMC SCv3020 storage solution for Microsoft Exchange
More informationIsilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division
Isilon Scale Out NAS Morten Petersen, Senior Systems Engineer, Isilon Division 1 Agenda Architecture Overview Next Generation Hardware Performance Caching Performance SMB 3 - MultiChannel 2 OneFS Architecture
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationTake control of storage performance
Take control of storage performance Transition From Speed To Management SSD + RAID 2008-2011 Reduce time to market Inherent bottlenecks Re-architect for better performance NVMe, SCSI Express Reads & Writes
More informationBroadberry. Hyper-Converged Solution. Date: Q Application: Hyper-Converged S2D Storage. Tags: Storage Spaces Direct, DR, Hyper-V
TM Hyper-Converged Solution Date: Q2 2018 Application: Hyper-Converged S2D Storage Tags: Storage Spaces Direct, DR, Hyper-V The Cam Academy Trust Set up in 2011 to oversee the conversion of Comberton Village
More informationFLASHARRAY//M Business and IT Transformation in 3U
FLASHARRAY//M Business and IT Transformation in 3U TRANSFORM IT Who knew that moving to all-flash storage could help reduce the cost of IT? FlashArray//m makes server and workload investments more productive,
More informationThe amount of data increases every day Some numbers ( 2012):
1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect
More informationDDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage
DDN DDN Updates Data DirectNeworks Japan, Inc Shuichi Ihara DDN A Broad Range of Technologies to Best Address Your Needs Protection Security Data Distribution and Lifecycle Management Open Monitoring Your
More informationAccelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More information2/26/2017. The amount of data increases every day Some numbers ( 2012):
The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to
More informationSolid Access Technologies, LLC
Newburyport, MA, USA USSD 200 USSD 200 The I/O Bandwidth Company Solid Access Technologies, LLC Solid Access Technologies, LLC Why Are We Here? The Storage Perfect Storm Traditional I/O Bottleneck Reduction
More informationEMC SYMMETRIX VMAX 40K SYSTEM
EMC SYMMETRIX VMAX 40K SYSTEM The EMC Symmetrix VMAX 40K storage system delivers unmatched scalability and high availability for the enterprise while providing market-leading functionality to accelerate
More informationEMC SYMMETRIX VMAX 40K STORAGE SYSTEM
EMC SYMMETRIX VMAX 40K STORAGE SYSTEM The EMC Symmetrix VMAX 40K storage system delivers unmatched scalability and high availability for the enterprise while providing market-leading functionality to accelerate
More informationUCS Invicta: A New Generation of Storage Performance. Mazen Abou Najm DC Consulting Systems Engineer
UCS Invicta: A New Generation of Storage Performance Mazen Abou Najm DC Consulting Systems Engineer HDDs Aren t Designed For High Performance Disk 101 Can t spin faster (200 IOPS/Drive) Can t seek faster
More informationIBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide
V7 Unified Asynchronous Replication Performance Reference Guide IBM V7 Unified R1.4.2 Asynchronous Replication Performance Reference Guide Document Version 1. SONAS / V7 Unified Asynchronous Replication
More informationscc: Cluster Storage Provisioning Informed by Application Characteristics and SLAs
scc: Cluster Storage Provisioning Informed by Application Characteristics and SLAs Harsha V. Madhyastha*, John C. McCullough, George Porter, Rishi Kapoor, Stefan Savage, Alex C. Snoeren, and Amin Vahdat
More informationPowerVault MD3 SSD Cache Overview
PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationVeritas NetBackup on Cisco UCS S3260 Storage Server
Veritas NetBackup on Cisco UCS S3260 Storage Server This document provides an introduction to the process for deploying the Veritas NetBackup master server and media server on the Cisco UCS S3260 Storage
More informationJuxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms
, pp.289-295 http://dx.doi.org/10.14257/astl.2017.147.40 Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms Dr. E. Laxmi Lydia 1 Associate Professor, Department
More informationBig Data and Object Storage
Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity
More information18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.
18-hdfs-gfs.txt Thu Oct 27 10:05:07 2011 1 Notes on Parallel File Systems: HDFS & GFS 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Ghemawat, Gobioff, Leung, "The Google File
More informationScale-out Data Deduplication Architecture
Scale-out Data Deduplication Architecture Gideon Senderov Product Management & Technical Marketing NEC Corporation of America Outline Data Growth and Retention Deduplication Methods Legacy Architecture
More informationImproved Solutions for I/O Provisioning and Application Acceleration
1 Improved Solutions for I/O Provisioning and Application Acceleration August 11, 2015 Jeff Sisilli Sr. Director Product Marketing jsisilli@ddn.com 2 Why Burst Buffer? The Supercomputing Tug-of-War A supercomputer
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationIBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?
FlashSystem Family 2015 IBM FlashSystem IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein? PiRT - Power i Round Table 17 Sep. 2015 Daniel Gysin IBM
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationBest Practices for Deploying Hadoop Workloads on HCI Powered by vsan
Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan Chen Wei, ware, Inc. Paudie ORiordan, ware, Inc. #vmworld HCI2038BU #HCI2038BU Disclaimer This presentation may contain product features
More informationApache Spark Graph Performance with Memory1. February Page 1 of 13
Apache Spark Graph Performance with Memory1 February 2017 Page 1 of 13 Abstract Apache Spark is a powerful open source distributed computing platform focused on high speed, large scale data processing
More informationHPE Scalable Storage with Intel Enterprise Edition for Lustre*
HPE Scalable Storage with Intel Enterprise Edition for Lustre* HPE Scalable Storage with Intel Enterprise Edition For Lustre* High Performance Storage Solution Meets Demanding I/O requirements Performance
More informationIME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning
IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning September 22 nd 2015 Tommaso Cecchi 2 What is IME? This breakthrough, software defined storage application
More informationDDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1
1 DDN DDN Updates DataDirect Neworks Japan, Inc Nobu Hashizume DDN Storage 2018 DDN Storage 1 2 DDN A Broad Range of Technologies to Best Address Your Needs Your Use Cases Research Big Data Enterprise
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationLATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN
LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN Russ Fellows Enabling you to make the best technology decisions November 2017 EXECUTIVE OVERVIEW* The new Intel Xeon Scalable platform
More informationI/O CANNOT BE IGNORED
LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.
More informationHP SAS benchmark performance tests
HP SAS benchmark performance tests technology brief Abstract... 2 Introduction... 2 Test hardware... 2 HP ProLiant DL585 server... 2 HP ProLiant DL380 G4 and G4 SAS servers... 3 HP Smart Array P600 SAS
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationRIGHTNOW A C E
RIGHTNOW A C E 2 0 1 4 2014 Aras 1 A C E 2 0 1 4 Scalability Test Projects Understanding the results 2014 Aras Overview Original Use Case Scalability vs Performance Scale to? Scaling the Database Server
More informationUsing Transparent Compression to Improve SSD-based I/O Caches
Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationNew Approach to Unstructured Data
Innovations in All-Flash Storage Deliver a New Approach to Unstructured Data Table of Contents Developing a new approach to unstructured data...2 Designing a new storage architecture...2 Understanding
More informationDecentralized Distributed Storage System for Big Data
Decentralized Distributed Storage System for Big Presenter: Wei Xie -Intensive Scalable Computing Laboratory(DISCL) Computer Science Department Texas Tech University Outline Trends in Big and Cloud Storage
More informationThe Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput
Solution Brief The Impact of SSD Selection on SQL Server Performance Understanding the differences in NVMe and SATA SSD throughput 2018, Cloud Evolutions Data gathered by Cloud Evolutions. All product
More informationDataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage
Solution Brief DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage DataON Next-Generation All NVMe SSD Flash-Based Hyper-Converged
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationTable 1 The Elastic Stack use cases Use case Industry or vertical market Operational log analytics: Gain real-time operational insight, reduce Mean Ti
Solution Overview Cisco UCS Integrated Infrastructure for Big Data with the Elastic Stack Cisco and Elastic deliver a powerful, scalable, and programmable IT operations and security analytics platform
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Storage Innovation at the Core of the Enterprise Robert Klusman Sr. Director Storage North America 2 The following is intended to outline our general product direction. It is intended for information
More informationDynamically unify your data center Dell Compellent: Self-optimized, intelligently tiered storage
Dell Fluid Data architecture Dynamically unify your data center Dell Compellent: Self-optimized, intelligently tiered storage Dell believes that storage should help you spend less while giving you the
More information