I/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

Similar documents
Backup, Recovery and Archive in the Clouds May 2, 2013

Technology Challenges for Clouds. Henry Newman CTO Instrumental Inc

Solid Access Technologies, LLC

Coming Changes in Storage Technology. Be Ready or Be Left Behind

Insight: that s for NSA Decision making: that s for Google, Facebook. so they find the best way to push out adds and products

New Approach to Unstructured Data

Tape Data Storage in Practice Minnesota Supercomputing Institute

@Pentaho #BigDataWebSeries

DELL POWERVAULT MD FAMILY MODULAR STORAGE THE DELL POWERVAULT MD STORAGE FAMILY

HPE Converged Data Solutions

Niagara Regional Broadband Network Ltd. (NRBN) Company Overview

Dell EMC Isilon All-Flash

Global Headquarters: 5 Speen Street Framingham, MA USA P F

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Data Movement & Tiering with DMF 7

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Data Movement & Storage Using the Data Capacitor Filesystem

Hybrid Storage Architecture Marries Performance and Efficiency

Organizational Update: December 2015

Lustre* is designed to achieve the maximum performance and scalability for POSIX applications that need outstanding streamed I/O.

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

e BOOK Do you feel trapped by your database vendor? What you can do to take back control of your database (and its associated costs!

Mainframe Storage Best Practices Utilizing Oracle s Virtual Tape Technology

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

A Ten Year ( ) Storage Landscape LTO Tape Media, HDD, NAND

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Reducing Costs in the Data Center Comparing Costs and Benefits of Leading Data Protection Technologies

Get More Out of Storage with Data Domain Deduplication Storage Systems

Simplifying Collaboration in the Cloud

Tech Talk on HPC. Ken Claffey. VP, Cloud Systems. May 2016

Disk-to-Disk backup. customer experience. Harald Burose. Architect Hewlett-Packard

Tiered File System without Tiers. Laura Shepard, Isilon

Server & Storage Offering Guide for SME

THE STATE OF CLOUD & DATA PROTECTION 2018

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

The Future of Data Archive

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

Dell PowerVault MD Family. Modular storage. The Dell PowerVault MD storage family

i-dhsm Dynamic Hierarchical Storage Manager

NetVault Backup Client and Server Sizing Guide 3.0

Storage for HPC, HPDA and Machine Learning (ML)

Cluster-Level Google How we use Colossus to improve storage efficiency

DDN About Us Solving Large Enterprise and Web Scale Challenges

The Best of Times/The Worst of Times

HPC Growing Pains. IT Lessons Learned from the Biomedical Data Deluge

IBM TS4300 with IBM Spectrum Storage - The Perfect Match -

Accelerate the Journey to 100% Virtualization with EMC Backup and Recovery. Copyright 2010 EMC Corporation. All rights reserved.

VSTOR Vault Mass Storage at its Best Reliable Mass Storage Solutions Easy to Use, Modular, Scalable, and Affordable

BLOCKBUSTER BACKUP REDESIGN

Cat Herding. Why It s Time for a Millennial Approach to Storage. Cloud Expo East Western Digital Corporation All rights reserved 01/25/2016

Veritas NetBackup on Cisco UCS S3260 Storage Server

High Performance Computing Data Management. Philippe Trautmann BDM High Performance Computing Global Research

Unit 14 plan installation and maintenance of hardware in a technology system

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

First Financial Bank. Highly available, centralized, tiered storage brings simplicity, reliability, and significant cost advantages to operations

5 Fundamental Strategies for Building a Data-centered Data Center

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Table 9. ASCI Data Storage Requirements

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Data Centers and Cloud Computing. Data Centers

Information Infrastructure Forum

Shared Storage Solutions. for Collaborative Media Production Networks

TechTour. Winning Technology Roadmaps. Sig Knapstad. Cutting Edge

AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION?

Actifio Test Data Management

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Overcoming Obstacles to Petabyte Archives

Universal Storage. Innovation to Break Decades of Tradeoffs VASTDATA.COM

Disks and their Cloudy Future

Refining and redefining HPC storage

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

Technology Trend : Green IT and Virtualizaiton. Education and Research Sun Microsystems(Thailand)

Network Support for Data Intensive Science

Petabytes of Preservation on Tape Jason Pierson Oct 2012

Beyond Petascale. Roger Haskin Manager, Parallel File Systems IBM Almaden Research Center

IBM Storage Software Strategy

6,000 Cameras in Time Square 210 million Cameras worldwide

Hyper-converged Secondary Storage for Backup with Deduplication Q & A. The impact of data deduplication on the backup process

Data, Data, Everywhere. We are now in the Big Data Era.

NetVault Backup Client and Server Sizing Guide 2.1

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

The Fastest Scale-Out NAS

HIGH PERFORMANCE STORAGE SOLUTION PRESENTATION All rights reserved RAIDIX

Backup and Disaster Recovery: DIY or Buy? Presented by: Stanley Louissaint

Your cloud solution for EO Data access and processing

WHITE PAPER HYBRID CLOUD: FLEXIBLE, SCALABLE, AND COST-EFFICIENT UK: US: HK:

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

User profiles inside out Helge Klein,

WHITE PAPER QUANTUM S XCELLIS SCALE-OUT NAS. Industry-leading IP Performance for 4K, 8K and Beyond

HPSS Treefrog Introduction.

The Benefits of Solid State in Enterprise Storage Systems. David Dale, NetApp

Field Update Expanded Deduplication Sizing Guidelines. Oct 2015

Web Object Scaler. WOS and IRODS Data Grid Dave Fellinger

APPLYING THE POWER OF AI TO YOUR VIDEO PRODUCTION STORAGE

Nicman Group Test Data Management 2.0 Leveraging Copy Data Virtualization Technology in QA for SQuAD. November 2016

24th Annual ROTH Conference

Building on Strengths Learning From Differences. Baron Schwartz O'Reilly MySQL Conference & Expo 2011

Transcription:

I/O Challenges: Todays I/O Challenges for Big Data Analysis Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

The Challenge is Archives Big data in HPC means archive and archive translates to a tape archive at least today Cost per PB per year What about reliability? For the foreseeable future, tape is the archive medium of choice

Archive challenges Networking challenges Cost challenges RAS challenges Scalability challenges I/O performance challenges Metadata challenges Final thoughts

Networking Challenges Those who own the archive own the big data solutions as you cannot move data around

Cannot move archive data Networking data rates are not increasing as fast as data generation In a perfect world: So if you wanted to read the whole archive to search for new information even with an OC-192 channel, it may take years

Cost Challenges or Business Opportunity Those who control the archive own the big data solutions and the resulting benefits

Cost Justifying keeping data you might or might not need today is difficult In the 80s for example the DNA between chromosomes was called junk DNA We know today it is not junk Do we know what we need to keep? We do not know what information we will find in the data we collect today Some industries have a policy of keeping everything forever Geosciences is a good example Satellite imagery is another These industries build in the cost of archives into the operational costs These costs are well defined and well known based on industry trends

RAS Challenges Those who own the archive own the big data solutions, as it is difficult to maintain data integrity and availability over long periods of time, will benefit

RAS is not easy Though tapes might last 30 years: Interfaces, OSs and hardware do not Try to find a supported fibre channel 1 Gbit interface Try to find an operating system where this hardware is supported How many motherboards still support PCI-X? What about things like silent corruption? Shameless plug for tutorial I am giving on this topic at IEEE Mass Storage Conference next week http://www.storageconference.org/2013/presentations.html» Slides will be up in a few weeks What about the time to migration to new technology? Architecting for availability and meeting SLAs is difficult when migrating Becomes a bigger issue when namespaces are broken up into smaller chunks

Scalability Challenges Those who control the archive own the big data solutions, and as more data is collected and the size of the archive grows, and the organization will continue to win

Archives are hard Today s largest single namespace archives have over 55 PB Scaling for archives technologies choices How many namespaces do you want to manage? 1, 5, 10, 100, 1000? Large namespaces are important for more than just searching for your file You can always put in a DB interface to search for a file Large namespace size is important for management efficiency Namespace load leveling is labor intensive

I/O Performance Challenges Those who understand technology direction will be able to use the archive effectively and will be successful

I/O performance is not scaling Performance is not increasing with density growth for any technology (SSD, disk, tape) This trend is not expected to change This impacts archives in many different ways Migration to new technology Access time for data Architecture and design must take this into account

MB/sec per GB of storage Instrumental Proprietary

Metadata Challenges Those who own the archive own the big data solutions as reindexing metadata remotely at future scale will be impossible

Metadata Bringing back a large archive over the network and indexing for new information is impossible If someone has a new idea for how to use data and the archive is owned by someone else, you would not want to run the re-indexing on their machines You would need to bring the database back to your location There seems an industry goal of outsourcing archives This could have a long term negative financial impact on those companies

Final Thoughts Archives are and will be the central point for big data analysis To be successful with archives, planning for costs, reliability, scalability and how the archive will be used is critical to the success Those that have successful archives, like the geosciences industry, do all of the above Archive performance must be matched with data usage requirements Do you need to use 1 TB per day or 100 TB per day? Do you need to ingest 10 TB a day or 500 TB per day? Archive management and design requires careful planning Spend the time and money to do it right the 1 st time as doing the second time is far more costly

Last Thought Without owning the archive, you are dependent on a 3rd party for your future That 3rd party might or might not be a competitor Ownership of the raw data is critical to your business and potential to the future of a nation

Thank you Thanks for listening Copyright 2013, Instrumental, Inc.