Data Centric Computing

Similar documents
3D NAND Technology Scaling helps accelerate AI growth

FLASH.NEXT. Zero to One Million IOPS in a Flash. Ahmed Iraqi Account Systems Engineer North Africa

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

ACCELERATE YOUR ANALYTICS GAME WITH ORACLE SOLUTIONS ON PURE STORAGE

Price Performance Analysis of NxtGen Vs. Amazon EC2 and Rackspace Cloud.

DCBench: a Data Center Benchmark Suite

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104

Hybrid Storage Architecture Marries Performance and Efficiency

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

Optimizing Apache Spark with Memory1. July Page 1 of 14

Nimble Storage vs HPE 3PAR: A Comparison Snapshot

Taming the Data Deluge With IBM Information Infrastructure The smart movement and management of information capacity growth without complexity

Performance Analysis of Virtual Machines on NxtGen ECS and Competitive IaaS Offerings An Examination of Web Server and Database Workloads

Aerospike Scales with Google Cloud Platform

Copyright 2012 EMC Corporation. All rights reserved.

Running Head: APPLIED KNOWLEDGE MANAGEMENT. MetaTech Consulting, Inc. White Paper

SAP IQ Software16, Edge Edition. The Affordable High Performance Analytical Database Engine

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Improve Web Application Performance with Zend Platform

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Taking Hyper-converged Infrastructure to a New Level of Performance, Efficiency and TCO

The Foundry-Packaging Partnership. Enabling Future Performance. Jon A. Casey. IBM Systems and Technology Group

Analyzing I/O Performance on a NEXTGenIO Class System

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

Best Practices for Validating the Performance of Data Center Infrastructure. Henry He Ixia

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Technology Challenges for Clouds. Henry Newman CTO Instrumental Inc

Performance and Energy Efficiency of the 14 th Generation Dell PowerEdge Servers

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

DDN Annual High Performance Computing Trends Survey Reveals Rising Deployment of Flash Tiers & Private/Hybrid Clouds vs.

EMC XTREMCACHE ACCELERATES MICROSOFT SQL SERVER

CPS104 Computer Organization and Programming Lecture 18: Input-Output. Outline of Today s Lecture. The Big Picture: Where are We Now?

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

CSE 451: Operating Systems Spring Module 12 Secondary Storage

Was ist dran an einer spezialisierten Data Warehousing platform?

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Solution Brief. A Key Value of the Future: Trillion Operations Technology. 89 Fifth Avenue, 7th Floor. New York, NY

November IBM XL C/C++ Compilers Insights on Improving Your Application

Warehouse- Scale Computing and the BDAS Stack

Flash in a Hybrid Cloud World. How Cloud Shift will affect flash in the Data Center Steve Knipple: Cloud Shift Advisors

The Arrival of Affordable In-Memory Database Management Systems

CS Project Report

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

Isilon Performance. Name

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

New Approach to Unstructured Data

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

IBM FlashSystems with IBM i

Lecture-14 (Memory Hierarchy) CS422-Spring

Evolving To The Big Data Warehouse

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Searching for Expertise

Design and Implementation of a Random Access File System for NVRAM

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

Executive Brief June 2014

Introduction to OpenMP. Lecture 10: Caches

Copyright 2012 EMC Corporation. All rights reserved.

Toward a Memory-centric Architecture

I/O Profiling Towards the Exascale

Data Intensive Computing SUBTITLE WITH TWO LINES OF TEXT IF NECESSARY PASIG June, 2009

Starting small to go Big: Building a Living Database

EMC XTREMCACHE ACCELERATES ORACLE

Build a system health check for Db2 using IBM Machine Learning for z/os

IBM XIV Storage System

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Lawrence Ying Sr Tech Lead, Google Platforms Aug 9, 2018

Top 4 considerations for choosing a converged infrastructure for private clouds

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

CS 201 The Memory Hierarchy. Gerson Robboy Portland State University

Inside Intel Core Microarchitecture

Oktober 2018 Dell Tech. Forum München

Eliminating Dark Bandwidth

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Best Practices for Setting BIOS Parameters for Performance

Storage Designed to Support an Oracle Database. White Paper

SNIA Emerald SNIA Emerald Power Efficiency Measurement Specification. SNIA Emerald Program

Micron Quad-Level Cell Technology Brings Affordable Solid State Storage to More Applications

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Scientific Applications. Chao Sun

An Overview of Projection, Partitioning and Segmentation of Big Data Using Hp Vertica

CS3350B Computer Architecture CPU Performance and Profiling

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

Big Data: From Transactions, To Interactions

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Composite Metrics for System Throughput in HPC

DRAM and Storage-Class Memory (SCM) Overview

Forget IOPS: A Proper Way to Characterize & Test Storage Performance Peter Murray SwiftTest

Cray XE6 Performance Workshop

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Header Compression Capacity Calculations for Wireless Networks

Lecture: Large Caches, Virtual Memory. Topics: cache innovations (Sections 2.4, B.4, B.5)

Disks and Files. Storage Structures Introduction Chapter 8 (3 rd edition) Why Not Store Everything in Main Memory?

Transcription:

Piyush Chaudhary HPC Solutions Development <piyushc@us.ibm.com> Data Centric Computing SPXXL/SCICOMP Summer 2011

Agenda What is Data Centric Computing? What is Driving Data Centric Computing? Puzzle vs. Mystery Characteristics of Data Centric Computing Workloads Hardware Changes Needed to Handle Data Centric Computing Workloads 2 5/11/2011

What is Data Centric Computing? Data-centric computing concerns the acquisition, processing, analysis, storage, and query of data sets and streams. M. Gokhale et al. Not a new concept but the growth in data has caused renewed focus and specialization A Rose by Any Other Name.. Data Intensive Computing Data Mining Data Warehousing Analytics Deep Q&A Big Data How is Data Centric Computing Different from Traditional Compute Centric Computing? Data is moved through various storage hierarchies to compute resources, as needed, in traditional model whereas computation is done where the data lives in the data centric model Typically operations per byte are low in a data centric model compared to compute centric model Data Centric Computing has a shallow and persistent storage hierarchy 3 5/11/2011

What is Driving Data Centric Computing? Business data is doubling every 1.2 years* Companies that adopt data driven decision making achieve 5 6 % improvement in productivity than can be explained by other factors, this difference is enough to separate winners from losers in most industries Based on research by Erik Brynjolfsson (Sloan School of Management, MIT), Lorin Hitt (Wharton School of University of Pennsylvania) and Heekyung Kim (MIT) First quantitative evidence of productivity growth anecdotes Data Centric Computing has been instrumental to the success of companies like Google, Facebook and many more To help companies find meaningful patterns by sifting through business data, companies like IBM, Oracle, SAP and Microsoft have collectively spent over $25B buying up specialist companies in the field IBM alone has spent $14B on 25 companies that focus on data analytics IBM employs over 8000 consultants and 200 mathematicians to focus on analytics IBM expects this business to grow to $16B by 2015 * Lohr, S.;, When there s no such thing as too much information, from The New York Times, April 23, 2011 4 5/11/2011

Puzzle vs. Mystery* Critical piece of data is missing Need to add to data collection Need to develop systems capable of ingesting large amounts of data, summarize and correlate it All the data is available and in fact may have too much data Requires judgment and assessment of uncertainty Need to develop expert systems to analyze the available information, categorize it, rank and correlate it * Gladwell, M.;, Open Secrets Enron, intelligence and the peril of too much information, from The New Yorker, January 8, 2007 URL: http://www.gladwell.com/2007/2007_01_08_a_secrets.html 5 5/11/2011

Characteristics of Data Centric Computing Workloads Characteristics: Pattern Matching in Unstructured Data Real Time or Forensic Exact or Approximate Matching Text, Video, Speech, Web, mixed Record Processing in Structured Data Database requirements Analysis / Computation Graph Assembly and Analysis Correlation and Scoring Sorting Optimization Requirements: Pattern Matching in Unstructured Data Low ops per byte Random access Integer dominated Record Processing in Structured Data Coherency Locking Analysis / Computation (relative to HPC) Mixed Integer / Floating point Computational load per byte low Parallelism in algorithms and data difficult to identify Locality in algorithms and data low 6 5/11/2011

Hardware Changes Required to Handle Data Centric Computing Workloads Note: For a class of data centric computing workloads the current trajectory of systems will be sufficient Processor Fast integer operations Efficient vector integer operations Memory Small latency and high bandwidth for small size random accesses Intelligent prefetching, explicit pattern following Partial cache line fetch Network Small latency and high bandwidth for small size random messages Resilience in the face of multiple link failures (availability and performance) Need similar features for external connectivity Storage Biggest area of concern and needs the most innovation Storage class memory will be key to meeting the challenges 7 5/11/2011

System Architecture Moving data through storage, memory and cache hierarchies is very inefficient for data intensive workloads since the typical operations per byte are low Disk latency and bandwidth trajectories are not on the right track to support data intensive computing workloads Need a rethink in how to build systems to support these workloads Need to embed compute resource with storage/memory 3D packaging with memory and compute units in the same stack Need to provide near line storage based on SCM Need to balance power and performance to match the workload specific needs 8 5/11/2011

A Note on Storage Class Memory (SCM*) The gap between the performance of disks (latency) and the rest of the system, which is already six orders of magnitude, continues to widen Although the areal density of disk platters continues to improve, albeit at a slower rate, the bit error rate per gigabyte is not keeping up and in fact is getting worse Similarly IOP rates are not proportional to the areal density increase The cost of the disks continues to fall but also at a slower rate Power reduction for spinning disks is approaching its limit and the power consumption of the storage subsystem, as a percentage of the total system, is increasing SCM technologies promise to address all these issues by creating compact, robust storage systems with greatly improved cost/performance ratios compared to the state of the art systems today * Freitas, R. F.; Wilcke, W. W.;, "Storage-class memory: The next storage system technology," IBM Journal of Research and Development, vol.52, no.4.5, pp.439-447, July 2008 doi: 10.1147/rd.524.0439 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5388608&isnumber=5388602 9 5/11/2011