Parallel, In Situ Indexing for Data-intensive Computing. Introduction

Similar documents
Stream Processing for Remote Collaborative Data Analysis

Bitmap Indices for Fast End-User Physics Analysis in ROOT

Adaptable IO System (ADIOS)

Collaborators from SDM Center, CPES, GPSC, GSEP, Sandia, ORNL

Analysis of Basic Data Reordering Techniques

Data Compression for Bitmap Indexes. Y. Chen

SDS: A Framework for Scientific Data Services

Using Bitmap Indexing Technology for Combined Numerical and Text Queries

Bitmap Index Partition Techniques for Continuous and High Cardinality Discrete Attributes

Introduction to High Performance Parallel I/O

DATA WAREHOUSING II. CS121: Relational Databases Fall 2017 Lecture 23

Histogram-Aware Sorting for Enhanced Word-Aligned Compress

Big Data in Scientific Domains

THE HYPERDYADIC INDEX AND GENERALIZED INDEXING AND QUERY WITH PIQUE

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Data Warehousing & Data Mining

In-Memory Data Management Jens Krueger

- SLED: single large expensive disk - RAID: redundant array of (independent, inexpensive) disks

Efficient, Scalable, and Provenance-Aware Management of Linked Data

EMPRESS Extensible Metadata PRovider for Extreme-scale Scientific Simulations

Spark and HPC for High Energy Physics Data Analyses

University of Minnesota, Twin Cities, Minnesota

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Evaluation of Parallel I/O Performance and Energy with Frequency Scaling on Cray XC30 Suren Byna and Brian Austin

ArrayUDF Explores Structural Locality for Faster Scientific Analyses

Summary. 4. Indexes. 4.0 Indexes. 4.1 Tree Based Indexes. 4.0 Indexes. 19-Nov-10. Last week: This week:

Sorting Improves Bitmap Indexes

Part 1: Indexes for Big Data

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14

High Speed Asynchronous Data Transfers on the Cray XT3

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory

PreDatA - Preparatory Data Analytics on Peta-Scale Machines

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Column Stores vs. Row Stores How Different Are They Really?

Scibox: Online Sharing of Scientific Data via the Cloud

WHITEPAPER. Disk Configuration Tips for Ingres by Chip nickolett, Ingres Corporation

Storage hierarchy. Textbook: chapters 11, 12, and 13

Dept. Of Computer Science, Colorado State University

Evolution of Database Systems

On-Disk Bitmap Index In Bizgres

High Performance Data Analytics for Numerical Simulations. Bruno Raffin DataMove

HANA Performance. Efficient Speed and Scale-out for Real-time BI

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Overview of Data Exploration Techniques. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri

Peer-to-Peer Systems. Chapter General Characteristics

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

MapReduce Design Patterns

Main-Memory Databases 1 / 25

Enhancing Bitmap Indices

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

Strategies for Processing ad hoc Queries on Large Data Warehouses

Binary Encoded Attribute-Pairing Technique for Database Compression

COLUMN-STORES VS. ROW-STORES: HOW DIFFERENT ARE THEY REALLY? DANIEL J. ABADI (YALE) SAMUEL R. MADDEN (MIT) NABIL HACHEM (AVANTGARDE)

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

All About Bitmap Indexes... And Sorting Them

DataMods. Programmable File System Services. Noah Watkins

Introduction to HPC Parallel I/O

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

SolidFire and Pure Storage Architectural Comparison

Progress on Efficient Integration of Lustre* and Hadoop/YARN

Minimizing I/O Costs of Multi-Dimensional Queries with Bitmap Indices

EXTREME SCALE DATA MANAGEMENT IN HIGH PERFORMANCE COMPUTING

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Scaling Without Sharding. Baron Schwartz Percona Inc Surge 2010

CSE6230 Fall Parallel I/O. Fang Zheng

Oracle 1Z0-515 Exam Questions & Answers

Albis: High-Performance File Format for Big Data Systems

COMP283-Lecture 3 Applied Database Management

CS 405G: Introduction to Database Systems. Storage

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Huge market -- essentially all high performance databases work this way

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 8 - Data Warehousing and Column Stores

Processing of Very Large Data

Fast, Interactive, Language-Integrated Cluster Computing

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Product Data Sheet: TimeBase

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

OS and Hardware Tuning

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

In situ data processing for extreme-scale computing

CSCI-GA Database Systems Lecture 8: Physical Schema: Storage

PROGRAMMING AND RUNTIME SUPPORT FOR ENABLING DATA-INTENSIVE COUPLED SCIENTIFIC SIMULATION WORKFLOWS

About Speaker. Shrwan Krishna Shrestha. /

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

Canopus: Enabling Extreme-Scale Data Analytics on Big HPC Storage via Progressive Refactoring

Retrospective Satellite Data in the Cloud: An Array DBMS Approach* Russian Supercomputing Days 2017, September, Moscow

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

V. Mass Storage Systems

Lustre overview and roadmap to Exascale computing

The Fusion Distributed File System

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Lecture 15. Lecture 15: Bitmap Indexes

Data Transformation and Migration in Polystores

The Stratosphere Platform for Big Data Analytics

Managing the Database

In-Memory Data Management

Safe Harbor Statement

Transcription:

FastQuery - LDAV /24/ Parallel, In Situ Indexing for Data-intensive Computing October 24, 2 Jinoh Kim, Hasan Abbasi, Luis Chacon, Ciprian Docan, Scott Klasky, Qing Liu, Norbert Podhorszki, Arie Shoshani, John Wu Introduction Many scientific applications produce large outputs For example, GTC generates 26 GB data per 2 sec But, a relatively small fraction of the data is interesting, e.g., blobs and clumps in fusion, magnetic nulls in magnetohydrodynamic models Challenge: Accessing data on disk is slow Disk is getting slower relative to computing power We explore performance impact on parallelism and in situ indexing for large data 2

FastQuery - LDAV /24/ ADIOS Adaptable IO Systems developed by ORNL Proven read/write performance Widely adopted as a middleware for data-intensive scientific computing Provides good architectural merits for in situ processing By decoupling compute nodes with staging nodes Staging nodes take full charges of writing data Examples Statistics computation when data is generated Min, max, average, standard deviation http://www.olcf.ornl.gov/center-projects/adios/ 3 Data Staging Why asynchronous I/O? Eliminates performance linkage between I/O subsystem and application Decouples file system performance variations and limitations from application run time Enables optimizations based on dynamic number of writers High bandwidth data extraction from application Scalable data movement with shared resources requires us to manage the transfers Scheduling properly can greatly reduce the impact of I/O 2

FastQuery - LDAV /24/ q The cost of data movement, both from the application to storage and from storage to analysis or visualization, is a deterrent to effective use of the data q The output costs increase the overall application running time and often forces the user to reduce the total volume of data being produced by outputting data less frequently q Input costs, especially to visualization, can make up to 8% of the total run time q Solution: perform analysis operations in situ or in place In Situ Processing FastQuery Challenges & Approaches () Mismatch between the array model used by scientific data and the relational model when applying database indexing technology Map array data to relational table structure on-the-fly (2) Arbitrary hierarchical data layout Deploy a flexible yet simple variable naming scheme based on regular expression (3) Diverse scientific data format Define a unified array I/O interface (4) High index building cost Parallel I/O strategy and system design to reduce the index building time 6 3

FastQuery - LDAV /24/ Mapping between FastBit & Array Data Each variable associated with a query is mapped to a column of a relational table on-the-fly Elements of a multidimensional array are linearized An arbitrary number of arrays or subarrays can be placed into a logical table as long as they have the same array dimensions Ex: getnumhits( x[:2,:2] > 3 && y[2:4,2:4]>3 ) NumHits= / Coordinates={,} RID X Y X 4 3 2 8 3x4 Array view Y 3 2 7 3 8 4x4 3 4 8 2 3 3 FastBit view 7 Flexible Naming Schema Naïve option: use the full path getnumhits( /test/space/test2/temperature > ) Can we do better? getnumhits( x > 3 ) / test exp time space time time time2 x y z test test2 x y z x y z x y a b c energy temperature 8 4

FastQuery - LDAV /24/ Flexible Naming Schema Separate variable name and path Implemented with a tuple (varname, varpath) Variable is identified by the rule */varpath/*/varname Example: ( temperature >, ) è /test/space/test2/temperature > ( x > 3, test) è /test/time/x > 3 ( x > 3, time) è /exp/time/x > 3 Advantage: Simplify query string Decouple user specification from file layout / test exp time space time time time2 x y z test test2 x y z x y z x y a b c energy temperature 9 FastQuery System Architecture FastQuery API Query Processor Index Builder Indexing & Query Technique (FastBit) Parser (Naming & Query schema) Variable table var data index var data index var... data Array I/O interface HDF5 Driver NetCDF Driver index... ADIOS Driver read data/load index write index Data formats HDF5 NetCDF ADIOS... 5

FastQuery - LDAV /24/ Data values 5 3 2 4 A < 2 Basic Bitmap Index b b b 2 b 3 b 4 b 5 = = =2 =3 =4 =5 2 < A First commercial version Model 24, P. O Neil, 987 Easy to build: faster than building B-trees Efficient for querying: only bitwise logical operations A < 2 à b OR b A > 2 à b 3 OR b 4 OR b 5 Efficient for multi-dimensional queries Use bitwise operations to combine the partial results Size: one bit per distinct value per row Definition: Cardinality == number of distinct values Compact for low cardinality attributes, say, cardinality < Worst case: cardinality = N, number of rows; index size: N*N bits FastBit Compression Example: 25 bits. Main Idea: Use run-length-encoding, but... partition bits into 3-bit groups [not 32 bit] on 32-bit machines 3 bits 3 bits (62 groups skipped) 3 bits Merge neighboring groups with identical bits 3 literal bits 3-bit count=63 3 literal bits 32 bits Encode each group using one 32-bit word Name: Word-Aligned Hybrid (WAH) code (US patent) Key features: WAH is compute-efficient Ø Uses the run-length encoding (simple) Ø Allows operations directly on compressed bitmaps Ø Never breaks any words into smaller pieces during operations Ø Worst case index size 4N words, not N*N (without compression) [Wu, Otoo, and Shoshani 26] 2 6

FastQuery - LDAV /24/ Multi-Dimensional Query Performance Ø Queries 5 out of 2 most popular variables from STAR (2.2 million records) Ø Average attribute cardinality (distinct values): 222, Ø FastBit uses WAH compression Ø DBMS uses BBC compression Ø FastBit >X faster than DBMS Ø FastBit indexes are 3% of raw data sizes 5-dimensional queries >X faster Wu, Otoo andshoshani 22 3 q Impact of indexing q Parallel index building q In situ index building Experimental Evaluation q Measurements collected on Franklin at NERSC ² ~ nodes ² 8 cores ² 8 GB memory ² Lustre file system q Test problem sizes ² Small: 3.6GB ² Medium: 27GB ² Large: 28GB ² Large2: 73GB 4 7

FastQuery - LDAV /24/ Why Indexing? Hits (%) 99% 2% % Speed-up with indexing: 3x 99x 5 But challenges remain Index construction time 3 min/3.6gb 23 min/27gb 3 hr/28gb > 2hr/.7TB q Solution: Building indexes in parallel! 6 8

FastQuery - LDAV /24/ Parallel Index Construction The data is on disk system D D 2 D 3 D n Index I I 2 I 3 I n Split and assign data blocks to multiple processors 7 Performance with Parallelism Hopper2 @ LBNL ~6 nodes 24 cores 32-64 GB memory Lustre file system Parallelism improves performance, but Why the benefit disappears after a certain parallelism factor? 8 9

FastQuery - LDAV /24/ Index Construction Time Breakdown 65536 32768 6384 ReadData ComputeIndex WriteIndex Time (sec) 892 496 248 24 52 256 28 64 2 4 8 6 32 64 28 256 52 Number of cores Write performance shows little improvement! Why? Collective writes à Sync overhead 9 Optimization: Delayed Writes Reduce number of synchronizations! Delaying writing index whenever possible Retain created indexes in memory, then write them together Index crea7on 7me (sec) Classic DelayedWrite 2 4 8 6 32 64 28 256 52 Number of cores 2

FastQuery - LDAV /24/ Cluster with Dedicated Staging Nodes Computer node cluster In situ Work - Statistics - Indexing - Visualization Write data Storage System I/O staging node cluster Write data 2 Experiments for In Situ Indexing Split data by timestep T, T N+,.. S I, I N+,.. Periodic Data Generation T 2, T N+2,.. S 2 I 2, I N+2,.. Storage System Compute Node DART Server T N, T 2N,.. I N, I 2N,.. Staging node builds index directly from the given data S N Staging Node 22

FastQuery - LDAV /24/ Reading Time Franklin @ LBNL ~ nodes 8 cores 8 GB memory Lustre file system Small: 3.6GB Medium: 27GB Large: 28GB Large2: 73GB Getting data from another processor (in situ) is faster than getting data from disk 23 Summary Indexing dramatically reduces query time But expensive with 2+ hours for TB data Parallelism offers performance improvement for building index But collective writes causes random delay Delayed write optimization can mitigate the delay In situ indexing improves performance by significantly reducing data read time 24 2

FastQuery - LDAV /24/ Lessons Learned Avoiding synchronization One delayed processor causes severe delay in writing It is fine to delay writing index blocks if the base data is safely stored already Choosing a moderate number of processors Performance benefits are not linear! Finding sweet spot may be interesting (maybe GLEAN could help) Tuning file system parameters For example, striping count has direct performance impact to some extent 25 John Wu John.Wu@nersc.gov FastBit http://sdm.lbl.gov/fastbit/ FastQuery http://portal.nersc.gov/svn/fq/ ADIOS http://www.olcf.ornl.gov/center-projects/adios/ QUESTIONS? 26 3