SDS: A Framework for Scientific Data Services

SDS: A Framework for Scientific Data Services Bin Dong, Suren Byna*, John Wu Scientific Data Management Group Lawrence Berkeley National Laboratory

Finding Newspaper Articles of Interest Finding news articles of your interest in the old era of print newspaper Turn pages Read headlines Searching for specific news was time consuming Online newspapers Search box Limited to one newspaper Customized News Articles are organized based on reader s interest Search numerous online newspapers 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 2

Finding Newspaper Articles of Interest Finding news articles of your interest in the old era of print newspaper Turn pages Read headlines Searching for specific news was time consuming Online newspapers Search box Limited to one newspaper Customized News Articles are organized based on reader s interest à Better organization, faster access to news Search numerous online newspapers 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 5

Scientific Discovery needs Fast Data Analysis Modern scientific discoveries are driven by massive data Scientific simulations and experiments in many domains produce data in the range of terabytes and petabytes LSST image simulator LHC: Higgs Boson search Fast data analysis is an important requirement for discovery Data is typically stored as files on disks that are managed by file systems High data read performance from storage is critical 11/18/2013 Light source experiments at LCLS, ALS, SNS, etc. NCAR s CESM data Image Credits http://www.science.tamu.edu/articles/911 http://www.lsst.org/lsst/gallery/data http://nsidc.org/icelights/category/research-2/ Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 6

A Generic Data Analysis Use Case Simulation Site Exascale Simulation Machine + analysis Parallel Storage Perform some data analysis on exascale machine (e.g. in situ pattern identification) Experiment/observation Site Experiment/observation Processing Machine (Parallel) Storage Archive Archive Analysis Sites Need to reduce EBs and PBs of data, and move only TBs from simulation sites Analysis Analysis Analysis Machine Machines Shared Shared Shared storage storage storage 11/18/2013 7 Reduce and prepare data for further exploratory Analysis (e.g., Data mining)

Current Practice: Immutable Data in File Systems Data stored on file systems is immutable After data producers (simulations and experiments) store data, the file format/ organization is unchanged over time Data stored in files, which is treated by the system as a sequence of bytes User code (and libraries) has to impose meaning of the file layout and conduct analyses Burden of optimizing read performance falls on application developer However, optimizations are complex due to several layers of parallel I/O stack Applications High-level I/O libraries and formats High-level I/O libraries and formats I/O Optimization Layers Parallel File System Storage Hardware Parallel I/O Software Stack 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 8

Changing layout of data on file systems is helpful Separation of logical and physical data organization through reorganization Example Access all variables where 1.15 < Energy < 1.4 Full scan of data or random accesses to non-contiguous data locations results in poor performance Sorting the data and accessing contiguous chunks of data leads to good performance Energy X Y Z 1.1692 165.834-28.0448-2.76863 2.5364 166.249-27.9321-2.87165 1.4364 166.538-27.9735-3.16969 1.0862 166.663-27.9769-2.79716 1.2862 166.621-27.9046-2.78382 1.5862 167.001-27.9366-3.09605 1.3173 167.222-27.9551-3.22406 Several studies showing reorganization can help Listed some in the related work section 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 9

Changing layout of data on file systems is helpful Separation of logical and physical data organization through reorganization Example Access all variables where 1.15 < Energy < 1.4 Full scan of data or random accesses to non-contiguous data locations results in poor performance Sorting the data and accessing contiguous chunks of data leads to good performance Several studies showing reorganization can help Listed some in the related work section Energy X Y Z 1.1692 165.834-28.0448-2.76863 2.5364 166.249-27.9321-2.87165 1.4364 166.538-27.9735-3.16969 1.0862 166.663-27.9769-2.79716 1.2862 166.621-27.9046-2.78382 1.5862 167.001-27.9366-3.09605 1.3173 167.222-27.9551-3.22406 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 10

Changing layout of data on file systems is helpful Separation of logical and physical data organization through reorganization Example Access all variables where 1.15 < Energy < 1.4 Full scan of data or random accesses to non-contiguous data locations results in poor performance Sorting the data and accessing contiguous chunks of data leads to good performance Energy X Y Z 1.0862 166.663-27.9769-2.79716 1.1692 165.834-28.0448-2.76863 1.2862 166.621-27.9046-2.78382 1.3173 167.222-27.9551-3.22406 1.5862 167.001-27.9366-3.09605 1.4364 166.538-27.9735-3.16969 2.5364 166.249-27.9321-2.87165 Several studies showing reorganization can help Listed some in the related work section 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 11

Database Systems for Scientific Data Management Database management systems perform various optimizations without user involvement Many efforts from database community reach out to manage scientific data Objectivity/DB, Array QL, SciQL, SciDB, etc. SciDB A database system for scientific applications Key features: Array-oriented data model Append-only storage First-class support for user-defined functions Massively parallel computations 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 12

More DB Optimizations, but Database systems use various optimization tricks and perform them automatically Cache frequently accessed data Use indexes Store materialized views However, preparing and loading of scientific data to fit into database schemas is cumbersome Scientific data management requirements are different from DBMS Established data formats (HDF5, NetCDF, ROOT, FITS, ADIOS-BP, etc.) Arrays are first class citizens Highly concurrent applications 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 13

Our Solution: Scientific Data Services (SDS) Preserving scientific data formats and adding database optimizations as services A framework for bringing the merits of database technologies to scientific data Examples of services Indexing Sorting Reorganization to improve data locality Compression Remote and selective data movement 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 14

First step: Automatic Data Reorganization Finding an optimal data organization strategy Capture common read patterns Estimate cost with various data organization strategies Rank data organizations for each pattern and select the best for a pattern Performing data reorganization Similar to database management systems, perform data reorganization transparently Resolving permissions to the original data Preserve the original permissions when data is reorganized Using an optimally reorganized dataset transparently Based on a read pattern, recognize the best available organization of data Redirect to read the selected organization Perform any needed post processing, such as decompression, transposition 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 15

The Design of the SDS Framework Initial implementation of the SDS framework Client-server approach SDS Client library for each MPI process SDS Server to find the best dataset from available reorganized datasets Supporting HDF5 Read API Added SDS Query API for answering range queries MPI(Process(0( Applica4on( MPI(Process(N( Applica4on( HDF5(API( SDS(Query(API( HDF5(API( SDS(Query(API( SDS(Client( SDS(Client( Network( SDS(Server( Parallel&File&System&& Database( Batch(System( 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 16

SDS Client Implementation HDF5 API Requested& Data& Post& Processor& SDS&Query& Interface& File&Handle& and& Query&String&& ApplicaAon& Parser& SDS&&Client& HDF5&API& File&Handle&and&Read&Parameters& File&Handle& and& Final&Query&String& Server& Connector& SDS&&Server& The HDF5 Virtual Object Layer (VOL) feature allows capturing HDF5 calls Developed a VOL plugin for SDS for capturing file open, read, and close functions Data& Reader& Original&File&Metadata&or&& Reorganized&File&Metadata& Parallel&File&System& 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 17

SDS Client Implementation Requested& Data& Post& Processor& SDS&Query& Interface& File&Handle& and& Query&String&& ApplicaAon& Parser& SDS&&Client& HDF5&API& File&Handle&and&Read&Parameters& File&Handle& and& Final&Query&String& Server& Connector& SDS&&Server& SDS Query Interface An interface to perform SQL-style queries on arrays Function-based API that can be used from C/C++ applications Data& Reader& Original&File&Metadata&or&& Reorganized&File&Metadata& Parallel&File&System& 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 18

SDS Client Implementation ApplicaAon& Requested& Data& Post& Processor& SDS&Query& Interface& File&Handle& and& Query&String&& Parser& SDS&&Client& HDF5&API& File&Handle&and&Read&Parameters& File&Handle& and& Final&Query&String& Server& Connector& SDS&&Server& Parser Checks the conditions in a query Verifies the validity of files Data& Reader& Original&File&Metadata&or&& Reorganized&File&Metadata& Parallel&File&System& 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 19

SDS Client Implementation ApplicaAon& Requested& Data& Post& Processor& Data& SDS&Query& Interface& File&Handle& and& Query&String&& Reader& Parser& Parallel&File&System& SDS&&Client& HDF5&API& File&Handle&and&Read&Parameters& File&Handle& and& Final&Query&String& Original&File&Metadata&or&& Reorganized&File&Metadata& Server& Connector& SDS&&Server& Server Connector Packages a query or HDF5 read call information and sends to the SDS server Using protocol buffers for communication MPI Rank 0 communicates with the server and then informs the remaining MPI processes 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 20

SDS Client Implementation ApplicaAon& Requested& Data& Post& Processor& Data& SDS&Query& Interface& File&Handle& and& Query&String&& Reader& Parser& Parallel&File&System& SDS&&Client& HDF5&API& File&Handle&and&Read&Parameters& File&Handle& and& Final&Query&String& Original&File&Metadata&or&& Reorganized&File&Metadata& Server& Connector& SDS&&Server& Reader Reads data from the dataset location returned by the SDS Server Makes use of the native HDF5 read calls 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 21

SDS Client Implementation ApplicaAon& Requested& Data& Post& Processor& Data& SDS&Query& Interface& File&Handle& and& Query&String&& Reader& Parser& Parallel&File&System& SDS&&Client& HDF5&API& File&Handle&and&Read&Parameters& File&Handle& and& Final&Query&String& Original&File&Metadata&or&& Reorganized&File&Metadata& Server& Connector& SDS&&Server& Post Processor Performs any postprocessing needed before returning the data to the application memory Eg. Decompression, transposition, etc. 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 22

SDS Server Implementation Client#Request# Query#Request# Query#Response# SDS#Client# Request# Dispatcher# Query# Evaluator# User#Reorganiza6on#Request# Periodic#SelfJstart#Request# Reorganiza6on## Request# Organiza6on#List# Reorganiza6on# Evaluator# Organiza6on# List# Data#Organiza6on# Recommender# SDS#Admin#Interface# SDS#Server# File#Handle## and## Reorganiza6on#Type# Data# Reorganizer# Job#Script# Job#Results# Reorganiza6on##Job##Running# Read#Sta6s6cs# Manager# Frequently#Read#File# Handle#and#its#SDS# Metadata# ACribute#of##Reorganized#file# Original#File## Reorganized#File# File#Data# Parallel#File#System# 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 23

SDS Server Implementation Request Dispatcher Client#Request# Query#Request# Query#Response# SDS#Client# Request# Dispatcher# Query# Evaluator# User#Reorganiza6on#Request# Periodic#SelfJstart#Request# Read#Sta6s6cs# Reorganiza6on## Request# Organiza6on#List# Manager# Reorganiza6on# Evaluator# Organiza6on# List# Data#Organiza6on# Recommender# Frequently#Read#File# Handle#and#its#SDS# Metadata# SDS#Admin#Interface# File#Handle## and## Reorganiza6on#Type# Data# Reorganizer# ACribute#of##Reorganized#file# SDS#Server# Job#Script# Job#Results# Original#File## Reorganiza6on##Job##Running# Reorganized#File# Receives SDS client requests and SDS Admin interface SDS Admin interface issues reorganization commands Based on the request, dispatcher passes on the request to Query Evaluator and Reorganization Evaluator File#Data# Parallel#File#System# 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 24

SDS Server Implementation Query Evaluator Client#Request# Query#Request# Query#Response# SDS#Client# Request# Dispatcher# Query# Evaluator# User#Reorganiza6on#Request# Periodic#SelfJstart#Request# Read#Sta6s6cs# Reorganiza6on## Request# Organiza6on#List# Manager# Reorganiza6on# Evaluator# Organiza6on# List# Data#Organiza6on# Recommender# Frequently#Read#File# Handle#and#its#SDS# Metadata# SDS#Admin#Interface# File#Handle## and## Reorganiza6on#Type# Data# Reorganizer# ACribute#of##Reorganized#file# SDS#Server# Job#Script# Job#Results# Original#File## Reorganiza6on##Job##Running# Reorganized#File# Looks up SDS Metadata for finding available reorganized datasets and their locations for a given dataset SDS Metadata File name, HDF5 dataset info, permissions File#Data# Parallel#File#System# 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 25

SDS Server Implementation Client#Request# Query#Request# Query#Response# SDS#Client# Request# Dispatcher# Query# Evaluator# User#Reorganiza6on#Request# Periodic#SelfJstart#Request# Read#Sta6s6cs# Reorganiza6on## Request# Organiza6on#List# Manager# Reorganiza6on# Evaluator# Organiza6on# List# Data#Organiza6on# Recommender# Frequently#Read#File# Handle#and#its#SDS# Metadata# SDS#Admin#Interface# File#Handle## and## Reorganiza6on#Type# Data# Reorganizer# ACribute#of##Reorganized#file# SDS#Server# Job#Script# Job#Results# Original#File## Reorganiza6on##Job##Running# Reorganized#File# Reorganization Evaluator Decides whether to reorganize based on the frequency of read accesses Takes commands from the Admin interface Instructs Data Reorganizer to create a reorganization job script File#Data# Parallel#File#System# 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 26

SDS Server Implementation Client#Request# Query#Request# Query#Response# SDS#Client# Request# Dispatcher# Query# Evaluator# User#Reorganiza6on#Request# Periodic#SelfJstart#Request# Read#Sta6s6cs# Reorganiza6on## Request# Organiza6on#List# Manager# Reorganiza6on# Evaluator# Organiza6on# List# Data#Organiza6on# Recommender# Frequently#Read#File# Handle#and#its#SDS# Metadata# SDS#Admin#Interface# File#Handle## and## Reorganiza6on#Type# Data# Reorganizer# ACribute#of##Reorganized#file# SDS#Server# Job#Script# Job#Results# Original#File## Reorganiza6on##Job##Running# Reorganized#File# Data Organization Recommender Identifies optimal data reorganization Informs the Reorganization Evaluator with the selected strategy File#Data# Parallel#File#System# 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 27

SDS Server Implementation File#Data# Client#Request# Query#Request# Query#Response# SDS#Client# Request# Dispatcher# Query# Evaluator# User#Reorganiza6on#Request# Periodic#SelfJstart#Request# Read#Sta6s6cs# Reorganiza6on## Request# Organiza6on#List# Manager# Reorganiza6on# Evaluator# Organiza6on# List# Data#Organiza6on# Recommender# Frequently#Read#File# Handle#and#its#SDS# Metadata# File#Handle## and## Reorganiza6on#Type# Data# Reorganizer# ACribute#of##Reorganized#file# Parallel#File#System# SDS#Admin#Interface# SDS#Server# Job#Script# Job#Results# Original#File## Reorganiza6on##Job##Running# Reorganized#File# Data Reorganizer Locates reorganization code, such as sorting, indexing algorithms Decides on the number of cores to use Prepares a batch job script Monitors the job execution After reorganization, stores the new data location in the SDS Metadata Manager 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 28

Performance Evaluation: Setup NERSC Hopper supercomputer 6,384 compute nodes 2 twelve-core AMD 'MagnyCours' 2.1-GHz processors per node 32 GB DDR3 1333-MHz memory per node Employs the Gemini interconnect with a 3D torus topology Lustre parallel file system with 156 OSTs at a peak BW of 35 GB/s Software Cray s MPI library HDF5 Virtual Object Layer code base Data Particle data from a plasma physics simulation (VPIC), simulating magnetic reconnection phenomenon in space weather 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 29

Performance Evaluation: Overhead of SDS SDS Metadata Manager uses Berkeley DB for storing metadata of reorganized data Measured response time of multiple concurrent SDS Clients requesting metadata from one SDS Server Low overhead of < 0.5 seconds with 240 concurrent clients 1" 0.8" HDF5%Open+Close% SDS%Metadata%Read% Time"(sec)" 0.6" 0.4" 0.2" 0" 40" 80" 120" 160" 200" 240" 280" 320" Number"of"Concurent"Clients" 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 30

Performance Evaluation: Benefits of SDS Time%(sec)% Ran various queries on particle energy conditions Performance benefit over full scan varies based on selectivity of data 20X to 50X speedup when data selectivity less than 5% Full%Data%Scan% Read%Reorganized%Data% 140.00% 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00% 1.2X% 2.3X% 8.1X% 18.9X% 25.7X% 36.3X% 42.0X% 44.8X% 51.2X% E>1.10% E>1.15% E>1.20% E>1.25% E>1.30% E>1.35% E>1.40% E>1.45% E>1.50% Selectivity: 79% 32% 11% 4.6% 2.2% 1.2% 0.75% 0.5% 0.33% Query% 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 31

Conclusion and Future Work File systems on current supercomputers are analogous to print newspapers The SDS framework is vehicle for applying various optimizations Live customization of data organization based on data usage (in progress) To work with existing scientific data formats Status: Platform for performing various optimizations is ready Low overhead and high performance benefits Ongoing and future work: Dynamic recognition of read patterns Model-driven data organization optimizations In-memory query execution and query plan optimization FastBit indexing to support querying Support for more data formats 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 32

Thanks! Contact: Suren Byna [SByna@lbl.gov] Thanks to: 11/18/2013 Computational Research Division Lawrence Berkeley National Laboratory Department of Energy 33