FastForward I/O and Storage: ACG 8.6 Demonstration

Size: px

Start display at page:

Download "FastForward I/O and Storage: ACG 8.6 Demonstration"

Russell Horn
5 years ago
Views:

AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE LIVERMORE NATIONAL SECURITY, LLC WHO IS THE OPERATOR

THAT THE UNITED STATES GOVERNMENT RETAINS A NON-EXCLUSIVE, PAID-UP, IRREVOCABLE, WORLD- WIDE LICENSE TO

1 FastForward I/O and Storage: ACG 8.6 Demonstration Kyle Ambert, Jaewook Yu, Arnab Paul Intel Labs June, 2014 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE LIVERMORE NATIONAL SECURITY, LLC WHO IS THE OPERATOR AND MANAGER OF LAWRENCE LIVERMORE NATIONAL LABORATORY UNDER CONTRACT NO. DE-AC52-07NA27344 WITH THE U.S. DEPARTMENT OF ENERGY. THE UNITED STATES GOVERNMENT RETAINS AND THE PUBLISHER, BY ACCEPTING THE ARTICLE OF PUBLICATION, ACKNOWLEDGES THAT THE UNITED STATES GOVERNMENT RETAINS A NON-EXCLUSIVE, PAID-UP, IRREVOCABLE, WORLD- WIDE LICENSE TO PUBLISH OR REPRODUCE THE PUBLISHED FORM OF THIS MANUSCRIPT, OR ALLOW OTHERS TO DO SO, FOR UNITED STATES GOVERNMENT PURPOSES. THE VIEWS AND OPINIONS OF AUTHORS EXPRESSED HEREIN DO NOT NECESSARILY REFLECT THOSE OF THE UNITED STATES GOVERNMENT OR LAWRENCE LIVERMORE NATIONAL SECURITY, LLC. 1

2 Overview Demonstration objectives and Background Demo Environment Demo: LDA Evaluation: Methods & Results Conclusion and learnings

3 Objectives From SOW : Create a Full analytics pipeline using the EFF stack Use Graphlab for graph computation Use HDF5 Adaptation Layer (HAL) for graph computation Demonstrate functional capability Demonstrate load-balancing Demonstrate I/O efficiency

Background: the Analytics Pipeline LDA Raw Data Big Data HPC Bridge Results ACG Ingress on a Hadoop Cluster HPC Node Node Node Node Node Node Node Node We compared I/O times of our approach, to one

4 Background: the Analytics Pipeline LDA Raw Data Big Data HPC Bridge Results ACG Ingress on a Hadoop Cluster HPC Node Node Node Node Node Node Node Node We compared I/O times of our approach, to one just using Hadoop as the data store. Our HDFS/MPI bridge facilitates transfer of data from hdfs to hdf5 via MPI. HAL converts data from ingress, and loads graph partitions with network information to the computational kernel ACG Ingress Processing Computation Kernel HDF5 Adaptation Layer HDF5 Adaptation Layer HDF5 HDF5 Graph (Partitions) and Network Information Represented in HDF5

5 Background: The Data & Topic Modeling We used documents from the Medline data set, each of which is tagged with MeSH terms The MeSH ontology has a natural hierarchical layout Running a topic modeling algorithm of the entire set would be uninformative

6 Background: The Data & Topic Modeling Examining documents tagged with terms deep in the hierarchy, however, can be interesting.

7 Background: The Data & Topic Modeling Examining documents tagged with terms deep in the hierarchy, however, can be interesting.

8 Background: The Data & Topic Modeling Examining documents tagged with terms deep in the hierarchy, however, can be interesting.

9 Background: The Data & Topic Modeling Examining documents tagged with terms deep in the hierarchy, however, can be interesting.

10 Background: The Data & Topic Modeling Examining documents tagged with terms deep in the hierarchy, however, can be interesting. We looked at documents within the lower levels of a selected path in the Psychology branch of MeSH for our work here.

Topics are associated with each token in the corpus; the topics are not known in advance.

11 Background: The Data & Topic Modeling Documents Bipartite Graph Words A bipartite graph was constructed for topic modeling with LDA. Topics are associated with each token in the corpus; the topics are not known in advance. LDA assigns distributions over topics to each document and a distribution over tokens to each topic. Solving for topic assignments is done in Graphlab using a parallel collapsed Gibbs sampler.

12 ACG Cluster Specs. ACG 16 nodes (8 CNs, 8 IONs) HDFS & DAOS POSIX-based EFF stack IONs DAOS Lustre Each with 4 Intel 910-series 400GB SSDs (burst buffer) CNs Graphlab, Hadoop, HDFS, HAL, HDF5 Each equipped with six 4TB HDDs, for local storage & HDFS

13 Demonstration: Topic Modeling with LDA

14 Demonstration: Topic Modeling Edge lists with network information on the HDFS are loaded into the EFF via the bridge HDFS- or EFF-based data are loaded into Graphlab for analysis, balanced across CNs Read statistics highlight the benefit of the object store

15 Performance Results: Reads HDFS location Read comparison: hdfs v. EFF stack, for a randomly-selected 100-element subset. EFF read time is nearly constant, but hdfs time is quite variable, and always greater Writes are consistently worse on EFF

16 Performance Results: Load Balancing 3, CPU -load( %) 3, CPU -load( %) 3, , , , , M1 M2 M3 2, , , M1 M2 M3 2, , Time Time (zoomed out) Taken over a 15-min compute interval (x-axis). The loads slightly vary, but eventually converges But this is purely a function of the application and partitioning (NP-hard problem)

17 Conclusions & Learnings

18 Conclusions & Learnings Summary (of what we did) We created a set of graph-computation specific APIs to model graph-data in HDF5 world - HDF5-Adaptation Layer (HAL) Using the HAL, we developed a bridge between the Hadoop and HPC worlds for data ingest. We ingested both graph and network-information into the EFF stack using this bridge We created full-scale real-life graph analytics application(s) for testing I/O on the EFF stack

19 Conclusions & Learnings Summary (of what we learnt) The fluid nature of the Speculative execution (of Hadoop) does not mix well with more stable MPI world. (subject for future research?) To support REAL-LIFE raw data ingest, incremental I/O, especially writes, are very important. Dynamic extensibility of structures is key. Variable-length data structures are critical for supporting power-law like graphs (that are ubiquitous). Much work to be done in this space Object-store based file systems are good for graphs as well. (evidenced by the recent surge for specialized graph-databases) Transactional I/O semantics of the current EFF stack requires a mental model shift, but pretty useful in the end. Wishes for the next phase of research True out-of-core graph-computing; Transactions can really become the key to support of out-of-core Gathe-Apply-Scatter graph-parallel apps. Pushing computation close to data ( new ML. techniques such as Deep Learning can greatly benefit from this)

FastForward I/O and Storage: ACG 5.8 Demonstration

FastForward I/O and Storage: ACG 5.8 Demonstration Jaewook Yu, Arnab Paul, Kyle Ambert Intel Labs September, 2013 NOTICE: THIS MANUSCRIPT HAS BEEN AUTHORED BY INTEL UNDER ITS SUBCONTRACT WITH LAWRENCE