Data Centric Computing

Size: px

Start display at page:

Download "Data Centric Computing"

Edgar McKenzie
5 years ago
Views:

1 Research at Scalable Computing Software Laboratory Data Centric Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology

The Scalable Computing Software Lab www.cs.iit.

D. students, 4 MS students Facility Clusters, Parallel Machine, Grid

2 The Scalable Computing Software Lab Personnel 2 Faculty, 2 Post-docs 2 Pre-doc 2 Visitors 10 Ph.D. students, 4 MS students Facility Clusters, Parallel Machine, Grid Distributed Optical Testbed (Grid) I-WIRE OMNI NU-C NU-E UIC Star Tap ANL IIT Uof C NCSA/UIUC

access delay Scalable Supercomputer Cloud Computing

3 High Performance Computing at SCS Data intensive computing Faulty tolerant Energy saving Reduce data access delay Scalable Supercomputer Cloud Computing Currently Support NSF (5) DoE(SciDAC) Microsoft, Argonne, Fermi

Memory: ~9%per year Processor-memory speed gap keeps increasing Perform mance 100,000 10,000 1,000 100 10 1 25% Source:

4 Processor-memory performance gap Processor performance increases rapidly Uni-processor: ~52%until 2004, ~25% since then New trend: multi-core/manycore architecture Intel TeraFlops chip, 2007 Aggregate processor performance much higher Memory: ~9%per year Processor-memory speed gap keeps increasing Perform mance 100,000 10,000 1, % Source: Intel Multi-core/many-core processor Uni-rocessor 52% 20% Year Memory 60% 9% Source: OCZ

5 Data-Centric Computing Data access is the bottleneck needs attention Need to rethinking of system design to reflect the fact Our Solutions Data Prefetching Data Layout Data-centric Scheduling Data-centric Architecture Integrated optimization Understanding memory system Understanding design trade-off L2 DF L1 Memory Wall Dynamic Application-aware Data-Centric Optimization 10/9/2011 Scalable Computing Software Lab, Illinois Institute of Technology 5

Prefetching Prefetch data as close as possible to the processor in the memory hierarchy Key to prefetching What data should be prefetched? When should prefetching occur?

6 Prefetching Prefetch data as close as possible to the processor in the memory hierarchy Key to prefetching What data should be prefetched? When should prefetching occur? Limitation of current Prefetching Computing Conservative and limited to static prediction strategies Only works for simple access patterns with locality 10/9/ IIT 6

7 Hybrid Adaptive Prefetching Architecture Core Core L1 $ Core L1 $ Core L1 $ L1 $ Data Access Histories Hybrid Adaptive Prefetching Demand requests Sequential Prediction Stride Markov Memory Prefetch generator Pre-execution Programmer Pre-compiler Hints Disk Prefetch queue Access Scheduler 10/9/2011 Scalable Computing Software Lab, Illinois Institute of Technology 7

8 What Data : Data Access History Cache and Dynamic hardware prefetch tag data S T M K M T S Q DAHC L1 cache L1 data Prefetcher L2 cache SQ Counter MT Counter MK Counter ST Counter C o m p Prefetch Counter Timer 10/9/2011 Scalable Computing Software Lab, Illinois Institute of Technology 8

When : Timing in Multi-streaming Prefetching L2 cache

9 When : Timing in Multi-streaming Prefetching L2 cache miss stream is fed to prefetcher. The global stream is localized into local streams by PC. The local streams are chained according to their last accessed time. Adds time information to the local streams. T 1 T 2 t 1 t 2 t 1 t 2 T 1 T 2 t 3 t 4 t 5 t 6 T 3 t 7 t 8 9

10 Software Solution:Server Push I/O Architecture Dynamic I/O architecture Optimize I/O architecture for each application Use a dedicated data server for Finding data access signature Data prefetching Data layout Computing Optimization and coordination Carry the data access service via Enhanced parallel I/O file system Special designed parallel cache system Explore various strategies and adaptive support Combine merits of prediction, post analysis,and pre-execution 10/9/ IIT

11 Result : Two approaches for I/O prefatching 10/9/2011 Scalable Computing Software Lab, Illinois Institute of Technology 11

12 Result: Smart Data Layout File Servers File Servers One-to-many mapping Many-to-many mapping Round-robin may not be the best for parallel I/O Smart three dimension data layout

13 Optimization:Coordination & Load Balance Stripe size also affects I/O workload on multiple file servers Scheduling issues Balanced Imbalance Imbalance Balanced

14 Core-aware memory scheduling

15 Integrated Optimization: A system approach Illinois Institute of Technology & Argonne National Laboratory 15

16 Data-Centric Computing Data access is a complex matter Dynamic, Application-aware System and algorithm re-design Big Data, Big Deal, Big Opportunity Layers of parallel I/O Operation of Memory Hierarchy Application MPI forwarding PFS 16

The Sluice Gate Theory: Have we found a solution for memory wall?

The Sluice Gate Theory: Have we found a solution for memory wall? Xian-He Sun Illinois Institute of Technology Chicago, Illinois sun@iit.edu Keynote, HPC China, Nov. 2, 205 Scalable Computing Software