Implemen'ng Asynchronous Checkpoint/Restart for CnC. Nick Vrvilo, Vivek Sarkar Rice University Kath Knobe, Frank Schlimbach Intel September 24, 2013

Size: px
Start display at page:

Download "Implemen'ng Asynchronous Checkpoint/Restart for CnC. Nick Vrvilo, Vivek Sarkar Rice University Kath Knobe, Frank Schlimbach Intel September 24, 2013"

Transcription

1 Implemen'ng Asynchronous Checkpoint/Restart for CnC Nick Vrvilo, Vivek Sarkar Rice University Kath Knobe, Frank Schlimbach Intel September 24,

2 MoCvaCng CnC Checkpoint/Restart (C/R) what s good for simplifying parallelism is good for simplifying resilience * Pushing CnC for extreme- scale programming Resilience is a must for large- scale applicacons CnC has many interescng properces: ExecuCon graphs ExecuCon froncers Single- assignment data Hierarchy Side- effect- free computacon steps How best to approach C/R for CnC? * Inter- Agency Workshop on HPC Resilience at Extreme Scale

3 Current C/R SoluCon Obstacles Problems Requires global coordinacon Bursty communicacon pa]ern Saving non- essencal data Explicit checkpoint in code Periodic global checkpoints Symptoms SynchronizaCon overhead Poor network uclizacon Large checkpoint files Invasive to programmer User required to set frequency 3

4 OpportuniCes for OpCmizaCon * CnC Features ComputaCon Graphs Single- assignment data Monotonically growing state Discrete computacon steps Side- effect- free step code Poten'al Proper'es Transparent Asynchronous ConCnuous Resumable * Alex Nelson (HP) idencfied some of these traits for a TStreams C/R prototype in

5 Implementa'on for CnC on Habanero C 5

6 C/R for CnC on Habanero- C Why Habanero- C (HC)? Rice implementacon access to source code AnCcipaCon for distributed implementacon CnC- HC and control tags Control tags are conceptual in CnC- HC Steps prescribed directly 6

7 Transparent to the Programmer HC- CnC runcme manages: Item colleccon put/get operacons Step prescripcon Step teardown RunCme encapsulates all C/R operacons C/R transparency E.g. CNC_PUT operacon also handles checkpoincng the produced data item 7

8 ConCnuously CheckpoinCng Stream CnC state changes into checkpoint Mirrors CnC state of data items and steps Checkpoint grows with execucon froncer (XF) Mirroring the leading edge of XF Trailing edge of XF handled separately Place hooks in CnC- HC funccons that alter the execucon froncer CNC_PUT CNC_PRESCRIBE Step teardown (support code) 8

9 Asynchrony Don t stop/stall computacon to checkpoint Enqueue state changes for checkpoincng Dedicated thread handles checkpoint wricng (parallels dedicated worker in some Rice runcmes) Single- assignment data guarantees queued data won t change before checkpointed 9

10 Resuming from a Checkpoint Implement C/R at CnC step granularity Monotonic state state is always sane CnC Restart Skip inical environment operacons Re- prescribe live steps Re- put live data Restart process is order- agnoscc Graph dependencies manage execucon order Could restore in parallel 10

11 Checkpoint s Trailing Edge Checkpoint state s trailing edge must be computed independently of execucon state Computa'on: Start StepA Put ItemX Finish StepA Checkpoint: Start StepA Finish StepA Put ItemX 11

12 State of CnC- HC C/R ImplementaCon Only supports scalar integer data (need user- defined serializacon funccons otherwise) Only supports integer- tuple tags/keys Writes checkpoint data to disk Separate process to maintain checkpoint s XF Manual restart amer failure 12

13 ApplicaCon AssumpCons Seeding from environment is checkpointed Get counts for data items Extra C/R dependency E.g. an item only dies amer it s checkpointed Get counts one extra get Observing CnC contracts Produces / consumes / tag funccons 13

14 Example App: Pascal s Triangle 14

15 nc k with Pascal s Triangle N K

16 Graph Spec for n C k App CollecCons Tags: Steps: Data: triangle [int] <int row, int col> edgetag <int row, int col> innertag <int row, int col> edgestep <int row, int col> innerstep <int row, int col> 16

17 Graph Spec for n C k App PrescripCon RelaCons edgetag <r, c> edgestep <r, c> innertag <r, c> innerstep <r, c> 17

18 Graph Spec for n C k App PrescripCon RelaCons edgetag <r+1, c> edgestep <r, c> edgetag <r+1, c+1> innertag <r+1, c> innerstep <r, c> innertag <r, c> 18

19 Graph Spec for n C k App PrescripCon RelaCons N K edgestep <r, c> edgetag <r+1, c> edgetag <r+1, c+1> innertag <r+1, c> LeN Right innerstep <r, c> innertag <r, c> 19

20 Graph Spec for n C k App Input/Output RelaCons edgestep <r, c> triangle <r- 1,c- 1> triangle <r- 1,c> innerstep <r, c> triangle <r,c> innerstep <r, c> triangle <r,c> 20

21 C/R with Pascal s Triangle 21

22 nc k C/R Example Staging Checkpoint 22

23 nc k C/R Example edgestep<0,0>: put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging Checkpoint es<0,0> 23

24 nc k C/R Example edgestep<0,0>:» put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] } Checkpoint es<0,0> 24

25 nc k C/R Example edgestep<0,0>: put(tri,<0,0>,1)» pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] pres: es<1,0> } Checkpoint es<0,0> 25

26 nc k C/R Example edgestep<0,0>: put(tri,<0,0>,1) pres(edgestep,<1,0>)» pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] pres: es<1,0> es<1,1> } Checkpoint es<0,0> 26

27 nc k C/R Example edgestep<0,0>:» put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] count:1 pres: es<1,0> es<1,1> count:2 gets: } Checkpoint es<0,0> 27

28 nc k C/R Example edgestep<0,0>:» put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] count:1 pres: es<1,0> es<1,1> count:2 gets: } Checkpoint es<0,0> 28

29 nc k C/R Example Staging Checkpoint tri<0,0>[1] es<1,0> es<1,1> 29

30 nc k C/R Example edgestep<1,0>: put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging Checkpoint tri<0,0>[1] es<1,0> es<1,1> 30

31 nc k C/R Example edgestep<1,0>:» put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 31

32 nc k C/R Example edgestep<1,0>: put(tri,<1,0>,1)» pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] pres: es<2,0> } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 32

33 nc k C/R Example edgestep<1,0>:» put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] count:1 pres: es<2,0> count:1 gets: } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 33

34 nc k C/R Example edgestep<1,0>:» put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] count:1 pres: es<2,0> count:1 gets: } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 34

35 nc k C/R Example Staging Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 35

36 nc k C/R Example edgestep<1,1>: put(tri,<1,1>,1) pres(innerstep,<2,1>) pres(edgestep,<2,2>) Staging Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 36

37 nc k C/R Example edgestep<1,1>:» put(tri,<1,1>,1) pres(innerstep,<2,1>) pres(edgestep,<2,2>) Staging es<1,1> { puts: tri<1.1>[1] } Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 37

38 nc k C/R Example edgestep<1,1>:» put(tri,<1,1>,1) pres(innerstep,<2,1>) pres(edgestep,<2,2>) CRASH! Staging es<1,1> { puts: tri<1.1>[1] } Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 38

39 nc k C/R Example Restart: 1. Throw out Staging 2. Re- add items/steps from checkpoint 3. ConCnue as normal Staging es<1,1> { puts: tri<1.1>[1] } Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 39

40 nc k C/R Example innerstep<2,1>:» put(tri,<2,1>,2) pres(innerstep,<3,1>) Staging is<2,1> { puts: tri<2,1>[1] count:1 pres: is<3,1> count:1 gets: tri<1,0>[1] tri<1,1>[1] } Checkpoint tri<1,0>[1;gc=1] tri<1,1>[1;gc=2] is<2,1> es<2,2> 40

41 nc k C/R Example Staging Checkpoint tri<1,1>[1;gc=1] es<2,2> tri<2,1>[1] is<3,1> 41

42 Conclusion 42

43 Summary Working prototype of C/R on CnC- HC Write- once data + monotonic state more asynchrony SCll have many more opcons to explore Transparent Con'nuous Asynchronous Resumable 43

44 Future Work More benchmarks and empirical data Granularity of checkpoint data Distributed CnC C/R Checkpoint/ConCnue IntegraCon with hierarchy in CnC MulC- level checkpoints Transparent Checkpoint/Restart CnC offline debugger 44

Asynchronous Checkpoint/Restart for the Concurrent Collections Model

Asynchronous Checkpoint/Restart for the Concurrent Collections Model RICE UNIVERSITY Asynchronous Checkpoint/Restart for the Concurrent Collections Model by Nick Vrvilo A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Master of Science Approved,

More information

Resilient Distributed Concurrent Collections. Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche

Resilient Distributed Concurrent Collections. Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche Resilient Distributed Concurrent Collections Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche 1 Evolution of Performance in High Performance Computing Exascale = 10 18

More information

The Concurrent Collections (CnC) Parallel Programming Model. Kathleen Knobe Intel.

The Concurrent Collections (CnC) Parallel Programming Model. Kathleen Knobe Intel. The Concurrent Collections (CnC) Parallel Programming Model Kathleen Knobe Intel kath.knobe@intel.com *Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are

More information

Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. Hung- chih Yang, Ali Dasdan Yahoo! Ruey- Lung Hsiao, D.

Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. Hung- chih Yang, Ali Dasdan Yahoo! Ruey- Lung Hsiao, D. Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung- chih Yang, Ali Dasdan Yahoo! Ruey- Lung Hsiao, D. Sto; Parker UCLA Outline 1. IntroducCon 2. Map- Reduce 3. Map- Reduce-

More information

CnC for high performance computing. Kath Knobe Intel SSG

CnC for high performance computing. Kath Knobe Intel SSG CnC for high performance computing Kath Knobe Intel SSG Thanks Frank Schlimbach Intel Vivek Sarkar and his group Rice University DARPA UHPC (Runnemede) DOE X-Stack (Trilacka Glacier) 2 Outline Intro Checkpoint/restart

More information

Lessons in Building a Distributed Query Planner. Ozgun Erdogan PGCon 2016

Lessons in Building a Distributed Query Planner. Ozgun Erdogan PGCon 2016 Lessons in Building a Distributed Query Planner Ozgun Erdogan PGCon 2016 Talk Outline 1. IntroducCon 2. Key insight in distributed planning 3. Distributed logical plans 4. Distributed physical plans 5.

More information

CnC for Tuning Hints on OCR. Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015

CnC for Tuning Hints on OCR. Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015 CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015 Acknowledgements This work was done as part of my internship with the OCR team, part of Intel Federal,

More information

RPC and Threads. Jinyang Li. These slides are based on lecture notes of 6.824

RPC and Threads. Jinyang Li. These slides are based on lecture notes of 6.824 RPC and Threads Jinyang Li These slides are based on lecture notes of 6.824 Labs are based on Go language Why Golang? (as opposed to the popular alternacve: C++) good support for concurrency good support

More information

Reducer Hyperobjects

Reducer Hyperobjects Reducer Hyperobjects int compute(const X& v); int main() { const int n = 1000000; extern X myarray[n]; // Summing Example } int result = 0; for (int i = 0; i < n; ++i) { result += compute(myarray[i]);

More information

CnC-HC. a programming model for CPU-GPU hybrid parallelism. Alina Sbîrlea, Zoran Budimlic, Vivek Sarkar Rice University

CnC-HC. a programming model for CPU-GPU hybrid parallelism. Alina Sbîrlea, Zoran Budimlic, Vivek Sarkar Rice University CnC-HC a programming model for CPU-GPU hybrid parallelism Alina Sbîrlea, Zoran Budimlic, Vivek Sarkar Rice University Acknowledgements CnC-CUDA: Declarative Programming for GPUs, Max Grossman, Alina Simion-Sbirlea,

More information

CSinParallel: Using Map-Reduce to Teach Parallel Programming Concepts Across the CS Curriculum Part 1 SC13

CSinParallel: Using Map-Reduce to Teach Parallel Programming Concepts Across the CS Curriculum Part 1 SC13 CSinParallel: Using Map-Reduce to Teach Parallel Programming Concepts Across the CS Curriculum Part 1 SC13 Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel Adams, Calvin College Workshop

More information

Dynamic Accommodation of Performance, Power, and Reliability Tradeoffs

Dynamic Accommodation of Performance, Power, and Reliability Tradeoffs Dynamic Accommodation of Performance, Power, and Reliability Tradeoffs Special ACK to: Vivek Sarkar (co-pi) & team, Rice Kath Knobe, Intel U.S. Dept. of Defense Intel Labs Fifth Annual Concurrent Collections

More information

NoSQL Databases. Vincent Leroy

NoSQL Databases. Vincent Leroy NoSQL Databases Vincent Leroy 1 Database Large-scale data processing First 2 classes: Hadoop, Spark Perform some computacon/transformacon over a full dataset Process all data SelecCve query Access a specific

More information

Structured Languages. Rahul Deodhar

Structured Languages. Rahul Deodhar Structured Languages Rahul Deodhar You already know Basics of computer Database FoxPro / Oracle DBMS / RDBMS OperaCng System DOS / Novel/Unix ApplicaCons (Spreadsheets / Word processor) Basics of programming

More information

Wrangling Your IOT Data Into Splunk

Wrangling Your IOT Data Into Splunk Copyright 2016 Splunk Inc. Wrangling Your IOT Data Into Splunk Damien Dallimore IOT Dreamcatcher, Splunk Disclaimer During the course of this presentacon, we may make forward looking statements regarding

More information

Smart Phone. Computer. Core. Logic Gates

Smart Phone. Computer. Core. Logic Gates Guest Lecturer Sagar Karandikar inst.eecs.berkeley.edu/~cs61c!!! UCB CS61C : Machine Structures Lecture 18 RLP, MapReduce 03-05-2014 Review of Last Lecture Warehouse Scale CompuCng Example of parallel

More information

Op#miza#on and Tuning Collec#ves in MVAPICH2

Op#miza#on and Tuning Collec#ves in MVAPICH2 Op#miza#on and Tuning Collec#ves in MVAPICH2 MVAPICH2 User Group (MUG) Mee#ng by Hari Subramoni The Ohio State University E- mail: subramon@cse.ohio- state.edu h

More information

May 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to

May 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to A Sleep-based Our Akram Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) May 1, 2011 Our 1 2 Our 3 4 5 6 Our Efficiency in Back-end Processing Efficiency in back-end

More information

An introduction to checkpointing. for scientifc applications

An introduction to checkpointing. for scientifc applications damien.francois@uclouvain.be UCL/CISM An introduction to checkpointing for scientifc applications November 2016 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count 1 2 3^C

More information

Parallelism and Concurrency (Part II) COS 326 David Walker Princeton University

Parallelism and Concurrency (Part II) COS 326 David Walker Princeton University Parallelism and Concurrency (Part II) COS 326 David Walker Princeton University let x = 1 + 2 in 3 + x Visualizing ComputaConal Costs Visualizing ComputaConal Costs x = 1 + 2 let x = 1 + 2 in 3 + x 3 +

More information

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System Dynamic Task Parallelism with a GPU Work-Stealing Runtime System Sanjay Chatterjee, Max Grossman, Alina Sbîrlea, and Vivek Sarkar Department of Computer Science Rice University Background As parallel programming

More information

Application Fault Tolerance Using Continuous Checkpoint/Restart

Application Fault Tolerance Using Continuous Checkpoint/Restart Application Fault Tolerance Using Continuous Checkpoint/Restart Tomoki Sekiyama Linux Technology Center Yokohama Research Laboratory Hitachi Ltd. Outline 1. Overview of Application Fault Tolerance and

More information

Scalable and Robust DDoS Detection via Universal Monitoring

Scalable and Robust DDoS Detection via Universal Monitoring Scalable and Robust DDoS Detection via Universal Monitoring Vyas Sekar Joint work with: Alan Liu, Vladimir Braverman JHU Hun Namkung, Antonis Manousis, CMU DDoS a&acks are ge-ng worse Increasing in number

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 12: Crawling and Link Analysis 2 1 Ch. 11-12 Last Time Chapter 11 1. ProbabilisCc Approach to Retrieval / Basic Probability Theory

More information

How many ways to make 50 cents? first-denomination Solution. CS61A Lecture 5. count-change. cc base cases. How many have you figured out?

How many ways to make 50 cents? first-denomination Solution. CS61A Lecture 5. count-change. cc base cases. How many have you figured out? 6/6/ CS6A Lecture -6-7 Colleen Lewis How many ways to make cents? first-denomination Solution (define (first-denomination kinds-of-coins) ((= kinds-of-coins ) ) ((= kinds-of-coins ) ) ((= kinds-of-coins

More information

Autosave for Research Where to Start with Checkpoint/Restart

Autosave for Research Where to Start with Checkpoint/Restart Autosave for Research Where to Start with Checkpoint/Restart Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) brandon.barker@cornell.edu Workshop: High Performance

More information

CSE373: Data Structures & Algorithms Lecture 6: Hash Tables

CSE373: Data Structures & Algorithms Lecture 6: Hash Tables Lecture 6: Hash Tables Hunter Zahn Summer 2016 Summer 2016 1 MoCvaCng Hash Tables For a dic$onary with n key, value pairs insert find delete Unsorted linked- list O(1) O(n) O(n) Unsorted array O(1) O(n)

More information

CS 101 Computer Science I. CS1101 Computer Science I. Today 1/27/16. Spring Robert Muller Boston College. What this course is about.

CS 101 Computer Science I. CS1101 Computer Science I. Today 1/27/16. Spring Robert Muller Boston College. What this course is about. 1/27/16 CS 101 CS1101 Spring 2016 Robert Muller Boston College Today What this course is about LogisCcs Course administracon 1 Super TA Staff (03 OCaml) Nick Denari Lab 03 Higgins 280 Tuesdays 4PM Meagan

More information

Main Points. File System Reliability (part 2) Last Time: File System Reliability. Reliability Approach #1: Careful Ordering 11/26/12

Main Points. File System Reliability (part 2) Last Time: File System Reliability. Reliability Approach #1: Careful Ordering 11/26/12 Main Points File System Reliability (part 2) Approaches to reliability Careful sequencing of file system operacons Copy- on- write (WAFL, ZFS) Journalling (NTFS, linux ext4) Log structure (flash storage)

More information

Advanced Memory Management

Advanced Memory Management Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions

More information

Brian W. Barre7 Scalable System So?ware Sandia NaConal Laboratories December 3, 2012 SAND Number: P

Brian W. Barre7 Scalable System So?ware Sandia NaConal Laboratories December 3, 2012 SAND Number: P Open MPI Data Transfer Brian W. Barre7 Scalable System So?ware Sandia NaConal Laboratories bwbarre@sandia.gov December 3, 2012 SAND Number: 2012-10326P Sandia National Laboratories is a multi-program laboratory

More information

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik Parallel Streaming Computation on Error-Prone Processors Yavuz Yetim, Margaret Martonosi, Sharad Malik Upsets/B muons/mb Average Number of Dopant Atoms Hardware Errors on the Rise Soft Errors Due to Cosmic

More information

Techniques to improve the scalability of Checkpoint-Restart

Techniques to improve the scalability of Checkpoint-Restart Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer

More information

HJ- OpenCL: Reducing the Gap Between the JVM and Accelerators

HJ- OpenCL: Reducing the Gap Between the JVM and Accelerators HJ- OpenCL: Reducing the Gap Between the JVM and Accelerators Max Grossman, Shams Imam, Vivek Sarkar Habanero Extreme Scale So8ware Research Group Rice University JVM: A Portable AbstracCon JVM: placorm-

More information

Open Ag Data Alliance

Open Ag Data Alliance Open Ag Data Alliance! An open source project designed to bring farmer- focused interoperability, security, and privacy to agricultural data. Aaron Ault OADA Project Lead h:p://engineering.purdue.edu/oatsgroup/

More information

GIN in 9.4 and further

GIN in 9.4 and further GIN in 9.4 and further Heikki Linnakangas, Alexander Korotkov, Oleg Bartunov May 23, 2014 Two major improvements 1. Compressed posting lists Makes GIN indexes smaller. Smaller is better. 2. When combining

More information

CS 5150 So(ware Engineering 12. System Architecture

CS 5150 So(ware Engineering 12. System Architecture Cornell University Compu1ng and Informa1on Science CS 5150 So(ware Engineering 12. System Architecture William Y. Arms Design The requirements describe the funccon of a system as seen by the client. For

More information

10/25/09. In the Beginning... Steve Mann. Wearable CompuCng

10/25/09. In the Beginning... Steve Mann. Wearable CompuCng Wearable CompuCng In the Beginning... Steve Mann 1970s, pre- laptop, early computer era. Building computers he could wear. Inventor of wearable compucng. 1 Steve Mann 1991: Started the Wearable CompuCng

More information

Concurrent Collections

Concurrent Collections Concurrent Collections Zoran Budimlić 1 Michael Burke 1 Vincent Cavé 1 Kathleen Knobe 2 Geoff Lowney 2 Ryan Newton 2 Jens Palsberg 3 David Peixotto 1 Vivek Sarkar 1 Frank Schlimbach 2 Sağnak Taşırlar 1

More information

A RESTful Java Framework for Asynchronous High-Speed Ingest

A RESTful Java Framework for Asynchronous High-Speed Ingest A RESTful Java Framework for Asynchronous High-Speed Ingest Pablo Silberkasten Jean De Lavarene Kuassi Mensah JDBC Product Development October 5, 2017 3 Safe Harbor Statement The following is intended

More information

Checkpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University

Checkpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University MVAPICH Users Group 2016 Kapil Arya Checkpointing with DMTCP and MVAPICH2 for Supercomputing Kapil Arya Mesosphere, Inc. & Northeastern University DMTCP Developer Apache Mesos Committer kapil@mesosphere.io

More information

Remote Procedure Call

Remote Procedure Call Remote Procedure Call Remote Procedure Call Integrate network communication with programming language Procedure call is well understood implementation use Control transfer Data transfer Goals Easy make

More information

Ed D Azevedo Oak Ridge National Laboratory Piotr Luszczek University of Tennessee

Ed D Azevedo Oak Ridge National Laboratory Piotr Luszczek University of Tennessee A Framework for Check-Pointed Fault-Tolerant Out-of-Core Linear Algebra Ed D Azevedo (e6d@ornl.gov) Oak Ridge National Laboratory Piotr Luszczek (luszczek@cs.utk.edu) University of Tennessee Acknowledgement

More information

Floodless in SEATTLE A Scalable Ethernet Architecture for Large Enterprises By Changhoon Kim, Ma/hew Caesar, and Jennifer Rexford

Floodless in SEATTLE A Scalable Ethernet Architecture for Large Enterprises By Changhoon Kim, Ma/hew Caesar, and Jennifer Rexford Floodless in SEATTLE A Scalable Ethernet Architecture for Large Enterprises By Changhoon Kim, Ma/hew Caesar, and Jennifer Rexford Presented by: Charndeep Grewal Department of Electrical Engineering MoCvaCon

More information

PC to HPC. Xiaoge Wang ICER Jan 27, 2016

PC to HPC. Xiaoge Wang ICER Jan 27, 2016 PC to HPC Xiaoge Wang ICER Jan 27, 2016 About This Series Format: talk + discussion Focus: fundamentals of parallel compucng (i) parcconing: data parccon and task parccon; (ii) communicacon: data sharing

More information

Map-Reduce. Marco Mura 2010 March, 31th

Map-Reduce. Marco Mura 2010 March, 31th Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of

More information

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering

More information

Progress Report Toward a Thread-Parallel Geant4

Progress Report Toward a Thread-Parallel Geant4 Progress Report Toward a Thread-Parallel Geant4 Gene Cooperman and Xin Dong High Performance Computing Lab College of Computer and Information Science Northeastern University Boston, Massachusetts 02115

More information

Andrew Gabriel Cucumber Technology Ltd 17 th June 2015

Andrew Gabriel Cucumber Technology Ltd 17 th June 2015 Andrew Gabriel Cucumber Technology Ltd andrew@cucumber.me.uk 17 th June 2015 What is ZFS? New file system developed by Sun Microsystems, starcng development in 2001, open sourced 2005, released 2006. Built-

More information

New Tools for STEM, Cyber, and Makers

New Tools for STEM, Cyber, and Makers New Tools for STEM, Cyber, and Makers www.lockelabs.net 18 April 2017 1 Overview MoCvaCon and RaConale Execute Java source code directly on hardware Concept inspired in part by modular features of Java

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 15/ Prof. Andrea Corradini Department of Computer Science, Pisa Monads in Haskell The IO Monad Lesson 27! 1 Pros of FuncConal

More information

A Pluggable Framework for Composable HPC Scheduling Libraries

A Pluggable Framework for Composable HPC Scheduling Libraries A Pluggable Framework for Composable HPC Scheduling Libraries Max Grossman 1, Vivek Kumar 2, Nick Vrvilo 1, Zoran Budimlic 1, Vivek Sarkar 1 1 Habanero Extreme Scale So=ware Research Group, Rice University

More information

Quick Start Guide for Flex2SQL

Quick Start Guide for Flex2SQL ! Quick Start Guide for Flex2SQL Overview Thank you for trying Mertech s Flex2SQ product, a database conneccvity solucon that allows an exiscng applicacon currently working exclusively with DataFlex transacconal

More information

CSE 392/CS 378: High-performance Computing - Principles and Practice

CSE 392/CS 378: High-performance Computing - Principles and Practice CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer

More information

Database Design & Deployment

Database Design & Deployment ICS 321 Data Storage & Retrieval High Level Database Models Prof. Lipyeow Lim InformaCon & Computer Science Department University of Hawaii at Manoa Lipyeow Lim - - University of Hawaii at Manoa 1 Database

More information

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC

INTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC INTERTWinE workshop Decoupling data computation from processing to support high performance data analytics Nick Brown, EPCC n.brown@epcc.ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation

More information

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Warm Standby...2 The Business Problem...2 Section II:

More information

CSEP 524: Parallel Computa3on (week 4) Brad Chamberlain Tuesdays 6:30 9:20 MGH 231

CSEP 524: Parallel Computa3on (week 4) Brad Chamberlain Tuesdays 6:30 9:20 MGH 231 CSEP 524: Parallel Computa3on (week 4) Brad Chamberlain Tuesdays 6:30 9:20 MGH 231 Pthreads vs. Chapel Categorizing Pthreads and Chapel (Generated Dynamically in- class) C+Pthreads Chapel degree of voodoo

More information

DisplayPort Technology Update. Jim Choate VESA Compliance Program Manager June 11, 2015

DisplayPort Technology Update. Jim Choate VESA Compliance Program Manager June 11, 2015 DisplayPort Technology Update Jim Choate VESA Compliance Program Manager June 11, 2015 Agenda VESA Overview DisplayPort 1.3 DisplayPort TM over USB- C TM Overview Product implementacon solucons (Parade)

More information

Computer Systems and Networks

Computer Systems and Networks University of the Pacific LECTURE 7: PERFORMANCE MEASUREMENT 13 TH FEB, 2018 Computer Systems and Networks Dr. Pallipuram (vpallipuramkrishnamani@pacific.edu) Lab Schedule Today Lab 5 Performance Measurement

More information

Why Generic Types? Structured Code Reuse. Why Generic Types? Structured Code Reuse. Type parameters (generics) 10/17/14. Type parameter. Awkward.

Why Generic Types? Structured Code Reuse. Why Generic Types? Structured Code Reuse. Type parameters (generics) 10/17/14. Type parameter. Awkward. CS 230, Fall 2014 WELLESLEY CS CS 230, Fall 2014 WELLESLEY CS Why Generic Types? Structured Code Reuse class Pair { Object x, y; public Pair(Object x, Object y) { this.x = x; this.y = y; public Object

More information

Investigating Resilient HPRC with Minimally-Invasive System Monitoring

Investigating Resilient HPRC with Minimally-Invasive System Monitoring Investigating Resilient HPRC with Minimally-Invasive System Monitoring Bin Huang Andrew G. Schmidt Ashwin A. Mendon Ron Sass Reconfigurable Computing Systems Lab UNC Charlotte Agenda Exascale systems are

More information

ArtPro+ 16. What s New. Frank Woltering Product Manager PDF Editors July 2016

ArtPro+ 16. What s New. Frank Woltering Product Manager PDF Editors July 2016 ArtPro+ 16 What s New Frank Woltering Product Manager PDF Editors July 2016 Tool Switcher u Tool Switcher u Breadcrumbs & Inspectors u Layers u Shapes u Edit Graphics u Text u Images u New document u Barcodes

More information

=tg= Thomas H. Grohser, bwin

=tg= Thomas H. Grohser, bwin =tg= Thomas H. Grohser, bwin select * from =tg= @@Version Remark SQL 4.21 First SQL Server ever used (1994) SQL 6.0 First Log Shipping with failover SQL 6.5 First SQL Server Cluster (NT4.0 + Wolfpack)

More information

Enterprise Backup and Restore technology and solutions

Enterprise Backup and Restore technology and solutions Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013

More information

The Concurrent Collections (CnC) Parallel Programming Model Tutorial. Kathleen Knobe Intel Vivek Sarkar Rice University

The Concurrent Collections (CnC) Parallel Programming Model Tutorial. Kathleen Knobe Intel Vivek Sarkar Rice University The Concurrent Collections (CnC) Parallel Programming Model Tutorial Kathleen Knobe Intel Vivek Sarkar Rice University kath.knobe@intel.com, vsarkar@rice.edu Tutorial CnC 09 July 23, 2009 *Intel and the

More information

pc++/streams: a Library for I/O on Complex Distributed Data-Structures

pc++/streams: a Library for I/O on Complex Distributed Data-Structures pc++/streams: a Library for I/O on Complex Distributed Data-Structures Jacob Gotwals Suresh Srinivas Dennis Gannon Department of Computer Science, Lindley Hall 215, Indiana University, Bloomington, IN

More information

COMP 322: Fundamentals of Parallel Programming. Lecture 37: Distributed Computing, Apache Spark

COMP 322: Fundamentals of Parallel Programming. Lecture 37: Distributed Computing, Apache Spark COMP 322: Fundamentals of Parallel Programming Lecture 37: Distributed Computing, Apache Spark Vivek Sarkar, Shams Imam Department of Computer Science, Rice University vsarkar@rice.edu, shams@rice.edu

More information

Current Topics in OS Research. So, what s hot?

Current Topics in OS Research. So, what s hot? Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

Fault Tolerant Runtime ANL. Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013

Fault Tolerant Runtime ANL. Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013 Fault Tolerant Runtime Research @ ANL Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013 Brief History of FT Checkpoint/Restart (C/R) has been around for quite a while Guards against

More information

Scalable In-memory Checkpoint with Automatic Restart on Failures

Scalable In-memory Checkpoint with Automatic Restart on Failures Scalable In-memory Checkpoint with Automatic Restart on Failures Xiang Ni, Esteban Meneses, Laxmikant V. Kalé Parallel Programming Laboratory University of Illinois at Urbana-Champaign November, 2012 8th

More information

Advanced Security AnalyCcs

Advanced Security AnalyCcs Copyright 2014 Splunk Inc. Advanced Security AnalyCcs Alex Loffler Security Architect, TELUS CommunicaCons Disclaimer During the course of this presentacon, we may make forward- looking statements regarding

More information

OpenDIEL Supported by The National Science Foundation. Tristin Baker, Jordan Scott, and Zachary Trzil Mentor: Dr. Kwai Wong

OpenDIEL Supported by The National Science Foundation. Tristin Baker, Jordan Scott, and Zachary Trzil Mentor: Dr. Kwai Wong OpenDIEL Supported by The National Science Foundation Tristin Baker, Jordan Scott, and Zachary Trzil Mentor: Dr. Kwai Wong Introduction What is OpenDIEL? Lightweight workflow framework for HPC s to run

More information

Programming with Python 4. Python for non- programmers Babar Ali

Programming with Python 4. Python for non- programmers Babar Ali Programming with Python 4 Python for non- programmers Babar Ali 1 Topics Input from text files Output to text files and screen. Try, except blocks and error handling FuncCons & Libraries 2 INPUT 3 Files

More information

Slide 6-1. Processes. Operating Systems: A Modern Perspective, Chapter 6. Copyright 2004 Pearson Education, Inc.

Slide 6-1. Processes. Operating Systems: A Modern Perspective, Chapter 6. Copyright 2004 Pearson Education, Inc. Slide 6-1 6 es Announcements Slide 6-2 Extension til Friday 11 am for HW #1 Previous lectures online Program Assignment #1 online later today, due 2 weeks from today Homework Set #2 online later today,

More information

ArtPro+ 16. What s New. Frank Woltering Product Manager PDF Editors September 2016

ArtPro+ 16. What s New. Frank Woltering Product Manager PDF Editors September 2016 ArtPro+ 16 What s New Frank Woltering Product Manager PDF Editors September 2016 What s New in 16.0.0 Frank Woltering Product Manager PDF Editors September 2016 Tool Switcher u Tool Switcher u Breadcrumbs

More information

The Concurrent Collections Programming Model

The Concurrent Collections Programming Model The Concurrent Collections Programming Model Michael G. Burke Rice University Houston, Texas Kathleen Knobe Intel Corporation Hudson, Massachusetts Ryan Newton Intel Corporation Hudson, Massachusetts Vivek

More information

Introduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem

Introduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem Introduction Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: Increase computation power Make the best use of available bandwidth We study the bandwidth

More information

SCHEDULING MACRO-DATAFLOW PROGRAMS TASK-PARALLEL RUNTIME SYSTEMS SAĞNAK TAŞIRLAR

SCHEDULING MACRO-DATAFLOW PROGRAMS TASK-PARALLEL RUNTIME SYSTEMS SAĞNAK TAŞIRLAR 1 SCHEDULING MACRO-DATAFLOW PROGRAMS ON TASK-PARALLEL RUNTIME SYSTEMS SAĞNAK TAŞIRLAR Thesis 2 Our thesis is that advances in task parallel runtime systems can enable a macro-dataflow programming model,

More information

Object Oriented Transaction Processing in the KeyKOS Microkernel

Object Oriented Transaction Processing in the KeyKOS Microkernel Object Oriented Transaction Processing in the KeyKOS Microkernel William S. Frantz Charles R. Landau Periwinkle Computer Consulting Tandem Computers Inc. 16345 Englewood Ave. 19333 Vallco Pkwy, Loc 3-22

More information

How To Force Restore A Computer That Won Boot Up After System

How To Force Restore A Computer That Won Boot Up After System How To Force Restore A Computer That Won Boot Up After System If your computer won't start up normally, you may need to use a disk repair utility This can occur after an improper shutdown, forced restart,

More information

Adaptive Runtime Support

Adaptive Runtime Support Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at

More information

Alexandre Alahi Vignesh Ramanathan. Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! Lecture 6-1!! 4-May-15!

Alexandre Alahi Vignesh Ramanathan. Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! Lecture 6-1!! 4-May-15! Project 2 Q&A Alexandre Alahi Vignesh Ramanathan Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! Lecture 6-1!! 4-May-15! TLD Review Error metrics Code Overview Outline Project 2 Report Project 2 PresentaCons

More information

CUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC

CUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC CUG Talk In-situ analytics for highly scalable cloud modelling on Cray machines Nick Brown, EPCC nick.brown@ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation for modelling clouds &

More information

Weak Levels of Consistency

Weak Levels of Consistency Weak Levels of Consistency - Some applications are willing to live with weak levels of consistency, allowing schedules that are not serialisable E.g. a read-only transaction that wants to get an approximate

More information

Transparent Checkpoint and Restart Technology for CUDA applications. Taichiro Suzuki, Akira Nukada, Satoshi Matsuoka Tokyo Institute of Technology

Transparent Checkpoint and Restart Technology for CUDA applications. Taichiro Suzuki, Akira Nukada, Satoshi Matsuoka Tokyo Institute of Technology Transparent Checkpoint and Restart Technology for CUDA applications Taichiro Suzuki, Akira Nukada, Satoshi Matsuoka Tokyo Institute of Technology Taichiro, SUZUKI 2010.4 ~ 2014.3 Bachelor course at Tokyo

More information

INTRODUCTION TO CODE ANALYSIS

INTRODUCTION TO CODE ANALYSIS PROGRAMMING LANGUAGES LABORATORY! Universidade Federal de Minas Gerais - Department of Computer Science INTRODUCTION TO CODE ANALYSIS AND OPTIMIZATION! PROGRAM ANALYSIS AND OPTIMIZATION DCC888! Fernando

More information

Linux-CR: Transparent Application Checkpoint-Restart in Linux

Linux-CR: Transparent Application Checkpoint-Restart in Linux Linux-CR: Transparent Application Checkpoint-Restart in Linux Oren Laadan Columbia University orenl@cs.columbia.edu Linux Kernel Summit, November 2010 1 orenl@cs.columbia.edu Linux Kernel Summit, November

More information

REMOTE PERSISTENT MEMORY THINK TANK

REMOTE PERSISTENT MEMORY THINK TANK 14th ANNUAL WORKSHOP 2018 REMOTE PERSISTENT MEMORY THINK TANK Report Out Prepared by a cast of thousands April 13, 2018 THINK TANK ABSTRACT Challenge - Some people think that Remote Persistent Memory over

More information

DATA-DRIVEN TASKS THEIR IMPLEMENTATION AND SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY

DATA-DRIVEN TASKS THEIR IMPLEMENTATION AND SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY 1 DATA-DRIVEN TASKS AND THEIR IMPLEMENTATION SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY Fork/Join graphs constraint -ism 2 Fork/Join models restrict task graphs to be

More information

Chapter One: Introduction A SHORT INTRODUCTION TO HARDWARE, SOFTWARE, AND ALGORITHM DEVELOPMENT

Chapter One: Introduction A SHORT INTRODUCTION TO HARDWARE, SOFTWARE, AND ALGORITHM DEVELOPMENT Chapter One: Introduction A SHORT INTRODUCTION TO HARDWARE, SOFTWARE, AND ALGORITHM DEVELOPMENT Chapter Goals In this chapter you will earn: About computer hardware, so8ware and programming How to write

More information

Multimedia Systems 2011/2012

Multimedia Systems 2011/2012 Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware

More information

Proactive Process-Level Live Migration in HPC Environments

Proactive Process-Level Live Migration in HPC Environments Proactive Process-Level Live Migration in HPC Environments Chao Wang, Frank Mueller North Carolina State University Christian Engelmann, Stephen L. Scott Oak Ridge National Laboratory SC 08 Nov. 20 Austin,

More information

Adaptive Cluster Computing using JavaSpaces

Adaptive Cluster Computing using JavaSpaces Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of

More information

Behind the scenes of Oracle MulCtenant

Behind the scenes of Oracle MulCtenant Behind the scenes of Oracle MulCtenant A new architecture for consolida2ng databases and simplifying opera2ons in the cloud Deba ChaFerjee Principal Product Manager, Oracle Database Safe Harbor Statement

More information

In either case, remember to delete each array that you allocate.

In either case, remember to delete each array that you allocate. CS 103 Path-so-logical 1 Introduction In this programming assignment you will write a program to read a given maze (provided as an ASCII text file) and find the shortest path from start to finish. 2 Techniques

More information

2-D Arrays. Of course, to set each grid location to 0, we have to use a loop structure as follows (assume i and j are already defined):

2-D Arrays. Of course, to set each grid location to 0, we have to use a loop structure as follows (assume i and j are already defined): 2-D Arrays We define 2-D arrays similar to 1-D arrays, except that we must specify the size of the second dimension. The following is how we can declare a 5x5 int array: int grid[5][5]; Essentially, this

More information