Implemen'ng Asynchronous Checkpoint/Restart for CnC. Nick Vrvilo, Vivek Sarkar Rice University Kath Knobe, Frank Schlimbach Intel September 24, 2013
|
|
- Homer Waters
- 6 years ago
- Views:
Transcription
1 Implemen'ng Asynchronous Checkpoint/Restart for CnC Nick Vrvilo, Vivek Sarkar Rice University Kath Knobe, Frank Schlimbach Intel September 24,
2 MoCvaCng CnC Checkpoint/Restart (C/R) what s good for simplifying parallelism is good for simplifying resilience * Pushing CnC for extreme- scale programming Resilience is a must for large- scale applicacons CnC has many interescng properces: ExecuCon graphs ExecuCon froncers Single- assignment data Hierarchy Side- effect- free computacon steps How best to approach C/R for CnC? * Inter- Agency Workshop on HPC Resilience at Extreme Scale
3 Current C/R SoluCon Obstacles Problems Requires global coordinacon Bursty communicacon pa]ern Saving non- essencal data Explicit checkpoint in code Periodic global checkpoints Symptoms SynchronizaCon overhead Poor network uclizacon Large checkpoint files Invasive to programmer User required to set frequency 3
4 OpportuniCes for OpCmizaCon * CnC Features ComputaCon Graphs Single- assignment data Monotonically growing state Discrete computacon steps Side- effect- free step code Poten'al Proper'es Transparent Asynchronous ConCnuous Resumable * Alex Nelson (HP) idencfied some of these traits for a TStreams C/R prototype in
5 Implementa'on for CnC on Habanero C 5
6 C/R for CnC on Habanero- C Why Habanero- C (HC)? Rice implementacon access to source code AnCcipaCon for distributed implementacon CnC- HC and control tags Control tags are conceptual in CnC- HC Steps prescribed directly 6
7 Transparent to the Programmer HC- CnC runcme manages: Item colleccon put/get operacons Step prescripcon Step teardown RunCme encapsulates all C/R operacons C/R transparency E.g. CNC_PUT operacon also handles checkpoincng the produced data item 7
8 ConCnuously CheckpoinCng Stream CnC state changes into checkpoint Mirrors CnC state of data items and steps Checkpoint grows with execucon froncer (XF) Mirroring the leading edge of XF Trailing edge of XF handled separately Place hooks in CnC- HC funccons that alter the execucon froncer CNC_PUT CNC_PRESCRIBE Step teardown (support code) 8
9 Asynchrony Don t stop/stall computacon to checkpoint Enqueue state changes for checkpoincng Dedicated thread handles checkpoint wricng (parallels dedicated worker in some Rice runcmes) Single- assignment data guarantees queued data won t change before checkpointed 9
10 Resuming from a Checkpoint Implement C/R at CnC step granularity Monotonic state state is always sane CnC Restart Skip inical environment operacons Re- prescribe live steps Re- put live data Restart process is order- agnoscc Graph dependencies manage execucon order Could restore in parallel 10
11 Checkpoint s Trailing Edge Checkpoint state s trailing edge must be computed independently of execucon state Computa'on: Start StepA Put ItemX Finish StepA Checkpoint: Start StepA Finish StepA Put ItemX 11
12 State of CnC- HC C/R ImplementaCon Only supports scalar integer data (need user- defined serializacon funccons otherwise) Only supports integer- tuple tags/keys Writes checkpoint data to disk Separate process to maintain checkpoint s XF Manual restart amer failure 12
13 ApplicaCon AssumpCons Seeding from environment is checkpointed Get counts for data items Extra C/R dependency E.g. an item only dies amer it s checkpointed Get counts one extra get Observing CnC contracts Produces / consumes / tag funccons 13
14 Example App: Pascal s Triangle 14
15 nc k with Pascal s Triangle N K
16 Graph Spec for n C k App CollecCons Tags: Steps: Data: triangle [int] <int row, int col> edgetag <int row, int col> innertag <int row, int col> edgestep <int row, int col> innerstep <int row, int col> 16
17 Graph Spec for n C k App PrescripCon RelaCons edgetag <r, c> edgestep <r, c> innertag <r, c> innerstep <r, c> 17
18 Graph Spec for n C k App PrescripCon RelaCons edgetag <r+1, c> edgestep <r, c> edgetag <r+1, c+1> innertag <r+1, c> innerstep <r, c> innertag <r, c> 18
19 Graph Spec for n C k App PrescripCon RelaCons N K edgestep <r, c> edgetag <r+1, c> edgetag <r+1, c+1> innertag <r+1, c> LeN Right innerstep <r, c> innertag <r, c> 19
20 Graph Spec for n C k App Input/Output RelaCons edgestep <r, c> triangle <r- 1,c- 1> triangle <r- 1,c> innerstep <r, c> triangle <r,c> innerstep <r, c> triangle <r,c> 20
21 C/R with Pascal s Triangle 21
22 nc k C/R Example Staging Checkpoint 22
23 nc k C/R Example edgestep<0,0>: put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging Checkpoint es<0,0> 23
24 nc k C/R Example edgestep<0,0>:» put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] } Checkpoint es<0,0> 24
25 nc k C/R Example edgestep<0,0>: put(tri,<0,0>,1)» pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] pres: es<1,0> } Checkpoint es<0,0> 25
26 nc k C/R Example edgestep<0,0>: put(tri,<0,0>,1) pres(edgestep,<1,0>)» pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] pres: es<1,0> es<1,1> } Checkpoint es<0,0> 26
27 nc k C/R Example edgestep<0,0>:» put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] count:1 pres: es<1,0> es<1,1> count:2 gets: } Checkpoint es<0,0> 27
28 nc k C/R Example edgestep<0,0>:» put(tri,<0,0>,1) pres(edgestep,<1,0>) pres(edgestep,<1,1>) Staging es<0,0> { puts: tri<0,0>[1] count:1 pres: es<1,0> es<1,1> count:2 gets: } Checkpoint es<0,0> 28
29 nc k C/R Example Staging Checkpoint tri<0,0>[1] es<1,0> es<1,1> 29
30 nc k C/R Example edgestep<1,0>: put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging Checkpoint tri<0,0>[1] es<1,0> es<1,1> 30
31 nc k C/R Example edgestep<1,0>:» put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 31
32 nc k C/R Example edgestep<1,0>: put(tri,<1,0>,1)» pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] pres: es<2,0> } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 32
33 nc k C/R Example edgestep<1,0>:» put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] count:1 pres: es<2,0> count:1 gets: } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 33
34 nc k C/R Example edgestep<1,0>:» put(tri,<1,0>,1) pres(edgestep,<2,0>) Staging es<1,0> { puts: tri<1,0>[1] count:1 pres: es<2,0> count:1 gets: } Checkpoint tri<0,0>[1] es<1,0> es<1,1> 34
35 nc k C/R Example Staging Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 35
36 nc k C/R Example edgestep<1,1>: put(tri,<1,1>,1) pres(innerstep,<2,1>) pres(edgestep,<2,2>) Staging Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 36
37 nc k C/R Example edgestep<1,1>:» put(tri,<1,1>,1) pres(innerstep,<2,1>) pres(edgestep,<2,2>) Staging es<1,1> { puts: tri<1.1>[1] } Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 37
38 nc k C/R Example edgestep<1,1>:» put(tri,<1,1>,1) pres(innerstep,<2,1>) pres(edgestep,<2,2>) CRASH! Staging es<1,1> { puts: tri<1.1>[1] } Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 38
39 nc k C/R Example Restart: 1. Throw out Staging 2. Re- add items/steps from checkpoint 3. ConCnue as normal Staging es<1,1> { puts: tri<1.1>[1] } Checkpoint tri<0,0>[1] es<1,1> tri<1,0>[1] es<2,0> 39
40 nc k C/R Example innerstep<2,1>:» put(tri,<2,1>,2) pres(innerstep,<3,1>) Staging is<2,1> { puts: tri<2,1>[1] count:1 pres: is<3,1> count:1 gets: tri<1,0>[1] tri<1,1>[1] } Checkpoint tri<1,0>[1;gc=1] tri<1,1>[1;gc=2] is<2,1> es<2,2> 40
41 nc k C/R Example Staging Checkpoint tri<1,1>[1;gc=1] es<2,2> tri<2,1>[1] is<3,1> 41
42 Conclusion 42
43 Summary Working prototype of C/R on CnC- HC Write- once data + monotonic state more asynchrony SCll have many more opcons to explore Transparent Con'nuous Asynchronous Resumable 43
44 Future Work More benchmarks and empirical data Granularity of checkpoint data Distributed CnC C/R Checkpoint/ConCnue IntegraCon with hierarchy in CnC MulC- level checkpoints Transparent Checkpoint/Restart CnC offline debugger 44
Asynchronous Checkpoint/Restart for the Concurrent Collections Model
RICE UNIVERSITY Asynchronous Checkpoint/Restart for the Concurrent Collections Model by Nick Vrvilo A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Master of Science Approved,
More informationResilient Distributed Concurrent Collections. Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche
Resilient Distributed Concurrent Collections Cédric Bassem Promotor: Prof. Dr. Wolfgang De Meuter Advisor: Dr. Yves Vandriessche 1 Evolution of Performance in High Performance Computing Exascale = 10 18
More informationThe Concurrent Collections (CnC) Parallel Programming Model. Kathleen Knobe Intel.
The Concurrent Collections (CnC) Parallel Programming Model Kathleen Knobe Intel kath.knobe@intel.com *Intel and the Intel logo are registered trademarks of Intel Corporation. Other brands and names are
More informationMap-Reduce-Merge: Simplified Relational Data Processing on Large Clusters. Hung- chih Yang, Ali Dasdan Yahoo! Ruey- Lung Hsiao, D.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung- chih Yang, Ali Dasdan Yahoo! Ruey- Lung Hsiao, D. Sto; Parker UCLA Outline 1. IntroducCon 2. Map- Reduce 3. Map- Reduce-
More informationCnC for high performance computing. Kath Knobe Intel SSG
CnC for high performance computing Kath Knobe Intel SSG Thanks Frank Schlimbach Intel Vivek Sarkar and his group Rice University DARPA UHPC (Runnemede) DOE X-Stack (Trilacka Glacier) 2 Outline Intro Checkpoint/restart
More informationLessons in Building a Distributed Query Planner. Ozgun Erdogan PGCon 2016
Lessons in Building a Distributed Query Planner Ozgun Erdogan PGCon 2016 Talk Outline 1. IntroducCon 2. Key insight in distributed planning 3. Distributed logical plans 4. Distributed physical plans 5.
More informationCnC for Tuning Hints on OCR. Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015
CnC for Tuning Hints on OCR Nick Vrvilo, Rice University The 7 th Annual CnC Workshop September 8, 2015 Acknowledgements This work was done as part of my internship with the OCR team, part of Intel Federal,
More informationRPC and Threads. Jinyang Li. These slides are based on lecture notes of 6.824
RPC and Threads Jinyang Li These slides are based on lecture notes of 6.824 Labs are based on Go language Why Golang? (as opposed to the popular alternacve: C++) good support for concurrency good support
More informationReducer Hyperobjects
Reducer Hyperobjects int compute(const X& v); int main() { const int n = 1000000; extern X myarray[n]; // Summing Example } int result = 0; for (int i = 0; i < n; ++i) { result += compute(myarray[i]);
More informationCnC-HC. a programming model for CPU-GPU hybrid parallelism. Alina Sbîrlea, Zoran Budimlic, Vivek Sarkar Rice University
CnC-HC a programming model for CPU-GPU hybrid parallelism Alina Sbîrlea, Zoran Budimlic, Vivek Sarkar Rice University Acknowledgements CnC-CUDA: Declarative Programming for GPUs, Max Grossman, Alina Simion-Sbirlea,
More informationCSinParallel: Using Map-Reduce to Teach Parallel Programming Concepts Across the CS Curriculum Part 1 SC13
CSinParallel: Using Map-Reduce to Teach Parallel Programming Concepts Across the CS Curriculum Part 1 SC13 Dick Brown, St. Olaf College Libby Shoop, Macalester College Joel Adams, Calvin College Workshop
More informationDynamic Accommodation of Performance, Power, and Reliability Tradeoffs
Dynamic Accommodation of Performance, Power, and Reliability Tradeoffs Special ACK to: Vivek Sarkar (co-pi) & team, Rice Kath Knobe, Intel U.S. Dept. of Defense Intel Labs Fifth Annual Concurrent Collections
More informationNoSQL Databases. Vincent Leroy
NoSQL Databases Vincent Leroy 1 Database Large-scale data processing First 2 classes: Hadoop, Spark Perform some computacon/transformacon over a full dataset Process all data SelecCve query Access a specific
More informationStructured Languages. Rahul Deodhar
Structured Languages Rahul Deodhar You already know Basics of computer Database FoxPro / Oracle DBMS / RDBMS OperaCng System DOS / Novel/Unix ApplicaCons (Spreadsheets / Word processor) Basics of programming
More informationWrangling Your IOT Data Into Splunk
Copyright 2016 Splunk Inc. Wrangling Your IOT Data Into Splunk Damien Dallimore IOT Dreamcatcher, Splunk Disclaimer During the course of this presentacon, we may make forward looking statements regarding
More informationSmart Phone. Computer. Core. Logic Gates
Guest Lecturer Sagar Karandikar inst.eecs.berkeley.edu/~cs61c!!! UCB CS61C : Machine Structures Lecture 18 RLP, MapReduce 03-05-2014 Review of Last Lecture Warehouse Scale CompuCng Example of parallel
More informationOp#miza#on and Tuning Collec#ves in MVAPICH2
Op#miza#on and Tuning Collec#ves in MVAPICH2 MVAPICH2 User Group (MUG) Mee#ng by Hari Subramoni The Ohio State University E- mail: subramon@cse.ohio- state.edu h
More informationMay 1, Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) A Sleep-based Communication Mechanism to
A Sleep-based Our Akram Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) May 1, 2011 Our 1 2 Our 3 4 5 6 Our Efficiency in Back-end Processing Efficiency in back-end
More informationAn introduction to checkpointing. for scientifc applications
damien.francois@uclouvain.be UCL/CISM An introduction to checkpointing for scientifc applications November 2016 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count 1 2 3^C
More informationParallelism and Concurrency (Part II) COS 326 David Walker Princeton University
Parallelism and Concurrency (Part II) COS 326 David Walker Princeton University let x = 1 + 2 in 3 + x Visualizing ComputaConal Costs Visualizing ComputaConal Costs x = 1 + 2 let x = 1 + 2 in 3 + x 3 +
More informationDynamic Task Parallelism with a GPU Work-Stealing Runtime System
Dynamic Task Parallelism with a GPU Work-Stealing Runtime System Sanjay Chatterjee, Max Grossman, Alina Sbîrlea, and Vivek Sarkar Department of Computer Science Rice University Background As parallel programming
More informationApplication Fault Tolerance Using Continuous Checkpoint/Restart
Application Fault Tolerance Using Continuous Checkpoint/Restart Tomoki Sekiyama Linux Technology Center Yokohama Research Laboratory Hitachi Ltd. Outline 1. Overview of Application Fault Tolerance and
More informationScalable and Robust DDoS Detection via Universal Monitoring
Scalable and Robust DDoS Detection via Universal Monitoring Vyas Sekar Joint work with: Alan Liu, Vladimir Braverman JHU Hun Namkung, Antonis Manousis, CMU DDoS a&acks are ge-ng worse Increasing in number
More informationInforma(on Retrieval
Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 12: Crawling and Link Analysis 2 1 Ch. 11-12 Last Time Chapter 11 1. ProbabilisCc Approach to Retrieval / Basic Probability Theory
More informationHow many ways to make 50 cents? first-denomination Solution. CS61A Lecture 5. count-change. cc base cases. How many have you figured out?
6/6/ CS6A Lecture -6-7 Colleen Lewis How many ways to make cents? first-denomination Solution (define (first-denomination kinds-of-coins) ((= kinds-of-coins ) ) ((= kinds-of-coins ) ) ((= kinds-of-coins
More informationAutosave for Research Where to Start with Checkpoint/Restart
Autosave for Research Where to Start with Checkpoint/Restart Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) brandon.barker@cornell.edu Workshop: High Performance
More informationCSE373: Data Structures & Algorithms Lecture 6: Hash Tables
Lecture 6: Hash Tables Hunter Zahn Summer 2016 Summer 2016 1 MoCvaCng Hash Tables For a dic$onary with n key, value pairs insert find delete Unsorted linked- list O(1) O(n) O(n) Unsorted array O(1) O(n)
More informationCS 101 Computer Science I. CS1101 Computer Science I. Today 1/27/16. Spring Robert Muller Boston College. What this course is about.
1/27/16 CS 101 CS1101 Spring 2016 Robert Muller Boston College Today What this course is about LogisCcs Course administracon 1 Super TA Staff (03 OCaml) Nick Denari Lab 03 Higgins 280 Tuesdays 4PM Meagan
More informationMain Points. File System Reliability (part 2) Last Time: File System Reliability. Reliability Approach #1: Careful Ordering 11/26/12
Main Points File System Reliability (part 2) Approaches to reliability Careful sequencing of file system operacons Copy- on- write (WAFL, ZFS) Journalling (NTFS, linux ext4) Log structure (flash storage)
More informationAdvanced Memory Management
Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions
More informationBrian W. Barre7 Scalable System So?ware Sandia NaConal Laboratories December 3, 2012 SAND Number: P
Open MPI Data Transfer Brian W. Barre7 Scalable System So?ware Sandia NaConal Laboratories bwbarre@sandia.gov December 3, 2012 SAND Number: 2012-10326P Sandia National Laboratories is a multi-program laboratory
More informationParallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik
Parallel Streaming Computation on Error-Prone Processors Yavuz Yetim, Margaret Martonosi, Sharad Malik Upsets/B muons/mb Average Number of Dopant Atoms Hardware Errors on the Rise Soft Errors Due to Cosmic
More informationTechniques to improve the scalability of Checkpoint-Restart
Techniques to improve the scalability of Checkpoint-Restart Bogdan Nicolae Exascale Systems Group IBM Research Ireland 1 Outline A few words about the lab and team Challenges of Exascale A case for Checkpoint-Restart
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationHJ- OpenCL: Reducing the Gap Between the JVM and Accelerators
HJ- OpenCL: Reducing the Gap Between the JVM and Accelerators Max Grossman, Shams Imam, Vivek Sarkar Habanero Extreme Scale So8ware Research Group Rice University JVM: A Portable AbstracCon JVM: placorm-
More informationOpen Ag Data Alliance
Open Ag Data Alliance! An open source project designed to bring farmer- focused interoperability, security, and privacy to agricultural data. Aaron Ault OADA Project Lead h:p://engineering.purdue.edu/oatsgroup/
More informationGIN in 9.4 and further
GIN in 9.4 and further Heikki Linnakangas, Alexander Korotkov, Oleg Bartunov May 23, 2014 Two major improvements 1. Compressed posting lists Makes GIN indexes smaller. Smaller is better. 2. When combining
More informationCS 5150 So(ware Engineering 12. System Architecture
Cornell University Compu1ng and Informa1on Science CS 5150 So(ware Engineering 12. System Architecture William Y. Arms Design The requirements describe the funccon of a system as seen by the client. For
More information10/25/09. In the Beginning... Steve Mann. Wearable CompuCng
Wearable CompuCng In the Beginning... Steve Mann 1970s, pre- laptop, early computer era. Building computers he could wear. Inventor of wearable compucng. 1 Steve Mann 1991: Started the Wearable CompuCng
More informationConcurrent Collections
Concurrent Collections Zoran Budimlić 1 Michael Burke 1 Vincent Cavé 1 Kathleen Knobe 2 Geoff Lowney 2 Ryan Newton 2 Jens Palsberg 3 David Peixotto 1 Vivek Sarkar 1 Frank Schlimbach 2 Sağnak Taşırlar 1
More informationA RESTful Java Framework for Asynchronous High-Speed Ingest
A RESTful Java Framework for Asynchronous High-Speed Ingest Pablo Silberkasten Jean De Lavarene Kuassi Mensah JDBC Product Development October 5, 2017 3 Safe Harbor Statement The following is intended
More informationCheckpointing with DMTCP and MVAPICH2 for Supercomputing. Kapil Arya. Mesosphere, Inc. & Northeastern University
MVAPICH Users Group 2016 Kapil Arya Checkpointing with DMTCP and MVAPICH2 for Supercomputing Kapil Arya Mesosphere, Inc. & Northeastern University DMTCP Developer Apache Mesos Committer kapil@mesosphere.io
More informationRemote Procedure Call
Remote Procedure Call Remote Procedure Call Integrate network communication with programming language Procedure call is well understood implementation use Control transfer Data transfer Goals Easy make
More informationEd D Azevedo Oak Ridge National Laboratory Piotr Luszczek University of Tennessee
A Framework for Check-Pointed Fault-Tolerant Out-of-Core Linear Algebra Ed D Azevedo (e6d@ornl.gov) Oak Ridge National Laboratory Piotr Luszczek (luszczek@cs.utk.edu) University of Tennessee Acknowledgement
More informationFloodless in SEATTLE A Scalable Ethernet Architecture for Large Enterprises By Changhoon Kim, Ma/hew Caesar, and Jennifer Rexford
Floodless in SEATTLE A Scalable Ethernet Architecture for Large Enterprises By Changhoon Kim, Ma/hew Caesar, and Jennifer Rexford Presented by: Charndeep Grewal Department of Electrical Engineering MoCvaCon
More informationPC to HPC. Xiaoge Wang ICER Jan 27, 2016
PC to HPC Xiaoge Wang ICER Jan 27, 2016 About This Series Format: talk + discussion Focus: fundamentals of parallel compucng (i) parcconing: data parccon and task parccon; (ii) communicacon: data sharing
More informationMap-Reduce. Marco Mura 2010 March, 31th
Map-Reduce Marco Mura (mura@di.unipi.it) 2010 March, 31th This paper is a note from the 2009-2010 course Strumenti di programmazione per sistemi paralleli e distribuiti and it s based by the lessons of
More informationApplication-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More informationProgress Report Toward a Thread-Parallel Geant4
Progress Report Toward a Thread-Parallel Geant4 Gene Cooperman and Xin Dong High Performance Computing Lab College of Computer and Information Science Northeastern University Boston, Massachusetts 02115
More informationAndrew Gabriel Cucumber Technology Ltd 17 th June 2015
Andrew Gabriel Cucumber Technology Ltd andrew@cucumber.me.uk 17 th June 2015 What is ZFS? New file system developed by Sun Microsystems, starcng development in 2001, open sourced 2005, released 2006. Built-
More informationNew Tools for STEM, Cyber, and Makers
New Tools for STEM, Cyber, and Makers www.lockelabs.net 18 April 2017 1 Overview MoCvaCon and RaConale Execute Java source code directly on hardware Concept inspired in part by modular features of Java
More informationPrinciples of Programming Languages
Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 15/ Prof. Andrea Corradini Department of Computer Science, Pisa Monads in Haskell The IO Monad Lesson 27! 1 Pros of FuncConal
More informationA Pluggable Framework for Composable HPC Scheduling Libraries
A Pluggable Framework for Composable HPC Scheduling Libraries Max Grossman 1, Vivek Kumar 2, Nick Vrvilo 1, Zoran Budimlic 1, Vivek Sarkar 1 1 Habanero Extreme Scale So=ware Research Group, Rice University
More informationQuick Start Guide for Flex2SQL
! Quick Start Guide for Flex2SQL Overview Thank you for trying Mertech s Flex2SQ product, a database conneccvity solucon that allows an exiscng applicacon currently working exclusively with DataFlex transacconal
More informationCSE 392/CS 378: High-performance Computing - Principles and Practice
CSE 392/CS 378: High-performance Computing - Principles and Practice Parallel Computer Architectures A Conceptual Introduction for Software Developers Jim Browne browne@cs.utexas.edu Parallel Computer
More informationDatabase Design & Deployment
ICS 321 Data Storage & Retrieval High Level Database Models Prof. Lipyeow Lim InformaCon & Computer Science Department University of Hawaii at Manoa Lipyeow Lim - - University of Hawaii at Manoa 1 Database
More informationINTERTWinE workshop. Decoupling data computation from processing to support high performance data analytics. Nick Brown, EPCC
INTERTWinE workshop Decoupling data computation from processing to support high performance data analytics Nick Brown, EPCC n.brown@epcc.ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation
More informationHigh Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.
High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Warm Standby...2 The Business Problem...2 Section II:
More informationCSEP 524: Parallel Computa3on (week 4) Brad Chamberlain Tuesdays 6:30 9:20 MGH 231
CSEP 524: Parallel Computa3on (week 4) Brad Chamberlain Tuesdays 6:30 9:20 MGH 231 Pthreads vs. Chapel Categorizing Pthreads and Chapel (Generated Dynamically in- class) C+Pthreads Chapel degree of voodoo
More informationDisplayPort Technology Update. Jim Choate VESA Compliance Program Manager June 11, 2015
DisplayPort Technology Update Jim Choate VESA Compliance Program Manager June 11, 2015 Agenda VESA Overview DisplayPort 1.3 DisplayPort TM over USB- C TM Overview Product implementacon solucons (Parade)
More informationComputer Systems and Networks
University of the Pacific LECTURE 7: PERFORMANCE MEASUREMENT 13 TH FEB, 2018 Computer Systems and Networks Dr. Pallipuram (vpallipuramkrishnamani@pacific.edu) Lab Schedule Today Lab 5 Performance Measurement
More informationWhy Generic Types? Structured Code Reuse. Why Generic Types? Structured Code Reuse. Type parameters (generics) 10/17/14. Type parameter. Awkward.
CS 230, Fall 2014 WELLESLEY CS CS 230, Fall 2014 WELLESLEY CS Why Generic Types? Structured Code Reuse class Pair { Object x, y; public Pair(Object x, Object y) { this.x = x; this.y = y; public Object
More informationInvestigating Resilient HPRC with Minimally-Invasive System Monitoring
Investigating Resilient HPRC with Minimally-Invasive System Monitoring Bin Huang Andrew G. Schmidt Ashwin A. Mendon Ron Sass Reconfigurable Computing Systems Lab UNC Charlotte Agenda Exascale systems are
More informationArtPro+ 16. What s New. Frank Woltering Product Manager PDF Editors July 2016
ArtPro+ 16 What s New Frank Woltering Product Manager PDF Editors July 2016 Tool Switcher u Tool Switcher u Breadcrumbs & Inspectors u Layers u Shapes u Edit Graphics u Text u Images u New document u Barcodes
More information=tg= Thomas H. Grohser, bwin
=tg= Thomas H. Grohser, bwin select * from =tg= @@Version Remark SQL 4.21 First SQL Server ever used (1994) SQL 6.0 First Log Shipping with failover SQL 6.5 First SQL Server Cluster (NT4.0 + Wolfpack)
More informationEnterprise Backup and Restore technology and solutions
Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013
More informationThe Concurrent Collections (CnC) Parallel Programming Model Tutorial. Kathleen Knobe Intel Vivek Sarkar Rice University
The Concurrent Collections (CnC) Parallel Programming Model Tutorial Kathleen Knobe Intel Vivek Sarkar Rice University kath.knobe@intel.com, vsarkar@rice.edu Tutorial CnC 09 July 23, 2009 *Intel and the
More informationpc++/streams: a Library for I/O on Complex Distributed Data-Structures
pc++/streams: a Library for I/O on Complex Distributed Data-Structures Jacob Gotwals Suresh Srinivas Dennis Gannon Department of Computer Science, Lindley Hall 215, Indiana University, Bloomington, IN
More informationCOMP 322: Fundamentals of Parallel Programming. Lecture 37: Distributed Computing, Apache Spark
COMP 322: Fundamentals of Parallel Programming Lecture 37: Distributed Computing, Apache Spark Vivek Sarkar, Shams Imam Department of Computer Science, Rice University vsarkar@rice.edu, shams@rice.edu
More informationCurrent Topics in OS Research. So, what s hot?
Current Topics in OS Research COMP7840 OSDI Current OS Research 0 So, what s hot? Operating systems have been around for a long time in many forms for different types of devices It is normally general
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs
More informationFault Tolerant Runtime ANL. Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013
Fault Tolerant Runtime Research @ ANL Wesley Bland Joint Lab for Petascale Compu9ng Workshop November 26, 2013 Brief History of FT Checkpoint/Restart (C/R) has been around for quite a while Guards against
More informationScalable In-memory Checkpoint with Automatic Restart on Failures
Scalable In-memory Checkpoint with Automatic Restart on Failures Xiang Ni, Esteban Meneses, Laxmikant V. Kalé Parallel Programming Laboratory University of Illinois at Urbana-Champaign November, 2012 8th
More informationAdvanced Security AnalyCcs
Copyright 2014 Splunk Inc. Advanced Security AnalyCcs Alex Loffler Security Architect, TELUS CommunicaCons Disclaimer During the course of this presentacon, we may make forward- looking statements regarding
More informationOpenDIEL Supported by The National Science Foundation. Tristin Baker, Jordan Scott, and Zachary Trzil Mentor: Dr. Kwai Wong
OpenDIEL Supported by The National Science Foundation Tristin Baker, Jordan Scott, and Zachary Trzil Mentor: Dr. Kwai Wong Introduction What is OpenDIEL? Lightweight workflow framework for HPC s to run
More informationProgramming with Python 4. Python for non- programmers Babar Ali
Programming with Python 4 Python for non- programmers Babar Ali 1 Topics Input from text files Output to text files and screen. Try, except blocks and error handling FuncCons & Libraries 2 INPUT 3 Files
More informationSlide 6-1. Processes. Operating Systems: A Modern Perspective, Chapter 6. Copyright 2004 Pearson Education, Inc.
Slide 6-1 6 es Announcements Slide 6-2 Extension til Friday 11 am for HW #1 Previous lectures online Program Assignment #1 online later today, due 2 weeks from today Homework Set #2 online later today,
More informationArtPro+ 16. What s New. Frank Woltering Product Manager PDF Editors September 2016
ArtPro+ 16 What s New Frank Woltering Product Manager PDF Editors September 2016 What s New in 16.0.0 Frank Woltering Product Manager PDF Editors September 2016 Tool Switcher u Tool Switcher u Breadcrumbs
More informationThe Concurrent Collections Programming Model
The Concurrent Collections Programming Model Michael G. Burke Rice University Houston, Texas Kathleen Knobe Intel Corporation Hudson, Massachusetts Ryan Newton Intel Corporation Hudson, Massachusetts Vivek
More informationIntroduction. Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: We study the bandwidth problem
Introduction Stream processor: high computation to bandwidth ratio To make legacy hardware more like stream processor: Increase computation power Make the best use of available bandwidth We study the bandwidth
More informationSCHEDULING MACRO-DATAFLOW PROGRAMS TASK-PARALLEL RUNTIME SYSTEMS SAĞNAK TAŞIRLAR
1 SCHEDULING MACRO-DATAFLOW PROGRAMS ON TASK-PARALLEL RUNTIME SYSTEMS SAĞNAK TAŞIRLAR Thesis 2 Our thesis is that advances in task parallel runtime systems can enable a macro-dataflow programming model,
More informationObject Oriented Transaction Processing in the KeyKOS Microkernel
Object Oriented Transaction Processing in the KeyKOS Microkernel William S. Frantz Charles R. Landau Periwinkle Computer Consulting Tandem Computers Inc. 16345 Englewood Ave. 19333 Vallco Pkwy, Loc 3-22
More informationHow To Force Restore A Computer That Won Boot Up After System
How To Force Restore A Computer That Won Boot Up After System If your computer won't start up normally, you may need to use a disk repair utility This can occur after an improper shutdown, forced restart,
More informationAdaptive Runtime Support
Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at
More informationAlexandre Alahi Vignesh Ramanathan. Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! Lecture 6-1!! 4-May-15!
Project 2 Q&A Alexandre Alahi Vignesh Ramanathan Fei-Fei Li, Alexandre Alahi, Vignesh Ramanathan! Lecture 6-1!! 4-May-15! TLD Review Error metrics Code Overview Outline Project 2 Report Project 2 PresentaCons
More informationCUG Talk. In-situ data analytics for highly scalable cloud modelling on Cray machines. Nick Brown, EPCC
CUG Talk In-situ analytics for highly scalable cloud modelling on Cray machines Nick Brown, EPCC nick.brown@ed.ac.uk Met Office NERC Cloud model (MONC) Uses Large Eddy Simulation for modelling clouds &
More informationWeak Levels of Consistency
Weak Levels of Consistency - Some applications are willing to live with weak levels of consistency, allowing schedules that are not serialisable E.g. a read-only transaction that wants to get an approximate
More informationTransparent Checkpoint and Restart Technology for CUDA applications. Taichiro Suzuki, Akira Nukada, Satoshi Matsuoka Tokyo Institute of Technology
Transparent Checkpoint and Restart Technology for CUDA applications Taichiro Suzuki, Akira Nukada, Satoshi Matsuoka Tokyo Institute of Technology Taichiro, SUZUKI 2010.4 ~ 2014.3 Bachelor course at Tokyo
More informationINTRODUCTION TO CODE ANALYSIS
PROGRAMMING LANGUAGES LABORATORY! Universidade Federal de Minas Gerais - Department of Computer Science INTRODUCTION TO CODE ANALYSIS AND OPTIMIZATION! PROGRAM ANALYSIS AND OPTIMIZATION DCC888! Fernando
More informationLinux-CR: Transparent Application Checkpoint-Restart in Linux
Linux-CR: Transparent Application Checkpoint-Restart in Linux Oren Laadan Columbia University orenl@cs.columbia.edu Linux Kernel Summit, November 2010 1 orenl@cs.columbia.edu Linux Kernel Summit, November
More informationREMOTE PERSISTENT MEMORY THINK TANK
14th ANNUAL WORKSHOP 2018 REMOTE PERSISTENT MEMORY THINK TANK Report Out Prepared by a cast of thousands April 13, 2018 THINK TANK ABSTRACT Challenge - Some people think that Remote Persistent Memory over
More informationDATA-DRIVEN TASKS THEIR IMPLEMENTATION AND SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY
1 DATA-DRIVEN TASKS AND THEIR IMPLEMENTATION SAĞNAK TAŞIRLAR, VIVEK SARKAR DEPARTMENT OF COMPUTER SCIENCE. RICE UNIVERSITY Fork/Join graphs constraint -ism 2 Fork/Join models restrict task graphs to be
More informationChapter One: Introduction A SHORT INTRODUCTION TO HARDWARE, SOFTWARE, AND ALGORITHM DEVELOPMENT
Chapter One: Introduction A SHORT INTRODUCTION TO HARDWARE, SOFTWARE, AND ALGORITHM DEVELOPMENT Chapter Goals In this chapter you will earn: About computer hardware, so8ware and programming How to write
More informationMultimedia Systems 2011/2012
Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware
More informationProactive Process-Level Live Migration in HPC Environments
Proactive Process-Level Live Migration in HPC Environments Chao Wang, Frank Mueller North Carolina State University Christian Engelmann, Stephen L. Scott Oak Ridge National Laboratory SC 08 Nov. 20 Austin,
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationBehind the scenes of Oracle MulCtenant
Behind the scenes of Oracle MulCtenant A new architecture for consolida2ng databases and simplifying opera2ons in the cloud Deba ChaFerjee Principal Product Manager, Oracle Database Safe Harbor Statement
More informationIn either case, remember to delete each array that you allocate.
CS 103 Path-so-logical 1 Introduction In this programming assignment you will write a program to read a given maze (provided as an ASCII text file) and find the shortest path from start to finish. 2 Techniques
More information2-D Arrays. Of course, to set each grid location to 0, we have to use a loop structure as follows (assume i and j are already defined):
2-D Arrays We define 2-D arrays similar to 1-D arrays, except that we must specify the size of the second dimension. The following is how we can declare a 5x5 int array: int grid[5][5]; Essentially, this
More information