GNU libstdc++ parallel mode: Algorithms

Size: px
Start display at page:

Download "GNU libstdc++ parallel mode: Algorithms"

Transcription

1 Introduction Library Overview Algorithms Conclusion 1/40 The GNU libstdc++ parallel mode: Algorithms Institute for Theoretical Computer Science University of Karlsruhe

2 Introduction Library Overview Algorithms Conclusion 2/40 Talk Outline Introduction Library Overview Algorithms Conclusion

3 Introduction Library Overview Algorithms Conclusion 3/40 Motivation How to Benefit from Multi-Core Systems? automatic parallelization not sufficient manual/explicit parallelization needed, but expensive, beyond qualification of most programmers Our Approach provide a parallelized library of basic algorithms for shared-memory systems provide implementations of all worthwhile STL algorithms included with GCC, under the name libstdc++ parallel mode formerly known as the Multi-Core Standard Template Library make the usage of (data-)parallel algorithms very easy actual parallelism not visible to the user, but encapsulated use established base bring multi-core performance to the end-user in every program

4 Introduction Library Overview Algorithms Conclusion 4/40 Basic Approach Make the usage of (data-)parallel algorithms as easy as winking. Starting Point provide the functionality of the C++ Standard Template Library run the algorithms in parallel Why STL? many useful algorithms and data structures included simple interface, very well-known among developers recompilation of existing programs may suffice C++ accepted and efficient language, standardized

5 Goals Introduction Library Overview Algorithms Conclusion 5/40 Ease of Use easy to use no new language, no language extension no (binary) library to be installed on target system just few compiler options Good Performance some speedup already for small inputs scale down full speedup for larger inputs scale up co-exist with other forms of parallelization respect machine load dynamic load-balancing

6 Introduction Library Overview Algorithms Conclusion 6/40 Competitors STAPL abstracts from memory model/communication must incorporate distributed-memory issues no code publicly available interface only similar to STL Intel Threading Building Blocks mostly on a more abstract level, parallel programming framework only combinatorial generic algorithm is parallel sorter

7 Introduction Library Overview Algorithms Conclusion 7/40 Technical Foundations based on OpenMP (fork-join parallelism) switching on/off parallelism both at compile-time and run-time Applications STL Interface Serial STL Algorithms OpenMP Extensions Parallel STL Algorithms Atomic Ops MCSTL OS Thread Scheduling Multi-Core Hardware

8 Introduction Library Overview Algorithms Conclusion 8/40 Atomic Operations a few operations are executed without any chance of interference atomically fetch and add(x, i) t := x; r := x; r := r + i; x := r; return t; allows concurrent iteration over sequence

9 Introduction Library Overview Algorithms Conclusion 8/40 Atomic Operations a few operations are executed without any chance of interference atomically fetch and add(x, i) t := x; r := x; r := r + i; x := r; return t; allows concurrent iteration over sequence compare and swap(x, c, r) if(x = c) { x := r; return c [true]; } else { return r [false]; } secure state transition, can emulate fetch and add and others by using in a loop slower than usual operation, in particular when concurrent

10 Introduction Library Overview Algorithms Conclusion 9/40 Overview of Important Algorithms Strictly STL (mostly <algorithm>) for each and friends (embarrassingly parallel) find partial sum (prefix sum) partition, partial sort merge sort random shuffle bulk construction and bulk insert for set and map 1 Extension to STL multiway merge multiseq partition (helper) 1 Parallelization of Bulk Operations for STL Dictionaries in HPPC [1]

11 Introduction Library Overview Algorithms Conclusion 10/40 MCSTL Development Status Algorithm Class Function Call(s) Status w/lb w/olb Embarrassingly for each, generate( n), fill( n), impl yes yes Parallel count( if), transform, replace( if), min element, max element, adjacent difference, unique copy Find find( if), find first of, impl yes not adjacent find, mismatch, equal, worthwhile lexicographical compare Search search( n) impl yes not ww. Numerical accumulate, partial sum, impl planned yes Algorithms inner product Partition partition, stable partition impl yes not ww. Merge merge, multiway merge, inplace merge impl planned yes Partial Sort nth element, partial sort impl yes planned Sort sort, stable sort impl yes yes Random Permutation random shuffle impl yes not worthw. Dictionaries (multi )map/set bulk op yes yes Complex set union, set intersection, impl no yes Set Operations set (symmetric )difference,... Vector Arithmetic valarray operations worked yes yes on Heap Construction make heap, sort heap planned Priority Queues amortized update operations planned

12 Introduction Library Overview Algorithms Conclusion 11/40 for each Problem Definition execute a certain function on a range of elements many similar functions like transform, generate parallelization is easy only for uniform execution time per element, exclusive machine

13 Introduction Library Overview Algorithms Conclusion 12/40 for each Implementation Multiple implementations, depending on the purpose Static Load-Balancing divide work into parts of almost equal size used for accumulate, since ends of chunks can easily be spliced (not commutative) Dynamic Load-Balancing initially divide work into parts of almost equal size allow unemployed threads to take work from others work-stealing

14 Introduction Library Overview Algorithms Conclusion 13/40 for each: Work-Stealing additional synchronization is done only by threads out of work steal from random victim steal half of the left jobs atomic operation fetch and add is used to reserve job(s), efficiently supported by today s hardware chunk size C allows compromise between the two worst cases uniformly distributed, little work and skewedly distributed, much work maximal slowdown 10 for C = 1 and no work, neutral for C = 10 and no work full speedup for hard work no matter what distribution logarithmic number of steals suffices with high probability

15 Introduction Library Overview Algorithms Conclusion 14/40 for each: Performance Results Speedup Mandelbrot on 4-way Opteron, at most 1000 iterations per pixel 4 bal. 3 bal. 4 unbal. 2 bal. 3 unbal. 2 unbal. seq Number of pixels

16 find Introduction Library Overview Algorithms Conclusion 15/40 Problem Definition find the first position in a sequence that matches/satisfies a predicate find if also immediately covers find, adjacent find, mismatch, equal, lexicographical compare Considerations sequential algorithm needs O(m) time if first hit is at position m naïve parallel algorithm needs Ω(n/p) = Ω(m) time, if m = n/p 1 (worst case). parallelization overhead makes situation even worse for small m.

17 Introduction Library Overview Algorithms Conclusion 16/40 find: Algorithm Solution start sequentially through position m 0 only then start assigning blocks to p threads dynamically load-balancing using fetch-and-add primitive first thread that is successful signals success to all others by grabbing the remaining part Tradeoff small blocks lower termination latency, but increase overhead solution: exponentially grow block size from starting m to m p 0 p 1 p 2 p sequential m 0 parallel

18 Introduction Library Overview Algorithms Conclusion 17/40 find: Performance Results Speedup th, gb 4 th, gb 8 th, gb 16 th, gb 32 th, gb 2 th, fsb 4 th, fsb 8 th, fsb 16 th, fsb 32 th, fsb 2 th, naive 4 th, naive 8 th, naive 16 th, naive 32 th, naive sequential Length of Sequence 10 7

19 Introduction Library Overview Algorithms Conclusion 18/40 merge, multiway merge Problem Definitions merge: combine two sorted sequences into one sorted sequence multiway merge: combine k > 2 sorted sequences into one sorted sequence important for (external memory) sorting How to divide the problem? find slabs, i. e. consistent sets of ranges from the sequences two possibilities: (randomized) splitting by sampling exact partitioning into slabs of equal size (using multi-sequence selection)

20 Introduction Library Overview Algorithms Conclusion 19/40 merge, multiway merge: Sequence Partitioning Exact Splitting vs. Sampling performance guarantee, no bad inputs like long sequences of equal elements complicated algorithm for multi-sequence selection [4] first generic implementation provided, explicitly handles degenerated inputs Algorithm divide sequences into p slabs of almost equal size determine the target positions in parallel, merge the slabs to the target positions total running time O(m/p log k + k log k log max j S j ) where m = j S j, the accumulated length of all sequences

21 Introduction Library Overview Algorithms Conclusion 20/40 merge, multiway merge: Diagram k { t 3 t 2 t 1 t 0

22 Introduction Library Overview Algorithms Conclusion 21/40 sort, stable sort Parallel Multiway Mergesort + less communication necessary + stable variant easy to derive needs twice the space Parallel Load-Balanced Quicksort + in-place ± dynamic load-balancing to compensate for unequal splitting concurrent access to memory not stable Both variants implemented in the MCSTL, user s choice.

23 Introduction Library Overview Algorithms Conclusion 22/40 Parallel Multiway Mergesort Procedure 1. divide sequence into p parts of equal size 2. in parallel, sort the parts locally 3. use parallel p-way merging to compute the final sequence 4. copy result back to original position t 0 t 1 t 2 t 3

24 Introduction Library Overview Algorithms Conclusion 23/40 partition < > pivot Sequential Algorithm scan from both ends swap to desired order when contrary

25 Introduction Library Overview Algorithms Conclusion 24/40 Parallel Partitioning [Tsigas Zhang 2003] 1. scan blocks of size B from both ends 1.1 claim new blocks when running out of data 2. swap the unfinished blocks to the middle 3. recurse on the middle input p 0 p 1 p 2 swap in parallel rest recursive or sequential time complexity O(n/p + B log p)

26 Introduction Library Overview Algorithms Conclusion 25/40 partition: Example 3 processors, B=3, pivot 50, no special cases p 0 p 1 p

27 Introduction Library Overview Algorithms Conclusion 26/40 Partitioning of 32-bit integers on Sun T1 Speedup sequential 1 thread 2 threads 3 threads 4 threads 8 threads 16 threads 32 threads n

28 Introduction Library Overview Algorithms Conclusion 27/40 Parallel Balanced Quicksort Procedure [3] 1. split sequence using parallel partition, descend recursively with the appropriate number of threads 2. as soon as there is only one processor left per partition: start local sorting 3. each processor sorts locally, pushes parts to process later into lock-free dequeue 4. other processors can steal parts when out of work input partition in parallel partition in parallel p 0 p 1 sequential sorting p 0 p 1 steal p 0 p 1 p 2 p 2

29 Introduction Library Overview Algorithms Conclusion 28/40 Sorting Performance Results Sorting Pairs of 64-bit Integers on the Sun T1 Speedup sequential 2 th, mwms 4 th, mwms 8 th, mwms 16 th, mwms 32 th, mwms 2 th, bqs 4 th, bqs 8 th, bqs 16 th, bqs 32 th, bqs Number of elements

30 Introduction Library Overview Algorithms Conclusion 29/40 Sorting Performance Results Multiway Mergesort for 32-bit integers on 2 Quad-Core-Xeons Speedup sequential 1 thread 2 threads 3 threads 4 threads 5 threads 6 threads 7 threads 8 threads Input Size

31 Introduction Library Overview Algorithms Conclusion 30/40 Random Permutation (random shuffle) Standard Sequential Algorithm (e. g. STL) for 0 i < n swap (a[i], a[rand(i + 1, n 1)]) Cache-efficient (parallel) algorithm 1. distribute randomly to (local) buckets 1b. (copy local buckets to global buckets) 2. permute buckets

32 Introduction Library Overview Algorithms Conclusion 31/40 Random Permutation (random shuffle) time complexity O( n p + p), global communication volume n cache efficiency very important (factor 2) Cache-aware random shuffling of integers on 4-way Opteron Speedup sequential 1 thread 2 threads 3 threads 4 threads n 10 8

33 Introduction Library Overview Algorithms Conclusion 32/40 Dictionary Bulk Operations Algorithmic Problem construct/insert into red-black tree complicated splitting and balancing of work bulk algorithm already brings sequential speedup not yet in parallel mode, but only in MCSTL Memory Management memory allocation takes a considerable share of the time C++ does not allow asymmetric allocation/deallocation, i. e. allocate several nodes at once, later deallocate one by one

34 Introduction Library Overview Algorithms Conclusion 33/40 Dictionary Bulk Operations Performance Insertion, n=0.1k, 2-way Quadcore Xeon Speedup th 7 th 6 th 5 th 4 th 3 th 2 th 1 th seq Number of inserted elements (k)

35 Introduction Library Overview Algorithms Conclusion 34/40 Effect of Core Mapping (Tree Construction) Speedup threads, different sockets 2 threads, same socket, different dice 2 threads, same die 1 thread sequential Number of inserted elements (k)

36 Introduction Library Overview Algorithms Conclusion 35/40 Usage Example Code #include <algorithm> vector<double> v; std::random_shuffle(v.begin(), v.end()); Applications in combination with STXXL: External Memory STL sort, multiway merge suffix array construction additional integer sort routine, merge, for each Release to Public Domain integration into libstdc++ started will ship as a part of GCC GPL with runtime exception open-source, everybody can contribute

37 Introduction Library Overview Algorithms Conclusion 36/40 Conclusions Benefits MCSTL provides a easy way to incorporate data parallelism into programs on an algorithmic level fully generic performance is excellent for large inputs speedup at hand for small inputs as well, depending on circumstances could transparently support new paradigms, e. g. transactional memory repository for parallel algorithm implementations use more (MC)STL

38 Introduction Library Overview Algorithms Conclusion 37/40 Demands to Language Spec, OS, and hardware OS allow specification of affinity between threads Threads 0, 2 and 4 should run close together (shared cache), otherwise widely separated (maximum bandwidth). define penalties for switching cores (cache locality) Hardware more memory bandwidth faster communication larger shared caches

39 Introduction Library Overview Algorithms Conclusion 38/40 Future Work performance estimation automatic switching point detection Switching Number of Threads for Balanced Quicksort: Preliminary Results seq 1 th 2 th 3 th 4 th 5 th 6 th 7 th 8 th Speedup Input Size more application studies updates of complex data structures like priority queues

40 Introduction Library Overview Algorithms Conclusion 39/40 Performance Estimation Issues circumstances can hardly be determined at compile-time execution times of functors, comparators, overloaded assignment operators and copy constructors crucial for performance

41 Introduction Library Overview Algorithms Conclusion 40/40 References L. Frias and J. Singler. Parallelization of bulk operations for STL dictionaries. In Workshop on Highly Parallel Processing on a Chip (HPPC), J. Singler, P. Sanders, and F. Putze. The Multi-Core Standard Template Library. In Euro-Par 2007: Parallel Processing, volume 4641 of LNCS, pages Springer-Verlag. P. Tsigas and Y. Zhang. A simple, fast parallel implementation of quicksort and its performance evaluation on SUN enterprise In 11th Euromicro Conference on Parallel, Distributed and Network-Based Processing, page 372, P. J. Varman, S. D. Scheufler, B. R. Iyer, and G. R. Ricard. Merging Multiple Lists on Hierarchical-Memory Multiprocessors. Journal of Parallel and Distributed Computing, 12(2): , 1991.

Parallel Partition Revisited

Parallel Partition Revisited Parallel Partition Revisited Leonor Frias and Jordi Petit Dep. de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya WEA 2008 Overview Partitioning an array with respect to a pivot

More information

Parallel Merge Sort with Double Merging

Parallel Merge Sort with Double Merging Parallel with Double Merging Ahmet Uyar Department of Computer Engineering Meliksah University, Kayseri, Turkey auyar@meliksah.edu.tr Abstract ing is one of the fundamental problems in computer science.

More information

Single-Pass List Partitioning

Single-Pass List Partitioning Single-Pass List Partitioning Leonor Frias 1 Johannes Singler 2 Peter Sanders 2 1 Dep. de Llenguatges i Sistemes Informàtics, Universitat Politècnica de Catalunya 2 Institut für Theoretische Informatik,

More information

Studienarbeit Parallel Highway-Node Routing

Studienarbeit Parallel Highway-Node Routing Studienarbeit Parallel Highway-Node Routing Manuel Holtgrewe Betreuer: Dominik Schultes, Johannes Singler Verantwortlicher Betreuer: Prof. Dr. Peter Sanders January 15, 2008 Abstract Highway-Node Routing

More information

Course Review for Finals. Cpt S 223 Fall 2008

Course Review for Finals. Cpt S 223 Fall 2008 Course Review for Finals Cpt S 223 Fall 2008 1 Course Overview Introduction to advanced data structures Algorithmic asymptotic analysis Programming data structures Program design based on performance i.e.,

More information

CS301 - Data Structures Glossary By

CS301 - Data Structures Glossary By CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm

More information

Lesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans

Lesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans Lesson 1 4 Prefix Sum Definitions Prefix sum given an array...the prefix sum is the sum of all the elements in the array from the beginning to the position, including the value at the position. The sequential

More information

Data Structures and Algorithm Analysis in C++

Data Structures and Algorithm Analysis in C++ INTERNATIONAL EDITION Data Structures and Algorithm Analysis in C++ FOURTH EDITION Mark A. Weiss Data Structures and Algorithm Analysis in C++, International Edition Table of Contents Cover Title Contents

More information

First Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors

First Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors First Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Distributed Computing Systems Chalmers University

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

Lecture 3: Sorting 1

Lecture 3: Sorting 1 Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:

More information

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS

DIVIDE AND CONQUER ALGORITHMS ANALYSIS WITH RECURRENCE EQUATIONS CHAPTER 11 SORTING ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM NANCY M. AMATO AND

More information

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix

Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix Spring 2010 Review Topics Big O Notation Heaps Sorting Selection, Bubble, Insertion, Merge, Heap, Quick Bucket, Radix Hashtables Tree Balancing: AVL trees and DSW algorithm Graphs: Basic terminology and

More information

Quick Sort. CSE Data Structures May 15, 2002

Quick Sort. CSE Data Structures May 15, 2002 Quick Sort CSE 373 - Data Structures May 15, 2002 Readings and References Reading Section 7.7, Data Structures and Algorithm Analysis in C, Weiss Other References C LR 15-May-02 CSE 373 - Data Structures

More information

Lecture 19 Sorting Goodrich, Tamassia

Lecture 19 Sorting Goodrich, Tamassia Lecture 19 Sorting 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4 4 2004 Goodrich, Tamassia Outline Review 3 simple sorting algorithms: 1. selection Sort (in previous course) 2. insertion Sort (in previous

More information

Practice Problems for the Final

Practice Problems for the Final ECE-250 Algorithms and Data Structures (Winter 2012) Practice Problems for the Final Disclaimer: Please do keep in mind that this problem set does not reflect the exact topics or the fractions of each

More information

QuickSort

QuickSort QuickSort 7 4 9 6 2 2 4 6 7 9 4 2 2 4 7 9 7 9 2 2 9 9 1 QuickSort QuickSort on an input sequence S with n elements consists of three steps: n n n Divide: partition S into two sequences S 1 and S 2 of about

More information

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum Table ADT and Sorting Algorithm topics continuing (or reviewing?) CS 24 curriculum A table ADT (a.k.a. Dictionary, Map) Table public interface: // Put information in the table, and a unique key to identify

More information

Sorting Goodrich, Tamassia Sorting 1

Sorting Goodrich, Tamassia Sorting 1 Sorting Put array A of n numbers in increasing order. A core algorithm with many applications. Simple algorithms are O(n 2 ). Optimal algorithms are O(n log n). We will see O(n) for restricted input in

More information

Intel Thread Building Blocks

Intel Thread Building Blocks Intel Thread Building Blocks SPD course 2017-18 Massimo Coppola 23/03/2018 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION

Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION DESIGN AND ANALYSIS OF ALGORITHMS Unit 6 Chapter 15 EXAMPLES OF COMPLEXITY CALCULATION http://milanvachhani.blogspot.in EXAMPLES FROM THE SORTING WORLD Sorting provides a good set of examples for analyzing

More information

High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs

High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs High Performance Comparison-Based Sorting Algorithm on Many-Core GPUs Xiaochun Ye, Dongrui Fan, Wei Lin, Nan Yuan, and Paolo Ienne Key Laboratory of Computer System and Architecture ICT, CAS, China Outline

More information

Algorithms and Applications

Algorithms and Applications Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class

More information

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

Sorting Pearson Education, Inc. All rights reserved.

Sorting Pearson Education, Inc. All rights reserved. 1 19 Sorting 2 19.1 Introduction (Cont.) Sorting data Place data in order Typically ascending or descending Based on one or more sort keys Algorithms Insertion sort Selection sort Merge sort More efficient,

More information

COMP Parallel Computing. SMM (2) OpenMP Programming Model

COMP Parallel Computing. SMM (2) OpenMP Programming Model COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

SORTING, SETS, AND SELECTION

SORTING, SETS, AND SELECTION CHAPTER 11 SORTING, SETS, AND SELECTION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN C++, GOODRICH, TAMASSIA AND MOUNT (WILEY 2004) AND SLIDES FROM

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Chapter 1 Introduction

Chapter 1 Introduction Preface xv Chapter 1 Introduction 1.1 What's the Book About? 1 1.2 Mathematics Review 2 1.2.1 Exponents 3 1.2.2 Logarithms 3 1.2.3 Series 4 1.2.4 Modular Arithmetic 5 1.2.5 The P Word 6 1.3 A Brief Introduction

More information

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai

Parallel Programming Principle and Practice. Lecture 7 Threads programming with TBB. Jin, Hai Parallel Programming Principle and Practice Lecture 7 Threads programming with TBB Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology Outline Intel Threading

More information

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms

Algorithm Efficiency & Sorting. Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms Algorithm Efficiency & Sorting Algorithm efficiency Big-O notation Searching algorithms Sorting algorithms Overview Writing programs to solve problem consists of a large number of decisions how to represent

More information

Mergesort again. 1. Split the list into two equal parts

Mergesort again. 1. Split the list into two equal parts Quicksort Mergesort again 1. Split the list into two equal parts 5 3 9 2 8 7 3 2 1 4 5 3 9 2 8 7 3 2 1 4 Mergesort again 2. Recursively mergesort the two parts 5 3 9 2 8 7 3 2 1 4 2 3 5 8 9 1 2 3 4 7 Mergesort

More information

Introduction to Computers and Programming. Today

Introduction to Computers and Programming. Today Introduction to Computers and Programming Prof. I. K. Lundqvist Lecture 10 April 8 2004 Today How to determine Big-O Compare data structures and algorithms Sorting algorithms 2 How to determine Big-O Partition

More information

On the cost of managing data flow dependencies

On the cost of managing data flow dependencies On the cost of managing data flow dependencies - program scheduled by work stealing - Thierry Gautier, INRIA, EPI MOAIS, Grenoble France Workshop INRIA/UIUC/NCSA Outline Context - introduction of work

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Virtual Memory COMPSCI 386

Virtual Memory COMPSCI 386 Virtual Memory COMPSCI 386 Motivation An instruction to be executed must be in physical memory, but there may not be enough space for all ready processes. Typically the entire program is not needed. Exception

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to

More information

Information Coding / Computer Graphics, ISY, LiTH

Information Coding / Computer Graphics, ISY, LiTH Sorting on GPUs Revisiting some algorithms from lecture 6: Some not-so-good sorting approaches Bitonic sort QuickSort Concurrent kernels and recursion Adapt to parallel algorithms Many sorting algorithms

More information

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort

Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Merge Sort & Quick Sort Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015 Merge Sort & Quick Sort 1 Divide-and-Conquer Divide-and conquer is a general algorithm

More information

CSc 110, Spring 2017 Lecture 39: searching

CSc 110, Spring 2017 Lecture 39: searching CSc 110, Spring 2017 Lecture 39: searching 1 Sequential search sequential search: Locates a target value in a list (may not be sorted) by examining each element from start to finish. Also known as linear

More information

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358 Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement

More information

Programming II (CS300)

Programming II (CS300) 1 Programming II (CS300) Chapter 12: Sorting Algorithms MOUNA KACEM mouna@cs.wisc.edu Spring 2018 Outline 2 Last week Implementation of the three tree depth-traversal algorithms Implementation of the BinarySearchTree

More information

Review: Creating a Parallel Program. Programming for Performance

Review: Creating a Parallel Program. Programming for Performance Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)

More information

Sorting Algorithms. + Analysis of the Sorting Algorithms

Sorting Algorithms. + Analysis of the Sorting Algorithms Sorting Algorithms + Analysis of the Sorting Algorithms Insertion Sort What if first k elements of array are already sorted? 4, 7, 12, 5, 19, 16 We can shift the tail of the sorted elements list down and

More information

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs

Algorithms in Systems Engineering ISE 172. Lecture 12. Dr. Ted Ralphs Algorithms in Systems Engineering ISE 172 Lecture 12 Dr. Ted Ralphs ISE 172 Lecture 12 1 References for Today s Lecture Required reading Chapter 6 References CLRS Chapter 7 D.E. Knuth, The Art of Computer

More information

Performance and Optimization Issues in Multicore Computing

Performance and Optimization Issues in Multicore Computing Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program

More information

Intel Thread Building Blocks

Intel Thread Building Blocks Intel Thread Building Blocks SPD course 2015-16 Massimo Coppola 08/04/2015 1 Thread Building Blocks : History A library to simplify writing thread-parallel programs and debugging them Originated circa

More information

SORTING AND SELECTION

SORTING AND SELECTION 2 < > 1 4 8 6 = 9 CHAPTER 12 SORTING AND SELECTION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND GOLDWASSER (WILEY 2016)

More information

Parallel Time-Dependent Contraction Hierarchies

Parallel Time-Dependent Contraction Hierarchies Parallel Time-Dependent Contraction Hierarchies Christian Vetter July 13, 2009 Student Research Project Universität Karlsruhe (TH), 76128 Karlsruhe, Germany Supervised by G. V. Batz and P. Sanders Abstract

More information

Comparison Sorts. Chapter 9.4, 12.1, 12.2

Comparison Sorts. Chapter 9.4, 12.1, 12.2 Comparison Sorts Chapter 9.4, 12.1, 12.2 Sorting We have seen the advantage of sorted data representations for a number of applications Sparse vectors Maps Dictionaries Here we consider the problem of

More information

Merge Sort Goodrich, Tamassia Merge Sort 1

Merge Sort Goodrich, Tamassia Merge Sort 1 Merge Sort 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4 4 2004 Goodrich, Tamassia Merge Sort 1 Review of Sorting Selection-sort: Search: search through remaining unsorted elements for min Remove: remove

More information

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019

CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting. Ruth Anderson Winter 2019 CSE 332: Data Structures & Parallelism Lecture 12: Comparison Sorting Ruth Anderson Winter 2019 Today Sorting Comparison sorting 2/08/2019 2 Introduction to sorting Stacks, queues, priority queues, and

More information

arxiv: v1 [cs.ds] 8 Dec 2016

arxiv: v1 [cs.ds] 8 Dec 2016 Sorting Data on Ultra-Large Scale with RADULS New Incarnation of Radix Sort Marek Kokot, Sebastian Deorowicz, and Agnieszka Debudaj-Grabysz Institute of Informatics, Silesian University of Technology,

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1

Chap. 5 Part 2. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1 Chap. 5 Part 2 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 Static work allocation Where work distribution is predetermined, but based on what? Typical scheme Divide n size data into P

More information

Recursive Sorts. Recursive Sorts. Divide-and-Conquer. Divide-and-Conquer. Divide-and-conquer paradigm:

Recursive Sorts. Recursive Sorts. Divide-and-Conquer. Divide-and-Conquer. Divide-and-conquer paradigm: Recursive Sorts Recursive Sorts Recursive sorts divide the data roughly in half and are called again on the smaller data sets. This is called the Divide-and-Conquer paradigm. We will see 2 recursive sorts:

More information

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

The Limits of Sorting Divide-and-Conquer Comparison Sorts II The Limits of Sorting Divide-and-Conquer Comparison Sorts II CS 311 Data Structures and Algorithms Lecture Slides Monday, October 12, 2009 Glenn G. Chappell Department of Computer Science University of

More information

NUMA replicated pagecache for Linux

NUMA replicated pagecache for Linux NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer Module 2: Divide and Conquer Divide and Conquer Control Abstraction for Divide &Conquer 1 Recurrence equation for Divide and Conquer: If the size of problem p is n and the sizes of the k sub problems are

More information

Data Structures and Algorithms

Data Structures and Algorithms Data Structures and Algorithms Autumn 2018-2019 Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Quicksort Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Quicksort Quicksort

More information

Performance Tools for Technical Computing

Performance Tools for Technical Computing Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology

More information

8 Introduction to Distributed Computing

8 Introduction to Distributed Computing CME 323: Distributed Algorithms and Optimization, Spring 2017 http://stanford.edu/~rezab/dao. Instructor: Reza Zadeh, Matroid and Stanford. Lecture 8, 4/26/2017. Scribed by A. Santucci. 8 Introduction

More information

COMP Data Structures

COMP Data Structures COMP 2140 - Data Structures Shahin Kamali Topic 5 - Sorting University of Manitoba Based on notes by S. Durocher. COMP 2140 - Data Structures 1 / 55 Overview Review: Insertion Sort Merge Sort Quicksort

More information

Algorithms: Design & Practice

Algorithms: Design & Practice Algorithms: Design & Practice Deepak Kumar Bryn Mawr College Spring 2018 Course Essentials Algorithms Design & Practice How to design Learn some good ones How to implement practical considerations How

More information

In-place Super Scalar. Tim Kralj. Samplesort

In-place Super Scalar. Tim Kralj. Samplesort In-place Super Scalar Tim Kralj Samplesort Outline Quicksort Super Scalar Samplesort In-place Super Scalar Samplesort (IPS 4 o) Analysis Results Further work/questions Quicksort Finds pivots in the array

More information

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language

More information

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)?

Key question: how do we pick a good pivot (and what makes a good pivot in the first place)? More on sorting Mergesort (v2) Quicksort Mergesort in place in action 53 2 44 85 11 67 7 39 14 53 87 11 50 67 2 14 44 53 80 85 87 14 87 80 50 29 72 95 2 44 80 85 7 29 39 72 95 Boxes with same color are

More information

Balanced Binary Search Trees. Victor Gao

Balanced Binary Search Trees. Victor Gao Balanced Binary Search Trees Victor Gao OUTLINE Binary Heap Revisited BST Revisited Balanced Binary Search Trees Rotation Treap Splay Tree BINARY HEAP: REVIEW A binary heap is a complete binary tree such

More information

Randomized Algorithms, Quicksort and Randomized Selection

Randomized Algorithms, Quicksort and Randomized Selection CMPS 2200 Fall 2017 Randomized Algorithms, Quicksort and Randomized Selection Carola Wenk Slides by Carola Wenk and Charles Leiserson CMPS 2200 Intro. to Algorithms 1 Deterministic Algorithms Runtime for

More information

Memory Management Algorithms on Distributed Systems. Katie Becker and David Rodgers CS425 April 15, 2005

Memory Management Algorithms on Distributed Systems. Katie Becker and David Rodgers CS425 April 15, 2005 Memory Management Algorithms on Distributed Systems Katie Becker and David Rodgers CS425 April 15, 2005 Table of Contents 1. Introduction 2. Coarse Grained Memory 2.1. Bottlenecks 2.2. Simulations 2.3.

More information

Buffer Heap Implementation & Evaluation. Hatem Nassrat. CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science

Buffer Heap Implementation & Evaluation. Hatem Nassrat. CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science Buffer Heap Implementation & Evaluation Hatem Nassrat CSCI 6104 Instructor: N.Zeh Dalhousie University Computer Science Table of Contents Introduction...3 Cache Aware / Cache Oblivious Algorithms...3 Buffer

More information

In multiprogramming systems, processes share a common store. Processes need space for:

In multiprogramming systems, processes share a common store. Processes need space for: Memory Management In multiprogramming systems, processes share a common store. Processes need space for: code (instructions) static data (compiler initialized variables, strings, etc.) global data (global

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introduction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threads 6. CPU Scheduling 7. Process Synchronization 8. Deadlocks 9. Memory Management 10. Virtual

More information

Sorting. Data structures and Algorithms

Sorting. Data structures and Algorithms Sorting Data structures and Algorithms Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++ Goodrich, Tamassia and Mount (Wiley, 2004) Outline Bubble

More information

Merge Sort

Merge Sort Merge Sort 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4 4 Divide-and-Conuer Divide-and conuer is a general algorithm design paradigm: n Divide: divide the input data S in two disjoint subsets S 1 and

More information

Run Times. Efficiency Issues. Run Times cont d. More on O( ) notation

Run Times. Efficiency Issues. Run Times cont d. More on O( ) notation Comp2711 S1 2006 Correctness Oheads 1 Efficiency Issues Comp2711 S1 2006 Correctness Oheads 2 Run Times An implementation may be correct with respect to the Specification Pre- and Post-condition, but nevertheless

More information

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 16 Treaps; Augmented BSTs Lecture 16 Treaps; Augmented BSTs Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Margaret Reid-Miller 8 March 2012 Today: - More on Treaps - Ordered Sets and Tables

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements.

Sorting. Bubble Sort. Pseudo Code for Bubble Sorting: Sorting is ordering a list of elements. Sorting Sorting is ordering a list of elements. Types of sorting: There are many types of algorithms exist based on the following criteria: Based on Complexity Based on Memory usage (Internal & External

More information

Scan Primitives for GPU Computing

Scan Primitives for GPU Computing Scan Primitives for GPU Computing Shubho Sengupta, Mark Harris *, Yao Zhang, John Owens University of California Davis, *NVIDIA Corporation Motivation Raw compute power and bandwidth of GPUs increasing

More information

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle

Dynamic Fine Grain Scheduling of Pipeline Parallelism. Presented by: Ram Manohar Oruganti and Michael TeWinkle Dynamic Fine Grain Scheduling of Pipeline Parallelism Presented by: Ram Manohar Oruganti and Michael TeWinkle Overview Introduction Motivation Scheduling Approaches GRAMPS scheduling method Evaluation

More information

Sorting. Divide-and-Conquer 1

Sorting. Divide-and-Conquer 1 Sorting Divide-and-Conquer 1 Divide-and-Conquer 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4 4 Divide-and-Conquer 2 Divide-and-Conquer Divide-and conquer is a general algorithm design paradigm: Divide:

More information

The Cost of Address Translation

The Cost of Address Translation The Cost of Address Translation Tomasz Jurkiewicz Kurt Mehlhorn Pat Nicholson Max Planck Institute for Informatics full version of paper by TJ and KM available at arxiv preliminary version presented at

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms

Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms Cache-Oblivious Algorithms A Unified Approach to Hierarchical Memory Algorithms Aarhus University Cache-Oblivious Current Trends Algorithms in Algorithms, - A Unified Complexity Approach to Theory, Hierarchical

More information