Overview of Java 8 Parallel Streams (Part 1)

Size: px
Start display at page:

Download "Overview of Java 8 Parallel Streams (Part 1)"

Transcription

1 Overview of Java 8 Parallel s (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA

2 Learning Objectives in this Part of the Lesson Know how aggregate operations & functional programming features are applied in the parallel streams framework Input x Aggregate operation (Function f) Output f(x) Aggregate operation (Function g) Output g(f(x)) Aggregate operation (Function h) Output h(g(f(x))) 2

3 Learning Objectives in this Part of the Lesson Know how aggregate operations & functional programming features are applied in the parallel streams framework Understand how a parallel stream works DataSource DataSource 1 DataSource 2 DataSource 1.1 DataSource 1.2 DataSource 2.1 DataSource 2.2 join join join 3

4 Overview of Java 8 Parallel s 4

5 Overview of Java 8 Parallel s A Java 8 parallel stream splits its elements into multiple chunks & uses a thread pool to process these chunks independently Input x Output f(x) Aggregate operation (behavior f) Aggregate operation (behavior g) Output g(f(x)) Aggregate operation (behavior h) Output h(g(f(x))) See docs.oracle.com/javase/tutorial/collections/streams/parallelism.html 5

6 Overview of Java 8 Parallel s A Java 8 parallel stream splits its elements into multiple chunks & uses a thread pool to process these chunks independently This splitting & thread pool are often invisible to programmers Input x Output f(x) Aggregate operation (behavior f) Aggregate operation (behavior g) Output g(f(x)) Aggregate operation (behavior h) Output h(g(f(x))) 6

7 Overview of Java 8 Parallel s A Java 8 parallel stream splits its elements into multiple chunks & uses a thread pool to process these chunks independently This splitting & thread pool are often invisible to programmers The order in which chunks are processed is likely non-deterministic Input x Aggregate operation (behavior f) Output f(x) Aggregate operation (behavior g) Output g(f(x)) Aggregate operation (behavior h) Output h(g(f(x))) i.e., programmers often have little/no 7control over how chunks are processed

8 Overview of Java 8 Parallel s A Java 8 parallel stream splits its elements into multiple chunks & uses a thread pool to process these chunks independently This splitting & thread pool are often invisible to programmers The order in which chunks are processed is likely non-deterministic This non-determinism is usually a good thing! Input x Aggregate operation (behavior f) Output f(x) Aggregate operation (behavior g) Output g(f(x)) Aggregate operation (behavior h) Output h(g(f(x))) 8

9 Overview of Java 8 Parallel s A Java 8 parallel stream splits its elements into multiple chunks & uses a thread pool to process these chunks independently This splitting & thread pool are often invisible to programmers The order in which chunks are processed is likely non-deterministic The results of the processing are likely deterministic Input x Aggregate operation (behavior f) Output f(x) Aggregate operation (behavior g) Output g(f(x)) Aggregate operation (behavior h) Output h(g(f(x))) Programmers have more control 9over how the results are presented

10 Overview of Java 8 Parallel s When a stream executes all of its aggregate operations run in a single thread List <String> <String> Search Phrases stream() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) List collect(tolist()) 10

11 Overview of Java 8 Parallel s When a stream executes in parallel, it is partitioned into multiple substream chunks that run in a common fork-join pool List <String> <String> Search Phrases parallel() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) List collect(tolist()) See docs.oracle.com/javase/8/docs/api/java/util/concurrent/forkjoinpool.html 11

12 Overview of Java 8 Parallel s When a stream executes in parallel, it is partitioned into multiple substream chunks that run in a common fork-join pool List <String> <String> Search Phrases parallel() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) List collect(tolist()) Threads in the pool process different 12 chunks in a non-deterministic order

13 Overview of Java 8 Parallel s When a stream executes in parallel, it is partitioned into multiple substream chunks that run in a common fork-join pool List <String> <String> List Search Phrases parallel() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) collect(tolist()) Intermediate operations iterate 13over & process these chunks in parallel

14 Overview of Java 8 Parallel s When a stream executes in parallel, it is partitioned into multiple substream chunks that run in a common fork-join pool List <String> <String> List Search Phrases parallel() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) collect(tolist()) A terminal operation then 14 combines the chunks into a single result

15 Overview of Java 8 Parallel s When a stream executes in parallel, it is partitioned into multiple substream chunks that run in a common fork-join pool List <String> <String> List Search Phrases parallel() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) collect(tolist()) (Stateless) Java 8 lambda expressions & method 15 references are used to pass behaviors

16 Overview of Java 8 Parallel s The same aggregate operations can be used for sequential & parallel streams See docs.oracle.com/javase/8/docs/api/java/util/stream/.html 16

17 Overview of Java 8 Parallel s The same aggregate operations can be used for sequential & parallel streams e.g., SearchGang uses the same aggregate operations for both SearchWithSequentials & SearchWithParallels implementations Search Phrases stream() vs. parallel() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) collect(tolist()) See github.com/douglascraigschmidt/livelessons/tree/master/searchgang 17

18 Overview of Java 8 Parallel s The same aggregate operations can be used for sequential & parallel streams Java 8 streams can thus treat parallelism as an optimization & leverage all available cores! Search Phrases parallel() map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) collect(tolist()) See qconlondon.com/london-2017/system/files/presentation-slides/concurrenttoparallel.pdf 18

19 Overview of Java 8 Parallel s The same aggregate operations can be used for sequential & parallel streams Java 8 streams can thus treat parallelism as an optimization & leverage all available cores! Naturally, behaviors run by these aggregate operations must be designed carefully to avoid accessing unsynchronized shared state.. parallel() Search Phrases map(phrase -> searchforphrase()) filter(not(searchresults::isempty)) collect(tolist()) See henrikeichenhardt.blogspot.com/2013/06/why-shared-mutable-state-is-root-of-all.html 19

20 Overview of How a Parallel Works 20

21 Overview of How a Parallel Works A Java 8 parallel stream implements a map/reduce variant optimized for multi-core processors Map Reduce Partition See en.wikipedia.org/wiki/mapreduce 21

22 Overview of How a Parallel Works A Java 8 parallel stream implements a map/reduce variant optimized for multi-core processors It s actually more like the split-apply-combine data analysis strategy DataSource 1 DataSource 2 DataSource DataSource 1.1 DataSource 1.2 DataSource 2.1 DataSource 2.2 join join join See 22

23 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks CollectionData CollectionData 1 CollectionData 2 CollectionData 1.1 CollectionData 1.2 CollectionData 2.1 CollectionData 2.2 See en.wikipedia.org/wiki/divide_and_conquer_algorithm 23

24 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks Spliterators are defined to partition collections in Java 8 CollectionData CollectionData 1 CollectionData 2 CollectionData 1.1 CollectionData 1.2 CollectionData 2.1 CollectionData 2.2 See docs.oracle.com/javase/8/docs/api/java/util/spliterator.html 24

25 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks Spliterators are defined to partition collections in Java 8 You can also define custom spliterators InputString InputString 1 InputString 2 InputString 1.1 InputString 1.2 InputString 2.1 InputString 2.2 Input Strings to Search Search Phrases 45,000+ phrases See github.com/douglascraigschmidt/livelessons/tree/master/searchspliterator 25

26 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks Spliterators are defined to partition collections in Java 8 You can also define custom spliterators Parallel streams perform better on data sources that can be split efficiently & evenly InputString InputString 1 InputString 2 InputString 1.1 InputString 1.2 InputString 2.1 InputString 2.2 See 26

27 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks 2. Apply chunks independently in a pool of threads InputString InputString 1 InputString 2 InputString 1.1 InputString 1.2 InputString 2.1 InputString 2.2 Splitting & applying run simultaneously 27 (after certain limit met), not

28 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks 2. Apply chunks independently in a pool of threads Programmers have some control over how many threads are in the pool InputString InputString 1 InputString 2 InputString 1.1 InputString 1.2 InputString 2.1 InputString

29 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks 2. Apply chunks independently in a pool of threads 3. Combine Join partial results into a single result InputString InputString 1 InputString 2 InputString 1.1 InputString 1.2 InputString 2.1 InputString 2.2 join join join 29

30 Overview of How a Parallel Works Split-apply-combine works as follows: 1. Split Recursively partition a data source into independent chunks 2. Apply chunks independently in a pool of threads 3. Combine Join partial results into a single result Performed by terminal operations like collect() & reduce() join InputString InputString 1 InputString 2 InputString 1.1 InputString 1.2 InputString 2.1 InputString 2.2 join join See 30

31 End of Overview of Java 8 Parallel s (Part 1) 31

Java 8 Parallel Stream Internals (Part 2)

Java 8 Parallel Stream Internals (Part 2) Java 8 Parallel Stream Internals (Part 2) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt

More information

External vs. Internal Iteration in Java 8

External vs. Internal Iteration in Java 8 External vs. Internal Iteration in Java 8 Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt

More information

Java 8 Parallel ImageStreamGang Example (Part 3)

Java 8 Parallel ImageStreamGang Example (Part 3) Java 8 Parallel ImageStreamGang Example (Part 3) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt

More information

The Java Executor (Part 1)

The Java Executor (Part 1) The Java Executor (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Learning

More information

Overview of Advanced Java 8 CompletableFuture Features (Part 2)

Overview of Advanced Java 8 CompletableFuture Features (Part 2) Overview of Advanced Java 8 CompletableFuture Features (Part 2) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated

More information

Overview of Advanced Java 8 CompletableFuture Features (Part 3)

Overview of Advanced Java 8 CompletableFuture Features (Part 3) Overview of Advanced Java 8 CompletableFuture Features (Part 3) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated

More information

Infrastructure Middleware (Part 1): Hardware Abstraction Layer (HAL)

Infrastructure Middleware (Part 1): Hardware Abstraction Layer (HAL) Infrastructure Middleware (Part 1): Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA

More information

The Java ExecutorService (Part 1)

The Java ExecutorService (Part 1) The Java ExecutorService (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Learning

More information

Overview of Java Threads (Part 3)

Overview of Java Threads (Part 3) Overview of Java Threads (Part 3) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Learning

More information

Overview of Java Threads (Part 2)

Overview of Java Threads (Part 2) Overview of Java Threads (Part 2) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Learning

More information

Benefits of Concurrency in Java & Android: Program Structure

Benefits of Concurrency in Java & Android: Program Structure Benefits of Concurrency in Java & Android: Program Structure Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University

More information

Java Barrier Synchronizers: CyclicBarrier (Part 1)

Java Barrier Synchronizers: CyclicBarrier (Part 1) Java Barrier Synchronizers: CyclicBarrier (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville,

More information

A Case Study of Gang of Four (GoF) Patterns : Part 7

A Case Study of Gang of Four (GoF) Patterns : Part 7 A Case Study of Gang of Four (GoF) Patterns : Part 7 d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University

More information

Java Barrier Synchronizers: CountDownLatch (Part 1)

Java Barrier Synchronizers: CountDownLatch (Part 1) Java Barrier Synchronizers: CountDownLatch (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville,

More information

Overview of Java s Support for Polymorphism

Overview of Java s Support for Polymorphism Overview of Java s Support for Polymorphism Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt

More information

Java 8 Stream Performance Angelika Langer & Klaus Kreft

Java 8 Stream Performance Angelika Langer & Klaus Kreft Java 8 Stream Performance Angelika Langer & Klaus Kreft agenda introduction loop vs. sequential stream sequential vs. parallel stream Stream Performance (2) what is a stream? equivalent of sequence from

More information

The Java FutureTask. Douglas C. Schmidt

The Java FutureTask. Douglas C. Schmidt The Java FutureTask Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Learning Objectives

More information

Java Semaphore (Part 1)

Java Semaphore (Part 1) Java Semaphore (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Learning Objectives

More information

Java 8 Stream Performance Angelika Langer & Klaus Kreft

Java 8 Stream Performance Angelika Langer & Klaus Kreft Java 8 Stream Performance Angelika Langer & Klaus Kreft objective how do streams perform? explore whether / when parallel streams outperfom seq. streams compare performance of streams to performance of

More information

Infrastructure Middleware (Part 3): Android Runtime Core & Native Libraries

Infrastructure Middleware (Part 3): Android Runtime Core & Native Libraries Infrastructure Middleware (Part 3): Android Runtime Core & Native Libraries Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt

More information

Overview of Java 8 Functional Interfaces

Overview of Java 8 Functional Interfaces Overview of Java 8 Functional Interfaces Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University

More information

Overview of Frameworks: Part 3

Overview of Frameworks: Part 3 : Part 3 d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA CS 282 Principles of Operating Systems II Systems

More information

Overview of Layered Architectures

Overview of Layered Architectures Overview of ed Architectures Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University Nashville,

More information

Java ReentrantLock (Part 1)

Java ReentrantLock (Part 1) Java ReentrantLock (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Learning

More information

Testing Properties of Dataflow Program Operators

Testing Properties of Dataflow Program Operators Testing Properties of Dataflow Program Operators Zhihong Xu Martin Hirzel Gregg Rothermel Kun-Lung Wu U. of Nebraska IBM Research U. of Nebraska IBM Research Presentation at ASE, November 2013 Big Data

More information

Multicore Programming

Multicore Programming Multicore Programming Java Streams Louis-Claude Canon louis-claude.canon@univ-fcomte.fr Bureau 414C Master 1 informatique Semestre 8 Louis-Claude Canon MCP Java Streams 1 / 124 Motivations Express simple

More information

Java Monitor Objects: Synchronization (Part 1)

Java Monitor Objects: Synchronization (Part 1) Java Monitor Objects: Synchronization (Part 1) Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee,

More information

Introduction to Spark

Introduction to Spark Introduction to Spark Outlines A brief history of Spark Programming with RDDs Transformations Actions A brief history Limitations of MapReduce MapReduce use cases showed two major limitations: Difficulty

More information

Overview of Patterns

Overview of Patterns d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA Topics Covered in this Module

More information

Java Concurrency. Towards a better life By - -

Java Concurrency. Towards a better life By - - Java Concurrency Towards a better life By - Srinivasan.raghavan@oracle.com - Vaibhav.x.choudhary@oracle.com Java Releases J2SE 6: - Collection Framework enhancement -Drag and Drop -Improve IO support J2SE

More information

Language Support for Stream. Processing. Robert Soulé

Language Support for Stream. Processing. Robert Soulé Language Support for Stream Processing Robert Soulé Introduction Proposal Work Plan Outline Formalism Translators & Abstractions Optimizations Related Work Conclusion 2 Introduction Stream processing is

More information

Programming Systems for Big Data

Programming Systems for Big Data Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There

More information

A Case Study of Gang of Four (GoF) Patterns: Part 3

A Case Study of Gang of Four (GoF) Patterns: Part 3 A Case Study of Gang of Four (GoF) Patterns: Part 3 d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Professor of Computer Science Institute for Software Integrated Systems Vanderbilt University

More information

Parallel Programming

Parallel Programming Parallel Programming Lecture delivered by: Venkatanatha Sarma Y Assistant Professor MSRSAS-Bangalore 1 Session Objectives To understand the parallelization in terms of computational solutions. To understand

More information

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II

EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II EE382 (20): Computer Architecture - Parallelism and Locality Fall 2011 Lecture 11 Parallelism in Software II Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality, Fall 2011 --

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2017 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

Strategies for Incremental Updates on Hive

Strategies for Incremental Updates on Hive Strategies for Incremental Updates on Hive Copyright Informatica LLC 2017. Informatica, the Informatica logo, and Big Data Management are trademarks or registered trademarks of Informatica LLC in the United

More information

Introduction to Concepts in Functional Programming. CS16: Introduction to Data Structures & Algorithms Spring 2017

Introduction to Concepts in Functional Programming. CS16: Introduction to Data Structures & Algorithms Spring 2017 Introduction to Concepts in Functional Programming CS16: Introduction to Data Structures & Algorithms Spring 2017 Outline Functions State Functions as building blocks Higher order functions Map Reduce

More information

Refactoring to Functional. Hadi Hariri

Refactoring to Functional. Hadi Hariri Refactoring to Functional Hadi Hariri Functional Programming In computer science, functional programming is a programming paradigm, a style of building the structure and elements of computer programs,

More information

Lecture 11 Parallel Computation Patterns Reduction Trees

Lecture 11 Parallel Computation Patterns Reduction Trees ECE408 Applied Parallel Programming Lecture 11 Parallel Computation Patterns Reduction Trees 1 Objective To master Reduction Trees, arguably the most widely used parallel computation pattern Basic concept

More information

Elasticsearch Scalability and Performance

Elasticsearch Scalability and Performance The Do's and Don ts of Elasticsearch Scalability and Performance Patrick Peschlow Think hard about your mapping Think hard about your mapping Which fields to analyze? How to analyze them? Need term frequencies,

More information

STREAMS VS PARALLELSTREAMS BY VAIBHAV CHOUDHARY JAVA PLATFORMS TEAM, ORACLE

STREAMS VS PARALLELSTREAMS BY VAIBHAV CHOUDHARY JAVA PLATFORMS TEAM, ORACLE STREAMS VS PARALLELSTREAMS BY VAIBHAV CHOUDHARY (@VAIBHAV_C) JAVA PLATFORMS TEAM, ORACLE HTTP://BLOGS.ORACLE.COM/VAIBHAV IN SHORT, WE WILL DISCUSS Concurrency and Parallelism Stream support in Java 8 Streams

More information

Shared Memory Programming with OpenMP (3)

Shared Memory Programming with OpenMP (3) Shared Memory Programming with OpenMP (3) 2014 Spring Jinkyu Jeong (jinkyu@skku.edu) 1 SCHEDULING LOOPS 2 Scheduling Loops (2) parallel for directive Basic partitioning policy block partitioning Iteration

More information

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev

Transactum Business Process Manager with High-Performance Elastic Scaling. November 2011 Ivan Klianev Transactum Business Process Manager with High-Performance Elastic Scaling November 2011 Ivan Klianev Transactum BPM serves three primary objectives: To make it possible for developers unfamiliar with distributed

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)

More information

Programming Iterative Loops. for while

Programming Iterative Loops. for while Programming Iterative Loops for while What was an iterative loop, again? Recall this definition: Iteration is when the same procedure is repeated multiple times. Some examples were long division, the Fibonacci

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1

Lecture 16: Recapitulations. Lecture 16: Recapitulations p. 1 Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently

More information

PGAS: Partitioned Global Address Space

PGAS: Partitioned Global Address Space .... PGAS: Partitioned Global Address Space presenter: Qingpeng Niu January 26, 2012 presenter: Qingpeng Niu : PGAS: Partitioned Global Address Space 1 Outline presenter: Qingpeng Niu : PGAS: Partitioned

More information

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced GLADE: A Scalable Framework for Efficient Analytics Florin Rusu University of California, Merced Motivation and Objective Large scale data processing Map-Reduce is standard technique Targeted to distributed

More information

Parallel Nested Loops

Parallel Nested Loops Parallel Nested Loops For each tuple s i in S For each tuple t j in T If s i =t j, then add (s i,t j ) to output Create partitions S 1, S 2, T 1, and T 2 Have processors work on (S 1,T 1 ), (S 1,T 2 ),

More information

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011

Parallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011 Parallel Nested Loops Parallel Partition-Based For each tuple s i in S For each tuple t j in T If s i =t j, then add (s i,t j ) to output Create partitions S 1, S 2, T 1, and T 2 Have processors work on

More information

N N Sudoku Solver. Sequential and Parallel Computing

N N Sudoku Solver. Sequential and Parallel Computing N N Sudoku Solver Sequential and Parallel Computing Abdulaziz Aljohani Computer Science. Rochester Institute of Technology, RIT Rochester, United States aaa4020@rit.edu Abstract 'Sudoku' is a logic-based

More information

Perchance to Stream with Java 8

Perchance to Stream with Java 8 Perchance to Stream with Java 8 Paul Sandoz Oracle The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated $$ into

More information

Streams in Java 8. Start programming in a more functional style

Streams in Java 8. Start programming in a more functional style Streams in Java 8 Start programming in a more functional style Background Who am I? Tobias Coetzee I m a Technical Lead at BBD I present the Java Expert Level Certifications at BBD (EJB, JPA, etc.) I m

More information

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors.

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors. About the Tutorial Storm was originally created by Nathan Marz and team at BackType. BackType is a social analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short time, Apache

More information

CS 470 Spring Mike Lam, Professor. Advanced OpenMP

CS 470 Spring Mike Lam, Professor. Advanced OpenMP CS 470 Spring 2018 Mike Lam, Professor Advanced OpenMP Atomics OpenMP provides access to highly-efficient hardware synchronization mechanisms Use the atomic pragma to annotate a single statement Statement

More information

Practical Concurrency. Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited.

Practical Concurrency. Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. Practical Concurrency Agenda Motivation Java Memory Model Basics Common Bug Patterns JDK Concurrency Utilities Patterns of Concurrent Processing Testing Concurrent Applications Concurrency in Java 7 2

More information

Tree-Structured Indexes

Tree-Structured Indexes Tree-Structured Indexes Yanlei Diao UMass Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke Access Methods v File of records: Abstraction of disk storage for query processing (1) Sequential scan;

More information

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Techniques for Dynamic Swapping in the Lightweight CORBA Component Model

Techniques for Dynamic Swapping in the Lightweight CORBA Component Model in the Lightweight CORBA Component Model jai@dre.vanderbilt.edu www.dre.vanderbilt.edu/~jai Dr. Aniruddha Gokhale gokhale@dre.vanderbilt.edu www.dre.vanderbilt.edu/~gokhale Dr. Douglas C. Schmidt schmidt@dre.vanderbilt.edu

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation Parallel Compilation Two approaches to compilation Parallelize a program manually Sequential code converted to parallel code Develop

More information

Java SE 8 Programming

Java SE 8 Programming Oracle University Contact Us: +52 1 55 8525 3225 Java SE 8 Programming Duration: 5 Days What you will learn This Java SE 8 Programming training covers the core language features and Application Programming

More information

Parallel Merge Sort with Double Merging

Parallel Merge Sort with Double Merging Parallel with Double Merging Ahmet Uyar Department of Computer Engineering Meliksah University, Kayseri, Turkey auyar@meliksah.edu.tr Abstract ing is one of the fundamental problems in computer science.

More information

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV

EE382N (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV EE382 (20): Computer Architecture - Parallelism and Locality Lecture 13 Parallelism in Software IV Mattan Erez The University of Texas at Austin EE382: Parallelilsm and Locality (c) Rodric Rabbah, Mattan

More information

Scan Primitives for GPU Computing

Scan Primitives for GPU Computing Scan Primitives for GPU Computing Shubho Sengupta, Mark Harris *, Yao Zhang, John Owens University of California Davis, *NVIDIA Corporation Motivation Raw compute power and bandwidth of GPUs increasing

More information

CS/EE 217 GPU Architecture and Parallel Programming. Lecture 10. Reduction Trees

CS/EE 217 GPU Architecture and Parallel Programming. Lecture 10. Reduction Trees CS/EE 217 GPU Architecture and Parallel Programming Lecture 10 Reduction Trees David Kirk/NVIDIA and Wen-mei W. Hwu University of Illinois, 2007-2012 1 Objective To master Reduction Trees, arguably the

More information

Interactions A link message

Interactions A link message Interactions An interaction is a behavior that is composed of a set of messages exchanged among a set of objects within a context to accomplish a purpose. A message specifies the communication between

More information

Apache Flink. Alessandro Margara

Apache Flink. Alessandro Margara Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate

More information

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins MapReduce 1 Outline Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins 2 Outline Distributed File System Map-Reduce The Computational Model Map-Reduce

More information

Patterns of Parallel Programming with.net 4. Ade Miller Microsoft patterns & practices

Patterns of Parallel Programming with.net 4. Ade Miller Microsoft patterns & practices Patterns of Parallel Programming with.net 4 Ade Miller (adem@microsoft.com) Microsoft patterns & practices Introduction Why you should care? Where to start? Patterns walkthrough Conclusions (and a quiz)

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature

More information

Java SE 8 Programming

Java SE 8 Programming Java SE 8 Programming Training Calendar Date Training Time Location 16 September 2019 5 Days Bilginç IT Academy 28 October 2019 5 Days Bilginç IT Academy Training Details Training Time : 5 Days Capacity

More information

MapReduce: Algorithm Design for Relational Operations

MapReduce: Algorithm Design for Relational Operations MapReduce: Algorithm Design for Relational Operations Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec Projection π Projection in MapReduce Easy Map over tuples, emit

More information

Overview of Activities

Overview of Activities d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee, USA CS 282 Principles of Operating Systems II Systems Programming

More information

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne

Chapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne Chapter 4: Threads Silberschatz, Galvin and Gagne Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Linux Threads 4.2 Silberschatz, Galvin and

More information

Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker

Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker 1 Turbo Charge CPU Utilization in Fork/Join Using the ManagedBlocker 2017 Heinz Kabutz, All Rights Reserved Dr Heinz M. Kabutz Last Updated 2017-01-24 2 Regular 3 ManagedBlocker 4 Speeding Up Fibonacci

More information

Fast Track to Core Java 8 Programming for OO Developers (TT2101-J8) Day(s): 3. Course Code: GK1965. Overview

Fast Track to Core Java 8 Programming for OO Developers (TT2101-J8) Day(s): 3. Course Code: GK1965. Overview Fast Track to Core Java 8 Programming for OO Developers (TT2101-J8) Day(s): 3 Course Code: GK1965 Overview Java 8 Essentials for OO Developers is a three-day, fast-paced, quick start to Java 8 training

More information

Principles of Software Construction: Objects, Design and Concurrency. The Perils of Concurrency, part 4 Can't live with it. Can't live without it.

Principles of Software Construction: Objects, Design and Concurrency. The Perils of Concurrency, part 4 Can't live with it. Can't live without it. Principles of Software Construction: Objects, Design and Concurrency 15-214 toad The Perils of Concurrency, part 4 Can't live with it. Can't live without it. Fall 2012 Jonathan Aldrich Charlie Garrod School

More information

Java SE 8 Programming

Java SE 8 Programming Oracle University Contact Us: Local: 1800 103 4775 Intl: +91 80 67863102 Java SE 8 Programming Duration: 5 Days What you will learn This Java SE 8 Programming training covers the core language features

More information

Shared state model. April 3, / 29

Shared state model. April 3, / 29 Shared state April 3, 2012 1 / 29 the s s limitations of explicit state: cells equivalence of the two s programming in limiting interleavings locks, monitors, transactions comparing the 3 s 2 / 29 Message

More information

The Return of Gearman

The Return of Gearman The Return of Gearman Eric Day eday@oddments.org Percona Performance Conference 2009 http://www.gearman.org/ Overview History Recent development How Gearman works Map/Reduce with Gearman Simple example

More information

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms :Algorithms in the Real World Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms Page1 Andrew Chien, 2008 2 Outline Concurrency vs. Parallelism Quicksort example Nested

More information

PYTHON TRAINING COURSE CONTENT

PYTHON TRAINING COURSE CONTENT SECTION 1: INTRODUCTION What s python? Why do people use python? Some quotable quotes A python history lesson Advocacy news What s python good for? What s python not good for? The compulsory features list

More information

CPS343 Parallel and High Performance Computing Project 1 Spring 2018

CPS343 Parallel and High Performance Computing Project 1 Spring 2018 CPS343 Parallel and High Performance Computing Project 1 Spring 2018 Assignment Write a program using OpenMP to compute the estimate of the dominant eigenvalue of a matrix Due: Wednesday March 21 The program

More information

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing Announcements Database Systems CSE 414 HW4 is due tomorrow 11pm Lectures 18: Parallel Databases (Ch. 20.1) 1 2 Why compute in parallel? Multi-cores: Most processors have multiple cores This trend will

More information

CSC 273 Data Structures

CSC 273 Data Structures CSC 273 Data Structures Lecture 6 - Faster Sorting Methods Merge Sort Divides an array into halves Sorts the two halves, Then merges them into one sorted array. The algorithm for merge sort is usually

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Introduction to Functional Programming in Java 8

Introduction to Functional Programming in Java 8 1 Introduction to Functional Programming in Java 8 Java 8 is the current version of Java that was released in March, 2014. While there are many new features in Java 8, the core addition is functional programming

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

Dremel: Interactive Analysis of Web-Scale Datasets

Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Presented by: Sameer Agarwal sameerag@cs.berkeley.edu

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

CS 261 Fall Mike Lam, Professor. Threads

CS 261 Fall Mike Lam, Professor. Threads CS 261 Fall 2017 Mike Lam, Professor Threads Parallel computing Goal: concurrent or parallel computing Take advantage of multiple hardware units to solve multiple problems simultaneously Motivations: Maintain

More information

It also suggests creating more reduce tasks to deal with problems in keeping reducer inputs in memory.

It also suggests creating more reduce tasks to deal with problems in keeping reducer inputs in memory. Reference 1. Processing Theta-Joins using MapReduce, Alper Okcan, Mirek Riedewald Northeastern University 1.1. Video of presentation on above paper 2. Optimizing Joins in map-reduce environment F.N. Afrati,

More information

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System

Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System Seunghwa Kang David A. Bader 1 A Challenge Problem Extracting a subgraph from

More information

MapReduce. Stony Brook University CSE545, Fall 2016

MapReduce. Stony Brook University CSE545, Fall 2016 MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining

More information

Adaptive System Infrastructure for Ultra-Large. Large-Scale Systems. SMART Conference, Thursday, March 6 th, 2008

Adaptive System Infrastructure for Ultra-Large. Large-Scale Systems. SMART Conference, Thursday, March 6 th, 2008 Adaptive System Infrastructure for Ultra-Large Large-Scale Systems SMART Conference, Thursday, March 6 th, 2008 Dr. Douglas C. Schmidt d.schmidt@vanderbilt.edu www.dre.vanderbilt.edu/~schmidt Institute

More information