Scaling Up & Out. Haidar Osman

Similar documents
Reactive App using Actor model & Apache Spark. Rahul Kumar Software

CS Programming Languages: Scala

Beyond MapReduce: Apache Spark Antonino Virgillito

Data Engineering. How MapReduce Works. Shivnath Babu

The detailed Spark programming guide is available at:

Spark and Spark SQL. Amir H. Payberah. SICS Swedish ICT. Amir H. Payberah (SICS) Spark and Spark SQL June 29, / 71

Big Data Analytics with Apache Spark. Nastaran Fatemi

RDDs are the primary abstraction in Spark RDDs are distributed collections of objects spread across the nodes of a clusters

Spark, Shark and Spark Streaming Introduction

08/04/2018. RDDs. RDDs are the primary abstraction in Spark RDDs are distributed collections of objects spread across the nodes of a clusters

Spark Streaming. Guido Salvaneschi

2/26/2017. RDDs. RDDs are the primary abstraction in Spark RDDs are distributed collections of objects spread across the nodes of a clusters

Spark Overview. Professor Sasu Tarkoma.

An Introduction to Apache Spark

CSE 444: Database Internals. Lecture 23 Spark

COSC 6339 Big Data Analytics. Introduction to Spark. Edgar Gabriel Fall What is SPARK?

Reliable Asynchronous Communication in Distributed Systems

Object-oriented programming

Temporal Random Testing for Spark Streaming

An Introduction to Apache Spark

Chapter 4: Apache Spark

Distributed Computing with Spark and MapReduce

Scala. Introduction. Scala

RESILIENT DISTRIBUTED DATASETS: A FAULT-TOLERANT ABSTRACTION FOR IN-MEMORY CLUSTER COMPUTING

Index. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /

Lecture 30: Distributed Map-Reduce using Hadoop and Spark Frameworks

Processing of big data with Apache Spark

Spark Tutorial. General Instructions

High-level Parallel Programming: Scala on the JVM

Big data systems 12/8/17

Covers Scala 2.10 IN ACTION. Nilanjan Raychaudhuri. FOREWORD BY Chad Fowler MANNING

Fast, Interactive, Language-Integrated Cluster Computing

25/05/2018. Spark Streaming is a framework for large scale stream processing

Akka. Reactive Applications made easy

Apache Spark Internals

MapReduce, Hadoop and Spark. Bompotas Agorakis

15.1 Data flow vs. traditional network programming

CMPT 435/835 Tutorial 1 Actors Model & Akka. Ahmed Abdel Moamen PhD Candidate Agents Lab

Data-intensive computing systems

Getting Started with Kotlin. Commerzbank Java Developer Day

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

An Introduction to Big Data Analysis using Spark

COMP 322: Fundamentals of Parallel Programming. Lecture 37: Distributed Computing, Apache Spark

Goals Pragmukko features:

Outline. CS-562 Introduction to data analysis using Apache Spark

Lecture 11 Hadoop & Spark

Parallel Processing Spark and Spark SQL

An Introduction to Apache Spark

Introduction to Apache Spark

Big Data Infrastructures & Technologies

Introduction to Apache Spark. Patrick Wendell - Databricks

Spark. Cluster Computing with Working Sets. Matei Zaharia, Mosharaf Chowdhury, Michael Franklin, Scott Shenker, Ion Stoica.

CS Spark. Slides from Matei Zaharia and Databricks

Massive Online Analysis - Storm,Spark

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

2/4/2019 Week 3- A Sangmi Lee Pallickara

Programming Systems for Big Data

Distributed Systems. 22. Spark. Paul Krzyzanowski. Rutgers University. Fall 2016

Evolution of an Apache Spark Architecture for Processing Game Data

Executive Summary. It is important for a Java Programmer to understand the power and limitations of concurrent programming in Java using threads.

Microservices. Webservices with Scala (II) Microservices

BSP, Pregel and the need for Graph Processing

Introduction to Spark

CS/ENGRD 2110 SPRING Lecture 5: Local vars; Inside-out rule; constructors

Big Streaming Data Processing. How to Process Big Streaming Data 2016/10/11. Fraud detection in bank transactions. Anomalies in sensor data

Matrix Computations and " Neural Networks in Spark

CS 2340 Objects and Design - Scala

Akka Streams and Bloom Filters

State and DStreams. Big Data Analysis with Scala and Spark Heather Miller

Stream Processing on IoT Devices using Calvin Framework

Scalable task distribution with Scala, Akka and Mesos. Dario

Pyspark standalone code

...something useful to do with the JVM.

Introduction to Apache Spark

@h2oai presents. Sparkling Water Meetup

Apache Spark: Hands-on Session A.A. 2016/17

Machine Learning With Spark

DATA SCIENCE USING SPARK: AN INTRODUCTION

Apache Spark: Hands-on Session A.A. 2017/18

Databases and Big Data Today. CS634 Class 22

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

Spark.jl: Resilient Distributed Datasets in Julia

Higher level data processing in Apache Spark

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

EPL660: Information Retrieval and Search Engines Lab 11

The Actor Model. Towards Better Concurrency. By: Dror Bereznitsky

Guidelines For Hadoop and Spark Cluster Usage

Corpus methods in linguistics and NLP Lecture 7: Programming for large-scale data processing

Distributed Systems. Distributed computation with Spark. Abraham Bernstein, Ph.D.

Big Data Analytics with Hadoop and Spark at OSC

CS435 Introduction to Big Data FALL 2018 Colorado State University. 10/24/2018 Week 10-B Sangmi Lee Pallickara

Exercise 6 Multiple Inheritance, Multiple Dispatch and Linearization November 3, 2017

CS60021: Scalable Data Mining. Sourangshu Bhattacharya

Favoring Isolated Mutability The Actor Model of Concurrency. CSCI 5828: Foundations of Software Engineering Lecture 24 04/11/2012

A Tutorial on Apache Spark

In-Memory Processing with Apache Spark. Vincent Leroy

Akka: Simpler Concurrency, Scalability & Fault-tolerance through Actors. Jonas Bonér Viktor Klang

Spark. In- Memory Cluster Computing for Iterative and Interactive Applications

Cloud Computing & Visualization

Spark Streaming. Professor Sasu Tarkoma.

Transcription:

Scaling Up & Out Haidar Osman

1- Crash course in Scala - Classes - Objects 2- Actors - The Actor Model - Examples I, II, III, IV 3- Apache Spark - RDD & DAG - Word Count Example 2

1- Crash course in Scala - Classes - Objects 2- Actors - The Actor Model - Examples I, II, III, IV 3- Apache Spark - RDD & DAG - Word Count Example 3

public class Person { private String name; private String address; public Person(String name, String address) { this.name = name; this.address = address; System.out.println("Person object is constructed"); public String getname() { return name; public String getaddress() { return address;

public class Person { private String name; private String address; public Person(String name, String address) { this.name = name; this.address = address; System.out.println("Person object is constructed"); public String getname() { return name; public String getaddress() { return address; class Person {

public class Person { private String name; private String address; public Person(String name, String address) { this.name = name; this.address = address; System.out.println("Person object is constructed"); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null

public class Person { private String name; private String address; public Person(String name, String address) { this.name = name; this.address = address; System.out.println("Person object is constructed"); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null def this(name: String, address: String) { this() this.name = name this.address = address println("person object is constructed")

public class Person { private String name; private String address; public Person(String name, String address) { this.name = name; this.address = address; System.out.println("Person object is constructed"); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null def this(name: String, address: String) { this() this.name = name this.address = address println("person object is constructed") def getname(): String = name def getaddress(): String = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null def this(name: String, address: String) { this() this.name = name this.address = address println("person object is constructed") def getname(): String = name def getaddress(): String = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null private val welcomenote: String = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname(): String = name def getaddress(): String = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null private val welcomenote: String = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname(): String = name def getaddress(): String = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null private val welcomenote: String = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname: String = name def getaddress: String = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null private val welcomenote: String = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname: String = name def getaddress: String = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null private val welcomenote = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname = name def getaddress = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { private var name: String = null private var address: String = null private val welcomenote = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname = name def getaddress = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { var name: String = null var address: String = null val welcomenote = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname = name def getaddress = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person { var name: String = null var address: String = null val welcomenote = "Person object is constructed" def this(name: String, address: String) { this() this.name = name this.address = address println(welcomenote) def getname = name def getaddress = address

public class Person { private String name; private String address; private final String welcomenote = "Person object is constructed"; public Person(String name, String address) { this.name = name; this.address = address; System.out.println(welcomeNote); public String getname() { return name; public String getaddress() { return address; class Person (var name: String, var address: String){ val welcomenote = "Person object is constructed" println(welcomenote)

class Person (var name: String, var address: String){ val welcomenote = "Person object is constructed" println(welcomenote)

class Person (var name: String, var address: String){ Person.welcome object Person { def welcome = { val welcomenote = "Person object is constructed" println(welcomenote)

public class MainClass { public static void main(string[] args){ System.out.println("Hello World"); object MainClass { def main(args: Array[String]): Unit = { println("hello World")

class Person (var name: String, var address: String){ Person.welcome object Person { def welcome = { val welcomenote = "Person object is constructed" println(welcomenote)

case class Person (var name: String, var address: String){ Person.welcome object Person { def welcome = { val welcomenote = "Person object is constructed" println(welcomenote)

case class Person (var name: String, var address: String){ Person.welcome object Person { def welcome = { val welcomenote = "Person object is constructed" println(welcomenote) object MainClass { def main(args: Array[String]): Unit = { val person = new Person("John","My Address") person match { case Person("John", address) => println("john's adress: " + address) case _ => println("not John")

1- Crash course in Scala - Classes - Objects 2- Actors - The Actor Model - Examples I, II, III, IV 3- Apache Spark - RDD & DAG - Word Count Example 25

What are actors? > The Actors Model is another model for expressing computational problems. It is equivalent to Turing Machine and Lambda Calculus > Actors are not new. It is a solid and scientifically robust idea (Carl Hewitt 1973). 26

What are actors? Actors are fundamental units of computation that embody: Processing (behavior) Storage (state) Communication (messages) When an actor receives a message, it can: create more actors send messages to other actors designate what to do with the next message Actor Behavior State 27

Rules of bare-metal actors > Every actor has an address. > An actor can process one message at a time. > Everything in an actor system is an actor. > Message delivery is best-effort > There are no guarantees that messages are received in the same order they are sent. > Messages to actors should always be immutable to avoid shared state. 28

Example 1 - Simple Actor Definition import akka.actor.actor class Counter extends Actor { var count = 0 def receive = { case "increase" => count += 1 case "get" => sender()! count case _ => println("unknown message") 29

Example 1 - Simple Actor Definition import akka.actor.actor class StatelessCounter extends Actor { def count(n:int): Receive = { case "increase" => context.become(count(n+1)) case "get" => sender()! n def receive = count(0) 30

Example 1 - Simple Actor Definition import akka.actor.actor class StatelessCounter extends Actor { def count(n:int): Receive = { case "increase" => context.become(count(n+1)) case "get" => sender()! n def receive = count(0) for the next message, receive = count(n+1) 31

Example 1 - Simple Actor Definition import akka.actor.actor class StatelessCounter extends Actor { def count(n:int): Receive = { case "increase" => context.become(count(n+1)) case "get" => sender()! n def receive = count(0) type Receive = PartialFunction[Any, Unit] 32

Example 1 - Simple Actor Definition import akka.actor.{actorsystem, Props object Application extends App { val system = ActorSystem("SimpleActorSystem") val counter = system.actorof(props[counter], "counter") counter! "increase" counter! "increase" counter! "get" 33

Example 2 - Ping Pong import akka.actor.{actor, ActorRef, ActorSystem, PoisonPill, Props import scala.language.postfixops case object Ping case object Pong class Pinger extends Actor { var countdown = 5 def receive = { case Pong => println(s"${self.path received pong, count down $countdown") if (countdown > 0) { countdown -= 1 sender()! Ping else { sender()! PoisonPill self! PoisonPill 34

Example 2 - Ping Pong class Ponger(pinger: ActorRef) extends Actor { def receive = { case Ping => println(s"${self.path received ping") pinger! Pong object PingPongApplication extends App { val system = ActorSystem("PingPong") val pinger = system.actorof(props[pinger], "pinger") val ponger = system.actorof(props(classof[ponger], pinger), "ponger") ponger! Ping 35

Example 2 - Ping Pong akka://pingpong/user/ponger received ping akka://pingpong/user/pinger received pong, count down 5 akka://pingpong/user/ponger received ping akka://pingpong/user/pinger received pong, count down 4 akka://pingpong/user/ponger received ping akka://pingpong/user/pinger received pong, count down 3 akka://pingpong/user/ponger received ping akka://pingpong/user/pinger received pong, count down 2 akka://pingpong/user/ponger received ping akka://pingpong/user/pinger received pong, count down 1 akka://pingpong/user/ponger received ping akka://pingpong/user/pinger received pong, count down 0 36

Example 3 - Approximating π using Leibniz formula = π / 4 π = 4 * (1-1/3 + 1/5-1/7 + 1/9-1/11 + 1/13-1/15 + ) 37

Example 3 - Approximating π using Leibniz formula = π / 4 π = 4 * (1-1/3 + 1/5-1/7 + 1/9-1/11 + 1/13-1/15 + ) 38

Example 3 - Designing The Actor System Master Workers 39

Example 3 - Designing The Actor System Calculate Master Workers 40

Example 3 - Designing The Actor System Work Work Master Work Workers 41

Example 3 - Designing The Actor System Result Result Master Result Workers 42

Example 3 - Designing The Actor System PiApproximation Listener Master Workers 43

Example 3 - Approximating π using Leibniz formula π = 4 * (1-1/3 + 1/5-1/7 + 1/9-1/11 + 1/13-1/15 + ) 44

Example 3 - Approximating π using Leibniz formula π = 4 * (1-1/3 + 1/5-1/7 + 1/9-1/11 + 1/13-1/15 + ) worker1 worker2 worker3 worker1 45

Example 3 - Approximating π using Leibniz formula System configurations: Number of workers Number of messages Number of elements per message π = 4 * (1-1/3 + 1/5-1/7 + 1/9-1/11 + 1/13-1/15 + ) worker1 worker2 worker3 worker1 46

Example 3 - Messages Calculate PiApproximation Work Result Work Result worker1 worker2 Listener Master Work Result worker3 47

Example 3 - Worker 48

Example 3 - Worker = π / 4 49

Example 3 - Master 1/2 50

Example 3 - Master 2/2 51

Example 3 - Listener 52

Example 3 - Putting It Together 53

With actors > It is easier to build scalable software Scalability is a deployment problem > It is a convenient abstraction for modeling Similarly to OOP > Fault tolerance can be implemented on top Actors can fail and get replaced at run time 54

1- Crash course in Scala - Classes - Objects 2- Actors - The Actor Model - Examples I, II, III, IV 3- Apache Spark - RDD & DAG - Word Count Example 55

High-Level Spark Architecture 58

Resilient Distributed Dataset > RDD is a partitioned collection of records > RDD is read-only (immutable) > RDDs can only be created through deterministic operations on: > data from stable storage, or > other RDDs Transformation Transformation RDD RDD RDD 59

We call these deterministic operations transformations. Examples of transformations include map, filter, and join. RDDs do not need to be materialized at all times. Instead, an RDD has enough information about how it was derived from other datasets (its lineage) to compute its partitions from data in stable storage.

Directed Acyclic Graph (DAG) Transformation Action RDD RDD RDD data RDD RDD RDD data RDD RDD 61

Word Count Example val conf = new SparkConf().setMaster( local[2]").setappname("wordcounter") val sparkcontext = new SparkContext(conf) val lines = sparkcontext.textfile("/var/log/commerce.log") val words = lines.flatmap(line => line.split(" ")) val pairs = words.map(word => (word, 1)) val wordcounts = pairs.reducebykey((count1,count2)=>count1+count2) val results = wordcounts.take(100) results.foreach(resultpair => println(resultpair)) RDD[String] RDD[(String, Int)] Array[(String, Int)] word lines words pairs results Counts RDD[String] RDD[(String, Int)]

Word Count Example val conf = new SparkConf().setMaster( local[2]").setappname("wordcounter") val sparkcontext = new SparkContext(conf) val lines = sparkcontext.textfile("/var/log/commerce.log") val words = lines.flatmap(line => line.split(" ")) val pairs = words.map(word => (word, 1)) val wordcounts = pairs.reducebykey((count1,count2)=>count1+count2) val results = wordcounts.take(100) results.foreach(resultpair => println(resultpair)) RDD[String] RDD[(String, Int)] Array[(String, Int)] word lines words pairs results Counts RDD[String] RDD[(String, Int)]

Word Count Example val conf = new SparkConf().setMaster( local[2]").setappname("wordcounter") val sparkcontext = new SparkContext(conf) val lines = sparkcontext.textfile("/var/log/commerce.log") val words = lines.flatmap(line => line.split(" ")) val pairs = words.map(word => (word, 1)) val wordcounts = pairs.reducebykey((count1,count2)=>count1+count2) val results = wordcounts.take(100) results.foreach(resultpair => println(resultpair)) RDD[String] RDD[(String, Int)] Array[(String, Int)] word lines words pairs results Counts RDD[String] RDD[(String, Int)]

Word Count Example val conf = new SparkConf().setMaster( local[2]").setappname("wordcounter") val sparkcontext = new SparkContext(conf) val lines = sparkcontext.textfile("/var/log/commerce.log") val words = lines.flatmap(line => line.split(" ")) val pairs = words.map(word => (word, 1)) val wordcounts = pairs.reducebykey((count1,count2)=>count1+count2) val results = wordcounts.take(100) results.foreach(resultpair => println(resultpair)) RDD[String] RDD[(String, Int)] Array[(String, Int)] word lines words pairs results Counts RDD[String] RDD[(String, Int)]

References > The original actor paper [Hewitt73]: A universal modular ACTOR formalism for artificial intelligence > Hewitt explaining actors: https://channel9.msdn.com/shows/going+deep/hewitt-meijer-and-szyperski-the- Actor-Model-everything-you-wanted-to-know-but-were-afraid-to-ask > Akka Tutorial: https://doc.akka.io/docs/akka/2.0/intro/getting-started-first-scala.html > Scaling up and out with actors: https://www.youtube.com/watch?v=3jbqtxstlc4 > RDD original paper: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In- Memory Cluster Computing 66

Exercise > Implement a naïve actor framework in Java where: each actor is a monitor Java class each actor has a mailbox where messages from other actors are saved messages are asynchronous > Using your framework, reimplement the three examples in the lecture. 67