Master-Worker pattern

Similar documents
Master-Worker pattern

Expectation Maximization!

modern database systems lecture 10 : large-scale graph processing

Introduction to MapReduce

Parallel K-means Clustering. Ajay Padoor Chandramohan Fall 2012 CSE 633

Cluster Evaluation and Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University

COSC 6374 Parallel Computation. Edgar Gabriel Fall Each student should deliver Source code (.c file) Documentation (.pdf,.doc,.tex or.

Distributed Memory Parallel Programming

What is Performance for Internet/Grid Computation?

COSC 6397 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2015.

A Comparative study of Clustering Algorithms using MapReduce in Hadoop

732A54/TDDE31 Big Data Analytics

COSC 6339 Big Data Analytics. Fuzzy Clustering. Some slides based on a lecture by Prof. Shishir Shah. Edgar Gabriel Spring 2017.

COSC 6385 Computer Architecture - Multi Processor Systems

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

A Review of K-mean Algorithm

Comparative studyon Partition Based Clustering Algorithms

PROCESSES AND THREADS THREADING MODELS. CS124 Operating Systems Winter , Lecture 8

CONCURRENT DISTRIBUTED TASK SYSTEM IN PYTHON. Created by Moritz Wundke

Department of Computer Science San Marcos, TX Report Number TXSTATE-CS-TR Clustering in the Cloud. Xuan Wang

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

Moore s Law. Computer architect goal Software developer assumption

Map Reduce Group Meeting

Scaling Distributed Machine Learning with the Parameter Server

CS 61C: Great Ideas in Computer Architecture. MapReduce

MOHA: Many-Task Computing Framework on Hadoop


Introduction to Machine Learning CMU-10701

L9: Hierarchical Clustering

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

CHAPTER 4: CLUSTER ANALYSIS

COSC 6339 Big Data Analytics. Graph Algorithms and Apache Giraph

CLUSTERING BIG DATA USING NORMALIZATION BASED k-means ALGORITHM

MapReduce. Kiril Valev LMU Kiril Valev (LMU) MapReduce / 35

COSC 6385 Computer Architecture - Memory Hierarchies (II)

Clustering Algorithms for general similarity measures

Clustering: K-means and Kernel K-means

1 Introduction to Parallel Computing

Clustering. (Part 2)

Lecture 9: Load Balancing & Resource Allocation

The Design of a State Machine of Controlling the ALICE s Online-Offline Compute Platform. Sirapop Na Ranong

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

L22: SC Report, Map Reduce

Introduction to Parallel Computing

Lecture 11 Hadoop & Spark

Developing MapReduce Programs

Message-Passing Shared Address Space

Pregel. Ali Shah

Cluster Analysis: Agglomerate Hierarchical Clustering

CLUSTERING. CSE 634 Data Mining Prof. Anita Wasilewska TEAM 16

Parallel Computing: MapReduce Jin, Hai

Distributed File Systems II

Clustering CS 550: Machine Learning

Parallel Computing Concepts. CSInParallel Project

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Types of general clustering methods. Clustering Algorithms for general similarity measures. Similarity between clusters

Parallel Numerics, WT 2013/ Introduction

COSC 6374 Parallel Computation. Analytical Modeling of Parallel Programs (I) Edgar Gabriel Fall Execution Time

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Introduction to Artificial Intelligence

k-means Clustering Todd W. Neller Gettysburg College Laura E. Brown Michigan Technological University

COSC 6385 Computer Architecture - Memory Hierarchies (III)

Large-Scale GPU programming

COSC 6339 Big Data Analytics. Introduction to Spark. Edgar Gabriel Fall What is SPARK?

Understanding Parallelism and the Limitations of Parallel Computing

Mounica B, Aditya Srivastava, Md. Faisal Alam

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

Fast Fuzzy Clustering of Infrared Images. 2. brfcm

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Similarities and Differences Between Parallel Systems and Distributed Systems

Huge market -- essentially all high performance databases work this way

Apache Flink. Alessandro Margara

Ceph: A Scalable, High-Performance Distributed File System

Clustering & Classification (chapter 15)

k-means Clustering David S. Rosenberg April 24, 2018 New York University

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

Parallelization Principles. Sathish Vadhiyar

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark

Parallel Algorithms K means Clustering

Clustering. Partition unlabeled examples into disjoint subsets of clusters, such that:

Outline. Speedup & Efficiency Amdahl s Law Gustafson s Law Sun & Ni s Law. Khoa Khoa học và Kỹ thuật Máy tính - ĐHBK TP.HCM

PuLP. Complex Objective Partitioning of Small-World Networks Using Label Propagation. George M. Slota 1,2 Kamesh Madduri 2 Sivasankaran Rajamanickam 1

1 Case study of SVM (Rob)

Clustering. Informal goal. General types of clustering. Applications: Clustering in information search and analysis. Example applications in search

Chapter 3: Processes. Chapter 3: Processes. Process in Memory. Process Concept. Process State. Diagram of Process State

Database Systems CSE 414

ADVANCED MACHINE LEARNING MACHINE LEARNING. Kernel for Clustering kernel K-Means

A FRAMEWORK ARCHITECTURE FOR SHARED FILE POINTER OPERATIONS IN OPEN MPI

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Work Queue + Python. A Framework For Scalable Scientific Ensemble Applications

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

CompSci 516: Database Systems

VM and I/O. IO-Lite: A Unified I/O Buffering and Caching System. Vivek S. Pai, Peter Druschel, Willy Zwaenepoel

Parallel Programming in C with MPI and OpenMP

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

SBKMA: Sorting based K-Means Clustering Algorithm using Multi Machine Technique for Big Data

Distributed Systems Theory 4. Remote Procedure Call. October 17, 2008

The Google File System

Transcription:

COSC 6397 Big Data Analytics Master Worker Programming Pattern Edgar Gabriel Fall 2018 Master-Worker pattern General idea: distribute the work among a number of processes Two logically different entities: master process: manages assignments to worker processes worker processes executing a task assigned to them by the master Communication only occurs between master process and worker processes No direct worker to worker communication possible 1

Master-Worker framework Master Process Worker Process 1 Worker Process 2 Result queue Task queue Master process pseudo code for each worker w=0 no. of workers { f = FETCH next piece of work SEND f to w while ( not done) { RECEIVE result from any process w = process who sent result STORE result f = FETCH next piece of work if f!= no more work left SEND f to w else SEND termination message to w 2

Master process List of things to worry about: Number of workers: Want to maximize performance -> many workers Too many workers: master process can become the bottleneck Work distribution: too little work per worker can lead to large management overhead for the master SEND/RECEIVE can be implemented using any communication paradigm (sockets, MPI, http, files on a shared filesystem, ) More than one message might be required per communication step Master process Support for process failure Master needs to maintain which work items have been assigned to which worker If a worker fails, work is reassigned to new worker Resilience of master can be achieved through reliable back-end storage Correctness verification: assign the same work to more than one worker compare results obtained 3

Worker process pseudo code while ( not done ) { RECEIVE work f from master If f equals termination signal exit else result r = execute work on f SEND r to master Worker process Dynamically generating worker instances vs. using a fixed number of workers Worker can support multiple functions Additional message might be required to identify the task that the worker needs to execute 4

1 st Example: Word Count Application counting the number of occurrences of each word in a given input file Input: a file with n number of lines Output: list of words and the number of occurrences Two step approach: Step 1: Each worker gets one/a fixed no. of line(s) of the file Worker returns a list of <word, #occurrences> to master. Master writes list of words, #occurrences into a orary file Step 2: the same word will appear multiple times in the orary file Sort orary file Add the number of occurrences of the same word Write word and final number of occurrences to end file Word count step 1 for each worker w=0 no. of workers { f = READLINE (file) SEND f to w while ( not done) { RECEIVE number of elements in the list returned allocate orary buffer to receive list RECEIVE list <words, occurrences> WRITE list of <words, occurrences> to orary file release orary buffer f = READLINE (file) if f!= EOF SEND f to w else SEND termination message to w 5

Word count step 2 SORT orary file based on word <word, occurrence> = READ (orary file); currentword = word; currentcount = occurrence; while ( not done) { <word, occurrence> = READ (orary file); if ( word!= currentword ) { WRITE (outputfile, currentword, currentcount) currentword = word; currentcount = 0; currentcount += occurrence; Alternative design possibilities Master searches for word occurrence and adds value received from worker process immediately Avoids orary file Requires large number of search operations in the file Only final output file is sorted Worker process does not return list of <word, occurrence> for each line received Just signals that its ready for a new assignment returns only final list after termination signal 6

Alternative design possibilities (II) Using workers for step 2 often also necessary Using workers for sorting Requires sending large data volumes Requires direct worker to worker communication -> virtually impossible in a classic master-worker setting -> Amdahl s law strikes! Aggregation of multiple entries for the same word can be distributed to workers 2 nd Example: k-means clustering An unsupervised clustering algorithm K stands for number of clusters, typically a user input to the algorithm Iterative algorithm Might obtain only a local minimum Works only for numerical data x 1,, x N are data points Each data point (vector x i ) will be assigned to exactly one cluster Dissimilarity measure: Euclidean distance metric 7

K-means Algorithm C(i): cluster that data point x i is currently assigned to For a current set of cluster means m k, assign each data point to exactly one cluster: C( i) arg min x m 2 i k, i 1,, 1 k K N k : number of data points in cluster k For a given cluster assignment C of the data points, compute the cluster means m k : x i i: C( i) k mk, k 1,, K. Nk Iterate above two steps until convergence N K-means clustering: sequential version Input : X = {x 1,,x k Instances to be clustered K : Number of clusters Output M = {m 1,,m k :cluster centroids m: X C : cluster membership Algorithm Set initial value for M while m has changed c n= 0, n = 1,, k N n = 0, n = 1,, k for each x ϵ X C(j) = min distance (x j, c n ), n = 1,, k c n += x j N n ++ end recompute C based on c and N 8

K-means clustering Initial cluster means Using some other/simpler pre-clustering algorithms Random values Random values difficult when using multiple processes Pseudo random number generators Master process creates random numbers and distributes it to workers K-means example Lets assume 2-D problem, 6 data points, 2 clusters x 0 = 0.6 0.325 x 1 = 1.0 0.87 x 2 = 0.35 0.5 x 3 = 0.2 0.7 x 4 = 0.9 0.65 x 5 = 0.85 0.9 Initial cluster centroids (chosen arbitrarily) m 0 = 0.8 0.4 m 1 = 1.1 1.0 9

K-means example Distance calculation, e.g. data point 0 to cluster centroid 0 d 0 0 = (x 0 (0) m 0 (0)) 2 +(x 0 1 m 0 (1)) 2 = (0.6 0.8) 2 +(0.325 0.4) 2 = 0.2136 Distance 0 1 2 3 4 5 Cluster 0 0.2136 0.510784 0.460977 0.67082 0.269258 0.502494 Cluster 1 0.840015 0.164012 0.901388 0.948683 0.403113 0.269258 Chosen cluster 0 1 0 0 0 1 Distance 0 1 2 3 4 5 Cluster 0 0.2136 0.510784 0.460977 0.67082 0.269258 0.502494 Cluster 1 0.840015 0.164012 0.901388 0.948683 0.403113 0.269258 Chosen cluster 0 1 0 0 0 1 C 0 = x0 + x 2 + x 3 + x 4 = 0.6 0.325 + 0.35 0.5 + 0.2 0.7 + 0.9 0.65 = 2.05 2.175 C 1 = x1 + x 5 = 1.0 0.87 + 0.85 0.9 = 1.85 1.77 N 0 = 4 N 1 = 2 m 0 = 2.05 4 = 0.5125 2.175 = 0.54375 4 m 1 = 1.85 2 = 0.925 1.77 2 = 0.885 10

Master-worker k-means m= randomly generated initial cluster means x = data points for each worker w=0 no. of workers { SEND portion of x for worker w Do { for each worker w=0 no. of workers { SEND m to w for each worker w=0 no. of workers { RECEIVE C from w Calculate N and c for all cluster Recalculate m while (m has changed ) 11

Master worker k-means Portion of data points x assigned to a worker remains constant -> has to be sent only once Need to separate loops to send m and receive C to avoid serialization of the worker processes Problem is strongly synchronized All worker processes have to finish before master process can recalculate N k, c and m Calculating N k, c and m sequential in this version -> Amdahl s law strikes! Worker 0 Worker 1 x 0 = 0.6 0.325 x 1 = 1.0 0.87 x 2 = 0.35 0.5 x 3 = 0.2 0.7 x 4 = 0.9 0.65 x 5 = 0.85 0.9 m 0 = 0.8 0.4 m 1 = 1.1 1.0 m 0 = 0.8 0.4 m 1 = 1.1 1.0 Distance 0 1 2 Distance 3 4 5 Cluster 0 0.2136 0.510784 0.460977 Cluster 0 0.67082 0.269258 0.502494 Cluster 1 0.840015 0.164012 0.901388 Cluster 1 0.948683 0.403113 0.269258 Chosen cluster 0 1 0 Chosen cluster 0 0 1 N 0 =2 N 0 =2 C 0 = x0 + x 2 C 0 = x3 + x 4 C 1 = x1 N 1 = 1 C 1 = x5 N 1 = 1 Master: C 0 = C 0 of Worker 0 + C 0 of Worker 1 N 0 = N 0 of Worker 0 + N 0 of Worker 1 recalculate m 0 and m 1 send new m 0 and m 1 to Worker 0 and Worker 1 12

Summary Master worker pattern useful for a number of application scenarios Mostly trivially parallel problems Supports dynamic load balancing A more powerful worker will return work faster and get more work assigned than a slow worker Well suited for heterogeneous environments Easy to integrate fault tolerance Not well suited for synchronized problems Amdahl s law will prevent the model from scaling up to extreme problems 13