Algorithm Design for MapReduce

Size: px
Start display at page:

Download "Algorithm Design for MapReduce"

Transcription

1 Algorithm Design for MapReduce CprE 419X, Spring 2014 Iowa State University Srikanta Tirthapura 2/4/14 CprE 419 X, Srikanta Tirthapura 1

2 Problem 0: Sum Find the sum and average of many input integers Write the map and reduce pseudocode What is the map cost, reduce cost, per reducer cost, and communication cost of your method? 2/4/14 CprE 419 X, Srikanta Tirthapura 2

3 Problem 1: Set Difference Input: Two sets A and B Output: Set difference (A B), i.e. all those elements that are present in A but not in B Ex: Find all IP addresses that appeared in one log but not the other What is the map cost, reduce cost, per reducer cost, and communication cost of your method? 2/4/14 CprE 419 X, Srikanta Tirthapura 3

4 Set Difference MR Algorithm map(key = set_name, value = elementid) emit (key = elementid, value = set_name) reduce (key = elementid, values) if (A in values) and (B not in values): emit(key = elementid, value = 1) 2/4/14 CprE 419 X, Srikanta Tirthapura 4

5 Observation Also works if A and B are multisets (i.e. the same value appears multiple times within A and B) 2/4/14 CprE 419 X, Srikanta Tirthapura 5

6 Analysis Is the set difference algorithm good? How much does it cost? Not an easy question to answer Consider: Per Mapper Cost, Total Map Cost Per Reducer Cost, Total Reduce Cost Total Bytes of Communication 2/4/14 CprE 419 X, Srikanta Tirthapura 6

7 Set Difference Let n = input size ( A + B ), M = number of mappers, R = number of reducers Total Map Cost = Theta(n) Per Mapper Cost = Theta(n/M) Total Reduce Cost = Theta(n) Per Reducer Cost = Theta(n/R) Total Communication Cost = Theta(n) 2/4/14 CprE 419 X, Srikanta Tirthapura 7

8 Problem 2: Matrix-Vector Input: n x n matrix M Multiplication M[i,j] provided as (i,j,m[i,j]) within a HDFS file rows numbered 1.. n, columns similar Element at row i and column j denoted M[i,j] n x 1 vector A A[j] provided as (j,a[j]) within a HDFS file Output: Vector B = M A B[i] = M[i,1] * A[1] + M[i,2] * A[2] +. 2/4/14 CprE 419 X, Srikanta Tirthapura 8

9 Matrix-Vector Using Mapreduce Suppose reduce key was i, i=1 n Reduce function for key i computes B[i] Needs values M[i,1]*A[1], M[i,2]*A[2], How do we compute these? Computing M[i,j] * A[j] Use j as the key in one MapReduce round 2/4/14 CprE 419 X, Srikanta Tirthapura 9

10 Matrix-Vector MR Round 1 map1(key, value = (i,j,m[i,j])) emit (key = j, value = (i,j,m[i,j])) map1(key, value = (j,a[j])) emit (key = j, value = A[j]) reduce1(key = j, values = [A[j], M[1,j], [2,j],..]) for (i = 1 to n): emit(i, M[i,j] * A[j]) 2/4/14 CprE 419 X, Srikanta Tirthapura 10

11 Matrix-Vector MR Round 2 map2(key = i, value) emit (key = i, value) reduce2(key = i, values) B[i] = 0 for (v in values): B[i] += v emit (key = i, value = B[i]) 2/4/14 CprE 419 X, Srikanta Tirthapura 11

12 Matrix-Vector Analysis First MR Round. Total Map cost = O(n 2 +n) = O(n 2 ) Per Mapper Cost = O(n 2 /M) Total Reduce Cost = O(n 2 ) Per Reducer Cost = O(n 2 /R) Communication = O(n 2 ) Second Round is Similar Q: One Round Algorithm for Matrix-Vector using MapReduce? 2/4/14 CprE 419 X, Srikanta Tirthapura 12

13 Problem 3: Length 2 Paths in a Graph Input: A graph G presented as a list of edges Output: All paths of length 2 in the graph Solution: Similar to the problem of enumerating triangles in a graph 2/4/14 CprE 419 X, Srikanta Tirthapura 13

14 Problem 4: Finding Pairs of Nearby Bit Strings Input: Set of bitstrings, each of length k Output: All pairs of bit strings such that the two strings in the pair differ at no more than two positions, i.e. at a Hamming distance of no more than 2. 2/4/14 CprE 419 X, Srikanta Tirthapura 14

15 Example: Nearby Bit Strings Input: 10010, 00010, 11000, 11110, Output: (10010, 00010), (10010, 11000) (10010, 11110), (10010, 00000) (00010, 00000), (11000, 11110) (11000, 00000) 2/4/14 CprE 419 X, Srikanta Tirthapura 15

16 Algorithm 1 Compare all pairs of strings and see if they are within a Hamming Distance of 2 Mapper: For each string b, Send a copy of b to each reducer Reducer i: Receives entire set S Computes a subset S i Examines all pairs in S i x S 2/4/14 CprE 419 X, Srikanta Tirthapura 16

17 Algorithm 1 Analysis Let n = total number of input strings. M = # of mappers, R = # of reducers Total Map Cost = O(nkR) Per Mapper Cost = O(nkR/M) Total Reducer Cost = O(n 2 k) Per Reducer Cost = O(n 2 k/r) Total Communication = O(nkR) 2/4/14 CprE 419 X, Srikanta Tirthapura 17

18 Algorithm 2 For each bitstring b, there are only (k choose 2) strings at Hamming distance 2 (k choose 1) strings at Hamming distance 1 Search all these possibilities 2/4/14 CprE 419 X, Srikanta Tirthapura 18

19 Algorithm 2 Map(key, value = bitstring b): emit (key = b, value = b) for each b formed by flipping 1 bit of b: emit (key = b, value = b) Reduce(key = bitstring b, values): for any two strings b1, b2 in values: emit(key = (b1, b2), value = 1) 2/4/14 CprE 419 X, Srikanta Tirthapura 19

20 Analysis of Algorithm 2 Let n = total number of input strings, M = # mappers, R = # reducers Total Map Cost = O(nk 2 ) Per Mapper Cost = O(nk 2 /M) Total Reduce Cost = O(kn + output size) Per Reducer Cost = O(kn/R + output size) Total Communication = O(nk 2 ) 2/4/14 CprE 419 X, Srikanta Tirthapura 20

21 Generalization List all pairs of bitstrings that are at a Hamming distance of t or lesser. 2/4/14 CprE 419 X, Srikanta Tirthapura 21

22 Problem 5: Graph Connectivity Input: A graph, presented as a list of edges Output: Yes if graph is connected, and No if it is not connected. 2/4/14 CprE 419 X, Srikanta Tirthapura 22

23 Readings Chapter 2 of Ullman-Rajaraman, Ver 1.2, MapReduce and the New Software Stack Chapter 2 of Hadoop: The Definitive Guide 2/4/14 CprE 419 X, Srikanta Tirthapura 23

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must

More information

Matrix Multiplication in MapReduce. Overview. Matrix Multiplication. .. CSC 369 Distributed Computing Alexander Dekhtyar.. Matrix Multiplication:

Matrix Multiplication in MapReduce. Overview. Matrix Multiplication. .. CSC 369 Distributed Computing Alexander Dekhtyar.. Matrix Multiplication: .. CSC 369 Distributed Computing Alexander Dekhtyar.. Overview Matrix Multiplication: Matrix Multiplication in MapReduce is extremely important in computing. It is critical to a large number of tasks from

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

Rectangle-Efficient Aggregation in Spatial Data Streams

Rectangle-Efficient Aggregation in Spatial Data Streams Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura Iowa State David Woodruff IBM Almaden The Data Stream Model Stream S of additive updates (i, Δ) to an underlying vector v: v

More information

Algorithms for Grid Graphs in the MapReduce Model

Algorithms for Grid Graphs in the MapReduce Model University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Computer Science and Engineering: Theses, Dissertations, and Student Research Computer Science and Engineering, Department

More information

Agenda. Arrays 01/12/2009 INTRODUCTION TO VBA PROGRAMMING. Arrays Matrices.

Agenda. Arrays 01/12/2009 INTRODUCTION TO VBA PROGRAMMING. Arrays Matrices. INTRODUCTION TO VBA PROGRAMMING LESSON6 dario.bonino@polito.it Agenda Matrices 1 Allow to store vectorial data Geometric vectors Sets of data having something in common... Declared as Dim array_name (begin

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Data Partitioning and MapReduce

Data Partitioning and MapReduce Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,

More information

Determining the k in k-means with MapReduce

Determining the k in k-means with MapReduce Algorithms for MapReduce and Beyond 2014 Determining the k in k-means with MapReduce Thibault Debatty, Pietro Michiardi, Wim Mees & Olivier Thonnard Clustering & k-means Clustering K-means [Stuart P. Lloyd.

More information

Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data

More information

Parallel Dijkstra s Algorithm

Parallel Dijkstra s Algorithm CSCI4180 Tutorial-6 Parallel Dijkstra s Algorithm ZHANG, Mi mzhang@cse.cuhk.edu.hk Nov. 5, 2015 Definition Model the Twitter network as a directed graph. Each user is represented as a node with a unique

More information

CS 345A Data Mining. MapReduce

CS 345A Data Mining. MapReduce CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 7: Parallel Computing Cho-Jui Hsieh UC Davis May 3, 2018 Outline Multi-core computing, distributed computing Multi-core computing tools

More information

INTRODUCTION OF LOOPS

INTRODUCTION OF LOOPS INTRODUCTION OF LOOPS For Loop Example 1: based on an existing array L, create another array R with each element having the absolute value of the corresponding one in array L. Input: L = [11-25 32-2 0

More information

Improving the MapReduce Big Data Processing Framework

Improving the MapReduce Big Data Processing Framework Improving the MapReduce Big Data Processing Framework Gistau, Reza Akbarinia, Patrick Valduriez INRIA & LIRMM, Montpellier, France In collaboration with Divyakant Agrawal, UCSB Esther Pacitti, UM2, LIRMM

More information

MapReduce Algorithm Design

MapReduce Algorithm Design MapReduce Algorithm Design Contents Combiner and in mapper combining Complex keys and values Secondary Sorting Combiner and in mapper combining Purpose Carry out local aggregation before shuffle and sort

More information

CS246: Mining Massive Data Sets Winter Final

CS246: Mining Massive Data Sets Winter Final CS246: Mining Massive Data Sets Winter 2013 Final These questions require thought, but do not require long answers. Be as concise as possible. You have three hours to complete this final. The exam has

More information

c) the set of students at your school who either are sophomores or are taking discrete mathematics

c) the set of students at your school who either are sophomores or are taking discrete mathematics Exercises Exercises Page 136 1. Let A be the set of students who live within one mile of school and let B be the set of students who walk to classes. Describe the students in each of these sets. a) A B

More information

Extreme Computing. Introduction to MapReduce. Cluster Outline Map Reduce

Extreme Computing. Introduction to MapReduce. Cluster Outline Map Reduce Extreme Computing Introduction to MapReduce 1 Cluster We have 12 servers: scutter01, scutter02,... scutter12 If working outside Informatics, first: ssh student.ssh.inf.ed.ac.uk Then log into a random server:

More information

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, June 8, Solutions to Practice Final Examination (Spring 2016)

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, June 8, Solutions to Practice Final Examination (Spring 2016) UCSD ECE54C Handout #2 Prof. Young-Han Kim Thursday, June 8, 27 Solutions to Practice Final Examination (Spring 26) There are 4 problems, each problem with multiple parts, each part worth points. Your

More information

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins

Outline. Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins MapReduce 1 Outline Distributed File System Map-Reduce The Computational Model Map-Reduce Algorithm Evaluation Computing Joins 2 Outline Distributed File System Map-Reduce The Computational Model Map-Reduce

More information

MIT805 BIG DATA MAPREDUCE

MIT805 BIG DATA MAPREDUCE MIT805 BIG DATA MAPREDUCE Christoph Stallmann Department of Computer Science University of Pretoria Admin Part 2 & 3 of the assignment Team registrations Concept Roman Empire Concept Roman Empire Concept

More information

Databases 2 (VU) ( / )

Databases 2 (VU) ( / ) Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:

More information

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare

COMP6237 Data Mining Data Mining & Machine Learning with Big Data. Jonathon Hare COMP6237 Data Mining Data Mining & Machine Learning with Big Data Jonathon Hare jsh2@ecs.soton.ac.uk Contents Going to look at two case-studies looking at how we can make machine-learning algorithms work

More information

Hamming Codes. s 0 s 1 s 2 Error bit No error has occurred c c d3 [E1] c0. Topics in Computer Mathematics

Hamming Codes. s 0 s 1 s 2 Error bit No error has occurred c c d3 [E1] c0. Topics in Computer Mathematics Hamming Codes Hamming codes belong to the class of codes known as Linear Block Codes. We will discuss the generation of single error correction Hamming codes and give several mathematical descriptions

More information

Hadoop Map Reduce 10/17/2018 1

Hadoop Map Reduce 10/17/2018 1 Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018

More information

A gentle introduction to Matlab

A gentle introduction to Matlab A gentle introduction to Matlab The Mat in Matlab does not stand for mathematics, but for matrix.. all objects in matlab are matrices of some sort! Keep this in mind when using it. Matlab is a high level

More information

CSE 4/531 Solution 3

CSE 4/531 Solution 3 CSE 4/531 Solution 3 Edited by Le Fang November 7, 2017 Problem 1 M is a given n n matrix and we want to find a longest sequence S from elements of M such that the indexes of elements in M increase and

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 1: MapReduce Algorithm Design (4/4) January 16, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Databases 2 (VU) ( )

Databases 2 (VU) ( ) Databases 2 (VU) (707.030) Map-Reduce Denis Helic KMI, TU Graz Nov 4, 2013 Denis Helic (KMI, TU Graz) Map-Reduce Nov 4, 2013 1 / 90 Outline 1 Motivation 2 Large Scale Computation 3 Map-Reduce 4 Environment

More information

Enumerating Maximal Bicliques from a Large Graph using MapReduce

Enumerating Maximal Bicliques from a Large Graph using MapReduce 1 Enumerating Maximal Bicliques from a Large Graph using MapReduce Arko Provo Mukherjee, Student Member, IEEE, and Srikanta Tirthapura, Senior Member, IEEE Abstract We consider the enumeration of maximal

More information

Solutions to Midterm 2 - Monday, July 11th, 2009

Solutions to Midterm 2 - Monday, July 11th, 2009 Solutions to Midterm - Monday, July 11th, 009 CPSC30, Summer009. Instructor: Dr. Lior Malka. (liorma@cs.ubc.ca) 1. Dynamic programming. Let A be a set of n integers A 1,..., A n such that 1 A i n for each

More information

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1

Matrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1 Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1 Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that

More information

The Data Link Layer. CS158a Chris Pollett Feb 26, 2007.

The Data Link Layer. CS158a Chris Pollett Feb 26, 2007. The Data Link Layer CS158a Chris Pollett Feb 26, 2007. Outline Finish up Overview of Data Link Layer Error Detecting and Correcting Codes Finish up Overview of Data Link Layer Last day we were explaining

More information

Importing and Exporting Data Between Hadoop and MySQL

Importing and Exporting Data Between Hadoop and MySQL Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for

More information

Complexity Theory for Map-Reduce. Communication and Computation Costs Enumerating Triangles and Other Sample Graphs Theory of Mapping Schemas

Complexity Theory for Map-Reduce. Communication and Computation Costs Enumerating Triangles and Other Sample Graphs Theory of Mapping Schemas Complexity Theory for Map-Reduce Communication and Computation Costs Enumerating Triangles and Other Sample Graphs Theory of Mapping Schemas 1 Coauthors Foto Aftrati, Anish das Sarma, Dimitris Fotakis,

More information

Data Structures CHAPTER 11. Review Questions. Multiple-Choice Questions. Exercises. (Solutions to Odd-Numbered Problems)

Data Structures CHAPTER 11. Review Questions. Multiple-Choice Questions. Exercises. (Solutions to Odd-Numbered Problems) CHAPTER 11 Data Structures (Solutions to Odd-Numbered Problems) Review Questions 1. Arrays, records, and linked lists are three types of data structures discussed in this chapter. 3. Elements of an array

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

The amount of data increases every day Some numbers ( 2012):

The amount of data increases every day Some numbers ( 2012): 1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect

More information

2/26/2017. The amount of data increases every day Some numbers ( 2012):

2/26/2017. The amount of data increases every day Some numbers ( 2012): The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to

More information

Collective Communication Patterns for Iterative MapReduce

Collective Communication Patterns for Iterative MapReduce Collective Communication Patterns for Iterative MapReduce CONTENTS 1 Introduction... 4 2 Background... 6 2.1 Collective Communication... 6 2.2 MapReduce... 7 2.3 Iterative MapReduce... 8 3 MapReduce-MergeBroadcast...

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Pig Design Patterns Hadoop Ctd. Graphs Giraph Spark Zoo Keeper Spark Learning objectives Implement

More information

Computer Science Spring 2005 Final Examination, May 12, 2005

Computer Science Spring 2005 Final Examination, May 12, 2005 Computer Science 302 00 Spring 2005 Final Examination, May 2, 2005 Name: No books, notes, or scratch paper. Use pen or pencil, any color. Use the backs of the pages for scratch paper. If you need more

More information

Distributed Data Deduplication

Distributed Data Deduplication Distributed Data Deduplication Xu Chu University of Waterloo x4chu@uwaterloo.ca Ihab F. Ilyas University of Waterloo ilyas@uwaterloo.ca Paraschos Koutris University of Wisconsin-Madison paris@cs.wisc.edu

More information

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version : Hortonworks HDPCD Hortonworks Data Platform Certified Developer Download Full Version : https://killexams.com/pass4sure/exam-detail/hdpcd QUESTION: 97 You write MapReduce job to process 100 files in HDFS.

More information

Hadoop Lab 3 Creating your first Map-Reduce Process

Hadoop Lab 3 Creating your first Map-Reduce Process Programming for Big Data Hadoop Lab 3 Creating your first Map-Reduce Process Lab work Take the map-reduce code from these notes and get it running on your Hadoop VM Driver Code Mapper Code Reducer Code

More information

CS294-1 Assignment 3 Report

CS294-1 Assignment 3 Report CS294-1 Assignment 3 Report Huasha Zhao and Keling Chen April 18, 2012 1 Problem The task of this assignment is to run some clustering algorithms on a moderately large dataset. The dataset is a recent

More information

Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please)

Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please) Virginia Tech. Computer Science CS 5614 (Big) Data Management Systems Spring 2017, Prakash Homework 3: Map-Reduce, Frequent Itemsets, LSH, Streams (due March 16 th, 9:30am in class hard-copy please) Reminders:

More information

Example : Write a Hadoop MapReduce program for Movie Recommendation System.

Example : Write a Hadoop MapReduce program for Movie Recommendation System. Example : Write a Hadoop MapReduce program for Movie Recommendation System. Object: Using given dataset, find Movie Recommendations using Hadoop MapReduce program. Dataset: Our example is conducted on

More information

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)! Big Data Processing, 2014/15 Lecture 7: MapReduce design patterns!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm

More information

The University of Sydney MATH 2009

The University of Sydney MATH 2009 The University of Sydney MTH 009 GRPH THORY Tutorial 10 Solutions 004 1. In a tournament, the score of a vertex is its out-degree, and the score sequence is a list of all the scores in non-decreasing order.

More information

C2: How to work with a petabyte

C2: How to work with a petabyte GREAT 2011 Summer School C2: How to work with a petabyte Matthew J. Graham (Caltech, VAO) Overview Strategy MapReduce Hadoop family GPUs 2/17 Divide-and-conquer strategy Most problems in astronomy are

More information

Chapter 10 Error Detection and Correction 10.1

Chapter 10 Error Detection and Correction 10.1 Chapter 10 Error Detection and Correction 10.1 10-1 INTRODUCTION some issues related, directly or indirectly, to error detection and correction. Topics discussed in this section: Types of Errors Redundancy

More information

NOI 2012 TASKS OVERVIEW

NOI 2012 TASKS OVERVIEW NOI 2012 TASKS OVERVIEW Tasks Task 1: MODSUM Task 2: PANCAKE Task 3: FORENSIC Task 4: WALKING Notes: 1. Each task is worth 25 marks. 2. Each task will be tested on a few sets of input instances. Each set

More information

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators) Name: Email address: Quiz Section: CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering. We will

More information

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science

CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science CHENNAI MATHEMATICAL INSTITUTE M.Sc. / Ph.D. Programme in Computer Science Entrance Examination, 5 May 23 This question paper has 4 printed sides. Part A has questions of 3 marks each. Part B has 7 questions

More information

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414

Announcements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414 Introduction to Database Systems CSE 414 Lecture 17: MapReduce and Spark Announcements Midterm this Friday in class! Review session tonight See course website for OHs Includes everything up to Monday s

More information

Chapter 10 Error Detection and Correction. Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Chapter 10 Error Detection and Correction. Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 10 Error Detection and Correction 0. Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Note The Hamming distance between two words is the number of differences

More information

CS535 Big Data Fall 2017 Colorado State University 9/5/2017. Week 3 - A. FAQs. This material is built based on,

CS535 Big Data Fall 2017 Colorado State University  9/5/2017. Week 3 - A. FAQs. This material is built based on, S535 ig ata Fall 217 olorado State University 9/5/217 Week 3-9/5/217 S535 ig ata - Fall 217 Week 3--1 S535 IG T FQs Programming ssignment 1 We will discuss link analysis in week3 Installation/configuration

More information

Graphs (Part II) Shannon Quinn

Graphs (Part II) Shannon Quinn Graphs (Part II) Shannon Quinn (with thanks to William Cohen and Aapo Kyrola of CMU, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University) Parallel Graph Computation Distributed computation

More information

15/03/2018. Counters

15/03/2018. Counters Counters 2 1 Hadoop provides a set of basic, built-in, counters to store some statistics about jobs, mappers, reducers E.g., number of input and output records E.g., number of transmitted bytes Ad-hoc,

More information

More NP-complete Problems. CS255 Chris Pollett May 3, 2006.

More NP-complete Problems. CS255 Chris Pollett May 3, 2006. More NP-complete Problems CS255 Chris Pollett May 3, 2006. Outline More NP-Complete Problems Hamiltonian Cycle Recall a hamiltonian cycle is a permutation of the vertices v i_1,, v i_n of a graph G so

More information

Data Partitioning Method for Mining Frequent Itemset Using MapReduce

Data Partitioning Method for Mining Frequent Itemset Using MapReduce 1st International Conference on Applied Soft Computing Techniques 22 & 23.04.2017 In association with International Journal of Scientific Research in Science and Technology Data Partitioning Method for

More information

Java in MapReduce. Scope

Java in MapReduce. Scope Java in MapReduce Kevin Swingler Scope A specific look at the Java code you might use for performing MapReduce in Hadoop Java program recap The map method The reduce method The whole program Running on

More information

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing ECE5610/CSC6220 Introduction to Parallel and Distribution Computing Lecture 6: MapReduce in Parallel Computing 1 MapReduce: Simplified Data Processing Motivation Large-Scale Data Processing on Large Clusters

More information

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm Announcements HW5 is due tomorrow 11pm Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create your AWS account before

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 2: MapReduce Algorithm Design (2/2) January 12, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

D-BAUG Informatik I. Exercise session: week 5 HS 2018

D-BAUG Informatik I. Exercise session: week 5 HS 2018 1 D-BAUG Informatik I Exercise session: week 5 HS 2018 Homework 2 Questions? Matrix and Vector in Java 3 Vector v of length n: Matrix and Vector in Java 3 Vector v of length n: double[] v = new double[n];

More information

Research Question Presentation on the Edge Clique Covers of a Complete Multipartite Graph. Nechama Florans. Mentor: Dr. Boram Park

Research Question Presentation on the Edge Clique Covers of a Complete Multipartite Graph. Nechama Florans. Mentor: Dr. Boram Park Research Question Presentation on the Edge Clique Covers of a Complete Multipartite Graph Nechama Florans Mentor: Dr. Boram Park G: V 5 Vertex Clique Covers and Edge Clique Covers: Suppose we have a graph

More information

CS61C Summer 2013 Final Exam

CS61C Summer 2013 Final Exam Login: cs61c CS61C Summer 2013 Final Exam Your Name: SID: Your TA (Circle): Albert Kevin Justin Shaun Jeffrey Sagar Name of person to your LEFT: Name of person to your RIGHT: This exam is worth 90 points

More information

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case 1 / 39 Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case PAN Jie 1 Yann LE BIANNIC 2 Frédéric MAGOULES 1 1 Ecole Centrale Paris-Applied Mathematics and Systems

More information

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b

Implementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory

More information

CSE 123: Computer Networks

CSE 123: Computer Networks Student Name: PID: UCSD email: CSE 123: Computer Networks Homework 1 Solution (Due 10/12 in class) Total Points: 30 Instructions: Turn in a physical copy at the beginning of the class on 10/10. Problems:

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia Final Exam. Administrivia Final Exam

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia Final Exam. Administrivia Final Exam Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#28: Modern Database Systems Administrivia Final Exam Who: You What: R&G Chapters 15-22 When: Tuesday

More information

Basic MapReduce Algorithm Design

Basic MapReduce Algorithm Design Chapter 3 Basic MapReduce Algorithm Design This is a post-production manuscript of: Jimmy Lin and Chris Dyer. Data-Intensive Text Processing with MapReduce. Morgan & Claypool Publishers, 2010. This version

More information

Enumerating Subgraph Instances Using Map-Reduce

Enumerating Subgraph Instances Using Map-Reduce Enumerating Subgraph Instances Using Map-Reduce Foto N. Afrati National Technical University of Athens afrati@softlab.ece.ntua.gr Dimitris Fotakis National Technical University of Athens fotakis@cs.ntua.gr

More information

2018 AIME I Problems

2018 AIME I Problems 2018 AIME I Problems Problem 1 Let be the number of ordered pairs of integers, with and such that the polynomial x + x + can be factored into the product of two (not necessarily distinct) linear factors

More information

Database Systems CSE 414

Database Systems CSE 414 Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) CSE 414 - Fall 2017 1 Announcements HW5 is due tomorrow 11pm HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create

More information

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,

More information

OLTP vs. OLAP Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

OLTP vs. OLAP Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications OLTP vs. OLAP Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#25: OldSQL vs. NoSQL vs. NewSQL On-line Transaction Processing: Short-lived txns.

More information

Data contains value and knowledge. 4/1/19 Tim Althoff, UW CS547: Machine Learning for Big Data,

Data contains value and knowledge. 4/1/19 Tim Althoff, UW CS547: Machine Learning for Big Data, Data contains value and knowledge 4/1/19 Tim Althoff, UW CS547: Machine Learning for Big Data, http://www.cs.washington.edu/cse547 3 But to extract the knowledge data needs to be Stored (systems) Managed

More information

MapReduce Patterns. MCSN - N. Tonellotto - Distributed Enabling Platforms

MapReduce Patterns. MCSN - N. Tonellotto - Distributed Enabling Platforms MapReduce Patterns 1 Intermediate Data Written locally Transferred from mappers to reducers over network Issue - Performance bottleneck Solution - Use combiners - Use In-Mapper Combining 2 Original Word

More information

COP 3014 Honors: Spring 2017 Homework 5

COP 3014 Honors: Spring 2017 Homework 5 COP 3014 Honors: Spring 2017 Homework 5 Total Points: 150 Due: Thursday 03/09/2017 11:59:59 PM 1 Objective The purpose of this assignment is to test your familiarity with C++ functions and arrays. You

More information

mapreduceclass.r carolinaalvesdelimasalge Fri Nov 18 15:42:

mapreduceclass.r carolinaalvesdelimasalge Fri Nov 18 15:42: mapreduceclass.r carolinaalvesdelimasalge Fri Nov 18 15:42:46 2016 # cr eat e a li st of 10 i nt eger s i nt s

More information

Three-Way Joins on MapReduce: An Experimental Study

Three-Way Joins on MapReduce: An Experimental Study Three-Way Joins on MapReduce: An Experimental Study Ben Kimmett University of Victoria, BC, Canada blk@uvic.ca Alex Thomo University of Victoria, BC, Canada thomo@cs.uvic.ca S. Venkatesh University of

More information

Execution Primitives for Scalable Joins and Aggregations in Map Reduce

Execution Primitives for Scalable Joins and Aggregations in Map Reduce Execution Primitives for Scalable Joins and Aggregations in Map Reduce Srinivas Vemuri, Maneesh Varshney, Krishna Puttaswamy, Rui Liu LinkedIn Mountain View, CA ABSTRACT Analytics on Big Data is critical

More information

15/03/2018. Combiner

15/03/2018. Combiner Combiner 2 1 Standard MapReduce applications The (key,value) pairs emitted by the Mappers are sent to the Reducers through the network Some pre-aggregations could be performed to limit the amount of network

More information

Batch Processing Basic architecture

Batch Processing Basic architecture Batch Processing Basic architecture in big data systems COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman 2 1 2 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 3

More information

CSE 190D Spring 2017 Final Exam Answers

CSE 190D Spring 2017 Final Exam Answers CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #16 Loops: Matrix Using Nested for Loop Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #16 Loops: Matrix Using Nested for Loop In this section, we will use the, for loop to code of the matrix problem.

More information

MapReduce and Hadoop. Debapriyo Majumdar Indian Statistical Institute Kolkata

MapReduce and Hadoop. Debapriyo Majumdar Indian Statistical Institute Kolkata MapReduce and Hadoop Debapriyo Majumdar Indian Statistical Institute Kolkata debapriyo@isical.ac.in Let s keep the intro short Modern data mining: process immense amount of data quickly Exploit parallelism

More information

CprE 281: Digital Logic

CprE 281: Digital Logic CprE 28: Digital Logic Instructor: Alexander Stoytchev http://www.ece.iastate.edu/~alexs/classes/ Minimization CprE 28: Digital Logic Iowa State University, Ames, IA Copyright Alexander Stoytchev Administrative

More information

Introduction to Algorithms

Introduction to Algorithms Introduction to Algorithms Dynamic Programming Well known algorithm design techniques: Brute-Force (iterative) ti algorithms Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic

More information

Error Detection. Hamming Codes 1

Error Detection. Hamming Codes 1 Error Detection Hamming Codes 1 Error detecting codes enable the detection of errors in data, but do not determine the precise location of the error. - store a few extra state bits per data word to indicate

More information

MapReduce: Algorithm Design for Relational Operations

MapReduce: Algorithm Design for Relational Operations MapReduce: Algorithm Design for Relational Operations Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec Projection π Projection in MapReduce Easy Map over tuples, emit

More information

10. EXTENDING TRACTABILITY

10. EXTENDING TRACTABILITY 0. EXTENDING TRACTABILITY finding small vertex covers solving NP-hard problems on trees circular arc coverings vertex cover in bipartite graphs Lecture slides by Kevin Wayne Copyright 005 Pearson-Addison

More information

Introduction to MapReduce

Introduction to MapReduce 732A54 Big Data Analytics Introduction to MapReduce Christoph Kessler IDA, Linköping University Towards Parallel Processing of Big-Data Big Data too large to be read+processed in reasonable time by 1 server

More information

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs

The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs The Matrix-Tree Theorem and Its Applications to Complete and Complete Bipartite Graphs Frankie Smith Nebraska Wesleyan University fsmith@nebrwesleyan.edu May 11, 2015 Abstract We will look at how to represent

More information

arxiv: v1 [cs.db] 12 Dec 2015

arxiv: v1 [cs.db] 12 Dec 2015 SharesSkew: An Algorithm to Handle Skew for Joins in MapReduce arxiv:1512.03921v1 [cs.db] 12 Dec 2015 Foto Afrati National Technical University of Athens Greece afrati@gmail.com ABSTRACT In this paper,

More information

1 Unweighted Set Cover

1 Unweighted Set Cover Comp 60: Advanced Algorithms Tufts University, Spring 018 Prof. Lenore Cowen Scribe: Yuelin Liu Lecture 7: Approximation Algorithms: Set Cover and Max Cut 1 Unweighted Set Cover 1.1 Formulations There

More information