Big Data Analysis using Hadoop Lecture 3

Size: px
Start display at page:

Download "Big Data Analysis using Hadoop Lecture 3"

Transcription

1 Big Data Analysis using Hadoop Lecture 3 Last Week - Recap Driver Class Mapper Class Reducer Class Create our first MR process Ran on Hadoop Monitored on webpages Checked outputs using HDFS command line and webpageas 1

2 In this Class Counters Combiners Partitionars Reading and Writing Data Chaining MapReduce Jobs (Workflows) Lab work Assignment Counters 2

3 Passing information back to the driver Counters provide a way for Mappers or Reducers to pass aggregated values back to the driver after the job has completed. Framework provides built in counters: Map-Reduce counters - e.g. number of input and output records for mapper, reducers, time and memory statistics File System counters - e.g. number of bytes read or written Job counters - e.g. launched tasks, failed tasks, etc... Counters are visible from the JobTracker UI Counters are reported on the console when the job finishes User Defined Counters User Defined Counters are a useful mechanism for gathering statistics about the job e.g. quality control - track different types of input record types, e,g, bad input records Instrumentation of code Numbers of warnings Number of errors Number of.. Framework aggregates all user defined counters over all mappers, reducers and reports back on UI enum GroupName { Counter1, Counter2,... Counters are set up as Enums in Mapper or Reducer context.getcounter(enum<?> countername); Counters are retrieved from the Context object which is passed to the Mapper and Reducer setvalue(long value); Increment or set the counter value: increment(long incrementamt); 3

4 Using Counters Set up the user-defined counters public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> { static enum MyCounters{Bad, Good, Missing public void map(longwritable key, Text value, Context context)... Increment the counter when necessary... if (<input data problem>) { context.getcounter(mycounters.bad).increment(1);... Using Counters Print out the counters at the end of the Job in the Driver... job.waitforcompletion(true)? 0 : 1; System.out.println("Job is complete - printing counters now:"); Counters counters = job.getcounters(); Counter bad = counters.findcounter(mymapper.mycounters.bad); System.out.println("Number of bad records is "+ bad.getvalue());... 4

5

6 Combiners Combiners Mappers can produce a large amount of intermediate data generating significant network traffic when passed to Reducers. A Combiner is a mini-reducer runs locally on a single Mapper s output passes its output to the Reducer reduces the intermediate data passed to Reducer Can lead to faster jobs and less network traffic often can reduce the amount of work needed to be done by the Reducer may be the same code as the Reducer 6

7 Typical MR process MR process With Combiner WordCount Example Input to the Mapper: (124, this one I think is called a yink ) (158, he likes to wink, he likes to drink ) (195, he likes to drink and drink and drink ) Output from the Mapper: (this, 1) (one, 1) (I, 1) (think, 1) (is, 1) (called,1)(a, 1)(yink,1) (he, 1) (likes,1) (to,1) (wink,1) (he,1) (likes,1) (to,1) (drink,1) (he,1) (likes,1) (to,1) (drink 1) (and,1) (drink,1) (and,1) (drink,1) 7

8 Example Job What happens with only a Reducer Intermediate data sent to the Reducer (a, [1]) (and,[1,1]) (called, [1]) (drink, [1,1,1,1]) (he, [1,1,1]) (I, [1]) (is, [1]) (likes, [1,1]) (one, [1]) (think, [1]) (this, [1]) (to, [1,1,1]) (wink, [1]) (yink, [1] ) Reducer output (a,1) (and,2) (called,1) (drink,4) (he,3) (I,1) (is,1) (likes,2) (one,1) (think,1) (this,1) (to,3) (wink,1) (yink,1) When we use a Combiner Output from Mapper (a, [1]) (and,[1,1]) (called, [1]) (drink, [1,1,1,1]) (he, [1,1,1]) (I, [1]) (is, [1]) (likes, [1,1]) (one, [1]) (think, [1]) (this, [1]) (to, [1,1,1]) (wink, [1]) (yink, [1] ) Combiner Output (a, [1]) (and,[2]) (called, [1]) (drink, [4]) (he, [3]) (I, [1]) (is, [1]) (likes, [2]) (one, [1]) (think, [1]) (this, [1]) (to, [3]) (wink, [1]) (yink, [1] ) Combiner output (a,1) (and,2) (called,1) (drink,4) (he,3) (I,1) (is,1) (likes,2) (one,1) (think,1) (this,1) (to,3) (wink,1) (yink,1) 8

9 When we use a Combiner Intermediate data sent to the Reducer Combiner output (a, [1]) (and,[2]) (called, [1]) (drink, [4]) (he, [3]) (I, [1]) (is, [1]) (likes, [2]) (one, [1]) (think, [1]) (this, [1]) (to, [3]) (wink, [1]) (yink, [1] ) Reducer output (a,1) (and,2) (called,1) (drink,4) (he,3) (I,1) (is,1) (likes,2) (one,1) (think,1) (this,1) (to,3) (wink,1) (yink,1) Other Combiner outputs Specifying a Combiner Set the Combiner up in the job configuration in the Driver job.setcombinerclass(yourcombinerclass.class) The Combiner uses the same interface as the Reducer takes in a key and a list of values (output from the Mapper) outputs zero or more lists of key/value pairs the work is done in the reduce method Note: The Combiner input types must be the same as the Mapper output types (K2, V2) The Combiner output types must be the same as the Reducer input types (K2, V2) Don t put code in the combiner that alters the data from the Mapper 9

10 Example Code 1 1 The Reducer code is used for the Combiner Example Code Sometimes the code can be slightly different between the Combiner and the Reducer You need to be careful to maintain the input and output formats 3 10

11 Partitioners Partitioners The number of Reducers that run is specified in the job configuration the default number is 1 The number of Reducers can be set when setting up the job setnumreducetasks(value) job.setnumreducetasks(10); No need to set this value if you just want one Reducer Partitioners implementation directs key-value pairs to a specific reducer Number of Partitions = Number of Reducers Default is to hash key the determine partition implemented HashPartitioner<K,V> 11

12 Shape = key Inner patterns = values Note: All keys go to the same reducer A reducer can handle different keys Reducers can have different loads Partitioners Shuffle & Sort 12

13 Partitioners Partitioners determine which reducer the map output data is sent to in the shuffle & sort phase Normally determined using a hash function on the key value Important that the Partitioner will distribute the map output records evenly over the reducers job.setpartitionerclass(casepartitioner.class); job.setnumreducetasks(10); Default Partitioner The default Partitioner is the HashPartitioner The key s hash code is turned into a non negative integer by bitwise ANDing it with the largest integer value It is then reduced modulo the number of partitions to find the index of the partition (the reducer number) that the record belongs to public int getpartition(k2 key, V2 value,int numpartitions) { return (key.hashcode() & Integer.MAX_VALUE) % numpartitions; s distributed evenly across available reduce tasks Assuming good hashcode() function s with same key will make it into the same reduce task 13

14 Default Partitioner Reducer 0 Key Hashcode Modulo 3 This 1 1 is 2 2 not 3 0 my 4 1 office 5 2 Colour 6 0 Pen 7 1 Money 8 2 Reducer 1 Reducer 2 To Implement a Custom Partitioner... Set the number of reducers in the job configuration Create a custom Partitioner class by extending the Partitioner class implement the getpartition() method to return a number between 0 and the number of reducers indexing to which reducer that the key/value pair should be sent. public class MyPartitioner extends Partitioner<KEY, VALUE> { public int getpartition(key key, VALUE value, int numpartitions){ // put code here to decide based on the key which // reducer the map output should go to... return partitionnumber; 14

15 Simple MapReduce Job Complete MapReduce Job 15

16 Drawbacks Need to know the number of partitions/reducers at the start, not dynamic. Letting the application fix the number of reducers rather than the cluster can result in inefficient use of the cluster and uneven reduce jobs that can dominate the job execution time. See MultipleOutputs later on for an alternative solution Partitioner example code

17 Reading & Writing Data 17

18 Logical split created by an InputFormat Each split is processed by a single mapper - Data locality Each record (key value pair) is processed by the map method File HDFS Physical locations Block Block Block InputSplit InputSplit InputSplit InputSplit InputSplit InputSplit Mapper Mapper Mapper Mapper Mapper Mapper map map map map map map Input Data The input data is split into chunks called input splits (logical division) split size is normally the size of a block - but is configurable The size of the splits determines the level of parallelisation One input split a mapper All input data in one single split no parallelisation Small input split useful for parallelisation of CPU bound tasks HDFS stores input data in blocks spread across the nodes (physical division) One block one input split (very efficient for I/O bound tasks) An input split may be split across blocks - Hadoop guarantees the processing of all records but data-local mappers will need to access remote data infrequently. 18

19 Input Data Data input is supported by InputFormat: indicates how the input files should be split into input splits Reader: performs the reading of the data providing a key/value pair for input to the mapper Hadoop provides predefined InputFormats TextInputFormat : default input format KeyValueTextInputFormat SequenceFileInputFormat (normally used for chaining multiple MapReduce jobs) NLineInputFormat The input format can be set in the job configuration in your driver, e.g. job.setinputformatclass(keyvaluetextinputformat.class) To read input data in a way not supported by the standard InputFormat classes you can create a custom InputFormat 19

20 Output Data Each reducer writes its output to its own file normally named part-nnnnn, where nnnnn is the partition ID of the reducer Data output is supported by OutputFormat Writer Hadoop provides predefined OutputFormats TextOutputFormat : default output format SequenceFileOutputFormat (normally used for chaining multiple MapReduce jobs) NullOutputFormat job.setoutputformatclass(sequencefileoutputformat.class) 20

21 Writing Multiple Files MultipleOutputs allows you to write data to multiple files whose names are derived from the output keys and values Output file names are of the form name-x-nnnn name is set by the code x = m, for mapper output x = r, for reducer output nnnn is an integer designating the part number Using MultipleOutputs Create an instance of MultipleOutputs in reducer or mapper where the output is being generated normally done in the setup() protected void setup(context context) throws IOException, InterruptedException{ multipleoutputs = new MultipleOutputs<KEY, VALUE>(context); Close the MultipleOutputs instance once finished with it normally done in the cleanup() protected void cleanup(context context) IOException,InterruptedException { throws multipleoutputs.close(); 21

22 Using MultipleOutputs Write the output key value pair to the instance of MultipleOutputs where name identifies the base output path name is interpreted relative to the output directory so it is possible to create subdirectories by including file path separator characters in name Include logic to determine which output file to write to normally dependent on key and/or value e.g. monthly reports, weekly reports, files identified by time periods e.g. store or branch reports, files identified by store or branch, or both... multipleoutputs.write(key, value, name ); Using MultipleOutputs MultipleOutputs delegates to the given OutputFormat separate named outputs can be set up in the driver using addnamedoutput each with its own OutputFormat and key value types MultipleOutputs.addNamedOutput(job, name, OUTPUTFORMAT, KEY, VALUE ); A single record can be written to multiple output files 22

23 Reading/Writing other types of data Reading to/from a database using JDBC Can use the DBInputFormat and DBOutputFormat DBInputFormat doesn t have sharding capabilities so you have to be careful not to overwhelm the database by reading with too many mappers DBOutputFormat very useful for outputting data to a database Reading XML Create a custom InputFormat to read a whole file at a time (see Chpt 7 in Hadoop the Definitive Guide), suitable for small XML files, or Use XMLInputFormat in Mahout (the machine learning library that is implemented on Hadoop) Chaining MapReduce Jobs 23

24 Chaining MapReduce Jobs We ve looked at single MapReduce jobs. Not every problem can be solved with a MapReduce job. Map-Reduce can get very complex with multiple MR jobs. Many problems can be solved with MapReduce, by writing serval MapReduce steps which run in series, or parallel or both Can control these and get them to interact with each other (dependencies) Gives better control and allows for greater computational capabilities Chaining Jobs in a Sequence MapReduce 1 MapReduce 2 MapReduce 3... Run MapReduce jobs sequentially with the output of one job being the input of another Note: Watch intermediate file format... SequenceFileOutputFormat & SequenceFileInputFormat are useful for this. Remember: Sequence files are Hadoop s compressed binary file format for storing key/value pairs. Set up a Job (job1) in the Driver - run job1; Then set up a new Job (job 2) in the driver, with the input path = the output path of job1 - run job2, etc... 24

25 Chaining Jobs with Dependencies MapReduce 1 MapReduce 2 MapReduce 3 Dependencies can occur when tasks don t run sequentially Use Job, ControlledJob & JobControl classes to setup and manage job dependencies Setting up workflows JobControl is created to hold the workflow Allows for the creation of simple workflows Represents a graph of Jibs to run Specify dependencies in code JobControl control = new JobControl("Workflow-Example"); A ControlledJob is set up for each job in the workflow ControlledJob step1 = new ControlledJob(job1, null); ControlledJob is a wrapper for Job ControlledJob constructor can take in job dependencies List<ControlledJob> dependencies = new ArrayList<ControlledJob>(); dependencies.add(step1); ControlledJob step2 = new ControlledJob(job2, dependencies); Dependencies between jobs can also be setup using adddependingjob(), e.g. step2.adddependingjob(step1) means step1 will not start until step1 has finished 25

26 Setting up workflows Each ControlledJob is added to the JobControl object using addjob()... control.addjob(step1); control.addjob(step2);... The JobControl is executed in a thread JobControl implements Runnable... Thread workflowthread = new Thread(control,"Workflow-Thread"); workflowthread.setdaemon(true); workflowthread.start();... Setting up workflows Wait for JobControl to complete and report results... while (!control.allfinished()){ Thread.sleep(500); if (control.getfailedjoblist().size() > 0 ){ log.error(control.getfailedjoblist().size() + " jobs failed!"); for ( ControlledJob job : control.getfailedjoblist()){ log.error(job.getjobname() + " failed"); else { log.info("success!! Workflow completed [" + control.getsuccessfuljoblist().size() + "] jobs");... JobControl has methods to allow monitoring and tracking of its jobs 26

27 Chaining preprocessing & postprocessing steps Sequential Jobs: Modular Jobs: [ map reduce ]+ map+ reduce map Preprocessing (and postprocessing) might require a number of Mappers to run sequentially, e.g. text preprocessing Sequential jobs (using identity Reducer) are inefficient Use ChainMapper and ChainReducer to implement modular pre- and postprocessing steps each mapper is added with its own job configuration parameters to ChainMapper or ChainReducer each mapper can be run individually, useful for testing/debugging MapReduce Algorithms Available in pdf at 27

28 Programmer Control The ability to construct complex data structures as keys and values to store and communicate partial results The ability to to execute user-specified initialisation code at the beginning of a map or reduce task, and the ability to execute user-specified termination code at the end of a map or reduce task. The ability to preserve state in both mappers and reducers across multiple input or intermediate keys. The ability to control the sort order of intermediate keys, and therefore the order in which a reducer will encounter particular keys. The ability to control the partitioning of the key space, and therefore the set of keys that will be encountered by a particular reducer

29 Useful code! 29

30 User-defined types - example public class TextPair implements WritableComparable<TextPair> { private Text first; private Text second; public TextPair() { set(new Text(), new Text()); public TextPair(String first, String second) { set(new Text(first), new Text(second)); public TextPair(Text first, Text second) { set(first, second); public void set(text first, Text second) { this.first = first; this.second = second; public Text getfirst() { return first; public Text getsecond() { return public void write(dataoutput out) throws IOException { first.write(out); public void readfields(datainput in) throws IOException { first.readfields(in); public int compareto(textpair tp) { int cmp = first.compareto(tp.first); if (cmp!= 0) { return cmp; return public int hashcode() { return first.hashcode() * public boolean equals(object o) { if (o instanceof TextPair) { TextPair tp = (TextPair) o; return first.equals(tp.first) && second.equals(tp.second); return public String tostring() { return first + "\t" + second; 30

IntWritable w1 = new IntWritable(163); IntWritable w2 = new IntWritable(67); assertthat(comparator.compare(w1, w2), greaterthan(0));

IntWritable w1 = new IntWritable(163); IntWritable w2 = new IntWritable(67); assertthat(comparator.compare(w1, w2), greaterthan(0)); factory for RawComparator instances (that Writable implementations have registered). For example, to obtain a comparator for IntWritable, we just use: RawComparator comparator = WritableComparator.get(IntWritable.class);

More information

Introduction to Map/Reduce. Kostas Solomos Computer Science Department University of Crete, Greece

Introduction to Map/Reduce. Kostas Solomos Computer Science Department University of Crete, Greece Introduction to Map/Reduce Kostas Solomos Computer Science Department University of Crete, Greece What we will cover What is MapReduce? How does it work? A simple word count example (the Hello World! of

More information

UNIT V PROCESSING YOUR DATA WITH MAPREDUCE Syllabus

UNIT V PROCESSING YOUR DATA WITH MAPREDUCE Syllabus UNIT V PROCESSING YOUR DATA WITH MAPREDUCE Syllabus Getting to know MapReduce MapReduce Execution Pipeline Runtime Coordination and Task Management MapReduce Application Hadoop Word Count Implementation.

More information

Big Data Analysis using Hadoop. Map-Reduce An Introduction. Lecture 2

Big Data Analysis using Hadoop. Map-Reduce An Introduction. Lecture 2 Big Data Analysis using Hadoop Map-Reduce An Introduction Lecture 2 Last Week - Recap 1 In this class Examine the Map-Reduce Framework What work each of the MR stages does Mapper Shuffle and Sort Reducer

More information

Ghislain Fourny. Big Data 6. Massive Parallel Processing (MapReduce)

Ghislain Fourny. Big Data 6. Massive Parallel Processing (MapReduce) Ghislain Fourny Big Data 6. Massive Parallel Processing (MapReduce) So far, we have... Storage as file system (HDFS) 13 So far, we have... Storage as tables (HBase) Storage as file system (HDFS) 14 Data

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 2: Hadoop Nuts and Bolts Jimmy Lin University of Maryland Thursday, January 31, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Ghislain Fourny. Big Data Fall Massive Parallel Processing (MapReduce)

Ghislain Fourny. Big Data Fall Massive Parallel Processing (MapReduce) Ghislain Fourny Big Data Fall 2018 6. Massive Parallel Processing (MapReduce) Let's begin with a field experiment 2 400+ Pokemons, 10 different 3 How many of each??????????? 4 400 distributed to many volunteers

More information

Parallel Computing. Prof. Marco Bertini

Parallel Computing. Prof. Marco Bertini Parallel Computing Prof. Marco Bertini Apache Hadoop Chaining jobs Chaining MapReduce jobs Many complex tasks need to be broken down into simpler subtasks, each accomplished by an individual MapReduce

More information

Hadoop Map Reduce 10/17/2018 1

Hadoop Map Reduce 10/17/2018 1 Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018

More information

Chapter 3. Distributed Algorithms based on MapReduce

Chapter 3. Distributed Algorithms based on MapReduce Chapter 3 Distributed Algorithms based on MapReduce 1 Acknowledgements Hadoop: The Definitive Guide. Tome White. O Reilly. Hadoop in Action. Chuck Lam, Manning Publications. MapReduce: Simplified Data

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 2: MapReduce Algorithm Design (1/2) January 10, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Cloud Computing CS

Cloud Computing CS Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part

More information

Big Data and Scripting map reduce in Hadoop

Big Data and Scripting map reduce in Hadoop Big Data and Scripting map reduce in Hadoop 1, 2, connecting to last session set up a local map reduce distribution enable execution of map reduce implementations using local file system only all tasks

More information

MapReduce Algorithm Design

MapReduce Algorithm Design MapReduce Algorithm Design Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul Big

More information

Hadoop. Introduction to BIGDATA and HADOOP

Hadoop. Introduction to BIGDATA and HADOOP Hadoop Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big Data and Hadoop What is the need of going ahead with Hadoop? Scenarios to apt Hadoop Technology in REAL

More information

Developing MapReduce Programs

Developing MapReduce Programs Cloud Computing Developing MapReduce Programs Dell Zhang Birkbeck, University of London 2017/18 MapReduce Algorithm Design MapReduce: Recap Programmers must specify two functions: map (k, v) * Takes

More information

Big Data XML Parsing in Pentaho Data Integration (PDI)

Big Data XML Parsing in Pentaho Data Integration (PDI) Big Data XML Parsing in Pentaho Data Integration (PDI) Change log (if you want to use it): Date Version Author Changes Contents Overview... 1 Before You Begin... 1 Terms You Should Know... 1 Selecting

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 2: MapReduce Algorithm Design (2/2) January 14, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Recommended Literature

Recommended Literature COSC 6397 Big Data Analytics Introduction to Map Reduce (I) Edgar Gabriel Spring 2017 Recommended Literature Original MapReduce paper by google http://research.google.com/archive/mapreduce-osdi04.pdf Fantastic

More information

Hadoop. copyright 2011 Trainologic LTD

Hadoop. copyright 2011 Trainologic LTD Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides

More information

15/03/2018. Combiner

15/03/2018. Combiner Combiner 2 1 Standard MapReduce applications The (key,value) pairs emitted by the Mappers are sent to the Reducers through the network Some pre-aggregations could be performed to limit the amount of network

More information

Map-Reduce Applications: Counting, Graph Shortest Paths

Map-Reduce Applications: Counting, Graph Shortest Paths Map-Reduce Applications: Counting, Graph Shortest Paths Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :

Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version : Hortonworks HDPCD Hortonworks Data Platform Certified Developer Download Full Version : https://killexams.com/pass4sure/exam-detail/hdpcd QUESTION: 97 You write MapReduce job to process 100 files in HDFS.

More information

Introduction to Map Reduce

Introduction to Map Reduce Introduction to Map Reduce 1 Map Reduce: Motivation We realized that most of our computations involved applying a map operation to each logical record in our input in order to compute a set of intermediate

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 2: MapReduce Algorithm Design (2/2) January 12, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Introduction to Map/Reduce & Hadoop

Introduction to Map/Reduce & Hadoop Introduction to Map/Reduce & Hadoop Vassilis Christophides christop@csd.uoc.gr http://www.csd.uoc.gr/~hy562 University of Crete 1 Peta-Bytes Data Processing 2 1 1 What is MapReduce? MapReduce: programming

More information

Cloud Computing. Up until now

Cloud Computing. Up until now Cloud Computing Lecture 9 Map Reduce 2010-2011 Introduction Up until now Definition of Cloud Computing Grid Computing Content Distribution Networks Cycle-Sharing Distributed Scheduling 1 Outline Map Reduce:

More information

Laarge-Scale Data Engineering

Laarge-Scale Data Engineering Laarge-Scale Data Engineering The MapReduce Framework & Hadoop Key premise: divide and conquer work partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 result combine Parallelisation challenges How

More information

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation)

HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation) HADOOP COURSE CONTENT (HADOOP-1.X, 2.X & 3.X) (Development, Administration & REAL TIME Projects Implementation) Introduction to BIGDATA and HADOOP What is Big Data? What is Hadoop? Relation between Big

More information

Map-Reduce Applications: Counting, Graph Shortest Paths

Map-Reduce Applications: Counting, Graph Shortest Paths Map-Reduce Applications: Counting, Graph Shortest Paths Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/

More information

Clustering Documents. Case Study 2: Document Retrieval

Clustering Documents. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 21 th, 2015 Sham Kakade 2016 1 Document Retrieval Goal: Retrieve

More information

CSE6331: Cloud Computing

CSE6331: Cloud Computing CSE6331: Cloud Computing Leonidas Fegaras University of Texas at Arlington c 2017 by Leonidas Fegaras Map-Reduce Fundamentals Based on: J. Simeon: Introduction to MapReduce P. Michiardi: Tutorial on MapReduce

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 16. Big Data Management VI (MapReduce Programming)

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 16. Big Data Management VI (MapReduce Programming) Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 16 Big Data Management VI (MapReduce Programming) Credits: Pietro Michiardi (Eurecom): Scalable Algorithm

More information

15/03/2018. Counters

15/03/2018. Counters Counters 2 1 Hadoop provides a set of basic, built-in, counters to store some statistics about jobs, mappers, reducers E.g., number of input and output records E.g., number of transmitted bytes Ad-hoc,

More information

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April, 2017 Sham Kakade 2017 1 Document Retrieval n Goal: Retrieve

More information

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec

MapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec MapReduce: Recap Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec MapReduce: Recap Sequentially read a lot of data Why? Map: extract something we care about map (k, v)

More information

Topics covered in this lecture

Topics covered in this lecture 9/5/2018 CS435 Introduction to Big Data - FALL 2018 W3.B.0 CS435 Introduction to Big Data 9/5/2018 CS435 Introduction to Big Data - FALL 2018 W3.B.1 FAQs How does Hadoop mapreduce run the map instance?

More information

Introduction to Map/Reduce & Hadoop

Introduction to Map/Reduce & Hadoop Introduction to Map/Reduce & Hadoop V. CHRISTOPHIDES University of Crete & INRIA Paris 1 Peta-Bytes Data Processing 2 1 1 What is MapReduce? MapReduce: programming model and associated implementation for

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model

More information

Hadoop Lab 3 Creating your first Map-Reduce Process

Hadoop Lab 3 Creating your first Map-Reduce Process Programming for Big Data Hadoop Lab 3 Creating your first Map-Reduce Process Lab work Take the map-reduce code from these notes and get it running on your Hadoop VM Driver Code Mapper Code Reducer Code

More information

Exam Name: Cloudera Certified Developer for Apache Hadoop CDH4 Upgrade Exam (CCDH)

Exam Name: Cloudera Certified Developer for Apache Hadoop CDH4 Upgrade Exam (CCDH) Vendor: Cloudera Exam Code: CCD-470 Exam Name: Cloudera Certified Developer for Apache Hadoop CDH4 Upgrade Exam (CCDH) Version: Demo QUESTION 1 When is the earliest point at which the reduce method of

More information

Local MapReduce debugging

Local MapReduce debugging Local MapReduce debugging Tools, tips, and tricks Aaron Kimball Cloudera Inc. July 21, 2009 urce: Wikipedia Japanese rock garden Common sense debugging tips Build incrementally Build compositionally Use

More information

Example of a use case

Example of a use case 2 1 In some applications data are read from two or more datasets The datasets could have different formats Hadoop allows reading data from multiple inputs (multiple datasets) with different formats One

More information

Outline. What is Big Data? Hadoop HDFS MapReduce Twitter Analytics and Hadoop

Outline. What is Big Data? Hadoop HDFS MapReduce Twitter Analytics and Hadoop Intro To Hadoop Bill Graham - @billgraham Data Systems Engineer, Analytics Infrastructure Info 290 - Analyzing Big Data With Twitter UC Berkeley Information School September 2012 This work is licensed

More information

Parallel Dijkstra s Algorithm

Parallel Dijkstra s Algorithm CSCI4180 Tutorial-6 Parallel Dijkstra s Algorithm ZHANG, Mi mzhang@cse.cuhk.edu.hk Nov. 5, 2015 Definition Model the Twitter network as a directed graph. Each user is represented as a node with a unique

More information

Map Reduce. MCSN - N. Tonellotto - Distributed Enabling Platforms

Map Reduce. MCSN - N. Tonellotto - Distributed Enabling Platforms Map Reduce 1 MapReduce inside Google Googlers' hammer for 80% of our data crunching Large-scale web search indexing Clustering problems for Google News Produce reports for popular queries, e.g. Google

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 1: MapReduce Algorithm Design (4/4) January 16, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns

STATS Data Analysis using Python. Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns STATS 700-002 Data Analysis using Python Lecture 7: the MapReduce framework Some slides adapted from C. Budak and R. Burns Unit 3: parallel processing and big data The next few lectures will focus on big

More information

Programming Models MapReduce

Programming Models MapReduce Programming Models MapReduce Majd Sakr, Garth Gibson, Greg Ganger, Raja Sambasivan 15-719/18-847b Advanced Cloud Computing Fall 2013 Sep 23, 2013 1 MapReduce In a Nutshell MapReduce incorporates two phases

More information

Big Data: Architectures and Data Analytics

Big Data: Architectures and Data Analytics Big Data: Architectures and Data Analytics June 26, 2018 Student ID First Name Last Name The exam is open book and lasts 2 hours. Part I Answer to the following questions. There is only one right answer

More information

Big Data: Architectures and Data Analytics

Big Data: Architectures and Data Analytics Big Data: Architectures and Data Analytics June 26, 2018 Student ID First Name Last Name The exam is open book and lasts 2 hours. Part I Answer to the following questions. There is only one right answer

More information

MI-PDB, MIE-PDB: Advanced Database Systems

MI-PDB, MIE-PDB: Advanced Database Systems MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:

More information

itpass4sure Helps you pass the actual test with valid and latest training material.

itpass4sure   Helps you pass the actual test with valid and latest training material. itpass4sure http://www.itpass4sure.com/ Helps you pass the actual test with valid and latest training material. Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Vendor : Cloudera

More information

Introduction to MapReduce

Introduction to MapReduce 732A54 Big Data Analytics Introduction to MapReduce Christoph Kessler IDA, Linköping University Towards Parallel Processing of Big-Data Big Data too large to be read+processed in reasonable time by 1 server

More information

MapReduce Simplified Data Processing on Large Clusters

MapReduce Simplified Data Processing on Large Clusters MapReduce Simplified Data Processing on Large Clusters Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) MapReduce 1393/8/5 1 /

More information

Parallel Processing - MapReduce and FlumeJava. Amir H. Payberah 14/09/2018

Parallel Processing - MapReduce and FlumeJava. Amir H. Payberah 14/09/2018 Parallel Processing - MapReduce and FlumeJava Amir H. Payberah payberah@kth.se 14/09/2018 The Course Web Page https://id2221kth.github.io 1 / 83 Where Are We? 2 / 83 What do we do when there is too much

More information

sqoop Easy, parallel database import/export Aaron Kimball Cloudera Inc. June 8, 2010

sqoop Easy, parallel database import/export Aaron Kimball Cloudera Inc. June 8, 2010 sqoop Easy, parallel database import/export Aaron Kimball Cloudera Inc. June 8, 2010 Your database Holds a lot of really valuable data! Many structured tables of several hundred GB Provides fast access

More information

CS 378 Big Data Programming

CS 378 Big Data Programming CS 378 Big Data Programming Lecture 5 Summariza9on Pa:erns CS 378 Fall 2017 Big Data Programming 1 Review Assignment 2 Ques9ons? mrunit How do you test map() or reduce() calls that produce mul9ple outputs?

More information

MapReduce Design Patterns

MapReduce Design Patterns MapReduce Design Patterns MapReduce Restrictions Any algorithm that needs to be implemented using MapReduce must be expressed in terms of a small number of rigidly defined components that must fit together

More information

Hadoop MapReduce Framework

Hadoop MapReduce Framework Hadoop MapReduce Framework Contents Hadoop MapReduce Framework Architecture Interaction Diagram of MapReduce Framework (Hadoop 1.0) Interaction Diagram of MapReduce Framework (Hadoop 2.0) Hadoop MapReduce

More information

Recommended Literature

Recommended Literature COSC 6339 Big Data Analytics Introduction to Map Reduce (I) Edgar Gabriel Fall 2018 Recommended Literature Original MapReduce paper by google http://research.google.com/archive/mapreduce-osdi04.pdf Fantastic

More information

Map Reduce and Design Patterns Lecture 4

Map Reduce and Design Patterns Lecture 4 Map Reduce and Design Patterns Lecture 4 Fang Yu Software Security Lab. Department of Management Information Systems College of Commerce, National Chengchi University http://soslab.nccu.edu.tw Cloud Computation,

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) Hadoop Lecture 24, April 23, 2014 Mohammad Hammoud Today Last Session: NoSQL databases Today s Session: Hadoop = HDFS + MapReduce Announcements: Final Exam is on Sunday April

More information

Announcements. Lab Friday, 1-2:30 and 3-4:30 in Boot your laptop and start Forte, if you brought your laptop

Announcements. Lab Friday, 1-2:30 and 3-4:30 in Boot your laptop and start Forte, if you brought your laptop Announcements Lab Friday, 1-2:30 and 3-4:30 in 26-152 Boot your laptop and start Forte, if you brought your laptop Create an empty file called Lecture4 and create an empty main() method in a class: 1.00

More information

Processing Distributed Data Using MapReduce, Part I

Processing Distributed Data Using MapReduce, Part I Processing Distributed Data Using MapReduce, Part I Computer Science E-66 Harvard University David G. Sullivan, Ph.D. MapReduce A framework for computation on large data sets that are fragmented and replicated

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Vendor: Cloudera. Exam Code: CCD-410. Exam Name: Cloudera Certified Developer for Apache Hadoop. Version: Demo

Vendor: Cloudera. Exam Code: CCD-410. Exam Name: Cloudera Certified Developer for Apache Hadoop. Version: Demo Vendor: Cloudera Exam Code: CCD-410 Exam Name: Cloudera Certified Developer for Apache Hadoop Version: Demo QUESTION 1 When is the earliest point at which the reduce method of a given Reducer can be called?

More information

TI2736-B Big Data Processing. Claudia Hauff

TI2736-B Big Data Processing. Claudia Hauff TI2736-B Big Data Processing Claudia Hauff ti2736b-ewi@tudelft.nl Intro Streams Streams Map Reduce HDFS Pig Pig Design Patterns Hadoop Ctd. Graphs Giraph Spark Zoo Keeper Spark Learning objectives Implement

More information

Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data

More information

Agenda CS121/IS223. Reminder. Object Declaration, Creation, Assignment. What is Going On? Variables in Java

Agenda CS121/IS223. Reminder. Object Declaration, Creation, Assignment. What is Going On? Variables in Java CS121/IS223 Object Reference Variables Dr Olly Gotel ogotel@pace.edu http://csis.pace.edu/~ogotel Having problems? -- Come see me or call me in my office hours -- Use the CSIS programming tutors Agenda

More information

Big Data: Architectures and Data Analytics

Big Data: Architectures and Data Analytics Big Data: Architectures and Data Analytics January 22, 2018 Student ID First Name Last Name The exam is open book and lasts 2 hours. Part I Answer to the following questions. There is only one right answer

More information

CS435 Introduction to Big Data Spring 2018 Colorado State University. 2/12/2018 Week 5-A Sangmi Lee Pallickara

CS435 Introduction to Big Data Spring 2018 Colorado State University. 2/12/2018 Week 5-A Sangmi Lee Pallickara W5.A.0.0 CS435 Introduction to Big Data W5.A.1 FAQs PA1 has been posted Feb. 21, 5:00PM via Canvas Individual submission (No team submission) Source code of examples in lectures: https://github.com/adamjshook/mapreducepatterns

More information

Hadoop & Big Data Analytics Complete Practical & Real-time Training

Hadoop & Big Data Analytics Complete Practical & Real-time Training An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE

More information

Programming with Hadoop MapReduce. Kostas Solomos Computer Science Department University of Crete, Greece

Programming with Hadoop MapReduce. Kostas Solomos Computer Science Department University of Crete, Greece Programming with Hadoop MapReduce Kostas Solomos Computer Science Department University of Crete, Greece What we will cover Diving deeper into Hadoop architecture Using a shared input for the mappers/reducers

More information

Map Reduce. Yerevan.

Map Reduce. Yerevan. Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate

More information

CS121/IS223. Object Reference Variables. Dr Olly Gotel

CS121/IS223. Object Reference Variables. Dr Olly Gotel CS121/IS223 Object Reference Variables Dr Olly Gotel ogotel@pace.edu http://csis.pace.edu/~ogotel Having problems? -- Come see me or call me in my office hours -- Use the CSIS programming tutors CS121/IS223

More information

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi

Giraph: Large-scale graph processing infrastructure on Hadoop. Qu Zhi Giraph: Large-scale graph processing infrastructure on Hadoop Qu Zhi Why scalable graph processing? Web and social graphs are at immense scale and continuing to grow In 2008, Google estimated the number

More information

Expert Lecture plan proposal Hadoop& itsapplication

Expert Lecture plan proposal Hadoop& itsapplication Expert Lecture plan proposal Hadoop& itsapplication STARTING UP WITH BIG Introduction to BIG Data Use cases of Big Data The Big data core components Knowing the requirements, knowledge on Analyst job profile

More information

Vendor: Hortonworks. Exam Code: HDPCD. Exam Name: Hortonworks Data Platform Certified Developer. Version: Demo

Vendor: Hortonworks. Exam Code: HDPCD. Exam Name: Hortonworks Data Platform Certified Developer. Version: Demo Vendor: Hortonworks Exam Code: HDPCD Exam Name: Hortonworks Data Platform Certified Developer Version: Demo QUESTION 1 Workflows expressed in Oozie can contain: A. Sequences of MapReduce and Pig. These

More information

MapReduce-style data processing

MapReduce-style data processing MapReduce-style data processing Software Languages Team University of Koblenz-Landau Ralf Lämmel and Andrei Varanovich Related meanings of MapReduce Functional programming with map & reduce An algorithmic

More information

Snapshots and Repeatable reads for HBase Tables

Snapshots and Repeatable reads for HBase Tables Snapshots and Repeatable reads for HBase Tables Note: This document is work in progress. Contributors (alphabetical): Vandana Ayyalasomayajula, Francis Liu, Andreas Neumann, Thomas Weise Objective The

More information

Data Analytics Job Guarantee Program

Data Analytics Job Guarantee Program Data Analytics Job Guarantee Program 1. INSTALLATION OF VMWARE 2. MYSQL DATABASE 3. CORE JAVA 1.1 Types of Variable 1.2 Types of Datatype 1.3 Types of Modifiers 1.4 Types of constructors 1.5 Introduction

More information

CS 231 Data Structures and Algorithms, Fall 2016

CS 231 Data Structures and Algorithms, Fall 2016 CS 231 Data Structures and Algorithms, Fall 2016 Dr. Bruce A. Maxwell Department of Computer Science Colby College Course Description Focuses on the common structures used to store data and the standard

More information

Data abstractions: ADTs Invariants, Abstraction function. Lecture 4: OOP, autumn 2003

Data abstractions: ADTs Invariants, Abstraction function. Lecture 4: OOP, autumn 2003 Data abstractions: ADTs Invariants, Abstraction function Lecture 4: OOP, autumn 2003 Limits of procedural abstractions Isolate implementation from specification Dependency on the types of parameters representation

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [SPARK] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Return type for collect()? Can

More information

ExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you

ExamTorrent.   Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you ExamTorrent http://www.examtorrent.com Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you Exam : Apache-Hadoop-Developer Title : Hadoop 2.0 Certification exam for Pig

More information

Introduction to HDFS and MapReduce

Introduction to HDFS and MapReduce Introduction to HDFS and MapReduce Who Am I - Ryan Tabora - Data Developer at Think Big Analytics - Big Data Consulting - Experience working with Hadoop, HBase, Hive, Solr, Cassandra, etc. 2 Who Am I -

More information

sqoop Automatic database import Aaron Kimball Cloudera Inc. June 18, 2009

sqoop Automatic database import Aaron Kimball Cloudera Inc. June 18, 2009 sqoop Automatic database import Aaron Kimball Cloudera Inc. June 18, 2009 The problem Structured data already captured in databases should be used with unstructured data in Hadoop Tedious glue code necessary

More information

Actual4Dumps. Provide you with the latest actual exam dumps, and help you succeed

Actual4Dumps.   Provide you with the latest actual exam dumps, and help you succeed Actual4Dumps http://www.actual4dumps.com Provide you with the latest actual exam dumps, and help you succeed Exam : HDPCD Title : Hortonworks Data Platform Certified Developer Vendor : Hortonworks Version

More information

Section 05: Solutions

Section 05: Solutions Section 05: Solutions 1. Asymptotic Analysis (a) Applying definitions For each of the following, choose a c and n 0 which show f(n) O(g(n)). Explain why your values of c and n 0 work. (i) f(n) = 5000n

More information

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)!

Lecture 7: MapReduce design patterns! Claudia Hauff (Web Information Systems)! Big Data Processing, 2014/15 Lecture 7: MapReduce design patterns!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm

More information

MAT 3670: Lab 3 Bits, Data Types, and Operations

MAT 3670: Lab 3 Bits, Data Types, and Operations MAT 3670: Lab 3 Bits, Data Types, and Operations Background In previous labs, we have used Turing machines to manipulate bit strings. In this lab, we will continue to focus on bit strings, placing more

More information

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per

More information

Hadoop Streaming. Table of contents. Content-Type text/html; utf-8

Hadoop Streaming. Table of contents. Content-Type text/html; utf-8 Content-Type text/html; utf-8 Table of contents 1 Hadoop Streaming...3 2 How Does Streaming Work... 3 3 Package Files With Job Submissions...4 4 Streaming Options and Usage...4 4.1 Mapper-Only Jobs...

More information

ML from Large Datasets

ML from Large Datasets 10-605 ML from Large Datasets 1 Announcements HW1b is going out today You should now be on autolab have a an account on stoat a locally-administered Hadoop cluster shortly receive a coupon for Amazon Web

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.

April Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. 1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map

More information

Complex stories about Sqooping PostgreSQL data

Complex stories about Sqooping PostgreSQL data Presentation slide for Sqoop User Meetup (Strata + Hadoop World NYC 2013) Complex stories about Sqooping PostgreSQL data 10/28/2013 NTT DATA Corporation Masatake Iwasaki Introduction 2 About Me Masatake

More information

Assignment 4: Hashtables

Assignment 4: Hashtables Assignment 4: Hashtables In this assignment we'll be revisiting the rhyming dictionary from assignment 2. But this time we'll be loading it into a hashtable and using the hashtable ADT to implement a bad

More information

1/30/2019 Week 2- B Sangmi Lee Pallickara

1/30/2019 Week 2- B Sangmi Lee Pallickara Week 2-A-0 1/30/2019 Colorado State University, Spring 2019 Week 2-A-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 3. DISTRIBUTED COMPUTING MODELS FOR SCALABLE BATCH COMPUTING Term project deliverable

More information