Introduction to Data Management CSE 344
|
|
- Arleen McKinney
- 5 years ago
- Views:
Transcription
1 Introduction to Data Management CSE 344 Lecture 27: Map Reduce and Pig Latin CSE Fall 214 1
2 Announcements HW8 out now, due last Thursday of the qtr You should have received AWS credit code via . Send mail to if problems. Start setting up account now(!). Takes time. And follow instructions!! Usually the biggest problem. Final exam: Mon. 12/8, 2:3-4:2, this room Review Sun. 12/7, 2 pm, EE 37 Comprehensive (but no rel. calculus, datalog) Closed book; reference notes included in test HW grading: yes, we re behind. Should have hw4 and hw5 by Friday night. HW5 and HW6 solutions posted now. CSE Fall 214 2
3 Outline Example of a large MapReduce job Whirlwind tour of Pig Latin for HW8 You ll need to learn from slides, starter code, Hadoop and related web pages; will not do details in class like we did for SQL CSE Fall 214 3
4 Executing a Large MapReduce Job CSE Fall 214 4
5 Anatomy of a Query Execution Running problem #4 2 nodes = 1 master + 19 workers Using PARALLEL 5 CSE Fall 214 5
6 March 213
7 Some other time (March 212) Let s see what happened CSE Fall 214 7
8 1h 16min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / duce Completion Graph - close copy sort reduce
9 1h 16min Only 19 reducers active, out of 5. Why? Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / duce Completion Graph - close Copying by 19 reducers in parallel with mappers. When will the other 31 reducers be scheduled? copy sort reduce
10 1h 16min Only 19 reducers active, out of 5. Why? Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host Address: Job-ACLs: All users are allowed Job Setup: Successful Status: Running Started at: Sun Mar 4 19:8:29 UTC 212 Running for: 1hrs, 16mins, 33sec Job Cleanup: Pending 3h 5min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 32.42% / Counter Map Reduce Total SLOTS_MILLIS_MAPS 164,62,372 1 Launched reduce tasks Job Counters 9 Rack-local map tasks 5, Launched map tasks 5, File Input Format 5 Bytes Read 4 4 Counters 3 3 S3N_BYTES_READ 175,523,148,98 175,523,148, FileSystemCounters HDFS_BYTES_READ 1,845,837 1,845, FILE_BYTES_WRITTEN ,26,62, ,356, ,351,958,245 Map output materialized 2,444,314,273 2,444,314,273 bytes Reduce Completion Graph - close duce Completion Graph - close 1 Map input records 85,225,193 85,225, Reduce shuffle bytes 99,468,723 copy99,468, Spilled Records When will 173,82,131 the other sort 173,82, Copying by 19 reducers Map output bytes 31 reducers 62,732,457,83 be scheduled? reduce 62,732,457, in parallel with mappers. 4 4 CPU time spent (ms) 55,277,52 2,656,94 57,934, Total committed heap usage 1,956,86,312,96 3,42,83,712 1,959,129,116, (bytes) Map-Reduce Combine 2 input 25 records ,225, ,442, ,668,9 Framework SPLIT_RAW_BYTES 1,845,837 1,845,837 Go back to JobTracker This is Apache Hadoop release.2.25 copy sort reduce
11 1h 16min Only 19 reducers active, out of 5. Why? Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host Address: Job-ACLs: All users are allowed Job Setup: Successful Status: Running Started at: Sun Mar 4 19:8:29 UTC 212 Running for: 1hrs, 16mins, 33sec Job Cleanup: Pending 3h 5min Completed. Sorting, and the rest of Reduce may proceed now Speculative Execution Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 32.42% / Counter Map Reduce Total SLOTS_MILLIS_MAPS 164,62,372 1 Launched reduce tasks Job Counters 9 Rack-local map tasks 5, Launched map tasks 5, File Input Format 5 Bytes Read 4 4 Counters 3 3 S3N_BYTES_READ 175,523,148,98 175,523,148, FileSystemCounters HDFS_BYTES_READ 1,845,837 1,845, FILE_BYTES_WRITTEN ,26,62, ,356, ,351,958,245 Map output materialized 2,444,314,273 2,444,314,273 bytes Reduce Completion Graph - close duce Completion Graph - close 1 Map input records 85,225,193 85,225, Reduce shuffle bytes 99,468,723 copy99,468, Spilled Records When will 173,82,131 the other sort 173,82, Copying by 19 reducers Map output bytes 31 reducers 62,732,457,83 be scheduled? reduce 62,732,457, in parallel with mappers. 4 4 CPU time spent (ms) 55,277,52 2,656,94 57,934, Total committed heap usage 1,956,86,312,96 3,42,83,712 1,959,129,116, (bytes) Map-Reduce Combine 2 input 25 records ,225, ,442, ,668,9 Framework SPLIT_RAW_BYTES 1,845,837 1,845,837 Go back to JobTracker This is Apache Hadoop release.2.25 copy sort reduce
12 3h 51min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 37.72% / p Completion Graph - close duce Completion Graph - close copy sort reduce
13 3h 51min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 37.72% / p Completion Graph - close duce Completion Graph - close Some of the 19 reducers have finished Next Batch of Reducers started copy sort reduce
14 3h 51min Hadoop job_ _1 on ip Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: 3h 52min User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host: ip ec2.internal Submit Host Address: Submit Host Address: Job-ACLs: All users are allowed Job-ACLs: All users are allowed Job Setup: Successful Job Setup: Successful Status: Running Status: Running Started at: Sun Mar 4 19:8:29 UTC 212 Started at: Sun Mar 4 19:8:29 UTC 212 Running for: 3hrs, 51mins, 19sec Running for: 3hrs, 52mins, 51sec Job Cleanup: Pending Job Cleanup: Pending Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 37.72% / map 1.% / 18 reduce 42.35% / p Completion Graph - close Launched map tasks 15,834 Launched map tasks 15, SLOTS_MILLIS_REDUCES 118,328, File Output Format 8 File Output Format Bytes Written 7 7 Counters Counters 6 6 File Input Format File Input Format 5 Bytes Read 5 Counters Counters 4 4 SLOTS_MILLIS_REDUCES Bytes Written Bytes Read 25,4,19 3 S3N_BYTES_READ 53,591,875,823 53,591,875,823 S3N_BYTES_READ 53,591,875,823 53,591,875, FILE_BYTES_READ 754,835,48 754,835, FILE_BYTES_READ 847,821, ,821,126 FileSystemCounters HDFS_BYTES_READ ,587, ,587,893 FileSystemCounters HDFS_BYTES_READ ,587, ,587,893 FILE_BYTES_WRITTEN 9,616,982,133 85,567,984 1,467,55,117 FILE_BYTES_WRITTEN 9,616,982, ,512,16 1,481,494,149 duce Completion Graph - close HDFS_BYTES_WRITTEN 3,4,371,86 3,4,371,86 Reduce Completion Graph - close HDFS_BYTES_WRITTEN 3,967,197,533 3,967,197, Job Counters Counter Map Reduce Total SLOTS_MILLIS_MAPS 495,799,522 Launched reduce tasks 31 Rack-local map tasks 15,834 Job Counters Some of the 19 reducers have finished Counter Map Reduce Total SLOTS_MILLIS_MAPS 495,799,522 Launched reduce tasks 39 Rack-local map tasks 15,834 Map output materialized 1 Map output materialized 7,311,35,131 7,311,35, ,311,35,131 7,311,35,131 bytes copy bytes copy 8 Map input records 2,51,793,3 sort 2,51,793,3 7 Map input records 2,51,793,3 2,51,793,3 sort Reduce shuffle bytes 2,755,65,871 6 reduce 2,755,65,871 Reduce shuffle bytes Next Batch of 19 reducers 3,489,678,276 3,489,678,276 reduce 5 Spilled Records Next Batch of Reducers 465,817,71 started 26,163, ,981,248 4 Spilled Records 465,817,71 54,94,866 52,758,576 Map output bytes 199,575,247, ,575,247,17 2 Map output bytes 199,575,247,17 199,575,247, Go back to JobTracker
15 4h 18min Several servers failed: fetch error. Their map tasks need to be rerun. All reducers are waiting. Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 99.88% / 3337 reduce 48.42% / Reduce Completion Graph - close copy sort reduce Go back to JobTracker
16 4h 18min Several servers failed: fetch error. Their map tasks need to be rerun. All reducers are waiting. Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 99.88% / 3337 reduce 48.42% / Reduce Completion Graph - close Why did we lose some reducers? copy sort reduce Go back to JobTracker
17 Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 99.88% / 3337 reduce 48.42% / Reduce Completion Graph - close h 18min Several servers failed: fetch error. Their map tasks need to be rerun. All reducers are waiting. Why did we lose some reducers? copy sort reduce 7h 1min Hadoop job_ _1 on ip Mappers finished, User: hadoop CPU time spent (ms) 165,59,32 36,329,45 21,388,77 Job Name: PigLatin:DefaultJobName Job File: Total reducers committed heap usage resumed. hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml 5,92,284,372,992 15,76,56,896 5,935,36,933,888 (bytes) Submit Host: ip ec2.internal Submit Host Address: Map-Reduce Combine input records 2,51,793,3 437,117,972 2,938,911,2 Job-ACLs: All users are allowed Framework Job Setup: Successful SPLIT_RAW_BYTES 5,587,893 5,587,893 Status: Running Started at: Sun Mar Reduce 4 19:8:29 input records UTC 212 Running for: 7hrs, Reduce 1mins, input 54sec groups Job Cleanup: Pending Black-listed TaskTrackers: Combine output 3 records 465,817,71 126,918,315 16,55,13 117,266, ,918,315 16,55,13 583,84,327 Physical memory (bytes) 5,781,194,698,752 17,89,435,72 Failed/Killed 5,799,85,133,824 Kind % Complete snapshot Num Tasks Pending Running Complete Killed Task Attempts map 1.% Reduce output records 16,55,11 16,55, / 5968 Virtual memory (bytes) 8,999,333,4,128 29,498,195,968 9,28,831,236,96 reduce 94.15% snapshot / 8 Map output records 2,51,793,3 2,51,793,3 Counter Map Reduce Total Map Completion Graph - close SLOTS_MILLIS_MAPS 676,845, Launched reduce tasks 62 8 Job Counters Rack-local map tasks 21, Launched map tasks 21, SLOTS_MILLIS_REDUCES 39,18,556 3 File Output Format 2 Counters Bytes Written 1 File Input Format Bytes Read Counters S3N_BYTES_READ 53,591,952,796 53,591,952,796 Reduce Completion Graph - close FILE_BYTES_READ 1,921,632,69 1,921,632,69 1 FileSystemCounters HDFS_BYTES_READ 5,587,893 5,587,893 9 copy 8 FILE_BYTES_WRITTEN 9,616,982,133 2,51,943,74 11,668,925,873 7 sort HDFS_BYTES_WRITTEN 9,411,137,927 9,411,137,927 6 reduce 5 Map output materialized 7,311,35,131 7,311,35,131 4 bytes 3 2 Map input records 2,51,793,3 2,51,793,3 1 Reduce shuffle bytes 7,226,95,915 7,226,95, Spilled 15 Records ,817, ,997, ,815,297 Map output bytes 199,575,247,17 199,575,247,17 Go back to JobTracker Go back to JobTracker
18 7h 2min Success! 7hrs, 2mins. Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host Address: Job-ACLs: All users are allowed Job Setup: Successful Status: Succeeded Started at: Sun Mar 4 19:8:29 UTC 212 Finished at: Mon Mar 5 2:28:39 UTC 212 Finished in: 7hrs, 2mins, 1sec Job Cleanup: Successful Black-listed TaskTrackers: 3 Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 5968 reduce 1.% 5 5 / copy sort reduce
19 Pig Latin Mini-Tutorial (will not discuss in detail in class; please read in order to do homework 8) CSE Fall
20 Pig Latin Overview Data model = loosely typed nested relations Query model = a SQL-like, dataflow language Execution model: Option 1: run locally on your machine; e.g. to debug Option 2: compile into graph of MapReduce jobs, run on a cluster supporting Hadoop CSE Fall 214 2
21 Example Input: a table of urls: (url, category, pagerank) Compute the average pagerank of all sufficiently high pageranks, for each category Return the answers only for categories with sufficiently many such pages CSE Fall
22 First in SQL Page(url, category, pagerank) SELECT category, AVG(pagerank) FROM Page WHERE pagerank >.2 GROUP BY category HAVING COUNT(*) > 1 6 CSE Fall
23 Page(url, category, pagerank) then in Pig-Latin good_urls = FILTER urls BY pagerank >.2 groups = GROUP good_urls BY category big_groups = FILTER groups BY COUNT(good_urls) > 1 6 output = FOREACH big_groups GENERATE category, AVG(good_urls.pagerank) CSE Fall
24 Types in Pig-Latin Atomic: string or number, e.g. Alice or 55 Tuple: ( Alice, 55, salesperson ) Bag: {( Alice, 55, salesperson ), ( Betty,44, manager ), } Maps: we will try not to use these CSE Fall
25 Types in Pig-Latin Tuple components can be referenced by number $, $1, $2, Bags can be nested! Non 1 st Normal Form {( a, {1,4,3}), ( c,{ }), ( d, {2,2,5,3,2})} CSE Fall
26 [Olston 28] 26
27 [Olston 28] Loading data Input data = FILES! Heard that before? The LOAD command parses an input file into a bag of records Both parser (= deserializer ) and output type are provided by user For HW8: simply use the code provided CSE Fall
28 [Olston 28] Loading data queries = LOAD query_log.txt USING userloadfcn( ) AS (userid, querystring, timestamp) Pig provides a set of built-in load/store functions A = LOAD 'student' USING PigStorage('\t') AS (name: chararray, age:int, gpa: float); same as A = LOAD 'student' AS (name: chararray, age:int, gpa: float); CSE Fall
29 [Olston 28] Loading data USING userfuction( ) -- is optional Default deserializer expects tab-delimited file AS type is optional Default is a record with unnamed fields; refer to them as $, $1, The return value of LOAD is just a handle to a bag The actual reading is done in pull mode, or parallelized CSE Fall
30 [Olston 28] FOREACH expanded_queries = FOREACH queries GENERATE userid, expandquery(querystring) expandquery( ) is a UDF that produces likely expansions Note: it returns a bag, hence expanded_queries is a nested bag CSE Fall 214 3
31 [Olston 28] FOREACH expanded_queries = FOREACH queries GENERATE userid, flatten(expandquery(querystring)) Now we get a flat collection CSE Fall
32 [Olston 28] CSE Fall
33 [Olston 28] FLATTEN Note that it is NOT a normal function! (that s one thing I don t like about Pig-latin) A normal FLATTEN would do this: FLATTEN({{2,3},{5},{},{4,5,6}}) = {2,3,5,4,5,6} Its type is: {{T}} à {T} The Pig Latin FLATTEN does this: FLATTEN({4,5,6}) = 4, 5, 6 What is its Type? {T} à T, T, T,, T????? CSE Fall
34 [Olston 28] FILTER Remove all queries from Web bots: real_queries = FILTER queries BY userid neq bot Better: use a complex UDF to detect Web bots: real_queries = FILTER queries BY NOT isbot(userid) CSE Fall
35 [Olston 28] JOIN results: revenue: {(querystring, url, position)} {(querystring, adslot, amount)} join_result = JOIN results BY querystring revenue BY querystring join_result : {(querystring, url, position, adslot, amount)} CSE Fall
36 [Olston 28] CSE Fall
37 [Olston 28] GROUP BY revenue: {(querystring, adslot, amount)} grouped_revenue = GROUP revenue BY querystring query_revenues = FOREACH grouped_revenue GENERATE querystring, SUM(revenue.amount) AS totalrevenue grouped_revenue: {(querystring, {(adslot, amount)})} query_revenues: {(querystring, totalrevenue)} CSE Fall
38 Simple MapReduce [Olston 28] input : {(field1, field2, field3,....)} map_result = FOREACH input GENERATE FLATTEN(map(*)) key_groups = GROUP map_result BY $ output = FOREACH key_groups GENERATE $, reduce($1) map_result : {(a1, a2, a3,...)} key_groups : {(a1, {(a2, a3,...)})} CSE Fall
39 [Olston 28] Co-Group results: {(querystring, url, position)} revenue: {(querystring, adslot, amount)} grouped_data = COGROUP results BY querystring, revenue BY querystring; grouped_data: {(querystring, results:{(url, position)}, revenue:{(adslot, amount)})} What is the output type in general? CSE Fall
40 [Olston 28] Co-Group Is this an inner join, or an outer join? CSE Fall 214 4
41 [Olston 28] Co-Group grouped_data: {(querystring, results:{(url, position)}, revenue:{(adslot, amount)})} url_revenues = FOREACH grouped_data GENERATE FLATTEN(distributeRevenue(results, revenue)); distributerevenue is a UDF that accepts search results and revenue information for a query string at a time, and outputs a bag of urls and the revenue attributed to them. CSE Fall
42 [Olston 28] Co-Group v.s. Join grouped_data: {(querystring, results:{(url, position)}, revenue:{(adslot, amount)})} grouped_data = COGROUP results BY querystring, revenue BY querystring; join_result = FOREACH grouped_data GENERATE FLATTEN(results), FLATTEN(revenue); Result is the same as JOIN CSE Fall
43 Asking for Output: STORE [Olston 28] STORE query_revenues INTO `theoutput' USING userstorefcn(); Meaning: write query_revenues to the file theoutput CSE Fall
44 [Olston 28] Implementation Over Hadoop! Parse query: Everything between LOAD and STORE à one logical plan Logical plan à graph of MapReduce ops All statements between two (CO)GROUPs à one MapReduce job CSE Fall
45 [Olston 28] Implementation CSE Fall
46 Review: MapReduce Data is typically a file in the Google File System HDFS for Hadoop File system partitions file into chunks Each chunk is replicated on k (typically 3) machines Each machine can run a few map and reduce tasks simultaneously Each map task consumes one chunk Can adjust how much data goes into each map task using splits Scheduler tries to schedule map task where its input data is located Map output is partitioned across reducers Map output is also written locally to disk Number of reduce tasks is configurable System shuffles data between map and reduce tasks Reducers sort-merge data before consuming it CSE Fall
47 MapReduce Phases Local storage ` CSE Fall
Lecture 23: Supplementary slides for Pig Latin. Friday, May 28, 2010
Lecture 23: Supplementary slides for Pig Latin Friday, May 28, 2010 1 Outline Based entirely on Pig Latin: A not-so-foreign language for data processing, by Olston, Reed, Srivastava, Kumar, and Tomkins,
More informationSection 8. Pig Latin
Section 8 Pig Latin Outline Based on Pig Latin: A not-so-foreign language for data processing, by Olston, Reed, Srivastava, Kumar, and Tomkins, 2008 2 Pig Engine Overview Data model = loosely typed nested
More informationIntroduction to Database Systems CSE 444. Lecture 22: Pig Latin
Introduction to Database Systems CSE 444 Lecture 22: Pig Latin Outline Based entirely on Pig Latin: A not-so-foreign language for data processing, by Olston, Reed, Srivastava, Kumar, and Tomkins, 2008
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Winter 215 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Due next Thursday evening Will send out reimbursement codes later
More informationOutline. MapReduce Data Model. MapReduce. Step 2: the REDUCE Phase. Step 1: the MAP Phase 11/29/11. Introduction to Data Management CSE 344
Outline Introduction to Data Management CSE 344 Review of MapReduce Introduction to Pig System Pig Latin tutorial Lecture 23: Pig Latin Some slides are courtesy of Alan Gates, Yahoo!Research 1 2 MapReduce
More information"Big Data" Open Source Systems. CS347: Map-Reduce & Pig. Motivation for Map-Reduce. Building Text Index - Part II. Building Text Index - Part I
"Big Data" Open Source Systems CS347: Map-Reduce & Pig Hector Garcia-Molina Stanford University Infrastructure for distributed data computations Map-Reduce, S4, Hyracks, Pregel [Storm, Mupet] Components
More informationPig Latin. Dominique Fonteyn Wim Leers. Universiteit Hasselt
Pig Latin Dominique Fonteyn Wim Leers Universiteit Hasselt Pig Latin...... is an English word game in which we place the rst letter of a word at the end and add the sux -ay. Pig Latin becomes igpay atinlay
More informationData Management in the Cloud PIG LATIN AND HIVE. The Google Stack. Sawzall. Map/Reduce. Bigtable GFS
Data Management in the Cloud PIG LATIN AND HIVE 191 The Google Stack Sawzall Map/Reduce Bigtable GFS 192 The Hadoop Stack SQUEEQL! ZZZZZQL! EXCUSEZ- MOI?!? Pig/Pig Latin Hive REALLY!?! Hadoop HDFS At your
More informationPig Latin: A Not-So-Foreign Language for Data Processing
Pig Latin: A Not-So-Foreign Language for Data Processing Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins (Yahoo! Research) Presented by Aaron Moss (University of Waterloo)
More information1.2 Why Not Use SQL or Plain MapReduce?
1. Introduction The Pig system and the Pig Latin programming language were first proposed in 2008 in a top-tier database research conference: Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi
More informationMotivation: Building a Text Index. CS 347 Distributed Databases and Transaction Processing Distributed Data Processing Using MapReduce
Motivation: Building a Text Index CS 347 Distributed Databases and Transaction Processing Distributed Data Processing Using MapReduce Hector Garcia-Molina Zoltan Gyongyi Web page stream 1 rat dog 2 dog
More informationData-intensive computing systems
Data-intensive computing systems High-Level Languages University of Verona Computer Science Department Damiano Carra Acknowledgements! Credits Part of the course material is based on slides provided by
More informationPig Latin: A Not-So-Foreign Language for Data Processing
Pig Latin: A Not-So-Foreign Language for Data Processing Christopher Olston Yahoo! Research Ravi Kumar Yahoo! Research Benjamin Reed Yahoo! Research Andrew Tomkins Yahoo! Research Utkarsh Srivastava Yahoo!
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 28: Map Reduce Wrapup, slides on Pig Latin CSE 344 - Fall 2013 1 Announcements HW8 due on Friday Message from Daseul: don t use PARALLEL Final exam: 12/11/2013,
More informationGoing beyond MapReduce
Going beyond MapReduce MapReduce provides a simple abstraction to write distributed programs running on large-scale systems on large amounts of data MapReduce is not suitable for everyone MapReduce abstraction
More informationWhere We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)
More informationPig Latin Reference Manual 1
Table of contents 1 Overview.2 2 Pig Latin Statements. 2 3 Multi-Query Execution 5 4 Specialized Joins..10 5 Optimization Rules. 13 6 Memory Management15 7 Zebra Integration..15 1. Overview Use this manual
More informationThe Hadoop Stack, Part 1 Introduction to Pig Latin. CSE Cloud Computing Fall 2018 Prof. Douglas Thain University of Notre Dame
The Hadoop Stack, Part 1 Introduction to Pig Latin CSE 40822 Cloud Computing Fall 2018 Prof. Douglas Thain University of Notre Dame Three Case Studies Workflow: Pig Latin A dataflow language and execution
More informationThe Pig Experience. A. Gates et al., VLDB 2009
The Pig Experience A. Gates et al., VLDB 2009 Why not Map-Reduce? Does not directly support complex N-Step dataflows All operations have to be expressed using MR primitives Lacks explicit support for processing
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) CSE 414 - Fall 2017 1 Announcements HW5 is due tomorrow 11pm HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create
More informationAnnouncements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm
Announcements HW5 is due tomorrow 11pm Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create your AWS account before
More informationHadoop. Course Duration: 25 days (60 hours duration). Bigdata Fundamentals. Day1: (2hours)
Bigdata Fundamentals Day1: (2hours) 1. Understanding BigData. a. What is Big Data? b. Big-Data characteristics. c. Challenges with the traditional Data Base Systems and Distributed Systems. 2. Distributions:
More informationDatabase Applications (15-415)
Database Applications (15-415) Hadoop Lecture 24, April 23, 2014 Mohammad Hammoud Today Last Session: NoSQL databases Today s Session: Hadoop = HDFS + MapReduce Announcements: Final Exam is on Sunday April
More informationAnnouncements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 17: MapReduce and Spark Announcements Midterm this Friday in class! Review session tonight See course website for OHs Includes everything up to Monday s
More informationAndrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09. Presented by: Daniel Isaacs
Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, Samuel Madden, and Michael Stonebraker SIGMOD'09 Presented by: Daniel Isaacs It all starts with cluster computing. MapReduce Why
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationPig A language for data processing in Hadoop
Pig A language for data processing in Hadoop Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Apache Pig: Introduction Tool for querying data on Hadoop
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel
More informationExpert Lecture plan proposal Hadoop& itsapplication
Expert Lecture plan proposal Hadoop& itsapplication STARTING UP WITH BIG Introduction to BIG Data Use cases of Big Data The Big data core components Knowing the requirements, knowledge on Analyst job profile
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationDr. Chuck Cartledge. 18 Feb. 2015
CS-495/595 Pig Lecture #6 Dr. Chuck Cartledge 18 Feb. 2015 1/18 Table of contents I 1 Miscellanea 2 The Book 3 Chapter 11 4 Conclusion 5 References 2/18 Corrections and additions since last lecture. Completed
More informationHadoop Map Reduce 10/17/2018 1
Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018
More informationScaling Up 1 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. Hadoop, Pig
CSE 6242 / CX 4242 Scaling Up 1 Hadoop, Pig Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le
More information2/26/2017. For instance, consider running Word Count across 20 splits
Based on the slides of prof. Pietro Michiardi Hadoop Internals https://github.com/michiard/disc-cloud-course/raw/master/hadoop/hadoop.pdf Job: execution of a MapReduce application across a data set Task:
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationClustering Lecture 8: MapReduce
Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data
More informationCC PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018
CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2018 Lecture 4: Apache Pig Aidan Hogan aidhog@gmail.com HADOOP: WRAPPING UP 0. Reading/Writing to HDFS Creates a file system for default configuration Check
More informationHadoop is supplemented by an ecosystem of open source projects IBM Corporation. How to Analyze Large Data Sets in Hadoop
Hadoop Open Source Projects Hadoop is supplemented by an ecosystem of open source projects Oozie 25 How to Analyze Large Data Sets in Hadoop Although the Hadoop framework is implemented in Java, MapReduce
More informationHadoop ecosystem. Nikos Parlavantzas
1 Hadoop ecosystem Nikos Parlavantzas Lecture overview 2 Objective Provide an overview of a selection of technologies in the Hadoop ecosystem Hadoop ecosystem 3 Hadoop ecosystem 4 Outline 5 HBase Hive
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More information3. Monitoring Scenarios
3. Monitoring Scenarios This section describes the following: Navigation Alerts Interval Rules Navigation Ambari SCOM Use the Ambari SCOM main navigation tree to browse cluster, HDFS and MapReduce performance
More informationScaling Up Pig. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up Pig Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials
More informationParallel Computing: MapReduce Jin, Hai
Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google
More informationIntroduction to Data Management CSE 344. Lecture 1: Introduction
Introduction to Data Management CSE 344 Lecture 1: Introduction CSE 344 - Winter 2014 1 Staff Instructor: Sudeepa Roy sudeepa@cs.washington.edu Office hours: Wednesdays, 3:30-4:20, in CSE 344 (my office)
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Hadoop Evolution and Ecosystem Hadoop Map/Reduce has been an incredible success, but not everybody is happy with it 3 DB
More informationIndex. Symbols A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Symbols A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Symbols + addition operator?: bincond operator /* */ comments - multi-line -- comments - single-line # deference operator (map). deference operator
More informationBatch Inherence of Map Reduce Framework
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287
More informationDistributed Data Management Summer Semester 2013 TU Kaiserslautern
Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel smichel@mmci.uni- saarland.de Distributed Data Management, SoSe 2013, S. Michel 1 Lecture 4 PIG/HIVE Distributed
More informationML from Large Datasets
10-605 ML from Large Datasets 1 Announcements HW1b is going out today You should now be on autolab have a an account on stoat a locally-administered Hadoop cluster shortly receive a coupon for Amazon Web
More informationAssignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis
Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationDistributed Systems. CS422/522 Lecture17 17 November 2014
Distributed Systems CS422/522 Lecture17 17 November 2014 Lecture Outline Introduction Hadoop Chord What s a distributed system? What s a distributed system? A distributed system is a collection of loosely
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationMapReduce Algorithm Design
MapReduce Algorithm Design Bu eğitim sunumları İstanbul Kalkınma Ajansı nın 2016 yılı Yenilikçi ve Yaratıcı İstanbul Mali Destek Programı kapsamında yürütülmekte olan TR10/16/YNY/0036 no lu İstanbul Big
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part
More informationMap- Reduce. Everything Data CompSci Spring 2014
Map- Reduce Everything Data CompSci 290.01 Spring 2014 2 Announcements (Thu. Feb 27) Homework #8 will be posted by noon tomorrow. Project deadlines: 2/25: Project team formation 3/4: Project Proposal is
More informationActual4Dumps. Provide you with the latest actual exam dumps, and help you succeed
Actual4Dumps http://www.actual4dumps.com Provide you with the latest actual exam dumps, and help you succeed Exam : HDPCD Title : Hortonworks Data Platform Certified Developer Vendor : Hortonworks Version
More informationMapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec
MapReduce: Recap Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec MapReduce: Recap Sequentially read a lot of data Why? Map: extract something we care about map (k, v)
More information/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016
15-319 / 15-619 Cloud Computing Recitation 3 Sep 13 & 15, 2016 1 Overview Administrative Issues Last Week s Reflection Project 1.1, OLI Unit 1, Quiz 1 This Week s Schedule Project1.2, OLI Unit 2, Module
More informationScaling Up Pig. Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech. CSE6242 / CX4242: Data & Visual Analytics
http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Scaling Up Pig Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics Georgia Tech Partly based on materials
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationMapReduce. Stony Brook University CSE545, Fall 2016
MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining
More informationHadoop & Big Data Analytics Complete Practical & Real-time Training
An ISO Certified Training Institute A Unit of Sequelgate Innovative Technologies Pvt. Ltd. www.sqlschool.com Hadoop & Big Data Analytics Complete Practical & Real-time Training Mode : Instructor Led LIVE
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationGetting Started. Table of contents. 1 Pig Setup Running Pig Pig Latin Statements Pig Properties Pig Tutorial...
Table of contents 1 Pig Setup... 2 2 Running Pig... 3 3 Pig Latin Statements... 6 4 Pig Properties... 8 5 Pig Tutorial... 9 1. Pig Setup 1.1. Requirements Mandatory Unix and Windows users need the following:
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationHortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :
Hortonworks HDPCD Hortonworks Data Platform Certified Developer Download Full Version : https://killexams.com/pass4sure/exam-detail/hdpcd QUESTION: 97 You write MapReduce job to process 100 files in HDFS.
More informationHOMEWORK 7. M. Neumann. Due: THU 8 MAR PM. Getting Started SUBMISSION INSTRUCTIONS
CSE427S HOMEWORK 7 M. Neumann Due: THU 8 MAR 2018 1PM Getting Started Update your SVN repository. When needed, you will find additional materials for homework x in the folder hwx. So, for the current assignment
More informationHadoop MapReduce Framework
Hadoop MapReduce Framework Contents Hadoop MapReduce Framework Architecture Interaction Diagram of MapReduce Framework (Hadoop 1.0) Interaction Diagram of MapReduce Framework (Hadoop 2.0) Hadoop MapReduce
More informationAnnouncements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk
Announcements Introduction to Database Systems CSE 414 Lecture 17: Basics of Query Optimization and Query Cost Estimation Midterm will be released by end of day today Need to start one HW6 step NOW: https://aws.amazon.com/education/awseducate/apply/
More informationIntroduction to Hadoop. High Availability Scaling Advantages and Challenges. Introduction to Big Data
Introduction to Hadoop High Availability Scaling Advantages and Challenges Introduction to Big Data What is Big data Big Data opportunities Big Data Challenges Characteristics of Big data Introduction
More informationPutting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21
Big Processing -Parallel Computation COS 418: Distributed Systems Lecture 21 Michael Freedman 2 Ex: Word count using partial aggregation Putting it together 1. Compute word counts from individual files
More informationHadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)
Hortonworks Hadoop-PR000007 Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer) http://killexams.com/pass4sure/exam-detail/hadoop-pr000007 QUESTION: 99 Which one of the following
More informationBig Data for Oracle DBAs. Arup Nanda
Big Data for Oracle DBAs Arup Nanda fcrawler.looksmart.com - - [26/Apr/2000:00:00:12-0400] "GET /contacts.html HTTP/1.0" 200 4595 "-" "FAST-WebCrawler/2.1-pre2 (ashen@looksmart.net)" fcrawle fcrawler.looksmart.com
More informationResearch challenges in data-intensive computing The Stratosphere Project Apache Flink
Research challenges in data-intensive computing The Stratosphere Project Apache Flink Seif Haridi KTH/SICS haridi@kth.se e2e-clouds.org Presented by: Seif Haridi May 2014 Research Areas Data-intensive
More informationCS 378 Big Data Programming
CS 378 Big Data Programming Lecture 5 Summariza9on Pa:erns CS 378 Fall 2017 Big Data Programming 1 Review Assignment 2 Ques9ons? mrunit How do you test map() or reduce() calls that produce mul9ple outputs?
More informationShark. Hive on Spark. Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker
Shark Hive on Spark Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Ion Stoica, Scott Shenker Agenda Intro to Spark Apache Hive Shark Shark s Improvements over Hive Demo Alpha
More informationCSC 261/461 Database Systems Lecture 19
CSC 261/461 Database Systems Lecture 19 Fall 2017 Announcements CIRC: CIRC is down!!! MongoDB and Spark (mini) projects are at stake. L Project 1 Milestone 4 is out Due date: Last date of class We will
More informationCompSci 516: Database Systems
CompSci 516 Database Systems Lecture 12 Map-Reduce and Spark Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Practice midterm posted on sakai First prepare and
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationCS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams
Question 1 (Bigtable) What is an SSTable in Bigtable? Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams It is the internal file format used to store Bigtable data. It maps keys
More informationDistributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018
Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams David Domingo Paul Krzyzanowski Rutgers University Fall 2018 November 28, 2018 1 Question 1 (Bigtable) What is an SSTable in
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lectures 16 17: Basics of Query Optimization and Cost Estimation (Ch. 15.{1,3,4.6,6} & 16.4-5) 1 Announcements WQ4 is due Friday 11pm HW3 is due next Tuesday 11pm Midterm is next
More informationData Analytics Job Guarantee Program
Data Analytics Job Guarantee Program 1. INSTALLATION OF VMWARE 2. MYSQL DATABASE 3. CORE JAVA 1.1 Types of Variable 1.2 Types of Datatype 1.3 Types of Modifiers 1.4 Types of constructors 1.5 Introduction
More informationGLADE: A Scalable Framework for Efficient Analytics. Florin Rusu (University of California, Merced) Alin Dobra (University of Florida)
DE: A Scalable Framework for Efficient Analytics Florin Rusu (University of California, Merced) Alin Dobra (University of Florida) Big Data Analytics Big Data Storage is cheap ($100 for 1TB disk) Everything
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationBig Data Development HADOOP Training - Workshop. FEB 12 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI
Big Data Development HADOOP Training - Workshop FEB 12 to 16 2017 (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI ISIDUS TECH TEAM FZE PO Box 9798 Dubai UAE, email training-coordinator@isidusnet M: +97150
More informationCompSci 516 Data Intensive Computing Systems
CompSci 516 Data Intensive Computing Systems Lecture 9 Join Algorithms and Query Optimizations Instructor: Sudeepa Roy CompSci 516: Data Intensive Computing Systems 1 Announcements Takeaway from Homework
More informationCSE 344 Final Review. August 16 th
CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join
More informationIntroduction to Data Management CSE 344. Lecture 12: Cost Estimation Relational Calculus
Introduction to Data Management CSE 344 Lecture 12: Cost Estimation Relational Calculus CSE 344 - Winter 2017 1 HW3 due tonight Announcements WQ4 and HW4 out Due on Thursday 2/9 2 Midterm! Monday, February
More informationMapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1
MapReduce-II September 2013 Alberto Abelló & Oscar Romero 1 Knowledge objectives 1. Enumerate the different kind of processes in the MapReduce framework 2. Explain the information kept in the master 3.
More informationApache Pig Releases. Table of contents
Table of contents 1 Download...3 2 News... 3 2.1 19 June, 2017: release 0.17.0 available...3 2.2 8 June, 2016: release 0.16.0 available...3 2.3 6 June, 2015: release 0.15.0 available...3 2.4 20 November,
More informationLecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018
Lecture 7 (03/12, 03/14): Hive and Impala Decisions, Operations & Information Technologies Robert H. Smith School of Business Spring, 2018 K. Zhang (pic source: mapr.com/blog) Copyright BUDT 2016 758 Where
More informationBig Data and Hadoop. Course Curriculum: Your 10 Module Learning Plan. About Edureka
Course Curriculum: Your 10 Module Learning Plan Big Data and Hadoop About Edureka Edureka is a leading e-learning platform providing live instructor-led interactive online training. We cater to professionals
More informationBatch Processing Basic architecture
Batch Processing Basic architecture in big data systems COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman 2 1 2 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 3
More information2/4/2019 Week 3- A Sangmi Lee Pallickara
Week 3-A-0 2/4/2019 Colorado State University, Spring 2019 Week 3-A-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 3. DISTRIBUTED COMPUTING MODELS FOR SCALABLE BATCH COMPUTING SECTION 1: MAPREDUCE PA1
More information