Computer Science Department Picnic. FAQs. Topics. This material is developed based on, 8/27/2018 Week 2-A Sangmi Lee Pallickara
|
|
- Pearl Bradford
- 5 years ago
- Views:
Transcription
1 Week 2- W FLL FLL 2018 W2..1 omputer Science epartment Picnic No ell-phones in the class. Welcome to the cademic year! If you need to use a laptop, please sit in the back row. I will ask you to turn off your laptop if it seems to be distracting to others. eet your faculty, department staff, and fellow students in a social setting. Food and drink will be provided. PT 1. LGE SLE T NLYTIS When: September 1st Time: 1pm 4pm Where: ity Park Shelter #7 omputer Science, olorado State University - FLL 2018 W FLL 2018 FQs pics P1 has been posted Introduction to apeduce apeduce esign Pattern I. Summarization Patterns September 18, 5:00P via anvas Individual submission (No team submission) W2..3 Team for term project 3-4 members ug. 30, 5:00P via anvas Looking for team members? Post your advertisement on Piazza! - FLL 2018 W FLL 2018 W2..5 This material is developed based on, nand ajaraman, Jure Leskovec, and Jeffrey Ullman, ining of assive atasets, ambridge University Press, hapter 2 Part 1. Large Scale ata nalytics Introduction to apeduce ownload this chapter from the S435 schedule page Hadoop: The definitive Guide, m White, eilly, 3rd Edition, 2014 apeduce esign Patterns, onald iner and dam Shook, eilly,
2 Week 2- - FLL 2018 W FLL 2018 W2..7 N data example national climate data center record Find the maximum temperature of a year (1900 ~ 1999 ) apeduce Example # USF weather station identifier # WN weather station identifier # observation date 0300 # observation time # latitude (degrees x 1000) # longitude (degrees x 1000) F # elevation (meters) V # wind direction (degrees) 1 # quality code N - FLL 2018 W2..8 The first entries for 1990 % ls raw/ 1990 head gz gz gz gz gz gz gz gz gz gz - FLL 2018 W2..9 nalyzing the data with Unix ols (1/2) program for finding the maximum recorded temperature by year from N weather records #!/ usr/ bin/ env bash for year in all/* do echo -ne ` basename $ year.gz ` \t gunzip -c $ year \awk '{ temp = substr( $ 0, 88, 5) + 0; q = substr( $ 0, 93, 1); if (temp!= 9999 && q ~ /[01459]/ && temp > max) max = temp EN { print max ' one - FLL 2018 W2..10 nalyzing the data with Unix ols (2/2) The script loops through the compressed year files Printing the year Processing each file using awk Extracts two fields ir temperature and the quality code heck if it is greater than the maximum value seen so far %./ max_temperature.sh FLL 2018 W2..11 esults? The complete run for the century took 42 minutes speed up the processing We need to run parts of the program in parallel Process different years in different processes What will be the problems? 2
3 Week 2- - FLL 2018 W2..12 hallenges ividing the work into equal-size pieces ata size per year? ombining the results from independent processes ombining results and sorting by year? You are still limited by the processing capacity of a single machine (the worst one)! - FLL 2018 W2..13 ap and educe apeduce works by breaking the processing into two phases The map phase The reduce phase Each phase has key-value pairs as input and output Programmers should specify Types of input/output key-values The map function The reduce function - FLL 2018 W2..14 Visualizing the way the apeduce works (1/3) Sample lines of input data N N N N N These lines are presented to the map function as the key-value pairs (0, N ) (106, N ) (212, N ) The keys are the line offsets within the file (optional) - FLL 2018 W2..15 Visualizing the way the apeduce works (2/3) The map function extracts the year and the air temperature and emit them as its output (1950, 0) (1950, 22) (1950, 11) (1949, 111) (1949, 78) This output key-value pairs will be sorted (by key) and grouped by key s passed to each reducer are NT sorted ur reduce function will see the following input: (1949, [111, 78]) (1950, [0, 22, 11]) - FLL 2018 W2..16 Visualizing the way the apeduce works (3/3) - FLL 2018 W2..17 omparison with other systems educe function iterates through the list and pick up the maximum reading (1949, 111) (1950, 22) This is the final output input r (0, ) (106, ) (212, ) (318, ) map (1950, 0)(1950, 22) (1950, -11) suffle (1949, [111,78]) (1950, [0,22,-11] reduce (1949, 111) (1950, 22) output 1949, , 22 PI vs. apeduce apeduce tries to collocate the data with the compute node ata access is fast ata is local! Volunteer computing vs. apeduce SETI@home Using donated PU time 3
4 Week 2- - FLL 2018 W FLL 2018 W2..19 ataset Network communication log file apeduce Example 3 Source estination There are two attributes and (IP1, IP2) : There was a communication from IP1 to IP2 - FLL 2018 W2..20 etrieve records with as a source IP heck each record and provide as output only those records containing as a source IP FLL 2018 W2..21 etrieve records with as the source ap function For each record, check the condition (source == and produce the key-value pair ( , record) How many reducers will be needed? educe function For the shuffled pairs (( , [record, record, record]) eturns ( ,,[record, record]) as the result of eliminating duplications - FLL 2018 W FLL 2018 W2..23 This material is developed based on, Part 1. Large Scale ata nalytics esign Pattern 1: Summarization Patterns nand ajaraman, Jure Leskovec, and Jeffrey Ullman, ining of assive atasets, ambridge University Press, hapter 2 ownload this chapter from the S435 schedule page Hadoop: The definitive Guide, m White, eilly, 3 rd Edition, 2014 apeduce esign Patterns, onald iner and dam Shook, eilly,
5 Week 2- - FLL 2018 W2..24 Why esign Patterns? ols for solving problems eusable and providing a general framework evelopers can spend less time figuring out how she/he is going to solve the problem - FLL 2018 W2..25 Summarization Patterns esign patterns that produce a top-level, summarized view of your data Helps you to glean insights not available from looking at a localized set of records alone Numerical summarizations ax/in, ount, verage, edian, Standard eviation Inverted index - FLL 2018 W FLL 2018 W2..27 s: Introduction esign Pattern 1: Summarization Patterns s Pattern escription General pattern for calculating aggregate statistical values over data Intent Group records together by a key field and calculate a numerical aggregate per group to get a top-level view of the larger data set pplicability Numerical summarizations should be used when both of the followings are true Numerical data or counting The data can be grouped by specific fields - FLL 2018 W2..28 s: Structure utput utput - FLL 2018 W2..29 Example 1. inimum, aximum and ount alculating the in, ax and ount of s per user USE-I ap function For each record, ap function produces the key-value pair (key, value) 1. esign your ap function educe function For the shuffled pairs (key, [a list of values]) eturns (key, value) as the result 2. esign your educe function 5
6 Week 2- - FLL 2018 W2..30 Example 1. inimum, aximum and ount alculating the in, ax and ount of per user USE-I ap function For each record, ap function produces the key-value pair (key, value) Input: <dummy_key, a list of values (a line of string)> utput: < USE-I, a data structure contains (in, ax, ount)> Functionality: kenizing the string, fill out the output data structure, emit an <key,value> pair educe function Input: (userid, [a list of data structures]) eturns (userid, [ax, in, ount]) as the result Functionality: Iterates over the list and calculate in, ax and ount. - FLL 2018 W2..31 apper ode public static class inaxountapper extends apper < bject, Text, Text, inaxounttuple > { // ur output key and value Writables private Text outuserid = new Text(); private inaxounttuple outtuple = new inaxounttuple(); public void map(bject key, Text value, ontext context) throws IException, InterruptedException { ap < String, String > parsed = transformxmlap(value.tostring()); // ssume we have developed a class, parse. // Grab the " field since it is what we are finding // the min and max value of String value = parsed.get( value ); // Grab the UserI since it is what we are grouping by String userid = parsed.get("userid"); // Set the minimum and maximum date values to the value outtuple.setin(value); outtuple.setax(value); // Set the comment count to 1 outtuple.setount(1); // Set our user I as the output key outuserid.set(userid); // Write out the hour and the average comment length context.write(outuserid, outtuple); - FLL 2018 W2..32 educer ode public static class inaxounteducer extends educer < Text, inaxounttuple, Text, inaxounttuple > { // ur output value Writable private inaxounttuple result = new inaxounttuple(); public void reduce(text key, Iterable < inaxounttuple > values, ontext context) throws IException, InterruptedException { // Initialize our result result.setin(null); result.setax(null); result.setount(0); int sum = 0; - FLL 2018 W2..33 educer ode --continued // Iterate through all input values for this key for (inaxounttuple val : values) { // If the value's min is less than the result's min // Set the result's min to value's if (result.getin()==null val.getin().compare(result.getin())<0) { result.setin(val.getin()); // If the value's max is more than the result's max // Set the result's max to value's if (result.getax()==null val.getax(). compare(result.getax())>0) { result.setax(val.getax()); // dd to our sum the count for value sum += val.getount(); //Set our count to the number of input values result.setount(sum); context.write(key, result); - FLL 2018 W2..34 Example 2. verage alculating an average of s per user USE-I ap function For each record, ap function produces the key-value pair (key, value) 1. esign your ap function 2. esign your educe function educe function For the shuffled pairs (key, [a list of values]) eturns (key, value) as the result - FLL 2018 W2..35 Example 2. verage alculating an verage of s per user USE-I ap function emit <useri, > educe function For the shuffled pairs (useri, [a list of values]) eturns (useri, average-value) as the result 6
7 Week 2- - FLL 2018 W FLL 2018 W2..37 Hadoop ombiner inimizes the data transferred between map and reduce tasks esign Pattern 1: Summarization Patterns ombiner Functions Users can specify a combiner function be run on the map output replace the map output with the combiner output Hadoop does NT guarantee how many times it will call combiner for a particular map output record - FLL 2018 W2..38 Example: Find the maximum temperature First map produces (1950, 0) (1950, 20) (1950, 10) Second map produces (1950, 25) (1950, 15) Input to the reduce function (1950, [0, 20, 10, 25, 15]) utput (1950, 25) - FLL 2018 W2..39 If a combiner finds the maximum temperature for each map output: First map produces (1950, 0) (1950, 20) (1950, 10) à (1950, 20) Second map produces (1950, 25) (1950, 15) à (1950, 25) Input to the reduce function (1950, [0, 20, 10, 25, 15]) à(1950, [20, 25]) utput (1950, 25)à(1950, 25) - FLL 2018 W2..40 an we use a combiner function for finding average value? - FLL 2018 W2..41 Example 2. verage alculating the verage of s per user USE-I ap function For each record, ap function produces the key-value pair (key, value) 1. esign your ap function 2. esign your ombiner function 3. esign your educe function educe function For the shuffled pairs (key, [a list of values]) eturns (key, value) as the result 7
8 Week 2- - FLL 2018 W2..42 Example 2. verage alculating the verage of s per user USE-I ap function emit <useri, (, count)> ombiner function alculate local average eturns (useri, (local-average-value, local-count)) as the result educe function For the shuffled pairs (useri, [a list of (local-average-values, local-count)]) eturns (useri, global-average-value) as the result - FLL 2018 W2..43 esign Pattern 1: Summarization Patterns Inverted Index - FLL 2018 W2..44 Inverted Index Generate Index from a data dataset to map from contents, such as words or numbers educes the amount of time to find related items Keyword based search, Web search, and document search e.g. dding Stackverflow links to each Wikipedia page that is reference in a Stackverflow comment - FLL 2018 W2..45 Structure * Example of an inverted index for document 1 and 2 (1, and 2) 1= olorado State University {olorado, State, University 2= University of olorado {University, of, olorado Inverted Index olorado {1, 2 State - {1 University {1, 2 of {2 - FLL 2018 W2..46 Inverted index of Stackverflow links to Wikipedia dding Stackverflow links to each Wikipedia page that is reference in a Stackverflow comment - FLL 2018 W2..47 Inverted index of Stackverflow links to Wikipedia ommenti E F G H I J omment omment-1 omment-2 omment-3 omment-4 omment-5 omment-6 omment-7 omment-8 omment-9 omment-10 ap function For each record, ap function produces the key-value pair (key, value) 1. esign your ap function 2. esign your educe function educe function For the shuffled pairs (key, [a list of values]) eturns (key, value) as the result 8
9 Week 2- - FLL 2018 W2..48 Inverted index of Stackverflow links to Wikipedia ommenti E F G H I J omment omment-1 omment-2 omment-3 omment-4 omment-5 omment-6 omment-7 omment-8 omment-9 omment-10 ap function If there is any link included in comment emit <link, commenti> educe function ll the commentis that referred same wikipedia page will be grouped together - FLL 2018 W2..49 apper ode public static class WikipediaExtractor extends apper < bject, Text, Text, Text > { private Text link = new Text(); private Text outkey = new Text(); public void map(bject key, Text value, ontext context) throws IException, InterruptedException { ap <String, String> parsed = PUtils.transformXmlap(value.toString()); // Grab the necessary XL attributes String txt = parsed.get("ody"); String posttype = parsed.get("posttypeid"); String row_id = parsed.get("id"); // if the body is null, or the post is a question (1), skip if (txt == null (posttype!= null && posttype.equals("1"))) { return; // Unescape the HTL because the S data is escaped. txt = StringEscapeUtils.unescapeHtml(txt.toLowerase()); link.set(getwikipediaul(txt)); outkey.set(row_id); context.write(link, outkey); - FLL 2018 W2..50 Questions? 9
1/30/2019 Week 2- B Sangmi Lee Pallickara
Week 2-A-0 1/30/2019 Colorado State University, Spring 2019 Week 2-A-1 CS535 BIG DATA FAQs PART A. BIG DATA TECHNOLOGY 3. DISTRIBUTED COMPUTING MODELS FOR SCALABLE BATCH COMPUTING Term project deliverable
More informationCS435 Introduction to Big Data FALL 2018 Colorado State University. 9/24/2018 Week 6-A Sangmi Lee Pallickara. Topics. This material is built based on,
FLL 218 olorado State University 9/24/218 Week 6-9/24/218 S435 Introduction to ig ata - FLL 218 W6... PRT 1. LRGE SLE T NLYTIS WE-SLE LINK N SOIL NETWORK NLYSIS omputer Science, olorado State University
More informationCS535 Big Data Fall 2017 Colorado State University 9/5/2017. Week 3 - A. FAQs. This material is built based on,
S535 ig ata Fall 217 olorado State University 9/5/217 Week 3-9/5/217 S535 ig ata - Fall 217 Week 3--1 S535 IG T FQs Programming ssignment 1 We will discuss link analysis in week3 Installation/configuration
More informationCS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [MAPREDUCE & HADOOP] Does Shrideep write the poems on these title slides? Yes, he does. These musing are resolutely on track For obscurity shores, from whence
More informationFAQs. Topics. This Material is Built Based on, Analytics Process Model. 8/22/2018 Week 1-B Sangmi Lee Pallickara
CS435 Introduction to Big Data Week 1-B W1.B.0 CS435 Introduction to Big Data No Cell-phones in the class. W1.B.1 FAQs PA0 has been posted If you need to use a laptop, please sit in the back row. August
More informationTopics covered in this lecture
9/5/2018 CS435 Introduction to Big Data - FALL 2018 W3.B.0 CS435 Introduction to Big Data 9/5/2018 CS435 Introduction to Big Data - FALL 2018 W3.B.1 FAQs How does Hadoop mapreduce run the map instance?
More informationParallel Processing - MapReduce and FlumeJava. Amir H. Payberah 14/09/2018
Parallel Processing - MapReduce and FlumeJava Amir H. Payberah payberah@kth.se 14/09/2018 The Course Web Page https://id2221kth.github.io 1 / 83 Where Are We? 2 / 83 What do we do when there is too much
More informationW1.A.0 W2.A.0 1/22/2018 1/22/2018. CS435 Introduction to Big Data. FAQs. Readings
CS435 Introduction to Big Data 1/17/2018 W2.A.0 W1.A.0 CS435 Introduction to Big Data W2.A.1.A.1 FAQs PA0 has been posted Feb. 6, 5:00PM via Canvas Individual submission (No team submission) Accommodation
More informationComputer Science 572 Exam Prof. Horowitz Monday, November 27, 2017, 8:00am 9:00am
Computer Science 572 Exam Prof. Horowitz Monday, November 27, 2017, 8:00am 9:00am Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.
More informationCS435 Introduction to Big Data Spring 2018 Colorado State University. 2/5/2018 Week 4-A Sangmi Lee Pallickara. FAQs. Total Order Sorting Pattern
W4.A.0.0 CS435 Introduction to Big Data W4.A.1 FAQs PA0 submission is open Feb. 6, 5:00PM via Canvas Individual submission (No team submission) If you have not been assigned the port range, please contact
More informationBig Data and Scripting map reduce in Hadoop
Big Data and Scripting map reduce in Hadoop 1, 2, connecting to last session set up a local map reduce distribution enable execution of map reduce implementations using local file system only all tasks
More informationVi & Shell Scripting
Vi & Shell Scripting Comp-206 : Introduction to Week 3 Joseph Vybihal Computer Science McGill University Announcements Sina Meraji's office hours Trottier 3rd floor open area Tuesday 1:30 2:30 PM Thursday
More informationMap Reduce & Hadoop Recommended Text:
Map Reduce & Hadoop Recommended Text: Hadoop: The Definitive Guide Tom White O Reilly 2010 VMware Inc. All rights reserved Big Data! Large datasets are becoming more common The New York Stock Exchange
More informationOutline. MAC (Medium Access Control) General MAC Requirements. Typical MAC protocols. Typical MAC protocols
Outline Medium ccess ontrol With oordinated daptive Sleeping for Wireless Sensor Networks Presented by: rik rooks Introduction to M S-M Overview S-M Evaluation ritique omparison to MW Washington University
More informationMapReduce: Algorithm Design for Relational Operations
MapReduce: Algorithm Design for Relational Operations Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec Projection π Projection in MapReduce Easy Map over tuples, emit
More informationComputer Science 572 Exam Prof. Horowitz Tuesday, April 24, 2017, 8:00am 9:00am
Computer Science 572 Exam Prof. Horowitz Tuesday, April 24, 2017, 8:00am 9:00am Name: Student Id Number: 1. This is a closed book exam. 2. Please answer all questions. 3. There are a total of 40 questions.
More informationSix Core Data Wrangling Activities. An introductory guide to data wrangling with Trifacta
Six Core Data Wrangling Activities An introductory guide to data wrangling with Trifacta Today s Data Driven Culture Are you inundated with data? Today, most organizations are collecting as much data in
More informationJava in MapReduce. Scope
Java in MapReduce Kevin Swingler Scope A specific look at the Java code you might use for performing MapReduce in Hadoop Java program recap The map method The reduce method The whole program Running on
More informationMapReduce Simplified Data Processing on Large Clusters
MapReduce Simplified Data Processing on Large Clusters Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) MapReduce 1393/8/5 1 /
More informationData Logging In this Chapter... Cloud Data Logging Pausing, Deactivating, & Terminating Your Datalogging Subscription...
ata Logging In this hapter... hapter loud ata Logging... - Why loud ata Logging?... - How oes the loud ata Logger Work?... - onfigure loud ata Logging On Your evice... - Live Monitor... - ata Reports...
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [MAPREDUCE] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Bit Torrent What is the right chunk/piece
More informationShell scripting and system variables. HORT Lecture 5 Instructor: Kranthi Varala
Shell scripting and system variables HORT 59000 Lecture 5 Instructor: Kranthi Varala Text editors Programs built to assist creation and manipulation of text files, typically scripts. nano : easy-to-learn,
More informationData Analyst Nanodegree Syllabus
Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working
More informationCOLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA
COLLABORATIVE LOCATION AND ACTIVITY RECOMMENDATIONS WITH GPS HISTORY DATA Vincent W. Zheng, Yu Zheng, Xing Xie, Qiang Yang Hong Kong University of Science and Technology Microsoft Research Asia WWW 2010
More informationDr. Chuck Cartledge. 4 Feb. 2015
CS-495/595 Hadoop (part 1) Lecture #3 Dr. Chuck Cartledge 4 Feb. 2015 1/23 Table of contents I 1 Miscellanea 2 Assignment 3 The Book 4 Chapter 1 5 Chapter 2 7 Break 8 Assignment #2 9 Conclusion 10 References
More informationCOMPUTER SCIENCE DEPARTMENT PICNIC. Operations. Push the power button and hold. Once the light begins blinking, enter the room code
COMPUTER SCIENCE DEPARTMENT PICNIC Welcome to the 2016-2017 Academic year! Meet your faculty, department staff, and fellow students in a social setting. Food and drink will be provided. When: Saturday,
More informationAssignment 3, Due October 4
Assignment 3, Due October 4 1 Summary This assignment gives you practice with writing shell scripts. Shell scripting is also known as bash programming. Your shell is bash, and when you write a shell script
More informationFormat specification for the SMET Weather Station Meteorological Data Format version 1.1
Format specification for the SMET Weather Station Meteorological Data Format version 1.1 Mathias Bavay November 28, 2017 Abstract The goal of this data format is to ease the exchange of meteorological
More information6.189 D ay 6. Name: R eadings. E x er cise 6. 1 { W ar mup: F inding B ugs
6.189 D ay 6 Name: T urn in your written answers to 6.1 and 6.2, stapled to your printed code les, tomorrow at the start of lecture. R eadings How T o T hink L ike A C omputer Scientist, chapter 10. E
More informationClick Stream Data Analysis Using Hadoop
Governors State University OPUS Open Portal to University Scholarship All Capstone Projects Student Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 2: Hadoop Nuts and Bolts Jimmy Lin University of Maryland Thursday, January 31, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationHadoop 3.X more examples
Hadoop 3.X more examples Big Data - 09/04/2018 Let s start with some examples! http://www.dia.uniroma3.it/~dvr/es2_material.zip Example: LastFM Listeners per Track Consider the following log file UserId
More informationParallel Computing: MapReduce Jin, Hai
Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google
More informationLarge-scale Information Processing
Sommer 2013 Large-scale Information Processing Ulf Brefeld Knowledge Mining & Assessment brefeld@kma.informatik.tu-darmstadt.de Anecdotal evidence... I think there is a world market for about five computers,
More information28-Nov CSCI 2132 Software Development Lecture 33: Shell Scripting. 26 Shell Scripting. Faculty of Computer Science, Dalhousie University
Lecture 33 p.1 Faculty of Computer Science, Dalhousie University CSCI 2132 Software Development Lecture 33: Shell Scripting 28-Nov-2018 Location: Chemistry 125 Time: 12:35 13:25 Instructor: Vla Keselj
More informationECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing
ECE5610/CSC6220 Introduction to Parallel and Distribution Computing Lecture 6: MapReduce in Parallel Computing 1 MapReduce: Simplified Data Processing Motivation Large-Scale Data Processing on Large Clusters
More informationChuck Cartledge, PhD. 19 January 2018
Big Data: Data Analysis Boot Camp Introduction and Overview Chuck Cartledge, PhD 19 January 2018 1/11 Table of contents (1 of 1) 1 Introduction The global view 2 Overview The world from 50,000 feet. 3
More informationHEAPS & PRIORITY QUEUES
HEAPS & PRIORITY QUEUES Lecture 13 CS2110 Spring 2018 Announcements 2 A4 goes out today! Prelim 1: regrades are open a few rubrics have changed No Recitations next week (Fall Break Mon & Tue) We ll spend
More informationFall 2017 ECEN Special Topics in Data Mining and Analysis
Fall 2017 ECEN 689-600 Special Topics in Data Mining and Analysis Nick Duffield Department of Electrical & Computer Engineering Teas A&M University Organization Organization Instructor: Nick Duffield,
More informationWHAT IS A DATABASE? There are at least six commonly known database types: flat, hierarchical, network, relational, dimensional, and object.
1 WHAT IS A DATABASE? A database is any organized collection of data that fulfills some purpose. As weather researchers, you will often have to access and evaluate large amounts of weather data, and this
More informationCOMPUTER SCIENCE DEPARTMENT PICNIC. Operations. Push the power button and hold. Once the light begins blinking, enter the room code
COMPUTER SCIENCE DEPARTMENT PICNIC Welcome to the 2016-2017 Academic year! Meet your faculty, department staff, and fellow students in a social setting. Food and drink will be provided. When: Saturday,
More informationBig Data Analysis using Hadoop. Map-Reduce An Introduction. Lecture 2
Big Data Analysis using Hadoop Map-Reduce An Introduction Lecture 2 Last Week - Recap 1 In this class Examine the Map-Reduce Framework What work each of the MR stages does Mapper Shuffle and Sort Reducer
More informationClustering Documents. Document Retrieval. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April, 2017 Sham Kakade 2017 1 Document Retrieval n Goal: Retrieve
More informationData science e tecnologie per le basi di dati Practice 2 Data Studio
copyright Politecnico di Torino - Tutti i diritti riservati Data science e tecnologie per le basi di dati Practice 2 Data Studio 1. Login Connect to Google Data Studio, login with your Google Account or
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model
More informationChapter 8. I/O operations initiated by program instructions. Requests to processor for service from an I/O device
hapter 8 The I/O subsystem I/O buses and addresses Programmed I/O I/O operations initiated by program instructions I/O interrupts Requests to processor for service from an I/O device irect Memory ccess
More informationIntroduction to Map/Reduce. Kostas Solomos Computer Science Department University of Crete, Greece
Introduction to Map/Reduce Kostas Solomos Computer Science Department University of Crete, Greece What we will cover What is MapReduce? How does it work? A simple word count example (the Hello World! of
More informationDATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843
DATA CUBE : A RELATIONAL AGGREGATION OPERATOR GENERALIZING GROUP-BY, CROSS-TAB AND SUB-TOTALS SNEHA REDDY BEZAWADA CMPT 843 WHAT IS A DATA CUBE? The Data Cube or Cube operator produces N-dimensional answers
More informationGuest Lecture. Daniel Dao & Nick Buroojy
Guest Lecture Daniel Dao & Nick Buroojy OVERVIEW What is Civitas Learning What We Do Mission Statement Demo What I Do How I Use Databases Nick Buroojy WHAT IS CIVITAS LEARNING Civitas Learning Mid-sized
More informationCS 109 C/C ++ Programming for Engineers w. MatLab Summer 2012 Homework Assignment 4 Functions Involving Barycentric Coordinates and Files
CS 109 C/C ++ Programming for Engineers w. MatLab Summer 2012 Homework Assignment 4 Functions Involving Barycentric Coordinates and Files Due: Wednesday July 11th by 8:00 a.m., via Blackboard. Optional
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More information[3-5] Consider a Combiner that tracks local counts of followers and emits only the local top10 users with their number of followers.
Quiz 6 We design of a MapReduce application that generates the top-10 most popular Twitter users (based on the number of followers) of a month based on the number of followers. Suppose that you have the
More informationCS:APP Chapter 4 Computer Architecture Logic Design Randal E. Bryant Carnegie Mellon University
S:PP hapter 4 omputer rchitecture Logic esign Randal E. Bryant arnegie Mellon University http://csapp.cs.cmu.edu S:PP Overview of Logic esign Fundamental Hardware Requirements ommunication How to get values
More informationAfter each stream monitoring event, someone (only one person, please) must enter your team s data in SWIMS. These instructions explain the process.
Entering Volunteer Stream Monitoring Data into SWIMS (SWIMS) is the acronym for the Wisconsin Department of Natural Resources (WDNR) Surface Water Integrated Monitoring System database. This database is
More information3.2 BINARY SEARCH TREES. BSTs ordered operations iteration deletion. Algorithms ROBERT SEDGEWICK KEVIN WAYNE.
3.2 BINY T lgorithms BTs ordered operations iteration deletion OBT DGWIK KVIN WYN http://algs4.cs.princeton.edu Binary search trees Definition. BT is a binary tree in symmetric order. binary tree is either:
More information/ Cloud Computing. Recitation 2 January 19 & 21, 2016
15-319 / 15-619 Cloud Computing Recitation 2 January 19 & 21, 2016 Accessing the Course Open Learning Initiative (OLI) Course Access via Blackboard http://theproject.zone AWS Account Setup Azure Account
More informationStudent Name: University of California at Berkeley College of Engineering Department of Electrical Engineering and Computer Science
University of alifornia at erkeley ollege of Engineering epartment of Electrical Engineering and omputer Science EES 5 all 2 R. H. Katz IRST MITERM EXMINTION Tuesday, 3 October 2 INSTRUTIONS RE THEM NOW!
More informationQuick Start Guide. RainWise CC-2000 Edition. Visit our Website to Register Your Copy (weatherview32.com)
Quick Start Guide RainWise CC-2000 Edition Visit our Website to Register Your Copy (weatherview32.com) Insert the WV32 installation CD in an available drive, or run the downloaded install file wvsetup80.exe.
More informationData Structure Chapter 6
ata Structure hapter Non-inary Trees r. Patrick han School of omputer Science and ngineering South hina University of Technology Outline Non-inary (eneral) Tree (h.) Parent Pointer mplementation (h.) ist
More informationClustering Documents. Case Study 2: Document Retrieval
Case Study 2: Document Retrieval Clustering Documents Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 21 th, 2015 Sham Kakade 2016 1 Document Retrieval Goal: Retrieve
More informationCSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA
CSE 3302 Lecture 11: Map/Reduce 7 October 2010 Nate Nystrom UTA 378,000 results in 0.17 seconds including images and video communicates with 1000s of machines web server index servers document servers
More informationCOMP4442. Service and Cloud Computing. Lab 12: MapReduce. Prof. George Baciu PQ838.
COMP4442 Service and Cloud Computing Lab 12: MapReduce www.comp.polyu.edu.hk/~csgeorge/comp4442 Prof. George Baciu csgeorge@comp.polyu.edu.hk PQ838 1 Contents Introduction to MapReduce A WordCount example
More informationHadoop 2.X on a cluster environment
Hadoop 2.X on a cluster environment Big Data - 05/04/2017 Hadoop 2 on AMAZON Hadoop 2 on AMAZON Hadoop 2 on AMAZON Regions Hadoop 2 on AMAZON S3 and buckets Hadoop 2 on AMAZON S3 and buckets Hadoop 2 on
More informationThe Esterel Language. The Esterel Version. Basic Ideas of Esterel
The Synchronous Language Esterel OMS W4995-02 Prof. Stephen. Edwards Fall 2002 olumbia University epartment of omputer Science The Esterel Language eveloped by Gérard erry starting 1983 Originally for
More informationCHAPTER CONFIGURING THE ERM AND SLAVE MODULES WITH ERM WORKBENCH. In This Chapter:
ONFIGURING THE ERM N SLVE MOULES WITH ERM WORKENH HPTER In This hapter: ERM Workbench Software............................................ Launching ERM Workbench...........................................
More informationHomework 6: Pose Tracking EE267 Virtual Reality 2018
Homework 6: Pose Tracking EE267 Virtual Reality 218 Due: 5/17/218, 11:59pm Instructions Students should use the Arduino environment and JavaScript for this assignment, building on top of the provided starter
More informationPace University. Fundamental Concepts of CS121 1
Pace University Fundamental Concepts of CS121 1 Dr. Lixin Tao http://csis.pace.edu/~lixin Computer Science Department Pace University October 12, 2005 This document complements my tutorial Introduction
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 2: MapReduce Algorithm Design (1/2) January 10, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationBigData and MapReduce with Hadoop
BigData and MapReduce with Hadoop Ivan Tomašić 1, Roman Trobec 1, Aleksandra Rashkovska 1, Matjaž Depolli 1, Peter Mežnar 2, Andrej Lipej 2 1 Jožef Stefan Institute, Jamova 39, 1000 Ljubljana 2 TURBOINŠTITUT
More informationWhat is Routing? EE 122: Shortest Path Routing. Example. Internet Routing. Ion Stoica TAs: Junda Liu, DK Moon, David Zats
What is Routing? Routing implements the core function of a network: : Shortest Path Routing Ion Stoica Ts: Junda Liu, K Moon, avid Zats http://inst.eecs.berkeley.edu/~ee/fa9 (Materials with thanks to Vern
More informationexample: name1=jan name2=mike export name1 In this example, name1 is an environmental variable while name2 is a local variable.
Bourne Shell Programming Variables - creating and assigning variables Bourne shell use the set and unset to create and assign values to variables or typing the variable name, an equal sign and the value
More informationParallel Data Processing with Hadoop/MapReduce. CS140 Tao Yang, 2014
Parallel Data Processing with Hadoop/MapReduce CS140 Tao Yang, 2014 Overview What is MapReduce? Example with word counting Parallel data processing with MapReduce Hadoop file system More application example
More informationCS 457 Networking and the Internet. Shortest-Path Problem. Dijkstra s Shortest-Path Algorithm 9/29/16. Fall 2016
9/9/6 S 7 Networking and the Internet Fall 06 Shortest-Path Problem Given: network topology with link costs c(x,y): link cost from node x to node y Infinity if x and y are not direct neighbors ompute:
More informationCOMP2012H Spring 2014 Dekai Wu. Sorting. (more on sorting algorithms: mergesort, quicksort, heapsort)
COMP2012H Spring 2014 Dekai Wu Sorting (more on sorting algorithms: mergesort, quicksort, heapsort) Merge Sort Recursive sorting strategy. Let s look at merge(.. ) first. COMP2012H (Sorting) 2 COMP2012H
More informationOperations. I Forgot 9/4/2016 COMPUTER SCIENCE DEPARTMENT PICNIC. If you forgot your IClicker, or your batteries fail during the exam
COMPUTER SCIENCE DEPARTMENT PICNIC Welcome to the 2016-2017 Academic year! Meet your faculty, department staff, and fellow students in a social setting. Food and drink will be provided. When: Saturday,
More informationChuck Cartledge, PhD. 12 October 2018
Big Data: Data Analysis Boot Camp Introduction and Overview Chuck Cartledge, PhD 12 October 2018 1/14 Table of contents (1 of 1) 1 Introduction The global view 2 Overview The world from 50,000 feet. Text
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 16. Big Data Management VI (MapReduce Programming)
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 16 Big Data Management VI (MapReduce Programming) Credits: Pietro Michiardi (Eurecom): Scalable Algorithm
More informationINTERMEDIATE SQL GOING BEYOND THE SELECT. Created by Brian Duffey
INTERMEDIATE SQL GOING BEYOND THE SELECT Created by Brian Duffey WHO I AM Brian Duffey 3 years consultant at michaels, ross, and cole 9+ years SQL user What have I used SQL for? ROADMAP Introduction 1.
More informationMap-Reduce Applications: Counting, Graph Shortest Paths
Map-Reduce Applications: Counting, Graph Shortest Paths Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationAlgorithms. Algorithms 3.2 BINARY SEARCH TREES. BSTs ordered operations iteration deletion (see book or videos) ROBERT SEDGEWICK KEVIN WAYNE
lgorithms OBT DGWIK KVIN WYN 3.2 BINY T lgorithms F O U T D I T I O N BTs ordered operations iteration deletion (see book or videos) OBT DGWIK KVIN WYN https://algs4.cs.princeton.edu Last updated on 10/9/18
More informationDept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [THREAD SAFETY & MAPREDUCE] Are you set on reinventing the wheel? Shunning libraries and frameworks, are you, despite the peril? Emerge scathed, from arduous
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationHierarchy. No or very little supervision Some heuristic quality guidances on the quality of the hierarchy. Jian Pei: CMPT 459/741 Clustering (2) 1
Hierarchy An arrangement or classification of things according to inclusiveness A natural way of abstraction, summarization, compression, and simplification for understanding Typical setting: organize
More informationQ1: Functions / 33 Q2: Arrays / 47 Q3: Multiple choice / 20 TOTAL SCORE / 100 Q4: EXTRA CREDIT / 10
EECE.2160: ECE Application Programming Spring 2018 Exam 2 March 30, 2018 Name: Lecture time (circle 1): 8-8:50 (Sec. 201) 12-12:50 (Sec. 202) For this exam, you may use only one 8.5 x 11 double-sided page
More informationChapter. System design and. configuration. In This Chapter
hapter System design and configuration In This hapter L0 System esign Strategies... Module Placement... Power udgeting... onfiguring the L0 s omm Ports... onnecting to MOUS and irectnet Networks... Non
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationCS 231 Data Structures and Algorithms Fall Binary Search Trees Lecture 23 October 29, Prof. Zadia Codabux
CS 231 Data Structures and Algorithms Fall 2018 Binary Search Trees Lecture 23 October 29, 2018 Prof. Zadia Codabux 1 Agenda Ternary Operator Binary Search Tree Node based implementation Complexity 2 Administrative
More informationSymbol Tables 1 / 15
Symbol Tables 1 / 15 Outline 1 What is a Symbol Table? 2 API 3 Sample Clients 4 Implementations 5 Performance Characteristics 2 / 15 What is a Symbol Table? A symbol table is a data structure for key-value
More informationDS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li
Welcome to DS595/CS525: Urban Network Analysis --Urban Mobility Prof. Yanhua Li Time: 6:00pm 8:50pm Wednesday Location: Fuller 320 Spring 2017 2 Team assignment Finalized. (Great!) Guest Speaker 2/22 A
More informationECE 2036 Lab 1: Introduction to Software Objects
ECE 2036 Lab 1: Introduction to Software Objects Assigned: Aug 24/25 2015 Due: September 1, 2015 by 11:59 PM Reading: Deitel& Deitel Chapter 2-4 Student Name: Check Off/Score Part 1: Check Off/Score Part
More informationConditional Control Structures. Dr.T.Logeswari
Conditional Control Structures Dr.T.Logeswari TEST COMMAND test expression Or [ expression ] Syntax Ex: a=5; b=10 test $a eq $b ; echo $? [ $a eq $b] ; echo $? 2 Unix Shell Programming - Forouzan 2 TEST
More informationHadoop Streaming. Table of contents. Content-Type text/html; utf-8
Content-Type text/html; utf-8 Table of contents 1 Hadoop Streaming...3 2 How Does Streaming Work... 3 3 Package Files With Job Submissions...4 4 Streaming Options and Usage...4 4.1 Mapper-Only Jobs...
More informationTCP/IP Networking. Part 3: Forwarding and Routing
TP/IP Networking Part 3: Forwarding and Routing Routing of IP Packets There are two parts to routing IP packets:. How to pass a packet from an input interface to the output interface of a router ( IP forwarding
More informationWAN Technology and Routing
PS 60 - Network Programming WN Technology and Routing Michele Weigle epartment of omputer Science lemson University mweigle@cs.clemson.edu March, 00 http://www.cs.clemson.edu/~mweigle/courses/cpsc60 WN
More informationCSC209. Software Tools and Systems Programming. https://mcs.utm.utoronto.ca/~209
CSC209 Software Tools and Systems Programming https://mcs.utm.utoronto.ca/~209 What is this Course About? Software Tools Using them Building them Systems Programming Quirks of C The file system System
More informationCS 61C: Great Ideas in Computer Architecture. MapReduce
CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing
More informationAssignment 2. Summary. Some Important bash Instructions. CSci132 Practical UNIX and Programming Assignment 2, Fall Prof.
Assignment 2 Summary The purpose of this assignment is to give you some practice in bash scripting. When you write a bash script, you are really writing a program in the bash programming language. In class
More informationDistributed computing: index building and use
Distributed computing: index building and use Distributed computing Goals Distributing computation across several machines to Do one computation faster - latency Do more computations in given time - throughput
More informationCS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University
9/5/6 CS Introduction to Computing II Wayne Snyder Department Boston University Today: Arrays (D and D) Methods Program structure Fields vs local variables Next time: Program structure continued: Classes
More information