Implementing Mapreduce Algorithms In Hadoop Framework Guide : Dr. SOBHAN BABU
|
|
- Avis Holmes
- 6 years ago
- Views:
Transcription
1 Implementing Mapreduce Algorithms In Hadoop Framework Guide : Dr. SOBHAN BABU CS13B1033 T Satya Vasanth Reddy CS13B1035 Hrishikesh Vaidya CS13S1041 Arjun V Anand
2 Hadoop Architecture
3 Hadoop Architecture Name Node : The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. Data Node: A Data Node stores data in the Hadoop File System. A functional filesystem has more than one Data Node, with data replicated across them. On startup, a Data Node connects to the Name Node and responds to requests from Name Node. Task Tracker: A TaskTracker is a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a JobTracker.
4 Hadoop Architecture Job Tracker: The JobTracker is the service within Hadoop that farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data. 1. Client applications submit jobs to the Job tracker. 2. The JobTracker talks to the NameNode to determine the location of the data 3. The JobTracker locates TaskTracker nodes with available slots at or near the data 4. The JobTracker submits the work to the chosen TaskTracker nodes. 5. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
5 Hadoop Architecture 6. A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable. 7. When the work is completed, the JobTracker updates its status. The steps and detailed explanation for setting up the Hadoop multi node cluster are in the comprehensive report.
6 Single source shortest path algorithm OVERVIEW: A path in a graph can be defined as the set of consecutive nodes such that there is an edge from one node to the next node in the sequence. The shortest path between two nodes can be defined as the path that has the minimum total weight of the edges along the path. Variant of breadth first search is used to solve the single source shortest path problem.
7 LOGIC OF ALGORITHM : The single-source all pairs shortest path can be solved using MapReduce using parallel Breadth-First Search (BFS) in an iterative manner. The source node is processed first, then the nodes connected to the source node are processed and so on. Input Format : ID EDGE-EDGEWEIGHT DISTANCE_FROM_SOURCE COLOR COLOUR CODE White unvisited Gray - visited Black - Finished
8 INPUT FORMAT Before start all nodes are coloured white except source which will be grey. For eg., 1 2-2,3-6,6-3 0 GRAY Here, 1-2 edge weight is 2, 1-3 edge weight is 6 and so on and 0 distance from source indicates that it is the source and GREY indicates it is discovered.
9 Algorithm The gray node indicates that it is visited and its neighbors should be processed. All the nodes adjacent to a gray node that are white are changed to be gray colored indicating that the nodes are visited. The original gray node is colored black indicating that all its neighbors are visited and the processing of the node is finished. The process continues until there are no more gray nodes to process in the graph. INPUT 1 2-2,3-6,6-3 0 GRAY Integer.MAX_VALUE WHITE 3 1-6,4-4,5-1,6-1 Integer.MAX_VALUE WHITE 4 3-4,5-2 Integer.MAX_VALUE WHITE 5 3-1,4-2 Integer.MAX_VALUE WHITE 6 1-3,3-1 Integer.MAX_VALUE WHITE
10 Stages of Algorithm
11 Mapper Responsible for "exploding" all gray nodes - e.g. for exploding all nodes that live at our current depth in the tree. For each gray node, the mappers emit a new gray node, with distance = distance from source of gray node + weight of the edge. They also then emit the input gray node, but colored black. Mappers also emit all non-gray nodes, with no change.
12 After Map Iteration ,3-6,6-3 0 GRAY 2 NULL 2 GRAY 3 NULL 6 GRAY 6 NULL 3 GRAY Integer.MAX_VALUE WHITE 3 1-6,4-4,5-1,6-1 Integer.MAX_VALUE WHITE 4 3-4,5-2 Integer.MAX_VALUE WHITE 5 3-1,4-2 Integer.MAX_VALUE WHITE 6 1-3,3-1 Integer.MAX_VALUE WHITE
13 Reducer The reducers, of course, receive all data for a given key - in this case it means that they receive the data for all "copies" of each node. For example, the reducer that receives the data for key = 2 gets the following list of values : 2 NULL 2 GRAY Integer.MAX_VALUE WHITE The reducers job is to take all this data and construct a new node using the non-null list of edges the minimum distance the darkest color
14 After Iteration 1 Using this logic after the first iteration, the output will be 1 2-2,3-6,6-3 0 BLACK GRAY 3 1-6,4-4,5-1,6-1 6 GRAY 4 3-4,5-2 Integer.MAX_VALUE WHITE 5 3-1,4-2 Integer.MAX_VALUE WHITE 6 1-3,3-1 3 GRAY
15 Terminating Condition The iteration stops when there are no more grey nodes to process in the graph. FINAL OUTPUT: 1 2-2,3-6,6-3 0 BLACK BLACK 3 1-6,4-4,5-1,6-1 4 BLACK 4 3-4,5-2 9 BLACK 5 3-1,4-2 7 BLACK 6 1-3,3-1 3 BLACK
16 Analysis of Running times on Single Node and No. of Nodes in input graph Multi node cluster Time taken in Single Node Time taken in 3 Node cluster
17 All pair shortest Path Overview For calculating shortest path between all pair of vertices without using parallel computing, we use the standard Floyd Warshall algorithm. For implementing it in hadoop framework, the main task is to reduce the problem statement to key-value pairs. After nth iteration of relaxation, we get shortest path from node i to node j having path length at most n.
18 Input format lthe input format is in the form of nodeid and adjacency list. lthe graph is undirected and there can be multiple edges within a pair of vertices. The adjacency list has a list of pairs which denote the neighbouring vertex and the weight joining them. 1 3,43 2 4,18 3 2,31 1,32 4,14 4 2,27 3,23 5,48 5 1,23
19 Mapper Class lthe mapper class takes the entire file as input and parses it line by line. lfor each trio of vertices present in the graph it relaxes the edge weights. lif node i and node j are adjacent to node k then it sums dist(i,k) and dist(k,j) and sets it to dist(i,j). lthe implementation is similar to that of Floyd Warshall. It considers the k'th vertex to be present in the path from i to j. lfor all the vertices in the adjacency list it emits a new node with the same nodeid as that of the adjacent vertex.
20 Output of Mapper 1 3,43 2 4,18 3 2,31 1,32 4,14 4 2,27 3,23 5,48 5 1,23 3 1,43 4 2,18 2 3,31 1,63 4,45 1 3,32 2,63 4,46 4 3,14 2,45 1,46 2 4,27 3,50 5,75 3 4,23 2,50 5,71 5 4,48 2,75 3,71
21 Reducer lthe output of mapper is fed to the reducer. A list of values having the same key is sent to a particular reducer. lthe value is adjacency list having adjacent node and path weight from key to the list nodeid. lfor the shortest path the minimum of all the path weights to a particular vertex is considered and added to the adjacency list of the key. lafter each iteration the output file is generated by the reducer having shortest path from each node i to j.
22 Output of Reducer 1 2,63 3,32 4,46 5,23 2 1,63 3,31 4,18 5,75 3 1,32 2,31 4,14 5,71 4 1,46 2,18 3,14 5,48 5 1,23 2,75 3,71 4,48
23 Final output For each vertex a boolean variable is maintained to check whether we have got the minimum weights for all pairs. If the path weight gets updated for a particular node then it sets isconverged to false indicating that the current distance may not be the shortest. The final output for the above graph is : 1 2,63 3,32 4,46 5,23 2 1,63 3,31 4,18 5,75 3 1,32 2,31 4,14 5,71 4 1,46 2,18 3,14 5,48 5 1,23 2,75 3,71 4,48
24 Analysis of running times of single node and multi-node cluster S.no No. of nodes in input graph Time taken in single node (sec) Time taken in 3-node cluster(sec
25 Summary and future enhancements Map reduce is not efficient way for small inputs as creating the map and reduce jobs takes a considerable amount of time comparable to processing time. Before formatting the namenode it s a better practice to delete the namenode and datanode in Hadoop_store to ensure all the nodes get the same cluster id. Analysis was done using wireless network but the performance can be improved using LAN which has greater bandwidth
26 Summary and future enhancements Dijkstra s algorithm is more efficient because at any step it only pursues edges from the minimum-cost path inside the frontier. But our algorithm explores all paths in parallel which isn t as efficient overall. We are calculating the shortest path length here but we can find the trace of shortest path covered along with shortest distance by keeping track of the parent vertex.
27 Acknowledgement We would like to thank Dr.Sobhan Babu for guiding us and Ms.Samanvi and Mr.Kanishka Chauhan for helping us understand the concepts time to time. Thanks to Tanya Marwah for giving us an extra slave node. PS: Complete details of implementation in framework and procedure of analysis are encompassed in the extensive reports and the video.
28 Bibliography Google MapReduce Paper Wiki Hadoop Hadoop Operations by Eric Sammer BigData University
29 THANK YOU
Hortonworks HDPCD. Hortonworks Data Platform Certified Developer. Download Full Version :
Hortonworks HDPCD Hortonworks Data Platform Certified Developer Download Full Version : https://killexams.com/pass4sure/exam-detail/hdpcd QUESTION: 97 You write MapReduce job to process 100 files in HDFS.
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationCloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]
s@lm@n Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ] Question No : 1 Which two updates occur when a client application opens a stream
More informationHadoop-PR Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer)
Hortonworks Hadoop-PR000007 Hortonworks Certified Apache Hadoop 2.0 Developer (Pig and Hive Developer) http://killexams.com/pass4sure/exam-detail/hadoop-pr000007 QUESTION: 99 Which one of the following
More informationCCA-410. Cloudera. Cloudera Certified Administrator for Apache Hadoop (CCAH)
Cloudera CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Download Full Version : http://killexams.com/pass4sure/exam-detail/cca-410 Reference: CONFIGURATION PARAMETERS DFS.BLOCK.SIZE
More informationGraph Algorithms using Map-Reduce. Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web
Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some examples: The hyperlink structure of the web Graph Algorithms using Map-Reduce Graphs are ubiquitous in modern society. Some
More information3. Monitoring Scenarios
3. Monitoring Scenarios This section describes the following: Navigation Alerts Interval Rules Navigation Ambari SCOM Use the Ambari SCOM main navigation tree to browse cluster, HDFS and MapReduce performance
More informationGraph Algorithms. Revised based on the slides by Ruoming Kent State
Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationPARALLELIZATION OF BACKWARD DELETED DISTANCE CALCULATION IN GRAPH BASED FEATURES USING HADOOP JAYACHANDRAN PILLAMARI. B.E., Osmania University, 2009
PARALLELIZATION OF BACKWARD DELETED DISTANCE CALCULATION IN GRAPH BASED FEATURES USING HADOOP by JAYACHANDRAN PILLAMARI B.E., Osmania University, 2009 A REPORT submitted in partial fulfillment of the requirements
More informationKillTest *KIJGT 3WCNKV[ $GVVGT 5GTXKEG Q&A NZZV ]]] QORRZKYZ IUS =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX
KillTest Q&A Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Version : DEMO 1 / 4 1.When is the earliest point at which the reduce method of a given Reducer can be called?
More informationClustering Lecture 8: MapReduce
Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data
More information2/26/2017. For instance, consider running Word Count across 20 splits
Based on the slides of prof. Pietro Michiardi Hadoop Internals https://github.com/michiard/disc-cloud-course/raw/master/hadoop/hadoop.pdf Job: execution of a MapReduce application across a data set Task:
More informationHortonworks PR PowerCenter Data Integration 9.x Administrator Specialist.
Hortonworks PR000007 PowerCenter Data Integration 9.x Administrator Specialist https://killexams.com/pass4sure/exam-detail/pr000007 QUESTION: 102 When can a reduce class also serve as a combiner without
More informationCS6301 Programming and Data Structures II Unit -5 REPRESENTATION OF GRAPHS Graph and its representations Graph is a data structure that consists of following two components: 1. A finite set of vertices
More informationMI-PDB, MIE-PDB: Advanced Database Systems
MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:
More informationDatabase Applications (15-415)
Database Applications (15-415) Hadoop Lecture 24, April 23, 2014 Mohammad Hammoud Today Last Session: NoSQL databases Today s Session: Hadoop = HDFS + MapReduce Announcements: Final Exam is on Sunday April
More informationDHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI. Department of Computer Science and Engineering CS6301 PROGRAMMING DATA STRUCTURES II
DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Computer Science and Engineering CS6301 PROGRAMMING DATA STRUCTURES II Anna University 2 & 16 Mark Questions & Answers Year / Semester: II / III
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationOutlines: Graphs Part-2
Elementary Graph Algorithms PART-2 1 Outlines: Graphs Part-2 Graph Search Methods Breadth-First Search (BFS): BFS Algorithm BFS Example BFS Time Complexity Output of BFS: Shortest Path Breath-First Tree
More informationElementary Graph Algorithms. Ref: Chapter 22 of the text by Cormen et al. Representing a graph:
Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. Representing a graph: Graph G(V, E): V set of nodes (vertices); E set of edges. Notation: n = V and m = E. (Vertices are numbered
More informationTIE Graph algorithms
TIE-20106 1 1 Graph algorithms This chapter discusses the data structure that is a collection of points (called nodes or vertices) and connections between them (called edges or arcs) a graph. The common
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationHadoop MapReduce Framework
Hadoop MapReduce Framework Contents Hadoop MapReduce Framework Architecture Interaction Diagram of MapReduce Framework (Hadoop 1.0) Interaction Diagram of MapReduce Framework (Hadoop 2.0) Hadoop MapReduce
More informationBig Data for Engineers Spring Resource Management
Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models
More informationHadoop. copyright 2011 Trainologic LTD
Hadoop Hadoop is a framework for processing large amounts of data in a distributed manner. It can scale up to thousands of machines. It provides high-availability. Provides map-reduce functionality. Hides
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationTIE Graph algorithms
TIE-20106 239 11 Graph algorithms This chapter discusses the data structure that is a collection of points (called nodes or vertices) and connections between them (called edges or arcs) a graph. The common
More informationitpass4sure Helps you pass the actual test with valid and latest training material.
itpass4sure http://www.itpass4sure.com/ Helps you pass the actual test with valid and latest training material. Exam : CCD-410 Title : Cloudera Certified Developer for Apache Hadoop (CCDH) Vendor : Cloudera
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 5: Analyzing Graphs (2/2) February 2, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationVendor: Cloudera. Exam Code: CCA-505. Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam.
Vendor: Cloudera Exam Code: CCA-505 Exam Name: Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam Version: Demo QUESTION 1 You have installed a cluster running HDFS and MapReduce
More informationVendor: Cloudera. Exam Code: CCD-410. Exam Name: Cloudera Certified Developer for Apache Hadoop. Version: Demo
Vendor: Cloudera Exam Code: CCD-410 Exam Name: Cloudera Certified Developer for Apache Hadoop Version: Demo QUESTION 1 When is the earliest point at which the reduce method of a given Reducer can be called?
More informationA brief history on Hadoop
Hadoop Basics A brief history on Hadoop 2003 - Google launches project Nutch to handle billions of searches and indexing millions of web pages. Oct 2003 - Google releases papers with GFS (Google File System)
More informationDistributed Systems. CS422/522 Lecture17 17 November 2014
Distributed Systems CS422/522 Lecture17 17 November 2014 Lecture Outline Introduction Hadoop Chord What s a distributed system? What s a distributed system? A distributed system is a collection of loosely
More informationFigure 1: A directed graph.
1 Graphs A graph is a data structure that expresses relationships between objects. The objects are called nodes and the relationships are called edges. For example, social networks can be represented as
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationBig Data 7. Resource Management
Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage
More informationCSI 604 Elementary Graph Algorithms
CSI 604 Elementary Graph Algorithms Ref: Chapter 22 of the text by Cormen et al. (Second edition) 1 / 25 Graphs: Basic Definitions Undirected Graph G(V, E): V is set of nodes (or vertices) and E is the
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationFrom prex computation on PRAM for nding Euler tours to usage of Hadoop-framework for distributed breadth rst search
From prex computation on PRAM for nding Euler tours to usage of Hadoop-framework for distributed breadth rst search Mark Sevalnev November 22, 2010 1 Introduction In the era of parallelism problems can
More informationCloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University
Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed
More information1. Introduction (Sam) 2. Syntax and Semantics (Paul) 3. Compiler Architecture (Ben) 4. Runtime Environment (Kurry) 5. Testing (Jason) 6. Demo 7.
Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry Tran System Integrator Paul Tylkin Language Guru THE HOG LANGUAGE A scripting MapReduce language.
More informationGraph implementations :
Graphs Graph implementations : The two standard ways of representing a graph G = (V, E) are adjacency-matrices and collections of adjacencylists. The adjacency-lists are ideal for sparse trees those where
More informationMixing and matching virtual and physical HPC clusters. Paolo Anedda
Mixing and matching virtual and physical HPC clusters Paolo Anedda paolo.anedda@crs4.it HPC 2010 - Cetraro 22/06/2010 1 Outline Introduction Scalability Issues System architecture Conclusions & Future
More information2. True or false: even though BFS and DFS have the same space complexity, they do not always have the same worst case asymptotic time complexity.
1. T F: Consider a directed graph G = (V, E) and a vertex s V. Suppose that for all v V, there exists a directed path in G from s to v. Suppose that a DFS is run on G, starting from s. Then, true or false:
More informationChapter 14. Graphs Pearson Addison-Wesley. All rights reserved 14 A-1
Chapter 14 Graphs 2011 Pearson Addison-Wesley. All rights reserved 14 A-1 Terminology G = {V, E} A graph G consists of two sets A set V of vertices, or nodes A set E of edges A subgraph Consists of a subset
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part
More informationExam Questions CCA-505
Exam Questions CCA-505 Cloudera Certified Administrator for Apache Hadoop (CCAH) CDH5 Upgrade Exam https://www.2passeasy.com/dumps/cca-505/ 1.You want to understand more about how users browse you public
More informationParallel Genetic Algorithm to Solve Traveling Salesman Problem on MapReduce Framework using Hadoop Cluster
Parallel Genetic Algorithm to Solve Traveling Salesman Problem on MapReduce Framework using Hadoop Cluster Abstract- Traveling Salesman Problem (TSP) is one of the most common studied problems in combinatorial
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Winter 215 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Due next Thursday evening Will send out reimbursement codes later
More informationInria, Rennes Bretagne Atlantique Research Center
Hadoop TP 1 Shadi Ibrahim Inria, Rennes Bretagne Atlantique Research Center Getting started with Hadoop Prerequisites Basic Configuration Starting Hadoop Verifying cluster operation Hadoop INRIA S.IBRAHIM
More informationCS490 Quiz 1. This is the written part of Quiz 1. The quiz is closed book; in particular, no notes, calculators and cell phones are allowed.
CS490 Quiz 1 NAME: STUDENT NO: SIGNATURE: This is the written part of Quiz 1. The quiz is closed book; in particular, no notes, calculators and cell phones are allowed. Not all questions are of the same
More informationDistributed Face Recognition Using Hadoop
Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,
More informationTop 25 Hadoop Admin Interview Questions and Answers
Top 25 Hadoop Admin Interview Questions and Answers 1) What daemons are needed to run a Hadoop cluster? DataNode, NameNode, TaskTracker, and JobTracker are required to run Hadoop cluster. 2) Which OS are
More informationTrees. Arash Rafiey. 20 October, 2015
20 October, 2015 Definition Let G = (V, E) be a loop-free undirected graph. G is called a tree if G is connected and contains no cycle. Definition Let G = (V, E) be a loop-free undirected graph. G is called
More information22.1 Representations of graphs
22.1 Representations of graphs There are two standard ways to represent a (directed or undirected) graph G = (V,E), where V is the set of vertices (or nodes) and E is the set of edges (or links). Adjacency
More informationA Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud
Calhoun: The NPS Institutional Archive Faculty and Researcher Publications Faculty and Researcher Publications 2013-03 A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the
More informationBasic Graph Algorithms
Basic Graph Algorithms 1 Representations of Graphs There are two standard ways to represent a graph G(V, E) where V is the set of vertices and E is the set of edges. adjacency list representation adjacency
More informationCMPSC 250 Analysis of Algorithms Spring 2018 Dr. Aravind Mohan Shortest Paths April 16, 2018
1 CMPSC 250 Analysis of Algorithms Spring 2018 Dr. Aravind Mohan Shortest Paths April 16, 2018 Shortest Paths The discussion in these notes captures the essence of Dijkstra s algorithm discussed in textbook
More informationcsci 210: Data Structures Graph Traversals
csci 210: Data Structures Graph Traversals Graph traversal (BFS and DFS) G can be undirected or directed We think about coloring each vertex WHITE before we start GRAY after we visit a vertex but before
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationMapReduce and Hadoop. Debapriyo Majumdar Indian Statistical Institute Kolkata
MapReduce and Hadoop Debapriyo Majumdar Indian Statistical Institute Kolkata debapriyo@isical.ac.in Let s keep the intro short Modern data mining: process immense amount of data quickly Exploit parallelism
More informationCS 341: Algorithms. Douglas R. Stinson. David R. Cheriton School of Computer Science University of Waterloo. February 26, 2019
CS 341: Algorithms Douglas R. Stinson David R. Cheriton School of Computer Science University of Waterloo February 26, 2019 D.R. Stinson (SCS) CS 341 February 26, 2019 1 / 296 1 Course Information 2 Introduction
More informationProblem 1. Which of the following is true of functions =100 +log and = + log? Problem 2. Which of the following is true of functions = 2 and =3?
Multiple-choice Problems: Problem 1. Which of the following is true of functions =100+log and =+log? a) = b) =Ω c) =Θ d) All of the above e) None of the above Problem 2. Which of the following is true
More informationProgramming Models MapReduce
Programming Models MapReduce Majd Sakr, Garth Gibson, Greg Ganger, Raja Sambasivan 15-719/18-847b Advanced Cloud Computing Fall 2013 Sep 23, 2013 1 MapReduce In a Nutshell MapReduce incorporates two phases
More informationGraphs. Tessema M. Mengistu Department of Computer Science Southern Illinois University Carbondale Room - Faner 3131
Graphs Tessema M. Mengistu Department of Computer Science Southern Illinois University Carbondale tessema.mengistu@siu.edu Room - Faner 3131 1 Outline Introduction to Graphs Graph Traversals Finding a
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationBig Data and Scripting map reduce in Hadoop
Big Data and Scripting map reduce in Hadoop 1, 2, connecting to last session set up a local map reduce distribution enable execution of map reduce implementations using local file system only all tasks
More informationYour First Hadoop App, Step by Step
Learn Hadoop in one evening Your First Hadoop App, Step by Step Martynas 1 Miliauskas @mmiliauskas Your First Hadoop App, Step by Step By Martynas Miliauskas Published in 2013 by Martynas Miliauskas On
More informationLECTURE 26 PRIM S ALGORITHM
DATA STRUCTURES AND ALGORITHMS LECTURE 26 IMRAN IHSAN ASSISTANT PROFESSOR AIR UNIVERSITY, ISLAMABAD STRATEGY Suppose we take a vertex Given a single vertex v 1, it forms a minimum spanning tree on one
More informationLECTURE 17 GRAPH TRAVERSALS
DATA STRUCTURES AND ALGORITHMS LECTURE 17 GRAPH TRAVERSALS IMRAN IHSAN ASSISTANT PROFESSOR AIR UNIVERSITY, ISLAMABAD STRATEGIES Traversals of graphs are also called searches We can use either breadth-first
More informationActual4Dumps. Provide you with the latest actual exam dumps, and help you succeed
Actual4Dumps http://www.actual4dumps.com Provide you with the latest actual exam dumps, and help you succeed Exam : HDPCD Title : Hortonworks Data Platform Certified Developer Vendor : Hortonworks Version
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationCS490: Problem Solving in Computer Science Lecture 6: Introductory Graph Theory
CS490: Problem Solving in Computer Science Lecture 6: Introductory Graph Theory Dustin Tseng Mike Li Wednesday January 16, 2006 Dustin Tseng Mike Li: CS490: Problem Solving in Computer Science, Lecture
More informationThe Shortest Path Problem
The Shortest Path Problem 1 Shortest-Path Algorithms Find the shortest path from point A to point B Shortest in time, distance, cost, Numerous applications Map navigation Flight itineraries Circuit wiring
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationINF280 Graph Traversals & Paths
INF280 Graph Traversals & Paths Florian Brandner February 27, 2018 1/21 Contents Introduction Simple Traversals Depth-First Search Breadth-First Search Finding Paths Dijkstra Bellman-Ford Floyd-Warshall
More informationTP1-2: Analyzing Hadoop Logs
TP1-2: Analyzing Hadoop Logs Shadi Ibrahim January 26th, 2017 MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development
More informationHomework Assignment #3 Graph
CISC 4080 Computer Algorithms Spring, 2019 Homework Assignment #3 Graph Some of the problems are adapted from problems in the book Introduction to Algorithms by Cormen, Leiserson and Rivest, and some are
More informationExamTorrent. Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you
ExamTorrent http://www.examtorrent.com Best exam torrent, excellent test torrent, valid exam dumps are here waiting for you Exam : Apache-Hadoop-Developer Title : Hadoop 2.0 Certification exam for Pig
More informationMap-Reduce Applications: Counting, Graph Shortest Paths
Map-Reduce Applications: Counting, Graph Shortest Paths Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationCS350: Data Structures Dijkstra s Shortest Path Alg.
Dijkstra s Shortest Path Alg. James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Shortest Path Algorithms Several different shortest path algorithms exist
More informationPractical Session No. 12 Graphs, BFS, DFS, Topological sort
Practical Session No. 12 Graphs, BFS, DFS, Topological sort Graphs and BFS Graph G = (V, E) Graph Representations (V G ) v1 v n V(G) = V - Set of all vertices in G E(G) = E - Set of all edges (u,v) in
More informationCS 220: Discrete Structures and their Applications. graphs zybooks chapter 10
CS 220: Discrete Structures and their Applications graphs zybooks chapter 10 directed graphs A collection of vertices and directed edges What can this represent? undirected graphs A collection of vertices
More informationOverview. Why MapReduce? What is MapReduce? The Hadoop Distributed File System Cloudera, Inc.
MapReduce and HDFS This presentation includes course content University of Washington Redistributed under the Creative Commons Attribution 3.0 license. All other contents: Overview Why MapReduce? What
More informationComparative Analysis of K means Clustering Sequentially And Parallely
Comparative Analysis of K means Clustering Sequentially And Parallely Kavya D S 1, Chaitra D Desai 2 1 M.tech, Computer Science and Engineering, REVA ITM, Bangalore, India 2 REVA ITM, Bangalore, India
More informationGraph Algorithms. Chapter 22. CPTR 430 Algorithms Graph Algorithms 1
Graph Algorithms Chapter 22 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms? Mathematical graphs seem to be relatively specialized and abstract Why spend so much time and effort on algorithms
More informationBigData and Map Reduce VITMAC03
BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to
More informationBreadth First Search. cse2011 section 13.3 of textbook
Breadth irst Search cse section. of textbook Graph raversal (.) Application example Given a graph representation and a vertex s in the graph, find all paths from s to the other vertices. wo common graph
More informationA Novel Approach for Workload Optimization and Improving Security in Cloud Computing Environments
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 2, Ver. 1 (Mar Apr. 2015), PP 20-27 www.iosrjournals.org A Novel Approach for Workload Optimization
More informationBreadth First Search. Graph Traversal. CSE 2011 Winter Application examples. Two common graph traversal algorithms
Breadth irst Search CSE Winter Graph raversal Application examples Given a graph representation and a vertex s in the graph ind all paths from s to the other vertices wo common graph traversal algorithms
More informationEvaluation of Apache Hadoop for parallel data analysis with ROOT
Evaluation of Apache Hadoop for parallel data analysis with ROOT S Lehrack, G Duckeck, J Ebke Ludwigs-Maximilians-University Munich, Chair of elementary particle physics, Am Coulombwall 1, D-85748 Garching,
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationCSC263 Week 8. Larry Zhang.
CSC263 Week 8 Larry Zhang http://goo.gl/forms/s9yie3597b Announcements (strike related) Lectures go as normal Tutorial this week everyone go to BA32 (T8, F2, F2, F3) Problem sets / Assignments are submitted
More informationFacilitating Consistency Check between Specification & Implementation with MapReduce Framework
Facilitating Consistency Check between Specification & Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, Keijiro ARAKI Kyushu University, Japan 2 Our expectation Light-weight formal
More informationDeployment Planning Guide
Deployment Planning Guide Community 1.5.1 release The purpose of this document is to educate the user about the different strategies that can be adopted to optimize the usage of Jumbune on Hadoop and also
More informationApril Final Quiz COSC MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model.
1. MapReduce Programming a) Explain briefly the main ideas and components of the MapReduce programming model. MapReduce is a framework for processing big data which processes data in two phases, a Map
More informationImproved MapReduce k-means Clustering Algorithm with Combiner
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering
More informationLocal Algorithms for Sparse Spanning Graphs
Local Algorithms for Sparse Spanning Graphs Reut Levi Dana Ron Ronitt Rubinfeld Intro slides based on a talk given by Reut Levi Minimum Spanning Graph (Spanning Tree) Local Access to a Minimum Spanning
More information