From prex computation on PRAM for nding Euler tours to usage of Hadoop-framework for distributed breadth rst search

Size: px

Start display at page:

Download "From prex computation on PRAM for nding Euler tours to usage of Hadoop-framework for distributed breadth rst search"

Imogen Adams
5 years ago
Views:

1 From prex computation on PRAM for nding Euler tours to usage of Hadoop-framework for distributed breadth rst search Mark Sevalnev November 22, Introduction In the era of parallelism problems can be solved in a less time by simply increasing the amount of computing nodes. The challenge is then of course to express the problem so that the parallel computing can be used. Many problems are impossible to parallelize by their nature, the example of such problems is Depth First Search (DFS). Also, as parallel algorithms are much more complicated then sequential ones, there is a big wish to get parallel versions of sequential algorithms systematically. Euler tour is a problem that can be computed parallely. For that dierent techniques can be used. One of them is prex computation. This technique is simple enough but covers a very broad set of problems. The drawback of the prex computation is that we need to reshape problems to make them applicable to be solved using prex computation. Hadoop is a software framework to create and run parallel programs. It implements map/reduceparadigm in which you reshape the problem into two partsyou have to specify what is the mapperprogram and what is the reducer-program after which you just feed data to those. Hadoop works in parallel by splitting the input les and sending them rst to mappers from which processed data is sent to reducers. As Hadoop hides all details of parallelism a programmer need not to understand the underlying implementation of parallel algorithms. It's needed just to t the whole program into a mapper and a reducer and Hadoop will care about the rest. This paper introduces two techniques to solve problems. First one is an algorithm design technique, which tells you what can be done if some constrains are satised. The second is an application which implements another design technique. We will study which problems can be solved by prex computation and Hadoop, how eciently they can be solved and how much designer's work is used then. 2 Algorithms on pointer-based data structures 2.1 Prex computation The prex computation technique is an ecient way to process data stored in linked lists parallely. Many computational problems can be solved by transforming them into linked lists and than applying the prex computation technique. The main idea behind the prex computation is a pointer jumping. If we have a tree, in which a leaf wants to access the root, than using sequential algorithm we can perform this in time linearly proportional to the number of nodes between the leaf and the root. On the other hand, using a parallel algorithm allows us to access the root from the leaf in logarithmic time. As we know, each tree node except the root have the pointer to its parent. The logarithmic time can be achieved if we instruct the nodes to update their parent pointers in the following manner: if node x points to node y and node y points to node z, then node x makes its pointer to point to node z. You can see from the picture how nodes update their parent pointers. Here is another example where prex computation is used, given a linked list L it is required to perform the following computation: x 1, x 1 x 2, x 1 x 2 x 3, In a sequential setting, we have a pointer 1

2 Figure 1: An example of prex computation 2

3 to the head of the list and we perform the prex computation by a single traversal of L, and the time required is linear in the number of nodes in the list. Now assume we want to solve the same problem in parallel. If all what we have is a pointer to the head of the list, there can be very little to improve. But in a typical parallel setting it is likely that each processor has a pointer to each own node. L is probably constructed in parallel, each processor contributing a node. A parallel algorithm for prex computation on L uses pointer jumping. In each iteration a processor: 1) Uses -operation to combine its own value with the value stored in its successor node. 2) Makes its successor pointer to point to the its successor's successor node. Although no processor knows the number of the nodes in the list, the algorithm terminates when all nodes point to nil. The algorithm PRAM LINKED LIST PREFIX is given below. It is assumed that we have as many processors as there is nodes in the list. Also the algorithm uses a next elds not to mix up succ elds. for all i do in parallel next(i) := succ(i) end for finished := false while not finished do finished := true for all i do in parallel if next(i)!= nil then val(next(i)) := val(i) * val(next(i)) next(i) := next(next(i)) end if if next!= nil then finished := (COMMON) false end if end for end while The algorithm runs in logarithmic time and uses linear number of processors. We can use the prex computation as a subroutine to perform another useful computation named list ranking. Given linked list L we may want to know the distance from each node to the end of the list. Specically, for each node i for which succ(i) nil, we wish to compute: rank(i) = rank(succ(i))+1, if succ(i) = nil, then rank(i) = 0. Sequentially the problem is solved by traversing the list from beginning to the end and converting each node to point to its predecessor. After that the list is traversed in opposite direction assigning a rank to each node visited. This takes O(n) time. In parallel this can be again solved in O(log(n)) time. First, each processor makes its successor node to point to itself. Second, value 1 is assigned to each node. Finally, the prex computation is performed with operation taken as +.[1] 2.2 Euler tours Now when we learned the concept of prex computation, we can use it to compute something more practical namely Euler tours. Euler tour in a graph is a list of edges such that every edge of the graph is present exactly once in the list and consecutive edges are neighbors in the graph. If Euler tour exists in a graph than all vertices have an even degree. This comes from the fact that each vertex must be entered and leaved which contributes to two edges, if the vertex is visited more than once, 2 n edges is needed, where n is the number of visits. This leads to the fact that Euler tours always exist in directed trees. By directed trees we mean any undirected tree which is turned into directed by splitting every edge into two going in both directions. Given a directed tree DT with n vertices, we describe a parallel algorithm for computing an Euler tour ET of DT. The input to the algorithm is set of liked lists in which DT is stored. Every vertex has its own linked list in which outgoing edges are stored. A node ij in the linked list for vertex v i consists of two elds, a eld edge containing edge (v i, v j ) and a eld next containing a pointer to the next node. The purpose of the algorithm is to arrange the edges of DT into single list such that each edge (v i, v j ) is followed by an edge (v j, v k ). 3

4 On a PRAM, we assume the availability of n 1 processors, with each processor P ij, i < j, in charge of two edges of DT, namely (v i, v j ) and (v j, v i ). The task of each processor is to determine the position of each edge in the nal linked list forming ET. It does so by determining the successor of its two edges as follows. If, in the linked list for v j, edge (v j, v i ) is followed by some edge (v j, v k ), then the successor of (v i, v j ) in ET is (v j, v k ). Otherwisethat is, if (v j, v i ) is the last edge in the linked list for v j then the successor of (v i, v j ) in ET is the rst edge in the linked list of v j. The successor of (v j, v i ) is computed similarly. The same formally: Successor of (vi, vj) If next(ji) == jk then succ(ij) := jk else succ(ij) := head(vj) Successor of (vj, vi) If next(ij) == im then succ(ji) := im else succ(ji) := head(vi) The successor of each edge is found in constant time thus the ET on the DT with n vertices is found in constant time using n 1 processors.[1] 3 Solving problems using an application platform 3.1 Map/reduce paradigm Hadoop is a software platform which implements map/reduce-paradigm. Map/reduce-paradigm was inspired by a functional programming paradigm. A map is just any function dened by user and applied to a list of values produces a list of results, for example it can be power-of-threefunction: cube(x) = x x x. So calling that for a list [1,2,3,4,5] will result [1,8,27,64,125]. A reduce is also any user-dened function that takes as an input a list of values, processes them in some order and returns a single result. For instance it can be factorial-functiion: res := 1; while there is elements in input do {res = next_element} return res. Thus given a list [1,2,3,4,5] the reducer returns 1*2*3*4*5 in other words 120. Another common example of map/reduce is word counting. One can count words occuring in a text document in a parallel manner. To implement this map-function is dened to assign every word number 1, and reduce-function to sum together numbers attached to each word. As separate word work as keys and they are grouped together after which their ones are summedthis gives of course the number of occurances in the text. [2] 3.2 Hadoop implementation Hadoop is an implementation of map/reduce paradigm. By using Hadoop you can in a distributive manner process petabytes of data on nodes. Hadoop takes data in a key/value-form which is then split and fed to a number of user-dened maps-functions. Each mapper processes the data appropriately and outputs the list of intermediate key/value pairs. The key/value list is sorted according to the keys and partitioned to number of reducers such that no piece of data with the same key can be fed to dierent reducers. Every partition is then sent to the user-dened reduce function. A reducer outputs the nal list of key/value pairs after its own data procession. Hadoop's map/reduce framework is built on the top of Hadoop distributed le system (HDFS) whose architecture resembles the Google le system a lot. [3] The basic idea of the HDFS is that it is denoted to be run on inexpensive hardware for large data-intensive applications. To improve faulttolerance and eciency HDFS distributes the same data amoung several nodes. If some cluster's datanode goes down the data which was assigned to it is resent to other datanodes from the replica datanodes still storing this data. HDFS is also aware of the data location for all replicas. It is rackaware and tries to send the data to the nearest available datanode in respect of that data location, so it optimises dataow across a network. 4

5 Figure 2: How master/slave architecture on Hadoop looks like Hadoop consists of two layersmap/reduce layer and HDFS layer. HDFS layer provides infrastructure for Map/Reduce layer and Map/Reduce layer executes tasks sent to a mappers and reducers. HDFS provides a familiar le system interface, les are organized hierarchically and identied by pathnames. HDFS layer contains a namenode, a secondary namenode and a datanode. The namenode is set up on an arbitrary computer which is going to be a master and it keeps information of all data replicas. HDFS splits data into blocks, such that all blocks have the same size (64 Mb by default) except possibly the last block of some le. A large block size oers some advantages, for example this reduces the need to access the namenode as les consist of a few blocks. The namenode also ensures that every piece of data is replicated on three dierent machines. When a client wants to access some piece of data, the namenode translates the requested data to blocks' locations the data consists of and returns the blocks' numbers and their locations to the client. The secondary namenode takes periodically snapshots of namenode's logs for further debugging in the case the namenode crashes. Datanodes work on slave machines. Slave machines are computers in a cluster on which Hadoop's datanodes are running. In the Map/Reduce layer there is a jobtracker and a tasktracker. The jobtracker works on master machine and decides which task will be sent to which tasktracker. Tasktrackers execute tasks assigned to them Distributed breadth rst search For state space exploration can be used two well-known algorihtmsdepth rst search (DFS) and breadth rst search (BFS). They are both equally ecient and widely used to traverse the nodes of a graph. DFS is also used as subroutine of nding strongly connected components of a graph. However, the problem with a DFS is that it cannot be paralellized [4]. As a DFS is a central algorithm in many model checkers, to round the problem with its unparalellization other graph-traversing algorithms was introduced. However, most of them are not as ecient as DFS or BFS. Instead of DFS, BFS can be paralellized. Distributed breadth rst search (DBFS), as we will call the parallel version of BFS, work as follows. It shares nodes in a frontier between execution nodes and every execution node generates the successors of its nodes. Then produced nodes are gathered together, duplicates are removed as well as nodes that were already seen on the previous iterations. The rest is a new frontier. It is concatenated with a seen nodes' list and feed again to the execution nodes. This is repeated until there is no unseen nodes. The eciency of DBFS is 1 5

6 O((V + E) V log(v )), where V is a total number of nodes and E is a total number of edges in a graph. O(V + E) is a time eciency of BFS and a factor V log(v ) comes from the fact that on every iteration of DBFS a frontier should be extracted. Theoretically we can achieve in DBFS the same eciency as in BFS. For that we need a table to inspect in constant time which nodes are already visited. The problem is that table should have an exponential size in respect of the number of nodes. 3.4 Simple node space exploration We designed several implementations of DBFS to run on Hadoop. Each of them have their weaknesses and strengths. The easier implementations give rough idea behind DBFS but they are not optimal. The more ecient ones are on the other hand more complicated and thus error-prone. The rst implementation, called simple node space exploration, puts nodes into input folder, calles Hadoop which in turn generates the successors of that nodes plus nodes themselves and outputs them into output folder. This is repeated until le containing nodes doesn't grow any more. Formally this algorithm looks like that: { S 0 = initial_node S i+1 = i j=0 get_successors(s i), i N In termes of algorihm the above mathematical notation will be look like this: S := getinitialstate(model); new_size := getsize(s); do old_size := new_size S := generatesuccessors(model, S); S := sort(s); S := removeduplicates(s); new_size := getsize(s); until (old_size!= new_size) Nodes of the graph were expressed as a bitvector. Because the nal purpose was to test this approach on Hadoop and provisionally the bottle neck of Hadoop is network's bandwidth, we decided to compress a node represanting character bit-vector into binary bit-vector. In this approach we divide a node into eight-number groups and every group is encoded into one integer. Because with eight bit it is possible to get numbers up to 255 (number which represented using three characters), we use three characters instead of original eight. In that way we save space in the price of additional computation. For ecient duplicates' removement we need rst to sort nodes-le. For that we used standard UNIX-command sort. After that we remove duplicates with self-made program Non-duplicateoutput. The size of nodes-le is got using standard UNIX-command wc. We implemented the above algorithm for run it sequentially. In this way we get a rough upperbound on a parallel version of the same algorihtm. It isn't exact because the same thing can be done more eciently but it gives the border we don't want to exceed. The parallel version of this algorithm looks pretty the same with the exception that sorting and duplicate removement is done by Hadoop. So, here keys are the nodes of a node space, values are not used, a mapper is a program which outputs the given node and the successors of that node and a reducer is a identity function. Internally Hadoop works as follows: it splits an input le, assigns every split to some mapper, the mappers in turn output the keys in other words nodes they got and the successors of that nodes. Hadoop sorts the output of a mappers with respect to a keys (actually there is no values). The reducers copy the sorted output from each mapper using HTTP across the network (this part is called shue) 2. Simultaneously the reduces merge sorts the keys because the same key may come from a dierent mappers. This phaseshue and sort will turn out to be the slowest part 3. After that the reducers execute the secondary sort on values of every key, but this (1) 6

7 Figure 3: How master/slave architecture on Hadoop looks like is not done becauce there is no values. Then the actual reduce function is calledidentity function and the result is output. 3.5 Node space exploration using set substraction The second version of the algorithm distinguishes between internal nodesnodes whose successors we have already seen and frontier nodesnodes whose successors might be some new, unseen nodes. So, here in addition to keysnodes we use values which can take two dierent values f or s. s means that the node is internal and it is no reason to generate its successors and f means that it is frontier node and its successors should be generated. The algorithm puts nodes into input folder, calles Hadoop which in turn generates the successors of that nodes plus nodes themselves for frontier nodes and outputs them into output folder. This is repeated until le containing frontier nodes has at least one node, formally: S 0 = initial_node F 0 = initial_node S i+1 = S i get_successors(f i ), i N F i+1 = S i+1 \ S i, i N (2) In termes of algorihm the above mathematical notation will be look like this: S := getinitialstate(model); F := S; frontier_size := getsize(f); do F' := generatesuccessors(model, F); F := F' / S; frontier_size := getsize(f); until (frontier_size > 0) As with the case of the rst algorithm we implemented this algoritm as sequential script to get rough bound of a running time. When running this algorithm on Hadoop, it works as follows. First, internal nodeskeys are put into one input le with the value s and frontier les to another with the value f. The input folder with these les is split and fed to the mappers. The mapper reads rst the value of a key, if it is s, keynode is output as it and if it is f, the mapper outputs this key with the value s, generates the successors of that node and outputs them with the value f. To reduce network trac we added also combiners to this point. Usually the combiner executes the reducer's function, it processes the mappers' output more eciently than reducer as the output of the mapper is available 7

8 in memory. 4 The combiner in our algorithm does slightly dierent function as the reducer because it sees the output of a single mapper only. Our combiner looks through the value list of each key, if it nds there the value s, it outputs the key with the value s and if it doesn't it outputs the key with the value f. In such a way we save time by doing some of a reducer's work but data already available in memory (not on a disk) and save space (and thus time) by sending for example only < , < f >> and not < , < f, f, f, f, f, f >>. 4 Conclusions There is many ways to solve problems parallely. In this paper we concentrated on two approaches algorithm design technique named prex computation and map/reduce. Also we introduced an application platform which implements map/reduce. Some pros and cons of both approaches were discussed and many dierent examples of problems to be solved by those techniques were inspected. As we said prex computation is more general and thus applicable for the broader set of problems. The drawback of this is that more time for algorithm design is needed. Map/reduce is more simplier but a much work of design is already completed. As we saw from experiments on Hadoop, its running time of node space generation besides the size of the graph strongly depends on the number of iterations. The minimum running time of one Hadoop call takes about half a minute no matter how small amount of data is fed. It is needed for starting mappers and reducers. Thus we get the lower bound of how fast Hadoop can solve some instance. To round this problem multiple iterations within one run can be donewe can generate not only successors of given node but also successors of successors. References [1] Selim G. Akl. Parallel Computation models and methods. Prentice Hall, [2] Jerey Dean and Sanjay Ghemawat. Mapreduce simplied data processing on large clusters. Communications of the ACM, 51(1):107113, [3] Sanjay Ghemawat, Howard Gobio,, and Shun-Tak Leung. The google le system. In 19th ACM Symposium on Operating Systems Principles, [4] John H. Reif. Depth-rst search is inherently sequential

Euler Tours and Their Applications. Chris Moultrie CSc 8530

Euler Tours and Their Applications Chris Moultrie CSc 8530 Topics Covered History Terminology Sequential Algorithm Parallel Algorithm BSP/CGM Algorithm History Started with the famous Konigsberg problem.