Fault Tolerant Parallel Data-Intensive Algorithms

Size: px
Start display at page:

Download "Fault Tolerant Parallel Data-Intensive Algorithms"

Transcription

1 Fault Tolerant Parallel Data-Intensive Algorithms Mucahid Kutlu Department of Computer Science and Engineering Ohio State University Columbus, OH, Gagan Agrawal Department of Computer Science and Engineering Ohio State University Columbus, OH, Oguz Kurt Department of Mathematics Ohio State University Columbus, OH, Abstract Fault-tolerance is rapidly becoming a crucial issue in high-end and distributed computing, as increasing number of cores are decreasing the mean-time to failure of the systems. While checkpointing, including checkpointing of parallel programs like MPI applications, provides a general solution, the overhead of this approach is becoming increasingly unacceptable. Thus, algorithm-based fault-tolerance provides a nice practical alternative, though it is less general. Although this approach has been studied for many applications, there is no existing work for algorithm-based fault-tolerance for the growing class of dataintensive parallel applications. In this paper, we present an algorithm-based fault tolerance solution that handles fail-stop failures for a class of data intensive algorithms. We divide the dataset into smaller data blocks and in replication step, we distribute the replicated blocks with the aim of keeping the maximum data intersection between any two processors minimum. This allows us to have minimum data loss when multiple failures occur. In addition, our approach enables better load balance after failure, and decreases the amount of re-processing of the lost data. We have evaluated our approach by using two popular parallel data mining algorithms, which are k-means and apriori. We show that our approach has negligible overhead when there are no failures, and allows us to gracefully handle different number of failures, and failures at different points of processing. We also provide the comparison of our approach with the MapReduce based solution for fault tolerance, and show that we outperform Hadoop both in absence and presence of failures. I. INTRODUCTION Growing computational and data processing needs are currently being met with increasing number of cores, as there is almost no improvement in single core s performance. However, with growing number of cores, the Mean-Time To Failure (MTTF) of the systems is decreasing. As a result, fault-tolerance is rapidly becoming a major topic in high-end computing. Several different approaches to fault-tolerance have been taken, depending upon the nature of the parallel application and programming model used. Since a large number of highend applications are developed using MPI, there is a large number of efforts on MPI fault-tolerance, which focus on checkpointing [28], [5], [15], [34], [23], [21], [2]. However, the main issue with checkpointing is the high overhead, especially, as systems are becoming larger, and disk bandwidths are not improving. For the future exascale systems, it is being argued that checkpointing and recovery time (with current methods) will even exceed the MTTF, leading to the need for alternative methods [9]. A promising alternative, which has resulted in lower overheads, is algorithm-based fault-tolerance [12], [16], [6], often based on disk-less checkpointing[36],[22]. These methods use specific properties of the algorithm to reduce the amount of information that needs to be cached. Most of the action in this area has been for scientific computations, like linear algebra routines [36], [16], iterative computations [12], including conjugate gradient [13]. While large parallel systems were traditionally used for scientific computations, data-intensive computing has rapidly emerged as a major application class in recent years [8]. Though most recent work in this area has been in the context of MapReduce [17] and its variants, stand-alone implementations of parallel data-intensive algorithms are also very common [24], [39]. In this paper, we examine algorithm-level fault-tolerance for data-intensive algorithms. We show how the common properties of many similar data mining algorithms can be exploited to develop an approach for algorithm-level faulttolerance. These algorithms involve an iterative structure, where the communication at the end of each iteration is limited to generalized reductions. We focus on fail-stop failures [38] in which the failed processors stop working and all their data is lost. We also assume that no additional processors are used for the recovery. Note that going on the execution with the remaining nodes is more challenging than using backup nodes and also continuing the process with remaining ones is more practical since backup nodes may not be always possible. Therefore, we have to read the lost data again and assign these data portions to the running processors. Not to degrade the performance, we also need to ensure good load balance over the remaining nodes. Our new approach is as follows. In order to minimize the amount of data loss, we first divide the data that each processor will normally process into smaller data blocks. Then we replicate these data blocks and distribute them among processors such that the maximum intersection between any two processors is minimum. In case of failures, the data that would normally be processed by the failed processors are assigned to the slaves that already store the replicas of them by the master node. Having smaller parts as a unit of processing and replication allows us to have better load balance after failure, and to decrease the amount of data loss when we have multiple failures. Moreover, we divide each data block into smaller data portions and augment the algorithm to perform summary exchange after processing each data portion. This reduces the amount of work to be redone after a failure. We have extensively evaluated our algorithms. First, we show that replication of data and summary exchanges add very little overhead.starting from an execution on 16 nodes, we could recover with 1, 2, and 3 failures with an overall slowdown of a 16.6%, 17.1%, and 26.3%, respectively. We also compared our approach with Hadoop s support for fault tolerance [41], using implementations of the same algorithms with the MapReduce API. Our approach had much lower slowdown while handling failures /12/$ IEEE

2 The rest of the paper is organized as follows. In Section 2, we explain the serial and parallel versions of k-means algorithm. In Section 3, we explain the details of our replication approach and present the distribution algorithm. In Section 4, we describe recovery from different failure cases. In Section 5, we report a detailed evaluation of our approach, including a comparison against Hadoop. We compare our work with related research efforts in Section 6 and conclude in Section 7. II. DATA MINING ALGORITHMS In our study, we focus on two representative data-intensive algorithms, which are k-means clustering [29] and apriori association mining [1]. In this section, we explain k-means algorithm, by presenting both sequential and parallel versions of the algorithm. The explanation of apriori algorithm is not given because of space limits. A. K-means Clustering Clustering is one of the most commonly studied problems in machine learning and data mining. The goal in clustering is to divide a set of data records into k parts or clusters,maximizing similarity within each cluster and dissimilarity across clusters. K-means is an iterative clustering algorithm, and its pseudocode is shown as Algorithm 1. In k-means algorithm, we select randomly k centroids initially (Line 2) before the iterative step. We assign each object to the nearest cluster (Line 5-7). Any distance calculation method can be used in this step. Once such assignments are completed, we calculate the new centroids of each cluster (Line 8). The calculation can be done by averaging the coordinates of each object that are assigned to the corresponding cluster. Then, we calculate delta in order to find how much the centroids of the clusters have changed in the current iteration (Line 9). If this change is not greater than a prespecified threshold, it means that the algorithm has converged. Therefore, we can finish the clustering process. To avoid the possibility of an infinite loop, we also put a boundary for the iteration numbers. Algorithm 1 Serial K-means Clustering Algorithm 1: input: D = { d 1,d 2,...,d n }(Data records to be clustered) k (Number of Clusters) M axiter (Maximum Iteration) Threshold 2: Select randomly k cluster centroids 3: iteration = 4: repeat 5: for i = 1 n do 6: Assign d i to the nearest cluster 7: end for 8: Calculate new centroids of clusters 9: delta = k j=1 (newcentroid j oldcentroid j 1: Increment iteration by 1 11: until iteration MaxIter delta Threshold Now, we consider the parallel version of k-means algorithm. The pseudo-code of k-means algorithm for master and slave nodes are given in Algorithm 2. We distribute the data among slave nodes equally, so each slave node is responsible for n/p data records, where p is the number of processors and n is the number of data records. In the master node, we select the initial k cluster centroids (Line 1). At the beginning of each iteration, we first broadcast current k cluster centroids (Line 4), so that each slave node gets the same cluster centroids (Line 3). Master node waits until it gets all the results from the slaves (Line 5). During this time, slave nodes calculate local new centroids (Line 5) and delta (Line 6). Then, they send delta and the centroids with the number of data records in each cluster to the master node. Once master node gets all the results from the slaves, it calculates the global new centroids and the total delta (Line 6-7) and broadcasts the delta. If delta is not greater than threshold, the clustering finishes for all nodes. Otherwise, a new iteration begins. Algorithm 2 Parallel K-means Clustering Algorithm Master Node 1: Select randomly k cluster centroids 2: iteration = 3: repeat 4: Broadcast k cluster centroids 5: Wait for all new centroids from slaves 6: Calculate new centroids of clusters 7: Calculate total delta 8: Broadcast delta 9: Increment iteration by 1 1: until iteration MAXITER delta Threshold Slave Node: 1: iteration = 2: repeat 3: Receive the k cluster centroids 4: Assign each data record to the nearest cluster 5: Calculate new cluster centroids 6: delta = k j=1 (newcentroid j oldcentroid j 7: Send delta and the cluster centroids with the number of data records of each cluster 8: Increment iteration by 1 9: Receive new delta 1: until iteration MAXITER delta Threshold B. Generalization to Other Algorithms { * Outer Sequential Loop * } While () { { * Reduction Loop * } Foreach (element e) { (i,val) = process(e); Reduc(i) = Reduc(i) op val; } } Fig. 1. Generalized Reduction Processing Structure of Common Data Mining Algorithms In our previous works [3], [31], we have made the observation that parallel versions of several well-known data mining techniques share a relatively similar structure. Besides k- means clustering and apriori association mining, this structure applies to several other clustering and association mining algorithms, as well as to algorithms for Bayesian network for classification [1], k-nearest neighbor classifier [25], artificial neural networks [25], and decision tree classifiers [35]. The common structure behind these algorithms is summarized in Figure 1. The function op is an associative and commutative function. Thus, the iterations of the foreach loop can be performed in any order. The data-structure Reduc is referred to as the reduction object. The reduction performed is, however, irregular, in the sense that specific elements of

3 the reduction object that are updated depend upon the results of the processing of an element. For algorithms following such generalized reduction structure, parallelization can be done by dividing the data instances (or records or transactions) among the processing threads. The computation performed by each thread will be iterative and will involve reading the data instances in an arbitrary order, processing each data instance, and performing a local reduction. In distributed memory setting, the reduction object needs to be replicated, and a global reduction is performed after the local reduction. Again, we can see that the parallel algorithms for k-means clustering described earlier and Apriori algorithm that we used in our experiments are specific instances of these. III. OUR APPROACH We now describe the overall approach taken for faulttolerant data-intensive algorithms, like k-means clustering and apriori association mining. A. System Model Assumptions and Goals In our study, we focus on only the fail-stop failures. When a node fails, we lose everything on that node, including the data that had been processed by the failed node. In addition, we do not use any additional node for the failed one to recover or be replaced, and instead, continue the execution with the remaining nodes since using backup nodes is not always practical. Unless otherwise stated, the word failure implies the failure of a slave node, and not the failure of the master node. In developing fault-tolerant algorithms, we have two main goals. First, we want to minimize the data loss, since the lost data needs to be reread from the storage cluster, resulting in slowdown. Second, we want to minimize re-execution or repeated work when a failure occurs. To meet these two goals, we take the following approach. We intelligently replicate the data across slave nodes, such that the amount of lost data because of a certain number of failures is minimized. Second, we have developed a method for summarizing the computations, and sharing these summaries, in a way that decreases the re-execution. However, in our approach, we also need to combine these two steps, which itself leads to several challenges. B. Replication We now describe the data replication approach we use. In order to decrease the number of accessing storage cluster, when we read the data initially at the beginning, we store the same data on more than 1 processor. Therefore, each processor has two types of data, primary data and the replicas. Note that the loading time of replicas will be much less than loading primary data since the data will be already existing in the cluster and we won t need to access the storage cluster for them. Each processor normally processes only its own primary data, but can process replicas in case of failures. If we replicate the data R times, i.e., have R 1 replicas besides the primary data, we can handle R 1 failures and continue the application with replicas, without requiring any new I/O operation from the storage cluster. But if R processors that store the same data fail, we lose those particular data elements, and have to access the storage cluster to read the lost data. Clearly, one way to reduce the possibility of having to read additional data will be to increase R, i.e., create additional copies of data. However, there is a practical limit on R, as local data storage resources are limited. Thus, our goal is to replicate the data in a fashion that for a given replication factor R, we minimize execution time for any number of failures. To meet the above goal, we need to ensure that the common data between any given set of processes is the minimum possible, among all possible options for the given replication factor. We achieve this in the following way. First, we divide the primary data of each processor into smaller and equal S parts, each of which can be denoted as D, and has a size of D. The primary data on the processor i is denoted as D i, and its size is D i = D S. In the replication step, we distribute these S data components that are the primary data for the processor i among different processors, so as to get R copies of the data in all. This process is repeated for the primary data of all processors. At the end, each processor should have the same amount of data. The data at a processor i, including the primary data and the replicas, is denoted as P i, and its size is P i = D i R. Returning to our goal in replicating the data and allocating the replicas, we will like that the intersection between the data that any two processors have is either a null set, or only one data block. So, our goal can be written as follows: i, j, P i P j D We developed an algorithm, which meets this goal, provided that a sufficient number of total processors are available. We first explain our idea with an example. In the Figure 2, a sample data distribution is shown where the replication factor, R, is 3 and the number of data components (or blocks per processor) is 6, which also means that S is 2. In general, if we have n processors, we allocate the data into n R virtual processors. The first n virtual processors are allocated for the primary data, and the remaining n (R 1) processors are allocated for replicas. For this example, we first divide all the data into n S R parts. In our example, n = 7, so there are 42 data blocks. We first distribute these 42 data blocks among the first n virtual processors. This distribution is shown under columns P-P6 processors in Figure 2. This is our first processor block with the index of zero. Then, we need to distribute the replicas. We allocate additional n virtual processors, which is the processor block with the index of 1. We first distribute all the data blocks among these virtual processors in the same fashion as we did for the initial block. This process is repeated till we have R processor blocks allocated. Now, to minimize the overlap, we use a simple trick, which can be seen from Figure 2. We shift each row for a virtual processor block to the left, block index row index times. There is no shift for the rows of the initial block (primary data), since the block index is. But, the row 1 (second two from the top) for the next block is shifted by 1, the row 2 is shifted by 2, and so on. Similarly, the row 1 for the processor block with an index of 2 is shifted by 2, the row 2 is shifted by 4, and so on. The final allocation for our example can be seen from Figure 2 where each processor block is colored differently. To generalize and formalize the method, we proceed as follows. Our goal is to compute a distribution matrix C, which is as follows: c, c,1 c,p 1 c 1, c 1,1 c 1,p 1 C S R 1,p 1 = c S R 1, c S R 1,1 c S R 1,p 1 (1) In this matrix, the column numbers (j) represent the virtual processor number and the row numbers (i) represent the

4 Fig. 2. Example Data Distribution (Replication Factor is 3, No. of Blocks Per Processor is 6) ranking of the data block in the corresponding processor. Each c i,j value represents the index of the data blocks. The overall method is generalized and summarized as Algorithm 3. Note that in the implementation, each processor can assign S data blocks as primary data. Simply, primary data of processors in the i th processor block will be the data blocks in the [S i,s (i + 1)) rows. Algorithm 3 Data Distribution Algorithm 1: Divide the entire data into n R S parts 2: Initialize block (columns through n 1) of the matrix C 3: Distribute the data parts among n processors 4: for i = 1 R 1 do 5: Allocate a virtual processor block with n processors 6: Copy columns through n 1 as block i 7: for j = (S R 1) do 8: Shift the j th row of the i th block j i times to the left. 9: end for 1: end for Returning to our matrix, each c i,j value will be as follows according to our algorithm. c i,j = i n + ((j + j/n i) mod n) where i < S R and j < R n. We can re-define all the data block indexes of any processor by using the matrix C as in Eq. (2). P j = {c i,j : i =,...,S R 1} (2) We now focus on proving the correctness of the method. First, an obvious observation about the algorithm is as follows. When we want to calculate the processor index of any data block in the next processor block, we just do a shift operation in its row. Therefore, the number of possible positions for that data block is 1. In other words, each data index in a processor block is unique. Theorem 3.1: P i Pj 1 if p = n R where p is the number of processors and n is a prime number larger than S R. Proof: Let A,B P i P j and A B. This implies Eq. (3) and (4). A = c k,i = c k,j (3) B = c m,i = c m,j (4) where k,m S R 1. Eq. (3) and (4) implies Eq. (5) and (6). i + i/n k j + j/n k mod n (5) i + i/n m j + j/n m mod n (6) When we subtract Eq. (6) from Eq. (5), we get Eq. (7) i/n (k m) j/n (k m) mod n (7) If k = m, then A = B. So we may assume that k m. We can derive Eq. (8) from Eq. (7). i/n = j/n n 1 (8) Say i/n = j/n = a. This also means that P i and P j are both in the a th processor block. But this contradicts with our observation mentioned above. Therefore, the intersection between any two processors cannot be more than 1 data portion. For all R and S values, the algorithm above can be used and we can achieve our minimum intersection goal if we have n R processors. When the intersection is minimum, in the bestcase,thedatalosswillbezeroin (R 1) n/rfailures.in theworstcase,weloseonlyonedatablockwith R failures.on the other hand, if we distribute the data randomly, we would lose S R data blocks with R failures in the worst case. C. Summarization In a basic approach for iterative algorithms involving reductions, the slaves send their reduction objects after processing all of their data. However, this approach does not perform well when there are failures. Suppose that processors fail after processing 99% of their data. Since the master node does not receive any result from the failed nodes, we have to re-process all that data again. From description of k-means given earlier, and from Figure 1, we can see that as data elements are processed, the state of the computation is getting captured in the reduction object. This fact can be used to improve efficiency of these algorithms in the presence of failures. Particularly, the slaves can send their reduction objects periodically within one iteration. The advantage of this approach is that we do not need to re-process the data if we have a copy of the reduction object for that data. One question that arises is, how frequently should the summaries or reduction objects be exchanged? In our approach, each processor has S data blocks that they are responsible to process and send their summaries. Therefore, each processor should send at least S summaries. However, we also divide each data block into smaller data portions. So a slave node sends a summary for each data portion. There are two advantages of dividing data blocks into smaller parts. First, in case of a failure, the amount of data to be reprocessed decreases.

5 Second, after failure, we need to re-assign the data records of the failed nodes to the running ones. If the replication factor is greater than 2, having smaller data portions allows us to distribute the lost data more balanced among the processors that store the lost data as replica. That is to say, we get better parallelization after failure when we have more data portions. The use of such summaries requires certain changes to the algorithm, even if there are no failures. The modification is similar for k-means and apriori, and is explained only for the k-means algorithm. The modified algorithms for the master node and the slave node are stated as Algorithm 4. The master node waits for all summaries, instead of waiting for the final results from slaves (Line 5-8). Once it receives a summary from a slave, it keeps the summary to calculate new centroids and also sends the summary to the slave nodes, which store the data of the summary as replica (Line 7). The other parts are similar with the original parallel algorithm. In the slave node, there are only two changes. 1) We divide the iteration into S M parts where S is the number of data blocks and M is the number of data portions per data block, and send the summaries S M times. 2) One question that arises is that what is the effect of these summaries for network traffic if we use high number of nodes? When we use high number of nodes, sending much more summaries may be problem if we use a single master node. But we can handle this problem by using more master nodes to decrease the network traffic. However, then there will be a master node failure problem. In order to handle this, we wait for the summaries of the replicas after sending all results (Line 1), so that we do not process replicas. These summaries of replicas can be used if master node also fails, together with some slave nodes. We can skip this last summary exchange part if the possibility of the failure of the master nodes is negligible. Note that sending and receiving of such summaries can be done by non-blocking messaging. Therefore, we can start waiting for the summaries of the replicas at the beginning of the iteration to save some time. But, of course, still we have to wait after sending all summaries if we haven t received all of them yet. IV. RECOVERY Replication and exchange of summaries (or reduction objects) are important steps towards enabling fault-tolerance in algorithms. We now discuss how our algorithms actually recover from failures. Let s assume that we have the system giveninthefigure3.inthisexample,thereplicationfactor, R, is 3, number of data blocks per processor (for primary data), S, is 1, the number of data portion per data blocks is 2 and the number of slave nodes is 7. In the figure, the blocks from P1 to P7 represent the slave processors. The dashed boxes inside the processors represent the data blocks and the numbered boxes inside each block represent the data portions. Each processor has to process its primary data when there is no failure. For example, P1 has to process d 1 and d 2 and will receive the summaries for d 7 d 1 where d i is the i th data portion. Particularly, we will explain how we recover the system in 2 different failure scenarios, which are single node failure in the middle of an iteration, and multiple nodes failure. In our approach, some bookkeeping is needed to support failure recovery. Particularly, the master node keeps track of the availability for each data portion. At the beginning, it is R for all of them. The master node also keeps track of the workload of each slave node. In case of a failure, it tries to balance the workload by using this information. A. Single Slave Node Failure in the Middle of the Iteration In this scenario, let s assume that P1 fails in the middle of an iteration, after sending the summary for d 1. The recovery Algorithm 4 Fault Tolerant Parallel K-Means Clustering Algorithm Master Node: 1: Select randomly k cluster centroids 2: iteration = 3: repeat 4: Broadcast k cluster centroids 5: repeat 6: Wait for a summary from slaves 7: Once a summary is received, send the summary to the slave nodes that store the replica of the related data 8: until There are unrecevied summaries 9: Calculate the new centroids of clusters 1: Calculate total delta and broadcast it 11: Increment iteration by 1 12: until iteration MAXITER delta Threshold Slave Node: 1: iteration = 2: repeat 3: Receive the k cluster centroids 4: for i = 1 S M do 5: Assign each data record of the slave node s i th data portion to the nearest cluster 6: Calculate new cluster centroids 7: delta = k j=1 (newcentroid j oldcentroid j 8: Send delta and cluster centroids with the number of data records of each cluster 9: end for 1: Wait for the summaries of the replica data portions. 11: Increment iteration by 1 12: Receive new delta 13: until iteration MAXITER delta Threshold Fig. 3. Sample System with 7 Slave Nodes for this case will be as follows. As the master node notices that P1 has failed, it decreases number of available resources for d 1,d 2,d 7,d 8,d 9,d 1, by 1. Since R = 3, we did not lose any data and do not have to do any additional reads from the storage cluster. Now, we can see that P2 and P3 have the data portions of d 1 and d 2 since both have the same data block as replica. So master node notifies P2 to process d 1 and P3 to process d 2 after the failure. But P2 does not have to process d 1 in this failure iteration since the master node has already received its summary before the failure. Overall, the application execution will be finished with 6 slave nodes. Because we have 2 data portions per block, we can balance the remaining workload to some extent, i.e., the work from P1 is being shared by P2 and P3. A large number of data blocks

6 and higher replication factor will also allow us to balance the workload even better. B. Multiple Slave Nodes Failure In this scenario, let us assume that P1, P2 and P3 fail at the beginning of an iteration. For simplicity, let us further assume that the master node notices all three failures at the same time. Then, the master node decreases the availability of d 1 and d 2 by 3 and those of d 3 d 14 by 1. Among the blocks that are no longer available as primary data, we still have d 3 d 6 available on slave nodes that are still operational. So, the master node notifies P4 to process d 4 and d 6, P5 to process d 3 and P6 to process d 5. However, d 1 and d 2 are not available on any of the slave nodes, and therefore, it needs to be read from the storage cluster. Since P7 has the least workload, master node notifies P7 to read the data from the storage cluster and process d 1 and d 2. Note that the bookkeeping of the master node allows us to balance the workload of slave nodes. V. EXPERIMENTS This section reports results from a number of experiments we conducted to evaluate our approach, and to compare it against the approach implemented in Hadoop. More specifically, we had the following goals in our experiments: To quantify the overheads associated with our approach and to examine its ability to handle different number of failures. To understand the significance of the parameters used in our approach, i.e. how the replication factor and the number of summaries exchanged impact the performance both in absence and presence of failures. To compare our approach against the MapReduce approach for fault-tolerance, by comparing performance with the Hadoop implementation. A. Experimental Setup In our experiments, we used a cluster where each node had 8 cores 2.53 GHz Intel (R) Xeon (R) processor and 12 GB memory. We implemented k-means and apriori algorithms in C programming language with MPI library. We assume a setup where data can be read at high speeds from within the cluster, but requires higher latencies if it needs to be loaded from outside the cluster. Thus, depending upon reading the data to store as replica or as primary data, the corresponding data can either be read from within the cluster, or may need to be read from outside the cluster. The former is very fast, whereas the latter case adds substantial slow-downs. Note that in case of failures where we lose all copies of some particular data, we have to read that data again outside the cluster. The size of the datasets is 4.87 GB and 4.79 GB for k-means and apriori, respectively. The parameters used for describing our results are given in Table I. Note that both of the algorithms we are considering involve multiple iterations, and failure iteration refers to the iteration in which a failure occurs. B. Evaluation of Our Approach We now evaluate the effectiveness of our approach, varying a number of parameters. Except for one experiment, we used 16 nodes and single core per each node in order to have fully distributed memory for execution of the algorithms. Initial Evaluation: The first experiment focused on the effect of replication in our approach. We fixed I, S, M and Per to some certain values and changed the number of failures and the number of replicas. The results are shown in Figure 4. TABLE I THE PARAMETERS USED IN EXPERIMENTS The number of replicas of the dataset (including the original) R The number of data blocks per primary data S The number of data portions per data block M The failure iteration I The percentage of the data that had been processed (within Per the failure iteration) before the failure occurs The number of failures F The number of processors on which the code is executed P The above part of the bars give the first initial data loading time and the below part gives the data processing time. For k-means and apriori, failure occurs after processing half of the data in the 1th (among 25) and 3rd (among 6) iterations, respectively R=1 R=2 R= R=1 R=2 R=3 (a) K-Means(I=1, Per=%5,S=1.M=2) (b) Apriori(I=3, Per=%5,S=1,M=2) Fig. 4. Overall Effectiveness of our Approach: Varying Replication Factor and : Top (Darker) Portion of Each Chart Reflects Initial Data Loading Times Several observations can be made from this figure. First, the average overhead of additional replicas in both algorithms are only.2% and.5% for R = 2 and R = 3, respectively. However, as seen from the figure, the slowdown in the case of failure(s) is much higher. In addition, the overhead of the data loading time will be less when the total number of iterations of the algorithms increases. However, as we showed here, we can handle failures quite effectively with different replication factors. Clearly, when we do not have any additional copies,

7 even one failure requires that additional data be read from the storage cluster which has a high cost. Having additional copies of the data alleviates this need, and failures (up to a certain number) can be handled with only modest slowdowns. Because the remaining processing (approximately half of the total work) is being done with fewer nodes, some slowdown is to be expected as number of failures is increasing. However, as we have more replicas, we are able to get better parallelization after failure by distributing the work of the failed node among remaining running nodes. In our next set of experiments, we show this re-parallelization effect and how the slowdown can be lower with a higher value of S. Impact of Frequency of Summary Exchange: Besides the number of replicas, another factor in our scheme is the number of summaries or the frequency of summary exchanges. The number of summary exchanges depends on number of data blocks and the number of messages sent per each data block. Therefore, this parameter impacts how the data is distributed, and how much work might have to be redone if a failure occurs in the middle of an iteration. We conducted two sets of experiments to evaluate this factor M=1 M=2 M=4 M=8 M= Percentage (a) K-Means(I=5, F=1,R=2,S=1) M=1 M=2 M=4 M=8 M= S=1 S=2 S=3 S= S=1 S=2 S=3 S=4 (a) K-Means(I=1, Per=%,R=3,M=1) (b) Apriori(I=3, Per=%,R=3,M=1) Fig. 5. Impact of Frequency of Summary Exchange: Varying Number of Failures In the first set of experiments, we fixed I, R, M and Per and varied F and S to see the impact of size of data blocks and also frequency of summary exchange with different number of failures. When we have larger S value, we have more summaries to be exchanged and smaller data blocks Percentage (b) Apriori(I=2, F=1,R=2,S=1) Fig. 6. Impact of Frequency of Summary Exchange: Varying Fractions of Work Completed At the Time of Failure Particularly, all failures occur at the beginning of the iteration in this set of experiments. The results are shown in Figure 5. When there is no failure, the execution time is nearly same for all S values for both of the algorithms. It shows that the overhead of summary exchange is negligible. When there is only one failure, we get the best results in the S = 4 case and the worst results with the S = 1 case. In other words, increasing value of S helps to improve the performance in case of failures. There are two reasons for the better performance of S = 4 case. First, as we divide the data into more parts, we are able to get better load balance in case of failures. Second, when there are 3 failures, we need to read one data block from the storage cluster. As we have more data blocks, the block size will be smaller. Therefore, the I/O operation takes less time when we have larger number of data blocks. Note that distributing data blocks to different processors to keep intersection among processors minimum decreases the number of lost data blocks. Without this kind of distribution, the amount of lost data would increase for larger S values and cause losing the advantage of reading smaller data. In the next set of experiments, we evaluate the impact of frequency of summary exchange as failures occur at different times during an iteration. The goal here is to understand the recovery time with different frequencies of summary exchanges. We fixed I, S, R and F, and varied Per and M.

8 The execution times of the failure iteration for different cases are shown in Figure 6. We can see that we get better results, as the frequency of summary exchanges increases for both of the algorithms. The reason is that the failed node is able to send some results of its data before the failure. Therefore, we do not need to re-process that data in that iteration P=16 P=32 P=64 using a different number of nodes, both MapReduce and our approach allow recovery using the remaining (fewer) nodes in case of failure. Thus, we have compared our solutions extensively against Hadoop 1, the open-source implementation of MapReduce. It should be noted that there is a significant programmability difference between our approach and the one from MapReduce. However, we believe that our approach can also be implemented as part of a high-level solution in the future No Failure Failure at % Failure at 25% Failure at 5% Failure at 75% P=16 P=32 P=64 (a) K-Means(I=1, Per=%5,R=3,S=4) (b) Apriori(I=3, Per=%5, R=3,S=4) Fig. 7. Effect of Processor Numbers with Different Scalability of the Approach: In all of the previous experiments, we used only 16 slave nodes. Now, we show the scalability of the approach by considering larger configurations. Thus, we look at the execution times with different number of failures, as the original number of nodes varies. The results are shown in Figure 7. Several observations can be made about these results. First, the execution times without failures are scaling quite well. Second, our approach can handle failures for all cases effectively. In fact, the relative slowdown with two or three failures is lower when the original number of processors is higher. This is because the remaining computing capacity is a larger fraction of the original capacity. C. Comparison with Hadoop We compared our approach with MapReduce, which is a popular solution for developing data-intensive applications. The reasons of why we chose MapReduce to compare with are as follows. First, MapReduce is a well-known fault tolerant framework besides its popularity. Second, unlike MPI-based solutions for fault-tolerance, which do not allow recovery 4 2 Hadoop K means Hadoop Apriori Our Approach K means Our Apporach Apriori Percentage Fig. 8. Total Execution Time when Single Failure Occurred at Different Percentages We used Hadoop implementations of k-means and apriori algorithms, which were used in an earlier study as well[4]. The configurations and parameters used were as follows. We did not use any back-up nodes for either of the implementations, i.e. failure recovery occurs with fewer nodes. In order to detect failures faster in Hadoop, we decreased tasktracker expire interval from the default value of 1 minutes to 1 seconds. Replication factor is set to 3 in both implementations. Default chunk sizes are used in Hadoop and the frequency of summary exchange is 12 (S = 4 and M = 3) in our implementation. The execution time of Hadoop had high variance for several failure cases. Therefore, we run each experiment 5 times and calculated the average after eliminating the maximum and minimum values. Because of differences in how Hadoop works and our algorithm-based approach is implemented, we made some changes in experiments and the metrics we use. First, because it is well known that Hadoop does not work well for iterative functions [19], we executed both applications for only a single iteration. Second, Hadoop and our approach are implemented with different programming language (C vs. Java) and file system. So comparing their execution times directly will not necessarily be fair. Therefore, we will compare their relative slowdown in presence of failure, instead of absolute execution times. First, we wanted to see how two systems behave when a failure occurs at different times during an iteration. The results are shown in Figure 8. Our approach has much better results than Hadoop. For apriori, it seems that our approach has similar results for all cases since the execution lasts for only 1 iteration and the data processing time is only 9% of the all execution time. In Hadoop tests, the average slowdown of both algorithms is similar for failures at % and 25% and it becomes higher as the failure percentage increases. The average slowdown of both algorithms in our approach s tests 1

9 Hadoop Hadoop Our Approach Number of Failure (a) K-Means Number of Failure (b) Apriori No Failure Single Failure 2 Failures 3 Failures No Failure Single Failure 2 Failures 3 Failures Our Approach Fig. 9. Total Execution Time that Changes with the is 5% for all failure points while it changes from 2% to 25% in Hadoop tests. Next, we wanted to examine how the two systems behave when multiple failures occur. We simulated failures at the beginning of the iteration since Hadoop s performance for failures % and 25% are similar and better than other failure points as seen from the Figure 8. In 3 failures scenario, we killed 3 nodes that share the same data block in our approach. Thus, when there isafailure,we have toread the data fromthe storage cluster. Note that, in another 3 nodes, we may not lose any data, i.e. the failure of first 3 nodes in Figure 2. That is to say, we kill the 3 nodes that cause highest slowdown for our approach. The results are shown in Figure 9. Our approach has much better performance than Hadoop for all cases. Hadoop s average slowdown of both algorithms increases from 23% to 33% as the number of failure increases. In our approach, the average slowdown of both algorithms increases from 5% to 28% as the number of failures increases. VI. RELATED WORK Fault-tolerance is a widely studied topic. We restrict our discussion to 3 topics: 1)data distribution, 2)fail-stop failures[38] in the context of parallel and data-intensive applications, 3) fault tolerance in MapReduce Data distribution approaches are mostly discussed for file systems and storage clusters. For example, CRUSH[4] is focused on minimizing data movement in case of addition/removal of disks. FARM[42] divides data into blocks with fix size and stores each replica of the blocks to different disks. FARM improves recovery time by distributing the lost data to be reconstructed over a number of drives in the disk array. RUSH[26] algorithms also try to minimize the redistribution of data elements and they guarantee that no two replicas of a particular object will be stored in the same server. However, none of these studies focused on minimizing the data intersection of servers in order to minimize the data loss in case of multiple failures. To the best of our knowledge, there is no previous study that focused on minimization of the intersection. A common way of handling fail-stop failures is checkpointing, including synchronous checkpointing [18] and asynchronous checkpointing [37]. In order to decrease the amount of the data to be saved, application level checkpointing (ALC) has been proposed [7], [21]. In ALC, only certain specific data elements that can recover the application are saved, instead of the whole system state. To decrease the overhead by eliminating the stable storage, Plank [36] proposed diskless checkpointing approach, which checkpoints the states of each processor in memory and use checkpointing processors that encode these in-memory checkpoints to calculate the last state of the failed processors. Another way of avoiding time consuming I/O operations is using algorithm based recovery methods (ABR). Here, the recovery can be performed by using already existing information in running processors. If the algorithm itself does not contain such redundant information, it can be added by modifying the algorithms. Huang and Abraham [27] demonstrated that the miscalculations can be detected and corrected by using the checksum relationship that is preserved in the final computation results. Chen [11] extended this work to tolerate fail-stop failures in the outer product version of matrixmatrix multiplications. Davies et al. extended the checksum method to the Linpack Benchmark decomposition [16]. Lu et al. [32] force processors to send redundant information to their neighborstouseintherecoveryandapplythistechniquetothe Newton s method. Chen [12] has developed an ABR scheme for iterative methods, based on the observation that many iterative algorithms can be recovered without checkpointing, if they satisfy some certain conditions. In Chen s study, one of the goals is to start the recovery from the iteration that failure occurred without any roll-back. But in our study, in worst case, we start the recovery from the failure iteration and some percentage of the failure iteration may also not need to be reprocessed. In addition, in Chen s study, the failed processor needs to become available again and is used for recovery. In comparison, in our study, we continue the process with the remaining (fewer) processors. None of the above efforts have considered data-intensive applications of the nature we have considered. Almost all work on fault-tolerance data processing has been in context of MapReduce [17]. The fault-tolerance approach in MapReduce is as follows. The work is divided into a number of tasks as specified by the user. In the case of a slave node failure, any of the running tasks as well as completed map tasks should be re-executed by another slave. The completed map tasks are re-executed because they store their output on the local disks. Most of the researches on MapReduce have focused on improving the performance of MapReduce(i.e.[14]). The limited amount of work on improving failure recovery in MapReduce includes the work by Zheng [44], who proposes using passive replication on top of re-execution. Costa et al. [2] propose Byzantine fault [3] tolerant MapReduce that re-executes each

10 task more than once but tries to minimize the number of these re-executions. Twister[19] and imapreduce[43] are both designed to improve performance of MapReduce in iterative algorithms. For fault tolerance, both use checkpointing and roll back to the last iteration with checkpoint in presence of failure. Martin et al. [33] introduce a fault tolerant mechanism for a streaming version of MapReduce, where reducers that can process key/value pairs as mappers emit them, are used instead of stateless reducers that wait for all mappers to be finished. They use a combination of uncoordinated checkpointing and in-memory logging for the fault tolerance. VII. CONCLUSION This paper has developed an algorithm-based fault tolerance approach for handling fail-stop failures in a class of dataintensive algorithms. Our approach combines replication and summarization together to decrease the latency of execution in the presence of failures. We use a novel replication and distribution algorithm, which allows us to distribute the data in a way that the maximal data intersection between processors is minimized. Our approach allows recovery using a fewer number of nodes, and achieves good load balance between the remaining nodes. The main observations from our detailed evaluation are as follows. First, the overhead of our approach when there are no failures is negligible. We show how different number of failures and failures at different points of processing can be gracefully handled by our approach. Finally, in comparing our approach with the MapReduce approach (as implemented in Hadoop), we show that our approach performs better both in absence and presence of failures. REFERENCES [1] R. Agrawal and J. Shafer. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering, 8(6): , June [2] Marcelo Pasin Alysson N. Bessani Pedro Costa and Miguel Correia. Byzantine fault-tolerant mapreduce: Faults are not just crashes. In 3rd IEEE International Conference on Cloud Computing Technology and Science, 211. [3] Algirdas Avizienis, Jean claude Laprie, Brian R, and Carl L. Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing, 1:11 33, 24. [4] T. Bicer, Wei Jiang, and G. Agrawal. Supporting fault tolerance in a data-intensive computing middleware. In Parallel Distributed Processing (IPDPS), 21 IEEE International Symposium on, pages 1 12, april 21. [5] G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, and A. Selikhov. Mpich-v: Toward a scalable fault tolerant mpi for volatile nodes. In Supercomputing, ACM/IEEE 22 Conference, page 29, nov. 22. [6] George Bosilca, Rémi Delmas, Jack Dongarra, and Julien Langou. Algorithm-based fault tolerance applied to high performance computing. J. Parallel Distrib. Comput., 69(4):41 416, April 29. [7] Greg Bronevetsky, Daniel Marques, Keshav Pingali, and Paul Stodghill. C: A system for automating application-level checkpointing of mpi programs. In 16th international workshop on languages and compilers for parallel computers (LCPC3, pages , 23. [8] Randal E. Bryant. Data-Intensive Supercomputing: The Case for DISC. Technical Report CMU- CS-7-128, School of Computer Science, Carnegie Mellon University, 27. [9] Franck Cappello, Al Geist, Bill Gropp, Laxmikant V. Kalé, Bill Kramer, and Marc Snir. Toward exascale resilience. IJHPCA, 23(4): , 29. [1] P. Cheeseman and J. Stutz. Bayesian classification (autoclass): Theory and practice. In Advances in Knowledge Discovery and Data Mining, pages AAAI Press / MIT Press, [11] Zizhong Chen. Extending algorithm-based fault tolerance to tolerate fail-stop failures in high performance distributed environments. In IPDPS, pages 1 8, 28. [12] Zizhong Chen. Algorithm-based recovery for iterative methods without checkpointing. In Proceedings of the 2th ACM International Symposium on High Performance Distributed Computing, HPDC 211, San Jose, CA, USA, June 8-11, 211, pages 73 84, 211. [13] Zizhong Chen and J. Dongarra. A scalable checkpoint encoding algorithm for diskless checkpointing. In High Assurance Systems Engineering Symposium, 28. HASE th IEEE, pages 71 79, dec. 28. [14] Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. Mapreduce online. In Proceedings of the 7th USENIX conference on Networked systems design and implementation, NSDI 1, pages 21 21, Berkeley, CA, USA, 21. USENIX Association. [15] Camille Coti, Thomas Herault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, and Franck Cappello. Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant mpi. In Proceedings of the 26 ACM/IEEE conference on Supercomputing, SC 6. ACM, 26. [16] Teresa Davies, Christer Karlsson, Hui Liu, Chong Ding, and Zizhong Chen. High performance linpack benchmark: a fault tolerant implementation without checkpointing. In Proceedings of the international conference on Supercomputing, ICS 11, pages ACM, 211. [17] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages , 24. [18] G. Deconinck and R. Lauwereins. User-triggered checkpointing: system-independent and scalable application recovery. In Computers and Communications, Proceedings., Second IEEE Symposium on, pages , jul [19] Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. Twister: a runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 1, pages , New York, NY, USA, 21. ACM. [2] G.Bronevetsky, D.Marques, K.Pingali, and P.Stodghill. Automated application-level checkpointing of mpi programs. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 23), pages 84 94, Oct. 23. [21] G.Bronevetsky, D.Marques, M.Schulz, P.Szwed, and K.Pingali. Application-level checkpointing for shared memory programs. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 24), pages , Oct. 24. [22] Leonardo Arturo Bautista Gomez, Naoya Maruyama, Franck Cappello, and Satoshi Matsuoka. Distributed diskless checkpoint for large scale systems. In Proceedings of the 21 1th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID 1. IEEE Computer Society, 21. [23] Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir, and Franck Cappello. Uncoordinated checkpointing without domino effect for send-deterministic mpi applications. In 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 211, Anchorage, Alaska, USA, 16-2 May, Conference Proceedings, 211. [24] E-H. Han, G. Karypis, and V. Kumar. Scalable parallel datamining for association rules. IEEE Transactions on Data and Knowledge Engineering, 12(3), May / June 2. [25] Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2. [26] R. J. Honicky and Ethan L. Miller. Replication under scalable hashing: A family of algorithms for scalable decentralized data distribution. April 24. [27] Kuang-Hua Huang and Jacob A. Abraham. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Computers, 33(6): , [28] J. Hursey, J.M. Squyres, T.I. Mattox, and A. Lumsdaine. The design and implementation of checkpoint/restart process fault tolerance for open mpi. In Parallel and Distributed Processing Symposium, 27. IPDPS 27. IEEE International, pages 1 8, march 27. [29] A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, [3] Ruoming Jin and Gagan Agrawal. A Middleware for Developing Parallel Data Mining Implementations. In In Proceedings of the first SIAM conference on Data Mining, 21. [31] Ruoming Jin and Gagan Agrawal. Shared Memory Paraellization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. In In Proceedings of the second SIAM conference on Data Mining, 22. [32] Hui Liu, T. Davies, Chong Ding, C. Karlsson, and Zizhong Chen. Algorithm-based recovery for newton s method without checkpointing. In Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 211 IEEE International Symposium on, pages , may 211. [33] A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, C. Fetzer, and A. Brito. Low-overhead fault tolerance for high-throughput data processing systems. In Distributed Computing Systems (ICDCS), st International Conference on, pages , june 211. [34] A. Moody, G. Bronevetsky, K. Mohror, and B.R. de Supinski. Design, modeling, and evaluation of a scalable multi-level checkpointing system. In High Performance Computing, Networking, Storage and Analysis (SC), 21 International Conference for, pages 1 11, nov. 21. [35] S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery, 2(4): , [36] J.S. Plank, Youngbae Kim, and J.J. Dongarra. Algorithm-based diskless checkpointing for fault tolerant matrix operations. In Fault-Tolerant Computing, FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on, pages , jun [37] III Richard, G.C. and M. Singhal. Using logging and asynchronous checkpointing to implement recoverable distributed shared memory. In Reliable Distributed Systems, Proceedings., 12th Symposium on, pages 58 67, oct [38] Richard D. Schlichting and Fred B. Schneider. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst., 1(3): , [39] David B. Skillicorn. Strategies for parallel data mining. IEEE Concurrency, Oct-Dec [4] Sage A. Weil, Scott A. Brandt, Ethan L. Miller, and Carlos Maltzahn. Crush: controlled, scalable, decentralized placement of replicated data. In Proceedings of the 26 ACM/IEEE conference on Supercomputing, SC 6, New York, NY, USA, 26. ACM. [41] Tom White. Hadoop: The Definitive Guide. O Reilly, first edition edition, june 29. [42] Qin Xin, Ethan L. Miller, and Thomas J. E. Schwarz. Evaluation of distributed recovery in large-scale storage systems. In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing, HPDC 4, pages , Washington, DC, USA, 24. IEEE Computer Society. [43] Yanfeng Zhang, Qinxin Gao, Lixin Gao, and Cuirong Wang. imapreduce: A distributed computing framework for iterative computation. In IPDPS Workshops [43], pages [44] Qin Zheng. Improving mapreduce fault tolerance in the cloud. In IPDPS Workshops, pages 1 6, 21.

Supporting Fault Tolerance in a Data-Intensive Computing Middleware

Supporting Fault Tolerance in a Data-Intensive Computing Middleware Supporting Fault Tolerance in a Data-Intensive Computing Middleware Tekin Bicer, Wei Jiang and Gagan Agrawal Department of Computer Science and Engineering The Ohio State University IPDPS 2010, Atlanta,

More information

Genetic Algorithms with Mapreduce Runtimes

Genetic Algorithms with Mapreduce Runtimes Genetic Algorithms with Mapreduce Runtimes Fei Teng 1, Doga Tuncay 2 Indiana University Bloomington School of Informatics and Computing Department CS PhD Candidate 1, Masters of CS Student 2 {feiteng,dtuncay}@indiana.edu

More information

Technical Comparison between several representative checkpoint/rollback solutions for MPI programs

Technical Comparison between several representative checkpoint/rollback solutions for MPI programs Technical Comparison between several representative checkpoint/rollback solutions for MPI programs Yuan Tang Innovative Computing Laboratory Department of Computer Science University of Tennessee Knoxville,

More information

Algorithm-Based Fault Tolerance. for Fail-Stop Failures

Algorithm-Based Fault Tolerance. for Fail-Stop Failures Algorithm-Based Fault Tolerance 1 for Fail-Stop Failures Zizhong Chen and Jack Dongarra Abstract Fail-stop failures in distributed environments are often tolerated by checkpointing or message logging.

More information

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms

Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Combining Distributed Memory and Shared Memory Parallelization for Data Mining Algorithms Ruoming Jin Department of Computer and Information Sciences Ohio State University, Columbus OH 4321 jinr@cis.ohio-state.edu

More information

Algorithm-Based Fault Tolerance for Fail-Stop Failures

Algorithm-Based Fault Tolerance for Fail-Stop Failures IEEE TRANSCATIONS ON PARALLEL AND DISTRIBUTED SYSYTEMS Algorithm-Based Fault Tolerance for Fail-Stop Failures Zizhong Chen, Member, IEEE, and Jack Dongarra, Fellow, IEEE Abstract Fail-stop failures in

More information

Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello

Rollback-Recovery Protocols for Send-Deterministic Applications. Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Rollback-Recovery Protocols for Send-Deterministic Applications Amina Guermouche, Thomas Ropars, Elisabeth Brunet, Marc Snir and Franck Cappello Fault Tolerance in HPC Systems is Mandatory Resiliency is

More information

Improved MapReduce k-means Clustering Algorithm with Combiner

Improved MapReduce k-means Clustering Algorithm with Combiner 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering

More information

Systems Support for High-Performance Scientific Data Mining

Systems Support for High-Performance Scientific Data Mining Systems Support for High-Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju Srini Parthasarathy Department of Computer and Information Sciences Ohio State University, Columbus

More information

Shark: Hive on Spark

Shark: Hive on Spark Optional Reading (additional material) Shark: Hive on Spark Prajakta Kalmegh Duke University 1 What is Shark? Port of Apache Hive to run on Spark Compatible with existing Hive data, metastores, and queries

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,

More information

Similarities and Differences Between Parallel Systems and Distributed Systems

Similarities and Differences Between Parallel Systems and Distributed Systems Similarities and Differences Between Parallel Systems and Distributed Systems Pulasthi Wickramasinghe, Geoffrey Fox School of Informatics and Computing,Indiana University, Bloomington, IN 47408, USA In

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance

Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance 1 Exploiting edundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance arxiv:1511.00212v1 [cs.dc] 1 Nov 2015 Abstract Communication-avoiding algorithms allow redundant

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

Survey Paper on Traditional Hadoop and Pipelined Map Reduce

Survey Paper on Traditional Hadoop and Pipelined Map Reduce International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,

More information

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge

More information

Dynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System

Dynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System Dynamically Estimating Reliability in a Volunteer-Based Compute and Data-Storage System Muhammed Uluyol University of Minnesota Abstract Although cloud computing is a powerful tool for analyzing large

More information

The Future of High Performance Computing

The Future of High Performance Computing The Future of High Performance Computing Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant Comparing Two Large-Scale Systems Oakridge Titan Google Data Center 2 Monolithic supercomputer

More information

High Performance Linpack Benchmark: A Fault Tolerant Implementation without Checkpointing

High Performance Linpack Benchmark: A Fault Tolerant Implementation without Checkpointing High Performance Linpack Benchmark: A Fault Tolerant Implementation without Checkpointing Teresa Davies, Christer Karlsson, Hui Liu, Chong Ding, and Zizhong Chen Colorado School of Mines Golden, CO, USA

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web

Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web Surveying Formal and Practical Approaches for Optimal Placement of Replicas on the Web TR020701 April 2002 Erbil Yilmaz Department of Computer Science The Florida State University Tallahassee, FL 32306

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

A Parallel Algorithm for Finding Sub-graph Isomorphism

A Parallel Algorithm for Finding Sub-graph Isomorphism CS420: Parallel Programming, Fall 2008 Final Project A Parallel Algorithm for Finding Sub-graph Isomorphism Ashish Sharma, Santosh Bahir, Sushant Narsale, Unmil Tambe Department of Computer Science, Johns

More information

Utilizing Concurrency: A New Theory for Memory Wall

Utilizing Concurrency: A New Theory for Memory Wall Utilizing Concurrency: A New Theory for Memory Wall Xian-He Sun (&) and Yu-Hang Liu Illinois Institute of Technology, Chicago, USA {sun,yuhang.liu}@iit.edu Abstract. In addition to locality, data access

More information

April 21, 2017 Revision GridDB Reliability and Robustness

April 21, 2017 Revision GridDB Reliability and Robustness April 21, 2017 Revision 1.0.6 GridDB Reliability and Robustness Table of Contents Executive Summary... 2 Introduction... 2 Reliability Features... 2 Hybrid Cluster Management Architecture... 3 Partition

More information

A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations

A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations A Hierarchical Checkpointing Protocol for Parallel Applications in Cluster Federations Sébastien Monnet IRISA Sebastien.Monnet@irisa.fr Christine Morin IRISA/INRIA Christine.Morin@irisa.fr Ramamurthy Badrinath

More information

Communication and Memory Efficient Parallel Decision Tree Construction

Communication and Memory Efficient Parallel Decision Tree Construction Communication and Memory Efficient Parallel Decision Tree Construction Ruoming Jin Department of Computer and Information Sciences Ohio State University, Columbus OH 4321 jinr@cis.ohiostate.edu Gagan Agrawal

More information

WITH the availability of large data sets in application

WITH the availability of large data sets in application IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 16, NO. 10, OCTOBER 2004 1 Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance Ruoming

More information

MapReduce & HyperDex. Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University

MapReduce & HyperDex. Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University MapReduce & HyperDex Kathleen Durant PhD Lecture 21 CS 3200 Northeastern University 1 Distributing Processing Mantra Scale out, not up. Assume failures are common. Move processing to the data. Process

More information

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved

Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India. IJRASET 2015: All Rights are Reserved Implementation of K-Means Clustering Algorithm in Hadoop Framework Uday Kumar Sr 1, Naveen D Chandavarkar 2 1 PG Scholar, Assistant professor, Dept. of CSE, NMAMIT, Nitte, India Abstract Drastic growth

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments 1 A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments E. M. Karanikolaou and M. P. Bekakos Laboratory of Digital Systems, Department of Electrical and Computer Engineering,

More information

Map Reduce Group Meeting

Map Reduce Group Meeting Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

Communication and Memory Efficient Parallel Decision Tree Construction

Communication and Memory Efficient Parallel Decision Tree Construction Communication and Memory Efficient Parallel Decision Tree Construction Ruoming Jin Department of Computer and Information Sciences Ohio State University, Columbus OH 4321 jinr@cis.ohiostate.edu Gagan Agrawal

More information

Supporting Fault-Tolerance in Streaming Grid Applications

Supporting Fault-Tolerance in Streaming Grid Applications Supporting Fault-Tolerance in Streaming Grid Applications Qian Zhu Liang Chen Gagan Agrawal Department of Computer Science and Engineering Ohio State University Columbus, OH, 43210 {zhuq,chenlia,agrawal}@cse.ohio-state.edu

More information

P2FS: supporting atomic writes for reliable file system design in PCM storage

P2FS: supporting atomic writes for reliable file system design in PCM storage LETTER IEICE Electronics Express, Vol.11, No.13, 1 6 P2FS: supporting atomic writes for reliable file system design in PCM storage Eunji Lee 1, Kern Koh 2, and Hyokyung Bahn 2a) 1 Department of Software,

More information

Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources

Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources Algorithm-Based Checkpoint-Free Fault Tolerance for Parallel Matrix Computations on Volatile Resources Zizhong Chen University of Tennessee, Knoxville zchen@cs.utk.edu Jack J. Dongarra University of Tennessee,

More information

Mining Distributed Frequent Itemset with Hadoop

Mining Distributed Frequent Itemset with Hadoop Mining Distributed Frequent Itemset with Hadoop Ms. Poonam Modgi, PG student, Parul Institute of Technology, GTU. Prof. Dinesh Vaghela, Parul Institute of Technology, GTU. Abstract: In the current scenario

More information

Cache Sensitive Locking

Cache Sensitive Locking Performance Prediction for Random Write Reductions: A Case Study in Modeling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University, Columbus

More information

Map Reduce. Yerevan.

Map Reduce. Yerevan. Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate

More information

Dynamic Clustering of Data with Modified K-Means Algorithm

Dynamic Clustering of Data with Modified K-Means Algorithm 2012 International Conference on Information and Computer Networks (ICICN 2012) IPCSIT vol. 27 (2012) (2012) IACSIT Press, Singapore Dynamic Clustering of Data with Modified K-Means Algorithm Ahamed Shafeeq

More information

Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect

Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect Performance Evaluation of Large Table Association Problem Implemented in Apache Spark on Cluster with Angara Interconnect Alexander Agarkov and Alexander Semenov JSC NICEVT, Moscow, Russia {a.agarkov,semenov}@nicevt.ru

More information

Physical Storage Media

Physical Storage Media Physical Storage Media These slides are a modified version of the slides of the book Database System Concepts, 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides are available

More information

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)

Introduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA) Introduction to MapReduce Adapted from Jimmy Lin (U. Maryland, USA) Motivation Overview Need for handling big data New programming paradigm Review of functional programming mapreduce uses this abstraction

More information

Performance impact of dynamic parallelism on different clustering algorithms

Performance impact of dynamic parallelism on different clustering algorithms Performance impact of dynamic parallelism on different clustering algorithms Jeffrey DiMarco and Michela Taufer Computer and Information Sciences, University of Delaware E-mail: jdimarco@udel.edu, taufer@udel.edu

More information

A New HadoopBased Network Management System with Policy Approach

A New HadoopBased Network Management System with Policy Approach Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,

More information

Large-Scale GPU programming

Large-Scale GPU programming Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University

More information

Algorithm-Based Recovery for Iterative Methods without Checkpointing

Algorithm-Based Recovery for Iterative Methods without Checkpointing Algorithm-Based Recovery for Iterative Methods without Checkpointing Zizhong Chen Colorado School of Mines Golden, CO 80401 zchen@minesedu ABSTRACT In today s high performance computing practice, fail-stop

More information

Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework

Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework Li-Yung Ho Institute of Information Science Academia Sinica, Department of Computer Science and Information Engineering

More information

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Amin Vahdat Department of Computer Science Duke University 1 Introduction Increasingly,

More information

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical

Write a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

More information

Harp-DAAL for High Performance Big Data Computing

Harp-DAAL for High Performance Big Data Computing Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big

More information

MapReduce for Data Intensive Scientific Analyses

MapReduce for Data Intensive Scientific Analyses apreduce for Data Intensive Scientific Analyses Jaliya Ekanayake Shrideep Pallickara Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, 47405 5/11/2009 Jaliya Ekanayake 1 Presentation

More information

Understanding policy intent and misconfigurations from implementations: consistency and convergence

Understanding policy intent and misconfigurations from implementations: consistency and convergence Understanding policy intent and misconfigurations from implementations: consistency and convergence Prasad Naldurg 1, Ranjita Bhagwan 1, and Tathagata Das 2 1 Microsoft Research India, prasadn@microsoft.com,

More information

Adaptive Runtime Support

Adaptive Runtime Support Scalable Fault Tolerance Schemes using Adaptive Runtime Support Laxmikant (Sanjay) Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory Department of Computer Science University of Illinois at

More information

Twitter data Analytics using Distributed Computing

Twitter data Analytics using Distributed Computing Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE

More information

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India Volume 115 No. 7 2017, 105-110 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN Balaji.N 1,

More information

Data Analytics on RAMCloud

Data Analytics on RAMCloud Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including

More information

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Scaling Distributed Machine Learning with the Parameter Server

Scaling Distributed Machine Learning with the Parameter Server Scaling Distributed Machine Learning with the Parameter Server Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su Presented

More information

Optimizing Data Locality for Iterative Matrix Solvers on CUDA

Optimizing Data Locality for Iterative Matrix Solvers on CUDA Optimizing Data Locality for Iterative Matrix Solvers on CUDA Raymond Flagg, Jason Monk, Yifeng Zhu PhD., Bruce Segee PhD. Department of Electrical and Computer Engineering, University of Maine, Orono,

More information

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. CS 138: Google CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved. Google Environment Lots (tens of thousands) of computers all more-or-less equal - processor, disk, memory, network interface

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

CS 61C: Great Ideas in Computer Architecture. MapReduce

CS 61C: Great Ideas in Computer Architecture. MapReduce CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing

More information

Study of Load Balancing Schemes over a Video on Demand System

Study of Load Balancing Schemes over a Video on Demand System Study of Load Balancing Schemes over a Video on Demand System Priyank Singhal Ashish Chhabria Nupur Bansal Nataasha Raul Research Scholar, Computer Department Abstract: Load balancing algorithms on Video

More information

Parallel K-Means Clustering with Triangle Inequality

Parallel K-Means Clustering with Triangle Inequality Parallel K-Means Clustering with Triangle Inequality Rachel Krohn and Christer Karlsson Mathematics and Computer Science Department, South Dakota School of Mines and Technology Rapid City, SD, 5771, USA

More information

Novel low-overhead roll-forward recovery scheme for distributed systems

Novel low-overhead roll-forward recovery scheme for distributed systems Novel low-overhead roll-forward recovery scheme for distributed systems B. Gupta, S. Rahimi and Z. Liu Abstract: An efficient roll-forward checkpointing/recovery scheme for distributed systems has been

More information

Estimation of MPI Application Performance on Volunteer Environments

Estimation of MPI Application Performance on Volunteer Environments Estimation of MPI Application Performance on Volunteer Environments Girish Nandagudi 1, Jaspal Subhlok 1, Edgar Gabriel 1, and Judit Gimenez 2 1 Department of Computer Science, University of Houston, {jaspal,

More information

arxiv: v1 [cs.dc] 2 Apr 2016

arxiv: v1 [cs.dc] 2 Apr 2016 Scalability Model Based on the Concept of Granularity Jan Kwiatkowski 1 and Lukasz P. Olech 2 arxiv:164.554v1 [cs.dc] 2 Apr 216 1 Department of Informatics, Faculty of Computer Science and Management,

More information

Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems Announcements CompSci 516 Database Systems Lecture 12 - and Spark Practice midterm posted on sakai First prepare and then attempt! Midterm next Wednesday 10/11 in class Closed book/notes, no electronic

More information

Hybrid MapReduce Workflow. Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US

Hybrid MapReduce Workflow. Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US Hybrid MapReduce Workflow Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox Indiana University, US Outline Introduction and Background MapReduce Iterative MapReduce Distributed Workflow Management

More information

MapReduce: A Programming Model for Large-Scale Distributed Computation

MapReduce: A Programming Model for Large-Scale Distributed Computation CSC 258/458 MapReduce: A Programming Model for Large-Scale Distributed Computation University of Rochester Department of Computer Science Shantonu Hossain April 18, 2011 Outline Motivation MapReduce Overview

More information

Big Data Management and NoSQL Databases

Big Data Management and NoSQL Databases NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model

More information

iihadoop: an asynchronous distributed framework for incremental iterative computations

iihadoop: an asynchronous distributed framework for incremental iterative computations DOI 10.1186/s40537-017-0086-3 RESEARCH Open Access iihadoop: an asynchronous distributed framework for incremental iterative computations Afaf G. Bin Saadon * and Hoda M. O. Mokhtar *Correspondence: eng.afaf.fci@gmail.com

More information

Enhancing K-means Clustering Algorithm with Improved Initial Center

Enhancing K-means Clustering Algorithm with Improved Initial Center Enhancing K-means Clustering Algorithm with Improved Initial Center Madhu Yedla #1, Srinivasa Rao Pathakota #2, T M Srinivasa #3 # Department of Computer Science and Engineering, National Institute of

More information

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Accelerating MapReduce on a Coupled CPU-GPU Architecture Accelerating MapReduce on a Coupled CPU-GPU Architecture Linchuan Chen Xin Huo Gagan Agrawal Department of Computer Science and Engineering The Ohio State University Columbus, OH 43210 {chenlinc,huox,agrawal}@cse.ohio-state.edu

More information

Unit 9 : Fundamentals of Parallel Processing

Unit 9 : Fundamentals of Parallel Processing Unit 9 : Fundamentals of Parallel Processing Lesson 1 : Types of Parallel Processing 1.1. Learning Objectives On completion of this lesson you will be able to : classify different types of parallel processing

More information

CompSci 516: Database Systems

CompSci 516: Database Systems CompSci 516 Database Systems Lecture 12 Map-Reduce and Spark Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Practice midterm posted on sakai First prepare and

More information

Parallel Implementation of 3D FMA using MPI

Parallel Implementation of 3D FMA using MPI Parallel Implementation of 3D FMA using MPI Eric Jui-Lin Lu y and Daniel I. Okunbor z Computer Science Department University of Missouri - Rolla Rolla, MO 65401 Abstract The simulation of N-body system

More information

Distributed Face Recognition Using Hadoop

Distributed Face Recognition Using Hadoop Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,

More information

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product Available online at www.sciencedirect.com ScienceDirect IERI Procedia 10 (2014 ) 153 159 2014 International Conference on Future Information Engineering Achieve Significant Throughput Gains in Wireless

More information

Decentralized and Distributed Machine Learning Model Training with Actors

Decentralized and Distributed Machine Learning Model Training with Actors Decentralized and Distributed Machine Learning Model Training with Actors Travis Addair Stanford University taddair@stanford.edu Abstract Training a machine learning model with terabytes to petabytes of

More information

Survey on Incremental MapReduce for Data Mining

Survey on Incremental MapReduce for Data Mining Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,

More information

Parallelizing Frequent Itemset Mining with FP-Trees

Parallelizing Frequent Itemset Mining with FP-Trees Parallelizing Frequent Itemset Mining with FP-Trees Peiyi Tang Markus P. Turkia Department of Computer Science Department of Computer Science University of Arkansas at Little Rock University of Arkansas

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San

More information

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568 FLAT DATACENTER STORAGE Paper-3 Presenter-Pratik Bhatt fx6568 FDS Main discussion points A cluster storage system Stores giant "blobs" - 128-bit ID, multi-megabyte content Clients and servers connected

More information

A Synchronous Self-Stabilizing Minimal Domination Protocol in an Arbitrary Network Graph

A Synchronous Self-Stabilizing Minimal Domination Protocol in an Arbitrary Network Graph A Synchronous Self-Stabilizing Minimal Domination Protocol in an Arbitrary Network Graph Z. Xu, S. T. Hedetniemi, W. Goddard, and P. K. Srimani Department of Computer Science Clemson University Clemson,

More information

Modification and Evaluation of Linux I/O Schedulers

Modification and Evaluation of Linux I/O Schedulers Modification and Evaluation of Linux I/O Schedulers 1 Asad Naweed, Joe Di Natale, and Sarah J Andrabi University of North Carolina at Chapel Hill Abstract In this paper we present three different Linux

More information

Mining High Average-Utility Itemsets

Mining High Average-Utility Itemsets Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 Mining High Itemsets Tzung-Pei Hong Dept of Computer Science and Information Engineering

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Rank Measures for Ordering

Rank Measures for Ordering Rank Measures for Ordering Jin Huang and Charles X. Ling Department of Computer Science The University of Western Ontario London, Ontario, Canada N6A 5B7 email: fjhuang33, clingg@csd.uwo.ca Abstract. Many

More information