Improving MapReduce Energy Efficiency for Computation Intensive Workloads

Size: px
Start display at page:

Download "Improving MapReduce Energy Efficiency for Computation Intensive Workloads"

Transcription

1 Improving MapReduce Energy Efficiency for Computation Intensive Workloads Thomas Wirtz and Rong Ge Department of Mathematics, Statistics and Computer Science Marquette University Milwaukee, WI {thomas.wirtz, Abstract MapReduce is a programming model for data intensive computing on large-scale distributed systems. With its wide acceptance and deployment, improving the energy efficiency of MapReduce will lead to significant energy savings for data centers and computational grids. In this paper, we study the performance and energy efficiency of the Hadoop implementation of MapReduce under the context of energyproportional computing. We consider how MapReduce efficiency varies with two runtime configurations: resource allocation that changes the number of available concurrent workers, and DVFS (Dynamic Voltage and Frequency Scaling) that adjusts the processor frequency based on the workloads computational needs. Our experimental results indicate significant energy savings can be achieved from judicious resource allocation and intelligent DVFS scheduling for computation intensive applications, though the level of improvements depends on both workload characteristic of the MapReduce application and the policy of resource and DVFS scheduling. I. INTRODUCTION MapReduce [1] is a programming model for data intensive computing on large-scale distributed systems that supports automatic parallel processing of large data sets. With MapReduce framework, programmers can focus on application algorithm design without dealing with low-level workload distribution and management. Today, MapReduce based applications are widely deployed in many business and educational datacenters. With data volume doubling every three years [2], MapReduce will potentially become a major computing paradigm in future data centers. Energy efficient MapReduce is critical for green data centers. Currently, it was estimated that data centers account for 1.5% of the overall U.S. electricity use [3]. Electricity costs are already the second highest expense after labor costs in data centers [4]. Nevertheless, efficiency is not among the top MapReduce design constraints. In a MapReduce based application, computations are broken into many short-lived map and reduce tasks. Map tasks communicate with reduce tasks via intermediate results stored on distributed storage. Process management and local and remote disk I/O accesses are likely to cause both performance and energy inefficiencies for MapReduce applications. There is a large body of work in application of MapReduce programming models [5, 6], library support for various programming languages [7], or debugging and tracing tools for MapReduce framework [8, 9]. Nevertheless, little work has studied MapReduce energy efficiency in depth. GreenHDFS [10] separated cluster servers into cold and hot zones and placed data in these zones according to data classification. GreenHDFS conserved energy by transitioning the servers in the cold zone to high energy saving power states. Chen et al [11, 12] and Leverich et al [13] studied MapReduce energy efficiency with varying number of worker nodes and found energy-saving potentials for MapReduce applications. Our focus is on MapReduce energy efficiency for dataand computation-intensive applications. This work is motivated by: (1) a vast number of scientific applications become data intensive as available data grow explosively; and (2) many of these applications and supporting software are ported to the MapReduce framework [6, 14, 15]. Unlike traditional MapReduce applications, such applications have larger number of operations by byte and higher demand for computational power. As with other parallel applications, the performance and energy efficiency of MapReduce applications is affected by the degree of parallelism and computational intensity (i.e., the ratio of on-chip CPU computation to off-chip memory and I/O access). Given an application, the optimal efficiency will be achieved when resource allocation matches application characteristics. Specifically, the number of allocated processing cores matches the degree of parallelism of the application; and the processor performance state matches the application's computational intensity. In this work, we use an experimental approach to prove the above concept. We choose three MapReduce benchmark applications Matrix Multiplication, CloudBurst, and Integer Sort in our study. For each benchmark, we vary the number of concurrent workers and processor frequency and investigate how performance and energy efficiency scale. We evaluate the energy and efficiency results within the context of energy-proportional computing. Energyproportional computing [16] is an ideal computing environment whereas server power is directly proportional to the level of server utilization and is zero at idle state (i.e., no active user workload). Energy-proportional computing has been promoted and used to guide hardware and architecture design. To emulate an energy-proportional computing system, we use work induced power instead of total system power in our analysis. That is, we exclude the 1

2 system idle power and consider it zero. By excluding the idle power, we are able to better capture the direct cost of workload execution and the trend of energy change with runtime configurations. The paper makes the following main contributions. First, we experimentally demonstrate that performance and energy inefficiency is not uncommon in the MapReduce framework for computation intensive applications. Such inefficiency is due to the overhead of automatic parallelization and I/O accesses and cannot be simply reduced by application developers. Second, the degree of parallelism of applications significantly affects the performance and energy efficiency of MapReduce applications. To achieve overall higher efficiency on the MapReduce framework, we need to tailor the resource allocations (i.e., the number of processing cores) to the applications degree of parallelism. Third, we compare three DVFS scheduling policies and the resulting energy efficiency for MapReduce framework. Overall, DVFS is effective for energy savings. In general, low power DVFS scheduling policy is optimal for systems with small idle power, while performance-constrained DVFS scheduling policy is optimal for systems with dominating idle power. The remainder of this paper is organized as follows. The related work is discussed in the next section. We discuss the variables of energy efficiency in section 3, and present our methodology in section 4. The experimental results are presented in section 5. Section 6 concludes the paper. II. BACKGROUND AND RELATED WORK A. Background MapReduce is a programming model introduced by Google for processing large data sets in parallel [1]. In the MapReduce framework, large data files are stored across distributed storage devices in small, workable chunks. With this model, programmers specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. During the MapReduce job execution, the map tasks, as well as the subsequent reduce tasks, execute in parallel on different key/value pairs. The initial file data chunks are input to the map tasks. At the end of the job, the output is compiled into one or more files. Hadoop MapReduce [17] is an open source MapReduce framework implementation in Java. The Hadoop framework comprises MapReduce libraries, Hadoop distributed file system (HDFS), and supporting system services. The Hadoop MapReduce system normally consists of one JobTracker node and multiple TaskTracker Nodes. The JobTracker initializes jobs, adds tasks to a queue, and holds job and task status. A TaskTracker fetches tasks from the JobTracker. The HDFS contains one primary NameNode that holds file system metadata and keeps track of the placement of file chunks, and multiple instances of DataNode that store the chunks of file data. The automated parallel computing in MapReduce boosts software development productivity. However, it may also potentially compromise performance and energy efficiency because of large amount of extra disk and network IO accesses, short lived processes, and load imbalance. B. Related Work Driven by ever-increasing operating costs and awareness of energy conservation, researchers have been actively developing technology to understand and improve energy efficiency in data centers. Existing work includes fine grain power profiling [18], analytical power modeling [19], and power management [20, 21, 22]. DVFS technology available on modern processors was widely used for data center power management [20]. Other technologies, such as PowerNap [21] and varying number of active servers [22], have also been investigated for systems where idle power dominates and thus the benefit from DVFS is limited. Though significant research was done on MapReduce systems and applications [5, 6], only a few addressed the problem of MapReduce energy (in)efficiency. GreenHDFS [10] separated cluster servers into cold and hot zones and transitioned the servers in the cold zone to high energy saving power states. It was shown that running Hadoop cluster with partial system nodes could save energy with some performance tradeoffs for application [13]. Instead of using a covering set of nodes, another independent study indicated using all available nodes for workload execution and powering them off after job completion was favored in terms of energy cost [23]. More recently, Chen at al analyzed how MapReduce operating parameters affected energy efficiency [11]. Power management for MapReduce systems was also explored through data placement [24], virtual machine placement [25], and data compression [26]. Fig.1. A typical deployment of Hadoop framework. The JobTracker and HDFS NameNode may reside on the same physical nodes, and the TaskTrackers and DataNodes are distributed on the other nodes. 2

3 While our work is close to [13], they differ in at least two major aspects. First, our work studies how the energy efficiency of MapReduce varies with both system scale and DVFS scheduling. Second, it targets computation intensive applications, compared to traditional MapReduce application in [13]. With the increasing application of MapReduce to a wide range of high performance and data intensive problems, improving the energy efficiency of MapReduce for such applications is a necessity. III. MAPREDUCE ENERGY EFFICIENCY As one type of parallel programs, the performance of MapReduce applications can be described by the general parallel performance models. For simplicity, we abstract a computer cluster as a power-aware system characterized with three parameters,,, where is the total number of compute nodes in the system, is the total number of processor cores per node, and is the operating frequency of the processor cores. By this abstraction we confine our work in a homogeneous environment. Changing processor operating frequency also leads to the change of its voltage as holds for DVFS processors. For computation intensive applications, we assume is the number of physical cores (no virtual cores) on a node and there is at most one worker per core at a time instance. Let and be the execution times of a MapReduce application when running with 1 and worker nodes respectively, and be the fraction of the workload that is parallelizable, can be calculated as 1 (1) Eq. (1) describes the execution time of an ideal parallel computing algorithm where there is no parallel overhead. In MapReduce framework, parallel overhead results from initial and intermediate data distribution and possible load imbalance. Considering the parallel overhead, Eq. (1) can be rewritten in a speedup form as follows: 1 (2) Denoting, the average node power when the application running with 1 and worker nodes, and, as the corresponding energy consumption, we can combine Eq. (1) with the energy equation and have 1 (3) For a perfectly parallelizable case where 1 and 0, increasing the number of worker nodes leads to proportionally improved performance, consistent average node power, and constant energy. For most applications where 01 and 0 holds, increasing the number of worker nodes improves performance but also leads to more energy cost. The performance improvement diminishes due to the sequential part of the application and various overhead, while energy cost always increase. Given the trends of performance and energy cost as the number of worker nodes increases, we expect there exists an optimal number of worker nodes that delivers a maximum performance to energy ratio for a given MapReduce application. Eq. (1) and (3) also indicate the energy cost of a MapReduce application increases slower with the number of worker nodes for larger or smaller. While increasing the problem size will normally result in a larger, our focus is on the scheduling of an optimal number of nodes for a fixed-size problem with a fixed. We also confine our effort in this work in improving energy efficiency without modifying the MapReduce framework, though optimizing the MapReduce framework implementation has the potential to effectively reduce and thus improve energy efficiency. In addition to resource allocation and MapReduce optimization, dynamic voltage and frequency scaling (DVFS) provides a further opportunity to improve energy efficiency. Previous studies have shown that workloads involve both on-chip and off-chip accesses; changing CPU frequency only affects the performance of on-chip accesses. For a given workload, if the on-chip access portion accounts for a fraction of the total execution time at base frequency, then the total execution time at frequency will be: 1 (4) For computation intensive workload where 1, decreasing frequency increases total execution time. However, for communication and I/O intensive workload where 0, varying processor frequency only has a small impact on the actual total execution time. On the other hand, the dominating dynamic power of microprocessors is proportional to. Thus, halving will reduce the power to one-eighth for DVFS processors as. Therefore, DVFS is effective for reducing energy cost with minimum performance impact for workloads that are not computation intensive. Nonetheless, applying DVFS to MapReduce applications that are computation intensive is nontrivial given the mixture of CPU execution phases in map and reduce functions and IO phases due to the data distribution across the network. IV. METHODOLOGY In this work, we use an experimental approach to study how resource allocation and DVFS scheduling will affect energy efficiency for MapReduce applications. A. MapReduce Workloads We include three MapReduce benchmark applications in our experiments. The Matrix Multiplication and CloudBurst benchmarks represent MapReduce applications that are both computation-intensive and data-intensive. To reveal the system behavior of the shuffling phase in many MapReduce applications, we also include the Sort benchmark in the Hadoop distribution. Matrix Multiplication. Matrix Multiplication calculates the product, where and are 2 matrices. 3

4 The implementation used in this study is a blocking MapReduce algorithm in [27]. Blocking is a common performance optimization technique to take advantage of memory cache locality. With blocking, the factor matrices and are split into smaller sub-matrices such that the latter can fit into low latency memory caches. This MapReduce implementation consists of two jobs: the first job performs the block multiplications and the second job sums up the results. In job 1, the map tasks route a copy of each A or B sub-matrix to all the reduce tasks, and the reduce tasks perform the sub-matrices multiplications. Depending on the number of reduce tasks and the number of sub-matrices, a reduce task may calculate one or more product sub-matrices. This strategy makes good use of parallelism at the expense of network traffic. In job 2, an identity map task reads from an input split, which is the output of reduce tasks in job 1, and a reduce task sums up the items for the same submatrix. To reduce the network traffic during the sort and shuffle phase, Combiner is used in the implementation. CloudBurst. CloudBurst [28] implements a MapReduce based parallel BLAST sequence alignment algorithm. It allows efficient mapping of reads to reference genomes with a smaller number of differences. The input of the program is comprised of two multi-fasta binary files in Hadoop SequenceFile format: one containing reads and the other containing one or more reference sequences. The output is all alignments for each read with up to a user-specified number of differences including both mismatches and indels. The program has three phases: map, shuffle, and reduce. The map task emits k-mers as keys for every k-mer in the reference and all non-overlapping k-mers in the reads. During the shuffle phase the k-mers shared by the reads and the references are grouped. The reduce task extends the seeds into end-to-end alignments allowing for a fixed number of mismatches or indels. Sort. The MapReduce Sort program performs a partial sort of its input data. This sort program simply uses the map/reduce framework to sort the input directory into the output directory. Each map task is the predefined IdentityMapper and each reduce task is the predefined IdentityReducer, both of which pass their inputs directly to the output. The full input dataset is transferred and sorted during the shuffle phase between the map and reduce tasks. Sort is a very useful benchmark for studying the shuffle phase, which exists in many MapReduce applications. B. Energy Management Parameter Space As discussed in Section III, the performance and energy of MapReduce applications are affected by two major factors: the number of concurrent workers, i.e., the number of worker nodes times the number of workers per node, and, the processor frequency on each work node. 1) The number of concurrent workers In this work, we execute each benchmark with multiple settings, where each setting is identified by a unique number of concurrent workers. The concurrency is determined by the number of worker nodes allocated and the number of concurrent tasks on each node. To maximize the performance and efficiency, we use all 8 processor cores (i.e., 8) on each node during benchmark runs. The concurrency ranges from 8 with 1 worker node to 56 with 7 worker nodes. We use hadoop-daemon.sh to control the TaskTracker on each compute node and give a delay of 15 minutes to allow Hadoop to recognize the active/inactive node. We repeat the experiments 5 times in each setting and use average performance and energy in the analysis. To ensure no extra disk and network I/O is introduced for the varying number of concurrent workers, data replication is set to 8 on our 8-node cluster. With this replica setting, each node has a copy of the required data in the local storage disk and accesses the data locally. For CloudBurst and Sort, the data is replicated prior to the job execution. For Matrix Multiplication, the data is generated on the fly. 2) The processor frequency The key of DVFS scheduling is to identify the workload phases and then adapt the processor frequency to match the computational demand of each phase. In this work, we analyze and identify the workload phases and corresponding performance and energy use by tracing system activities. Specifically, we trace CPU utilization, memory access, disk IO bandwidth, and network bandwidth on the worker nodes. We consider three DVFS scheduling policies: Fixed policy: a single processor frequency is used for all cores across the worker nodes during the entire execution. Adaptive I policy: based on workload phase heuristics observed from MapReduce application performance traces, we insert DVFS scheduling codes into MapReduce programs to adjust processor frequency during its execution. Specifically, this policy uses maximum processor frequency inside the map and reduce functions, and uses minimum processor frequency otherwise. Thus, the computations in the map and reduce tasks are with faster cores while I/O accesses are with slower cores for power reduction. The actual deployment of this policy on the Hadoop System is at job level. This is because Java based MapReduce framework lacks the capability to identify the specific physical core associated with a map/reducer task. Particularly, we set the affinity of the TaskTracker daemons to core 0 on each node, and fix its frequency at maximum speed. Then we apply the DVFS scaling to the remaining seven cores on each worker node. Adaptive II policy: This policy is performance-constraint and bounds the performance loss within a user specified value. The performance loss is relative to the performance at highest fixed processor frequency. In this work, we set the allowable performance loss 5%. With this constraint, a low processor frequency might not be scheduled for execution phases even if the resulting power reduction is much more than the performance loss. CPUMiser [29] implements this 4

5 policy. CPUMiser uses hardware performance counters to collect fine grain CPU activity information, and uses such information to predict the performance and identify target processor speed periodically at runtime. CPUMiser [29] runs on each node in the cluster and adapts the processor frequency of each core to applications' demand. C. Evaluation Metrics We use execution time () as performance metric and total system energy ( ) for energy metric. We also introduce two other metrics in our analysis. The first one is work-induced energy ( ), defined as: (5) The rationale of using work-induced energy in addition to total system energy lies in the fact that in today's data centers, idle power dominates system power consumption, accounting for up to 60% of the system power under load. Meanwhile, motivated by the concept of energyproportional computing [16], which essentially assumes zero idle power, many techniques are being developed to significantly reduce the idle power. Thus, we believe workinduced energy provides a direct indication of energy demand by the applications and workloads. The second metric is energy-performance efficiency, defined as the ratio of performance per Joule, or The metric measures how performance per Joule scales with the number of processor cores within the context of energy-proportional computing. 1 indicates constant performance per Joule, or performance grows with (6) the number of worker nodes at the same speed as energy consumption. 1 indicates performance grows faster than energy consumption. V. EXPERIMENTAL RESULTS A. Experimental Setup The experiments are conducted on an 8-node poweraware cluster with Gigabit Ethernet interconnection. Each node has dual AMD Opteron quad-core 2380 processors running Fedora Core 10 Linux. Each core has a 64KB L1 instruction cache, a 64KB L1 data cache, and a unified 512KB L2 cache. The four cores on the same chip share one 6MB L3 cache. The cluster supports DVFS with 4 frequencies: 0.8GHz, 1.3GHz, 1.8GHz, and 2.5GHz. Each node has one WD1600AYPS Raid Edition 7200rpm SATA hard drive. Hadoop, version , is running on the cluster. One of the nodes runs NameNode and JobTracker, and the other seven nodes serve as the DataNodes and perform map and reduce tasks. Unless explicitly stated, the number of concurrent workers on each node is eight. For Matrix Multiplication, the input matrices A and B are 2560 by 2560 matrices, and the sub-matrix size is 512 by 512. This matrix size provides sufficient load for all cores with the configured 4MB Hadoop file block size (dfs.block.size in hdfs-site.xml). For CloudBurst, the input is the 7.9 million sequencing reads publicly available from the 1000 Genomes Project (accession SRR NA12878) and the chromosome 1 human genome (NCBI Build 36.1). For Sort, we use the randomwriter method in hadoop-*-examples.jar to create seven random 10GB files. We use the PowerPack toolkit [18] to profile power and ( a ) ( b ) ( c ) Fig.2. The variations of performance and energy with the number of concurrent workers for Matrix Multiplication. (a) the normalized performance, energy, and efficiency against 8 workers, (b) the I/O traces and (c) the power traces and CPU utilization when n=48 and f=2.5ghz. ( a) ( b ) ( c ) Fig.3. The variations of performance and energy with the number of workers for CloudBurst. (a) the normalized performance, energy, and efficiency against 8 workers, (b) the I/O traces and (c) the power traces and CPU utilization when n=48 and f=2.5ghz. 5

6 energy. We attach three Watts Up? Pro USB power meters to three worker nodes and measure the total power of each node. We report the energy consumption of all worker nodes by averaging the measured energy and multiplying by 8. We exclude the energy of the NameNode and leave such investigation to a future study. From the recorded power profiles, work induced energy is calculated with Eq. (5). B. The Effects of the Number of Concurrent Workers Matrix Multiplication: As shown in Fig. 2(a), the execution time decreases when the number of concurrent workers increases. Due to the items in Eq. (2), a maximum relative speedup of 3.3, instead of an ideal speedup of 7, is achieved when 56. While total system energy increases significantly when more worker nodes are used for parallel programs due to system idle power, work-induced energy only increases slightly. The energy-performance efficiency increases with and achieves the maximum when 48. By allocating 48 concurrent workers on 6 nodes, we can achieve 3X speedup with 6.6% extra work-induced energy, or 2.8X efficiency using the metric defined in Eq. (6). To explain the above observation, we trace the CPU utilization, network and disk accesses during the execution. Fig. 2(c) shows two apparent low CPU utilization phases during the execution. The first matches the distribution of input data for the first MapReduce job and the second corresponds to the finishing of the first MapReduce job and the setting up for the second MapReduce job. There is a short period of low CPU utilization during the first job execution when the map tasks finish and the shuffle occurs. As the reduce task is computation intensive, a high CPU utilization is sustained for the second MapReduce job. Complementing CPU utilization, three I/O intensive phases are observed in Fig. 2(b). The first phase corresponds to the job initialization, and the last two correspond to the first and second MapReduce job respectively The power trace in Fig. 2(c) highlights how total power and idle power of a single node vary during the execution. The work-induced power is the difference between the total power and the idle power. The idle power is about 160 Watts and dominates the total power, even when the CPU utilization is close to 100%. The idle power is about twice the maximum work induced power when matrix multiplication program executes. This observation indicates effective power reduction technologies should consider reducing system idle power as a top priority. The workinduced power curve follows the same trend as CPU utilization. This figure also implies that within this experimental environment, the majority of work-induced power comes from CPU activity, and the memory and I/O activity only slightly change the total node power. CloudBurst: As shown in Fig.3a, Cloudburst achieves super-linear speedup with the number of concurrent workers because more data can be accessed in memory versus from disks with larger number of workers. With 48 concurrent workers, Cloudburst achieves a maximum speedup of 12X and a minimum work-induced energy 0.7X, resulting in an optimal efficiency value of In contrast to Matrix Multiplication, CloudBurst has better scalability in both performance and energy. Thus allocating more resources for CloudBurst is preferred. The system activity traces provided in Fig. 3(b)-(c) and MapReduce log files indicate there are two MapReduce jobs in this benchmark; each job consisting of a map, a shuffle, and a reduce phase. The first job accounts for 90% of the total execution time and the CPU utilization is high during most of map and reduce phases, except in the middle and the end of map tasks where CPU utilization oscillates around 20%. The I/O traces further reveal that network traffic and disk I/O accesses are high within the map and reduce phases. In addition, there are short periods with low CPU and I/O activities between two MapReduce jobs or different phases. These traces indicate that even though CloudBurst is computation intensive, its MapReduce implementation involves significant disk and network accesses and warrants energy efficiency optimization. Sort: Unlike the above two benchmarks, Sort does not scale well with the number of cores. As shown in Fig. 4(a), while the execution time gradually decreases when more cores are used, the maximum speedup is still less than 2. On the other hand, work-induced energy gradually increases with the number of concurrent workers. Sort also delivers its best efficiency at 48. System activity traces in Fig. 4 (b)-(c) reveal that disk and network accesses are very active during most of the execution period. These heavy I/O activities are responsible ( a ) ( b ) ( c ) Fig. 4. The variations of performance and energy with the number workers for Sort. (a) the normalized performance, energy, and efficiency against 8 workers, (b) the I/O traces and (c) the power traces and CPU utilization when n=48 and f=2.5ghz. 6

7 for a lower CPU utilization than previous two benchmarks. C. The Effects of Processor Frequency While the analysis in the previous section demonstrates that resource allocation is an effective approach to improve both performance and efficiency, it also points out that there are significant I/O activities within MapReduce applications. Provided that DVFS is a practical energy saving technology for non-cpu bound applications, we discuss how different DVFS scheduling policies presented in Section IV perform for MapReduce applications in this section. Fig. 5 shows the performance, energy, and efficiency when the three DVFS scheduling policies are applied to the benchmarks running with 56 concurrent workers. The first four groups correspond to fixed policy with 4 different frequencies: {2.5 GHz, 1.8 GHz, 1.3 GHz, and 0.8 GHz}. Adaptive I inserts DVFS control into the benchmark source code. Adaptive II uses CPUMiser to schedule the core frequencies. Fixed Policy: Overall, for all three benchmarks, a best efficiency has been observed when running the benchmarks at a fixed frequency, though the optimal frequency differs from code to code. For Matrix Multiplication, the optimal frequency is 1.8 GHz, at which there is 35% work-induced energy saving at the cost of 15% performance degradation, resulting in an efficiency number of For CloudBurst, 1.8 GHz also results in a best efficiency of 1.18, with 32% savings of work-induced energy at the cost of 24% performance loss. A more interesting result happens for Sort. At 1.3 GHz, it achieves an efficiency number of 1.33 with a 35% work-induced energy saving and a 4% performance gain. A performance gain from lower processor frequency has also been observed for NPB sorting benchmarks IS in our earlier work [30]. We believe this is a result of better matching between processor and system bus speeds. However, this explanation is not confirmed yet and we are still investigating it. While the results of fixed policy are promising, there are two major issues with it. First, it requires extensive performance and energy profiling. Second, the performance decrease is usually significant except for some rare cases such as the Sort benchmark. Adaptive I policy: With sufficient internal information about the workload, we expect the adaptive I policy to result in better efficiency improvement. However, the experiments show mixed results. For Matrix Multiplication, this policy reduces the work-induced energy by 19% at an expense of 17% performance degradation. For CloudBurst, it delivers a similar performance at 2.5 GHz and reduces the workinduced energy by 5%, which is equivalent to 3% total system energy saving. For Sort, the resulting performance and energy are similar to those achieved at 1.3 GHz. Adaptive II policy: Unlike the adaptive I policy, CPUMiser is implemented as a system software and adapts the processor frequency automatically without requiring code changes or performance profiling. Another unique feature of CPUMiser is that its performance control prevents some unacceptable cases such as large energy saving at the cost significant performance slowdown. The experimental results match our expectations. For Matrix Multiplication, the adaptive II policy reduces the work-induced energy by 23% with a 5% performance loss, improving the efficiency number by 23%. CPUMiser doesn t save energy for CloudBurst because lowering processor frequency would adversely degrade performance.. For Sort, CPUMiser delivers a same performance as ( a ) ( b ) ( c ) Fig. 5. The effects of various DVFS policies for Matrix Multiplication (a), Cloudburst (b) and Sort (c). ( a ) ( b ) ( c ) Fig. 6. The power traces under fixed 2.5GHz and Adaptive II DVFS scheduling policies for Matrix Multiplication (a), Cloudburst (b), and Sort (c). 7

8 2.5GHZ fixed policy with 4% induced energy reduction. Fig. 6 presents power traces of the three benchmarks with fixed 2.5 GHz and Adaptive II policies. The power traces with Adaptive II policy are identical to those at 2.5 GHz for Matrix Multiplication and CloudBurst, except some shift due to lower processor frequency and lower power consumption for idle or non-cpu intensive phases. For Sort, CPUMiser schedules processor frequency to lower values to save energy. The traces also reveal that as CPUMiser seeks performance oriented energy savings, it works best for current systems with large idle power but might not the best for future energy-proportional computing systems. VI. SUMMARY In this work, we use an experimental approach to study the scalability of performance, energy, and efficiency of MapReduce for computation intensive workloads. Various system activity traces indicate that MapReduce involves significant I/O accesses and CPU underutilization is not uncommon for MapReduce applications, due to the demand of intensive disk and network I/O accesses, as well as the separation of map and reduce tasks. By analyzing how efficiency changes with the number of concurrent MapReduce workers and DVFS scheduling policies, we found that judicious resource allocation (i.e., node counts) and DVFS scheduling could effectively improve efficiency. During our studies, we also observed that performance constrained DVFS scheduling strategies work well on systems with dominating idle power. Nevertheless, they need to be re-evaluated on energy-proportional computing systems where performance and power are treated equally. REFERENCES [1] J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (OSDI 2004), San Francisco, CA, 2004, pp [2] P. Dubey, "Recognition, Mining and Synthesis Moves Computers to the Era of Tera," Technology@Intel, [3] EPA, Report to Congress on Server and Data Center Energy Efficiency, Public Law , U.S., [4] Intel, "Increasing Data Center Density While Driving Down Power and Cooling Costs," ftp://download.intel.com/design/servers/technologies/thermal.pdf, 2006]. [5] J. Pan, Y. L. Biannic, and F. Magoulès, Parallelizing Multiple Group-by Query in Share-Nothing Environment: a MapReduce Study Case, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [6] S. Chen, and S. Schlosser, Map-Reduce Meets Wider Varieties of Applications, Intel, [7] S. Leo, and G. Zanetti, Pydoop: a Python MapReduce and HDFS API for Hadoop, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [8] D. Huang, X. Shi, S. Ibrahim et al., MR-Scope: a Real-Time Tracing Tool for MapReduce, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [9] J. Ekanayake, H. Li, B. Zhang et al., Twister: a Runtime for Iterative MapReduce, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [10] R. T. Kaushik, and M. Bhandarkar, GreenHDFS: towards an Energy- Conserving, Storage-Efficient, Hybrid Hadoop Compute Cluster, in Proceedings of the 2010 international conference on Power Aware Computing and Systems, Vancouver, BC, Canada, 2010, pp [11] Y. Chen, L. Keys, and R. H. Katz, Towards Energy Efficient MapReduce, UCB/EECS , EECS Department, University of California, Berkeley, [12] Y. Chen, A. Ganapathi, A. Fox et al., Statistical Workloads for Energy Efficiency MapReduce, UCB/EECS , University of California, Berkeley, [13] J. Leverich, and C. Kozyrakis, On the Energy (in)efficiency of Hadoop Clusters, SIGOPS Oper. Syst. Rev., vol. 44, no. 1, pp , [14] T. Hoefler, A. Lumsdaine, and J. Dongarra, Towards Efficient MapReduce Using MPI, in Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Espoo, Finland, 2009, pp [15] C. Ranger, R. Raghuraman, A. Penmetsa et al., Evaluating MapReduce for Multi-Core and Multiprocessor Systems, in Proceedings of the 13th IEEE International Symposium on High Performance Computer Architecture, USA, 2007, pp. 12. [16] L. A. Barroso, and U. Hölzle, The Case for Energy-Proportional Computing, Computer, vol. 40, no. 12, pp , [17] "Hadoop webpage," [18] R. Ge, X. Feng, S. Song et al., PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications, IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 5, pp , [19] D. Economou, S. Rivoire, C. Kozyrakis et al., "Full-System Power Analysis and Modeling for Server Environments," Workshop on Modeling, Benchmarking, and Simulation (MoBS), [20] P. Bohrer, E. N. Elnozahy, T. Keller et al., "The Case for Power Management in Web Servers," Power aware computing, pp : Kluwer Academic Publishers, [21] D. Meisner, B. T. Gold, and T. F. Wenisch, PowerNap: Eliminating Server Idle Power, in Proceeding of the 14th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), Washington, DC, USA, 2009, pp [22] B. M. Oppenheim, "Reducing Cluster Power Consumption by Dynamically Suspending Idle Nodes," DigitalCommons@CalPoly, [23] W. Lang, and J. M. Patel, Energy Management for MapReduce Clusters, Proc. VLDB Endow., vol. 3, no. 1-2, pp , [24] J. Xie, S. Yin, X. Ruan et al., Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters, in Proceedings of the 19th International Heterogeneity in Computing Workshop, Atlanta, Georgia, [25] M. Cardosa, A. Singh, H. Pucha et al., "Exploiting Spatio-Temporal Tradeoffs for Energy Efficient MapReduce in the Cloud," Department of Computer Science and Engineering, University of Minnesota, [26] Y. Chen, A. Ganapathi, and R. H. Katz, To Compress or Not to Compress - Compute vs. IO Tradeoffs for Mapreduce Energy Efficiency, in Proceedings of the first ACM SIGCOMM workshop on Green networking, New Delhi, India, 2010, pp [27] J. Norstad. "A MapReduce Algorithm for Matrix Multiplication," 2010; [28] M. C. Schatz, CloudBurst: Highly Sensitive Read Mapping with MapReduce, Bioinformatics, vol. 25, no. 11, pp , [29] R. Ge, X. Feng, W.-c. Feng et al., CPU MISER: A Performance- Directed, Run-Time System for Power-Aware Clusters, in Proceedings of International Conference on Parallel Processing (ICPP 2007), 2007, pp [30] R. Ge, X. Feng, and K. W. Cameron, Performance-constrained Distributed DVS Scheduling for Scientific Applications on Poweraware Clusters, in Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005, pp

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve

More information

Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation

Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation Jessica Hartog, Zacharia Fadika, Elif Dede, Madhusudhan Govindaraju Department of Computer Science, State University of New

More information

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP

PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge

More information

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*

Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating

More information

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets

PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department

More information

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications

A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas

More information

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce

Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,

More information

MapReduce: A Programming Model for Large-Scale Distributed Computation

MapReduce: A Programming Model for Large-Scale Distributed Computation CSC 258/458 MapReduce: A Programming Model for Large-Scale Distributed Computation University of Rochester Department of Computer Science Shantonu Hossain April 18, 2011 Outline Motivation MapReduce Overview

More information

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters

CAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing : Exploring Memory Locality for Big Data Analytics in Virtualized Clusters Eunji Hwang, Hyungoo Kim, Beomseok Nam and Young-ri

More information

SRCMap: Energy Proportional Storage using Dynamic Consolidation

SRCMap: Energy Proportional Storage using Dynamic Consolidation SRCMap: Energy Proportional Storage using Dynamic Consolidation By: Akshat Verma, Ricardo Koller, Luis Useche, Raju Rangaswami Presented by: James Larkby-Lahet Motivation storage consumes 10-25% of datacenter

More information

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti

MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department

More information

Introduction to MapReduce

Introduction to MapReduce Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed

More information

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest

More information

Performance Analysis in the Real World of Online Services

Performance Analysis in the Real World of Online Services Performance Analysis in the Real World of Online Services Dileep Bhandarkar, Ph. D. Distinguished Engineer 2009 IEEE International Symposium on Performance Analysis of Systems and Software My Background:

More information

Survey on MapReduce Scheduling Algorithms

Survey on MapReduce Scheduling Algorithms Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used

More information

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil

called Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate

More information

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage

More information

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen

Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Presented by: Nafiseh Mahmoudi Spring 2017

Presented by: Nafiseh Mahmoudi Spring 2017 Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory

More information

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases

A Cool Scheduler for Multi-Core Systems Exploiting Program Phases IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth

More information

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering

More information

Forget about the Clouds, Shoot for the MOON

Forget about the Clouds, Shoot for the MOON Forget about the Clouds, Shoot for the MOON Wu FENG feng@cs.vt.edu Dept. of Computer Science Dept. of Electrical & Computer Engineering Virginia Bioinformatics Institute September 2012, W. Feng Motivation

More information

A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data

A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data Sachin Jadhav, Shubhangi Suryawanshi Abstract Nowadays, the volume of data is growing at an nprecedented rate, big

More information

Distributed Filesystem

Distributed Filesystem Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the

More information

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents

More information

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP

TITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop

More information

Improved MapReduce k-means Clustering Algorithm with Combiner

Improved MapReduce k-means Clustering Algorithm with Combiner 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

The Fusion Distributed File System

The Fusion Distributed File System Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique

More information

Crossing the Chasm: Sneaking a parallel file system into Hadoop

Crossing the Chasm: Sneaking a parallel file system into Hadoop Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large

More information

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury

RESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background

More information

Towards Energy Efficient MapReduce

Towards Energy Efficient MapReduce Towards Energy Efficient MapReduce Yanpei Chen Laura Keys Randy H. Katz Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-109 http://www.eecs.berkeley.edu/pubs/techrpts/2009/eecs-2009-109.html

More information

MVAPICH2 vs. OpenMPI for a Clustering Algorithm

MVAPICH2 vs. OpenMPI for a Clustering Algorithm MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland, Baltimore

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

Per-call Energy Saving Strategies in All-to-all Communications

Per-call Energy Saving Strategies in All-to-all Communications Computer Science Technical Reports Computer Science 2011 Per-call Energy Saving Strategies in All-to-all Communications Vaibhav Sundriyal Iowa State University, vaibhavs@iastate.edu Masha Sosonkina Iowa

More information

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela

More information

Dept. Of Computer Science, Colorado State University

Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,

More information

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING

JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338

More information

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang

AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA

More information

EsgynDB Enterprise 2.0 Platform Reference Architecture

EsgynDB Enterprise 2.0 Platform Reference Architecture EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed

More information

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS

6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long

More information

Energy Conservation In Computational Grids

Energy Conservation In Computational Grids Energy Conservation In Computational Grids Monika Yadav 1 and Sudheer Katta 2 and M. R. Bhujade 3 1 Department of Computer Science and Engineering, IIT Bombay monika@cse.iitb.ac.in 2 Department of Electrical

More information

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.

More information

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing

Serial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.

More information

Batch Inherence of Map Reduce Framework

Batch Inherence of Map Reduce Framework Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287

More information

HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON

HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON Li Tan 1, Longxiang Chen 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California,

More information

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad

More information

Parallel Performance Studies for a Clustering Algorithm

Parallel Performance Studies for a Clustering Algorithm Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

Exploiting Heterogeneity in the Public Cloud for Cost-Effective Data Analytics

Exploiting Heterogeneity in the Public Cloud for Cost-Effective Data Analytics Exploiting Heterogeneity in the Public Cloud for Cost-Effective Data Analytics Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Intel Labs Berkeley Abstract Data analytics are

More information

InfiniBand SDR, DDR, and QDR Technology Guide

InfiniBand SDR, DDR, and QDR Technology Guide White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses

More information

Improving Throughput in Cloud Storage System

Improving Throughput in Cloud Storage System Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because

More information

Jumbo: Beyond MapReduce for Workload Balancing

Jumbo: Beyond MapReduce for Workload Balancing Jumbo: Beyond Reduce for Workload Balancing Sven Groot Supervised by Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba Meguro-ku, Tokyo 153-8505, Japan sgroot@tkl.iis.u-tokyo.ac.jp

More information

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud

LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He*, Qi Li # Huazhong University of Science and Technology *Nanyang Technological

More information

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS

MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing

More information

ptop: A Process-level Power Profiling Tool

ptop: A Process-level Power Profiling Tool ptop: A Process-level Power Profiling Tool Thanh Do, Suhib Rawshdeh, and Weisong Shi Wayne State University {thanh, suhib, weisong}@wayne.edu ABSTRACT We solve the problem of estimating the amount of energy

More information

CLOUD-SCALE FILE SYSTEMS

CLOUD-SCALE FILE SYSTEMS Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients

More information

High Throughput WAN Data Transfer with Hadoop-based Storage

High Throughput WAN Data Transfer with Hadoop-based Storage High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San

More information

CS Project Report

CS Project Report CS7960 - Project Report Kshitij Sudan kshitij@cs.utah.edu 1 Introduction With the growth in services provided over the Internet, the amount of data processing required has grown tremendously. To satisfy

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

Performance Modeling and Analysis of Flash based Storage Devices

Performance Modeling and Analysis of Flash based Storage Devices Performance Modeling and Analysis of Flash based Storage Devices H. Howie Huang, Shan Li George Washington University Alex Szalay, Andreas Terzis Johns Hopkins University MSST 11 May 26, 2011 NAND Flash

More information

Mitigating Data Skew Using Map Reduce Application

Mitigating Data Skew Using Map Reduce Application Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,

More information

Managing Performance Variance of Applications Using Storage I/O Control

Managing Performance Variance of Applications Using Storage I/O Control Performance Study Managing Performance Variance of Applications Using Storage I/O Control VMware vsphere 4.1 Application performance can be impacted when servers contend for I/O resources in a shared storage

More information

LITERATURE SURVEY (BIG DATA ANALYTICS)!

LITERATURE SURVEY (BIG DATA ANALYTICS)! LITERATURE SURVEY (BIG DATA ANALYTICS) Applications frequently require more resources than are available on an inexpensive machine. Many organizations find themselves with business processes that no longer

More information

Energy-Efficient Cloud Computing: Techniques &

Energy-Efficient Cloud Computing: Techniques & Energy-Efficient Cloud Computing: Techniques & Tools Thomas Knauth 1 Energy-Efficiency in Data Centers Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431 2 Cloud Land 5th

More information

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University

Multiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

MapReduce for Data Intensive Scientific Analyses

MapReduce for Data Intensive Scientific Analyses apreduce for Data Intensive Scientific Analyses Jaliya Ekanayake Shrideep Pallickara Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, 47405 5/11/2009 Jaliya Ekanayake 1 Presentation

More information

A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop

A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,

More information

Storage Hierarchy Management for Scientific Computing

Storage Hierarchy Management for Scientific Computing Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction

More information

Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data

More information

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD

EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath

More information

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency

Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV

More information

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Google File System (GFS) and Hadoop Distributed File System (HDFS) Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear

More information

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters

Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract

More information

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters

On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters 1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk

More information

Enosis: Bridging the Semantic Gap between

Enosis: Bridging the Semantic Gap between Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation

More information

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com

More information

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud

A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud Calhoun: The NPS Institutional Archive Faculty and Researcher Publications Faculty and Researcher Publications 2013-03 A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the

More information

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING

CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe

More information

Evaluating Private Information Retrieval on the Cloud

Evaluating Private Information Retrieval on the Cloud Evaluating Private Information Retrieval on the Cloud Casey Devet University ofwaterloo cjdevet@cs.uwaterloo.ca Abstract The goal of Private Information Retrieval (PIR) is for a client to query a database

More information

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark

Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Best practices Roland Mueller IBM Systems and Technology Group ISV Enablement April 2012 Copyright IBM Corporation, 2012

More information

MapReduce. U of Toronto, 2014

MapReduce. U of Toronto, 2014 MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,

More information

PRELIMINARY RESULTS: MODELING RELATION BETWEEN TOTAL EXECUTION TIME OF MAPREDUCE APPLICATIONS AND NUMBER OF MAPPERS/REDUCERS

PRELIMINARY RESULTS: MODELING RELATION BETWEEN TOTAL EXECUTION TIME OF MAPREDUCE APPLICATIONS AND NUMBER OF MAPPERS/REDUCERS SCHOOL OF INFORMATION TECHNOLOGIES PRELIMINARY RESULTS: MODELING RELATION BETWEEN TOTAL EXECUTION TIME OF MAPREDUCE APPLICATIONS AND NUMBER OF MAPPERS/REDUCERS TECHNICAL REPORT 679 NIKZAD BABAII RIZVANDI,

More information

A priority based dynamic bandwidth scheduling in SDN networks 1

A priority based dynamic bandwidth scheduling in SDN networks 1 Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems

More information

Modification and Evaluation of Linux I/O Schedulers

Modification and Evaluation of Linux I/O Schedulers Modification and Evaluation of Linux I/O Schedulers 1 Asad Naweed, Joe Di Natale, and Sarah J Andrabi University of North Carolina at Chapel Hill Abstract In this paper we present three different Linux

More information

The Use of Cloud Computing Resources in an HPC Environment

The Use of Cloud Computing Resources in an HPC Environment The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes

More information

Map Reduce Group Meeting

Map Reduce Group Meeting Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for

More information