Improving MapReduce Energy Efficiency for Computation Intensive Workloads
|
|
- Claude Rose
- 6 years ago
- Views:
Transcription
1 Improving MapReduce Energy Efficiency for Computation Intensive Workloads Thomas Wirtz and Rong Ge Department of Mathematics, Statistics and Computer Science Marquette University Milwaukee, WI {thomas.wirtz, Abstract MapReduce is a programming model for data intensive computing on large-scale distributed systems. With its wide acceptance and deployment, improving the energy efficiency of MapReduce will lead to significant energy savings for data centers and computational grids. In this paper, we study the performance and energy efficiency of the Hadoop implementation of MapReduce under the context of energyproportional computing. We consider how MapReduce efficiency varies with two runtime configurations: resource allocation that changes the number of available concurrent workers, and DVFS (Dynamic Voltage and Frequency Scaling) that adjusts the processor frequency based on the workloads computational needs. Our experimental results indicate significant energy savings can be achieved from judicious resource allocation and intelligent DVFS scheduling for computation intensive applications, though the level of improvements depends on both workload characteristic of the MapReduce application and the policy of resource and DVFS scheduling. I. INTRODUCTION MapReduce [1] is a programming model for data intensive computing on large-scale distributed systems that supports automatic parallel processing of large data sets. With MapReduce framework, programmers can focus on application algorithm design without dealing with low-level workload distribution and management. Today, MapReduce based applications are widely deployed in many business and educational datacenters. With data volume doubling every three years [2], MapReduce will potentially become a major computing paradigm in future data centers. Energy efficient MapReduce is critical for green data centers. Currently, it was estimated that data centers account for 1.5% of the overall U.S. electricity use [3]. Electricity costs are already the second highest expense after labor costs in data centers [4]. Nevertheless, efficiency is not among the top MapReduce design constraints. In a MapReduce based application, computations are broken into many short-lived map and reduce tasks. Map tasks communicate with reduce tasks via intermediate results stored on distributed storage. Process management and local and remote disk I/O accesses are likely to cause both performance and energy inefficiencies for MapReduce applications. There is a large body of work in application of MapReduce programming models [5, 6], library support for various programming languages [7], or debugging and tracing tools for MapReduce framework [8, 9]. Nevertheless, little work has studied MapReduce energy efficiency in depth. GreenHDFS [10] separated cluster servers into cold and hot zones and placed data in these zones according to data classification. GreenHDFS conserved energy by transitioning the servers in the cold zone to high energy saving power states. Chen et al [11, 12] and Leverich et al [13] studied MapReduce energy efficiency with varying number of worker nodes and found energy-saving potentials for MapReduce applications. Our focus is on MapReduce energy efficiency for dataand computation-intensive applications. This work is motivated by: (1) a vast number of scientific applications become data intensive as available data grow explosively; and (2) many of these applications and supporting software are ported to the MapReduce framework [6, 14, 15]. Unlike traditional MapReduce applications, such applications have larger number of operations by byte and higher demand for computational power. As with other parallel applications, the performance and energy efficiency of MapReduce applications is affected by the degree of parallelism and computational intensity (i.e., the ratio of on-chip CPU computation to off-chip memory and I/O access). Given an application, the optimal efficiency will be achieved when resource allocation matches application characteristics. Specifically, the number of allocated processing cores matches the degree of parallelism of the application; and the processor performance state matches the application's computational intensity. In this work, we use an experimental approach to prove the above concept. We choose three MapReduce benchmark applications Matrix Multiplication, CloudBurst, and Integer Sort in our study. For each benchmark, we vary the number of concurrent workers and processor frequency and investigate how performance and energy efficiency scale. We evaluate the energy and efficiency results within the context of energy-proportional computing. Energyproportional computing [16] is an ideal computing environment whereas server power is directly proportional to the level of server utilization and is zero at idle state (i.e., no active user workload). Energy-proportional computing has been promoted and used to guide hardware and architecture design. To emulate an energy-proportional computing system, we use work induced power instead of total system power in our analysis. That is, we exclude the 1
2 system idle power and consider it zero. By excluding the idle power, we are able to better capture the direct cost of workload execution and the trend of energy change with runtime configurations. The paper makes the following main contributions. First, we experimentally demonstrate that performance and energy inefficiency is not uncommon in the MapReduce framework for computation intensive applications. Such inefficiency is due to the overhead of automatic parallelization and I/O accesses and cannot be simply reduced by application developers. Second, the degree of parallelism of applications significantly affects the performance and energy efficiency of MapReduce applications. To achieve overall higher efficiency on the MapReduce framework, we need to tailor the resource allocations (i.e., the number of processing cores) to the applications degree of parallelism. Third, we compare three DVFS scheduling policies and the resulting energy efficiency for MapReduce framework. Overall, DVFS is effective for energy savings. In general, low power DVFS scheduling policy is optimal for systems with small idle power, while performance-constrained DVFS scheduling policy is optimal for systems with dominating idle power. The remainder of this paper is organized as follows. The related work is discussed in the next section. We discuss the variables of energy efficiency in section 3, and present our methodology in section 4. The experimental results are presented in section 5. Section 6 concludes the paper. II. BACKGROUND AND RELATED WORK A. Background MapReduce is a programming model introduced by Google for processing large data sets in parallel [1]. In the MapReduce framework, large data files are stored across distributed storage devices in small, workable chunks. With this model, programmers specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. During the MapReduce job execution, the map tasks, as well as the subsequent reduce tasks, execute in parallel on different key/value pairs. The initial file data chunks are input to the map tasks. At the end of the job, the output is compiled into one or more files. Hadoop MapReduce [17] is an open source MapReduce framework implementation in Java. The Hadoop framework comprises MapReduce libraries, Hadoop distributed file system (HDFS), and supporting system services. The Hadoop MapReduce system normally consists of one JobTracker node and multiple TaskTracker Nodes. The JobTracker initializes jobs, adds tasks to a queue, and holds job and task status. A TaskTracker fetches tasks from the JobTracker. The HDFS contains one primary NameNode that holds file system metadata and keeps track of the placement of file chunks, and multiple instances of DataNode that store the chunks of file data. The automated parallel computing in MapReduce boosts software development productivity. However, it may also potentially compromise performance and energy efficiency because of large amount of extra disk and network IO accesses, short lived processes, and load imbalance. B. Related Work Driven by ever-increasing operating costs and awareness of energy conservation, researchers have been actively developing technology to understand and improve energy efficiency in data centers. Existing work includes fine grain power profiling [18], analytical power modeling [19], and power management [20, 21, 22]. DVFS technology available on modern processors was widely used for data center power management [20]. Other technologies, such as PowerNap [21] and varying number of active servers [22], have also been investigated for systems where idle power dominates and thus the benefit from DVFS is limited. Though significant research was done on MapReduce systems and applications [5, 6], only a few addressed the problem of MapReduce energy (in)efficiency. GreenHDFS [10] separated cluster servers into cold and hot zones and transitioned the servers in the cold zone to high energy saving power states. It was shown that running Hadoop cluster with partial system nodes could save energy with some performance tradeoffs for application [13]. Instead of using a covering set of nodes, another independent study indicated using all available nodes for workload execution and powering them off after job completion was favored in terms of energy cost [23]. More recently, Chen at al analyzed how MapReduce operating parameters affected energy efficiency [11]. Power management for MapReduce systems was also explored through data placement [24], virtual machine placement [25], and data compression [26]. Fig.1. A typical deployment of Hadoop framework. The JobTracker and HDFS NameNode may reside on the same physical nodes, and the TaskTrackers and DataNodes are distributed on the other nodes. 2
3 While our work is close to [13], they differ in at least two major aspects. First, our work studies how the energy efficiency of MapReduce varies with both system scale and DVFS scheduling. Second, it targets computation intensive applications, compared to traditional MapReduce application in [13]. With the increasing application of MapReduce to a wide range of high performance and data intensive problems, improving the energy efficiency of MapReduce for such applications is a necessity. III. MAPREDUCE ENERGY EFFICIENCY As one type of parallel programs, the performance of MapReduce applications can be described by the general parallel performance models. For simplicity, we abstract a computer cluster as a power-aware system characterized with three parameters,,, where is the total number of compute nodes in the system, is the total number of processor cores per node, and is the operating frequency of the processor cores. By this abstraction we confine our work in a homogeneous environment. Changing processor operating frequency also leads to the change of its voltage as holds for DVFS processors. For computation intensive applications, we assume is the number of physical cores (no virtual cores) on a node and there is at most one worker per core at a time instance. Let and be the execution times of a MapReduce application when running with 1 and worker nodes respectively, and be the fraction of the workload that is parallelizable, can be calculated as 1 (1) Eq. (1) describes the execution time of an ideal parallel computing algorithm where there is no parallel overhead. In MapReduce framework, parallel overhead results from initial and intermediate data distribution and possible load imbalance. Considering the parallel overhead, Eq. (1) can be rewritten in a speedup form as follows: 1 (2) Denoting, the average node power when the application running with 1 and worker nodes, and, as the corresponding energy consumption, we can combine Eq. (1) with the energy equation and have 1 (3) For a perfectly parallelizable case where 1 and 0, increasing the number of worker nodes leads to proportionally improved performance, consistent average node power, and constant energy. For most applications where 01 and 0 holds, increasing the number of worker nodes improves performance but also leads to more energy cost. The performance improvement diminishes due to the sequential part of the application and various overhead, while energy cost always increase. Given the trends of performance and energy cost as the number of worker nodes increases, we expect there exists an optimal number of worker nodes that delivers a maximum performance to energy ratio for a given MapReduce application. Eq. (1) and (3) also indicate the energy cost of a MapReduce application increases slower with the number of worker nodes for larger or smaller. While increasing the problem size will normally result in a larger, our focus is on the scheduling of an optimal number of nodes for a fixed-size problem with a fixed. We also confine our effort in this work in improving energy efficiency without modifying the MapReduce framework, though optimizing the MapReduce framework implementation has the potential to effectively reduce and thus improve energy efficiency. In addition to resource allocation and MapReduce optimization, dynamic voltage and frequency scaling (DVFS) provides a further opportunity to improve energy efficiency. Previous studies have shown that workloads involve both on-chip and off-chip accesses; changing CPU frequency only affects the performance of on-chip accesses. For a given workload, if the on-chip access portion accounts for a fraction of the total execution time at base frequency, then the total execution time at frequency will be: 1 (4) For computation intensive workload where 1, decreasing frequency increases total execution time. However, for communication and I/O intensive workload where 0, varying processor frequency only has a small impact on the actual total execution time. On the other hand, the dominating dynamic power of microprocessors is proportional to. Thus, halving will reduce the power to one-eighth for DVFS processors as. Therefore, DVFS is effective for reducing energy cost with minimum performance impact for workloads that are not computation intensive. Nonetheless, applying DVFS to MapReduce applications that are computation intensive is nontrivial given the mixture of CPU execution phases in map and reduce functions and IO phases due to the data distribution across the network. IV. METHODOLOGY In this work, we use an experimental approach to study how resource allocation and DVFS scheduling will affect energy efficiency for MapReduce applications. A. MapReduce Workloads We include three MapReduce benchmark applications in our experiments. The Matrix Multiplication and CloudBurst benchmarks represent MapReduce applications that are both computation-intensive and data-intensive. To reveal the system behavior of the shuffling phase in many MapReduce applications, we also include the Sort benchmark in the Hadoop distribution. Matrix Multiplication. Matrix Multiplication calculates the product, where and are 2 matrices. 3
4 The implementation used in this study is a blocking MapReduce algorithm in [27]. Blocking is a common performance optimization technique to take advantage of memory cache locality. With blocking, the factor matrices and are split into smaller sub-matrices such that the latter can fit into low latency memory caches. This MapReduce implementation consists of two jobs: the first job performs the block multiplications and the second job sums up the results. In job 1, the map tasks route a copy of each A or B sub-matrix to all the reduce tasks, and the reduce tasks perform the sub-matrices multiplications. Depending on the number of reduce tasks and the number of sub-matrices, a reduce task may calculate one or more product sub-matrices. This strategy makes good use of parallelism at the expense of network traffic. In job 2, an identity map task reads from an input split, which is the output of reduce tasks in job 1, and a reduce task sums up the items for the same submatrix. To reduce the network traffic during the sort and shuffle phase, Combiner is used in the implementation. CloudBurst. CloudBurst [28] implements a MapReduce based parallel BLAST sequence alignment algorithm. It allows efficient mapping of reads to reference genomes with a smaller number of differences. The input of the program is comprised of two multi-fasta binary files in Hadoop SequenceFile format: one containing reads and the other containing one or more reference sequences. The output is all alignments for each read with up to a user-specified number of differences including both mismatches and indels. The program has three phases: map, shuffle, and reduce. The map task emits k-mers as keys for every k-mer in the reference and all non-overlapping k-mers in the reads. During the shuffle phase the k-mers shared by the reads and the references are grouped. The reduce task extends the seeds into end-to-end alignments allowing for a fixed number of mismatches or indels. Sort. The MapReduce Sort program performs a partial sort of its input data. This sort program simply uses the map/reduce framework to sort the input directory into the output directory. Each map task is the predefined IdentityMapper and each reduce task is the predefined IdentityReducer, both of which pass their inputs directly to the output. The full input dataset is transferred and sorted during the shuffle phase between the map and reduce tasks. Sort is a very useful benchmark for studying the shuffle phase, which exists in many MapReduce applications. B. Energy Management Parameter Space As discussed in Section III, the performance and energy of MapReduce applications are affected by two major factors: the number of concurrent workers, i.e., the number of worker nodes times the number of workers per node, and, the processor frequency on each work node. 1) The number of concurrent workers In this work, we execute each benchmark with multiple settings, where each setting is identified by a unique number of concurrent workers. The concurrency is determined by the number of worker nodes allocated and the number of concurrent tasks on each node. To maximize the performance and efficiency, we use all 8 processor cores (i.e., 8) on each node during benchmark runs. The concurrency ranges from 8 with 1 worker node to 56 with 7 worker nodes. We use hadoop-daemon.sh to control the TaskTracker on each compute node and give a delay of 15 minutes to allow Hadoop to recognize the active/inactive node. We repeat the experiments 5 times in each setting and use average performance and energy in the analysis. To ensure no extra disk and network I/O is introduced for the varying number of concurrent workers, data replication is set to 8 on our 8-node cluster. With this replica setting, each node has a copy of the required data in the local storage disk and accesses the data locally. For CloudBurst and Sort, the data is replicated prior to the job execution. For Matrix Multiplication, the data is generated on the fly. 2) The processor frequency The key of DVFS scheduling is to identify the workload phases and then adapt the processor frequency to match the computational demand of each phase. In this work, we analyze and identify the workload phases and corresponding performance and energy use by tracing system activities. Specifically, we trace CPU utilization, memory access, disk IO bandwidth, and network bandwidth on the worker nodes. We consider three DVFS scheduling policies: Fixed policy: a single processor frequency is used for all cores across the worker nodes during the entire execution. Adaptive I policy: based on workload phase heuristics observed from MapReduce application performance traces, we insert DVFS scheduling codes into MapReduce programs to adjust processor frequency during its execution. Specifically, this policy uses maximum processor frequency inside the map and reduce functions, and uses minimum processor frequency otherwise. Thus, the computations in the map and reduce tasks are with faster cores while I/O accesses are with slower cores for power reduction. The actual deployment of this policy on the Hadoop System is at job level. This is because Java based MapReduce framework lacks the capability to identify the specific physical core associated with a map/reducer task. Particularly, we set the affinity of the TaskTracker daemons to core 0 on each node, and fix its frequency at maximum speed. Then we apply the DVFS scaling to the remaining seven cores on each worker node. Adaptive II policy: This policy is performance-constraint and bounds the performance loss within a user specified value. The performance loss is relative to the performance at highest fixed processor frequency. In this work, we set the allowable performance loss 5%. With this constraint, a low processor frequency might not be scheduled for execution phases even if the resulting power reduction is much more than the performance loss. CPUMiser [29] implements this 4
5 policy. CPUMiser uses hardware performance counters to collect fine grain CPU activity information, and uses such information to predict the performance and identify target processor speed periodically at runtime. CPUMiser [29] runs on each node in the cluster and adapts the processor frequency of each core to applications' demand. C. Evaluation Metrics We use execution time () as performance metric and total system energy ( ) for energy metric. We also introduce two other metrics in our analysis. The first one is work-induced energy ( ), defined as: (5) The rationale of using work-induced energy in addition to total system energy lies in the fact that in today's data centers, idle power dominates system power consumption, accounting for up to 60% of the system power under load. Meanwhile, motivated by the concept of energyproportional computing [16], which essentially assumes zero idle power, many techniques are being developed to significantly reduce the idle power. Thus, we believe workinduced energy provides a direct indication of energy demand by the applications and workloads. The second metric is energy-performance efficiency, defined as the ratio of performance per Joule, or The metric measures how performance per Joule scales with the number of processor cores within the context of energy-proportional computing. 1 indicates constant performance per Joule, or performance grows with (6) the number of worker nodes at the same speed as energy consumption. 1 indicates performance grows faster than energy consumption. V. EXPERIMENTAL RESULTS A. Experimental Setup The experiments are conducted on an 8-node poweraware cluster with Gigabit Ethernet interconnection. Each node has dual AMD Opteron quad-core 2380 processors running Fedora Core 10 Linux. Each core has a 64KB L1 instruction cache, a 64KB L1 data cache, and a unified 512KB L2 cache. The four cores on the same chip share one 6MB L3 cache. The cluster supports DVFS with 4 frequencies: 0.8GHz, 1.3GHz, 1.8GHz, and 2.5GHz. Each node has one WD1600AYPS Raid Edition 7200rpm SATA hard drive. Hadoop, version , is running on the cluster. One of the nodes runs NameNode and JobTracker, and the other seven nodes serve as the DataNodes and perform map and reduce tasks. Unless explicitly stated, the number of concurrent workers on each node is eight. For Matrix Multiplication, the input matrices A and B are 2560 by 2560 matrices, and the sub-matrix size is 512 by 512. This matrix size provides sufficient load for all cores with the configured 4MB Hadoop file block size (dfs.block.size in hdfs-site.xml). For CloudBurst, the input is the 7.9 million sequencing reads publicly available from the 1000 Genomes Project (accession SRR NA12878) and the chromosome 1 human genome (NCBI Build 36.1). For Sort, we use the randomwriter method in hadoop-*-examples.jar to create seven random 10GB files. We use the PowerPack toolkit [18] to profile power and ( a ) ( b ) ( c ) Fig.2. The variations of performance and energy with the number of concurrent workers for Matrix Multiplication. (a) the normalized performance, energy, and efficiency against 8 workers, (b) the I/O traces and (c) the power traces and CPU utilization when n=48 and f=2.5ghz. ( a) ( b ) ( c ) Fig.3. The variations of performance and energy with the number of workers for CloudBurst. (a) the normalized performance, energy, and efficiency against 8 workers, (b) the I/O traces and (c) the power traces and CPU utilization when n=48 and f=2.5ghz. 5
6 energy. We attach three Watts Up? Pro USB power meters to three worker nodes and measure the total power of each node. We report the energy consumption of all worker nodes by averaging the measured energy and multiplying by 8. We exclude the energy of the NameNode and leave such investigation to a future study. From the recorded power profiles, work induced energy is calculated with Eq. (5). B. The Effects of the Number of Concurrent Workers Matrix Multiplication: As shown in Fig. 2(a), the execution time decreases when the number of concurrent workers increases. Due to the items in Eq. (2), a maximum relative speedup of 3.3, instead of an ideal speedup of 7, is achieved when 56. While total system energy increases significantly when more worker nodes are used for parallel programs due to system idle power, work-induced energy only increases slightly. The energy-performance efficiency increases with and achieves the maximum when 48. By allocating 48 concurrent workers on 6 nodes, we can achieve 3X speedup with 6.6% extra work-induced energy, or 2.8X efficiency using the metric defined in Eq. (6). To explain the above observation, we trace the CPU utilization, network and disk accesses during the execution. Fig. 2(c) shows two apparent low CPU utilization phases during the execution. The first matches the distribution of input data for the first MapReduce job and the second corresponds to the finishing of the first MapReduce job and the setting up for the second MapReduce job. There is a short period of low CPU utilization during the first job execution when the map tasks finish and the shuffle occurs. As the reduce task is computation intensive, a high CPU utilization is sustained for the second MapReduce job. Complementing CPU utilization, three I/O intensive phases are observed in Fig. 2(b). The first phase corresponds to the job initialization, and the last two correspond to the first and second MapReduce job respectively The power trace in Fig. 2(c) highlights how total power and idle power of a single node vary during the execution. The work-induced power is the difference between the total power and the idle power. The idle power is about 160 Watts and dominates the total power, even when the CPU utilization is close to 100%. The idle power is about twice the maximum work induced power when matrix multiplication program executes. This observation indicates effective power reduction technologies should consider reducing system idle power as a top priority. The workinduced power curve follows the same trend as CPU utilization. This figure also implies that within this experimental environment, the majority of work-induced power comes from CPU activity, and the memory and I/O activity only slightly change the total node power. CloudBurst: As shown in Fig.3a, Cloudburst achieves super-linear speedup with the number of concurrent workers because more data can be accessed in memory versus from disks with larger number of workers. With 48 concurrent workers, Cloudburst achieves a maximum speedup of 12X and a minimum work-induced energy 0.7X, resulting in an optimal efficiency value of In contrast to Matrix Multiplication, CloudBurst has better scalability in both performance and energy. Thus allocating more resources for CloudBurst is preferred. The system activity traces provided in Fig. 3(b)-(c) and MapReduce log files indicate there are two MapReduce jobs in this benchmark; each job consisting of a map, a shuffle, and a reduce phase. The first job accounts for 90% of the total execution time and the CPU utilization is high during most of map and reduce phases, except in the middle and the end of map tasks where CPU utilization oscillates around 20%. The I/O traces further reveal that network traffic and disk I/O accesses are high within the map and reduce phases. In addition, there are short periods with low CPU and I/O activities between two MapReduce jobs or different phases. These traces indicate that even though CloudBurst is computation intensive, its MapReduce implementation involves significant disk and network accesses and warrants energy efficiency optimization. Sort: Unlike the above two benchmarks, Sort does not scale well with the number of cores. As shown in Fig. 4(a), while the execution time gradually decreases when more cores are used, the maximum speedup is still less than 2. On the other hand, work-induced energy gradually increases with the number of concurrent workers. Sort also delivers its best efficiency at 48. System activity traces in Fig. 4 (b)-(c) reveal that disk and network accesses are very active during most of the execution period. These heavy I/O activities are responsible ( a ) ( b ) ( c ) Fig. 4. The variations of performance and energy with the number workers for Sort. (a) the normalized performance, energy, and efficiency against 8 workers, (b) the I/O traces and (c) the power traces and CPU utilization when n=48 and f=2.5ghz. 6
7 for a lower CPU utilization than previous two benchmarks. C. The Effects of Processor Frequency While the analysis in the previous section demonstrates that resource allocation is an effective approach to improve both performance and efficiency, it also points out that there are significant I/O activities within MapReduce applications. Provided that DVFS is a practical energy saving technology for non-cpu bound applications, we discuss how different DVFS scheduling policies presented in Section IV perform for MapReduce applications in this section. Fig. 5 shows the performance, energy, and efficiency when the three DVFS scheduling policies are applied to the benchmarks running with 56 concurrent workers. The first four groups correspond to fixed policy with 4 different frequencies: {2.5 GHz, 1.8 GHz, 1.3 GHz, and 0.8 GHz}. Adaptive I inserts DVFS control into the benchmark source code. Adaptive II uses CPUMiser to schedule the core frequencies. Fixed Policy: Overall, for all three benchmarks, a best efficiency has been observed when running the benchmarks at a fixed frequency, though the optimal frequency differs from code to code. For Matrix Multiplication, the optimal frequency is 1.8 GHz, at which there is 35% work-induced energy saving at the cost of 15% performance degradation, resulting in an efficiency number of For CloudBurst, 1.8 GHz also results in a best efficiency of 1.18, with 32% savings of work-induced energy at the cost of 24% performance loss. A more interesting result happens for Sort. At 1.3 GHz, it achieves an efficiency number of 1.33 with a 35% work-induced energy saving and a 4% performance gain. A performance gain from lower processor frequency has also been observed for NPB sorting benchmarks IS in our earlier work [30]. We believe this is a result of better matching between processor and system bus speeds. However, this explanation is not confirmed yet and we are still investigating it. While the results of fixed policy are promising, there are two major issues with it. First, it requires extensive performance and energy profiling. Second, the performance decrease is usually significant except for some rare cases such as the Sort benchmark. Adaptive I policy: With sufficient internal information about the workload, we expect the adaptive I policy to result in better efficiency improvement. However, the experiments show mixed results. For Matrix Multiplication, this policy reduces the work-induced energy by 19% at an expense of 17% performance degradation. For CloudBurst, it delivers a similar performance at 2.5 GHz and reduces the workinduced energy by 5%, which is equivalent to 3% total system energy saving. For Sort, the resulting performance and energy are similar to those achieved at 1.3 GHz. Adaptive II policy: Unlike the adaptive I policy, CPUMiser is implemented as a system software and adapts the processor frequency automatically without requiring code changes or performance profiling. Another unique feature of CPUMiser is that its performance control prevents some unacceptable cases such as large energy saving at the cost significant performance slowdown. The experimental results match our expectations. For Matrix Multiplication, the adaptive II policy reduces the work-induced energy by 23% with a 5% performance loss, improving the efficiency number by 23%. CPUMiser doesn t save energy for CloudBurst because lowering processor frequency would adversely degrade performance.. For Sort, CPUMiser delivers a same performance as ( a ) ( b ) ( c ) Fig. 5. The effects of various DVFS policies for Matrix Multiplication (a), Cloudburst (b) and Sort (c). ( a ) ( b ) ( c ) Fig. 6. The power traces under fixed 2.5GHz and Adaptive II DVFS scheduling policies for Matrix Multiplication (a), Cloudburst (b), and Sort (c). 7
8 2.5GHZ fixed policy with 4% induced energy reduction. Fig. 6 presents power traces of the three benchmarks with fixed 2.5 GHz and Adaptive II policies. The power traces with Adaptive II policy are identical to those at 2.5 GHz for Matrix Multiplication and CloudBurst, except some shift due to lower processor frequency and lower power consumption for idle or non-cpu intensive phases. For Sort, CPUMiser schedules processor frequency to lower values to save energy. The traces also reveal that as CPUMiser seeks performance oriented energy savings, it works best for current systems with large idle power but might not the best for future energy-proportional computing systems. VI. SUMMARY In this work, we use an experimental approach to study the scalability of performance, energy, and efficiency of MapReduce for computation intensive workloads. Various system activity traces indicate that MapReduce involves significant I/O accesses and CPU underutilization is not uncommon for MapReduce applications, due to the demand of intensive disk and network I/O accesses, as well as the separation of map and reduce tasks. By analyzing how efficiency changes with the number of concurrent MapReduce workers and DVFS scheduling policies, we found that judicious resource allocation (i.e., node counts) and DVFS scheduling could effectively improve efficiency. During our studies, we also observed that performance constrained DVFS scheduling strategies work well on systems with dominating idle power. Nevertheless, they need to be re-evaluated on energy-proportional computing systems where performance and power are treated equally. REFERENCES [1] J. Dean, and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation (OSDI 2004), San Francisco, CA, 2004, pp [2] P. Dubey, "Recognition, Mining and Synthesis Moves Computers to the Era of Tera," Technology@Intel, [3] EPA, Report to Congress on Server and Data Center Energy Efficiency, Public Law , U.S., [4] Intel, "Increasing Data Center Density While Driving Down Power and Cooling Costs," ftp://download.intel.com/design/servers/technologies/thermal.pdf, 2006]. [5] J. Pan, Y. L. Biannic, and F. Magoulès, Parallelizing Multiple Group-by Query in Share-Nothing Environment: a MapReduce Study Case, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [6] S. Chen, and S. Schlosser, Map-Reduce Meets Wider Varieties of Applications, Intel, [7] S. Leo, and G. Zanetti, Pydoop: a Python MapReduce and HDFS API for Hadoop, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [8] D. Huang, X. Shi, S. Ibrahim et al., MR-Scope: a Real-Time Tracing Tool for MapReduce, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [9] J. Ekanayake, H. Li, B. Zhang et al., Twister: a Runtime for Iterative MapReduce, in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, 2010, pp [10] R. T. Kaushik, and M. Bhandarkar, GreenHDFS: towards an Energy- Conserving, Storage-Efficient, Hybrid Hadoop Compute Cluster, in Proceedings of the 2010 international conference on Power Aware Computing and Systems, Vancouver, BC, Canada, 2010, pp [11] Y. Chen, L. Keys, and R. H. Katz, Towards Energy Efficient MapReduce, UCB/EECS , EECS Department, University of California, Berkeley, [12] Y. Chen, A. Ganapathi, A. Fox et al., Statistical Workloads for Energy Efficiency MapReduce, UCB/EECS , University of California, Berkeley, [13] J. Leverich, and C. Kozyrakis, On the Energy (in)efficiency of Hadoop Clusters, SIGOPS Oper. Syst. Rev., vol. 44, no. 1, pp , [14] T. Hoefler, A. Lumsdaine, and J. Dongarra, Towards Efficient MapReduce Using MPI, in Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, Espoo, Finland, 2009, pp [15] C. Ranger, R. Raghuraman, A. Penmetsa et al., Evaluating MapReduce for Multi-Core and Multiprocessor Systems, in Proceedings of the 13th IEEE International Symposium on High Performance Computer Architecture, USA, 2007, pp. 12. [16] L. A. Barroso, and U. Hölzle, The Case for Energy-Proportional Computing, Computer, vol. 40, no. 12, pp , [17] "Hadoop webpage," [18] R. Ge, X. Feng, S. Song et al., PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications, IEEE Transactions on Parallel and Distributed Systems, vol. 21, no. 5, pp , [19] D. Economou, S. Rivoire, C. Kozyrakis et al., "Full-System Power Analysis and Modeling for Server Environments," Workshop on Modeling, Benchmarking, and Simulation (MoBS), [20] P. Bohrer, E. N. Elnozahy, T. Keller et al., "The Case for Power Management in Web Servers," Power aware computing, pp : Kluwer Academic Publishers, [21] D. Meisner, B. T. Gold, and T. F. Wenisch, PowerNap: Eliminating Server Idle Power, in Proceeding of the 14th international conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), Washington, DC, USA, 2009, pp [22] B. M. Oppenheim, "Reducing Cluster Power Consumption by Dynamically Suspending Idle Nodes," DigitalCommons@CalPoly, [23] W. Lang, and J. M. Patel, Energy Management for MapReduce Clusters, Proc. VLDB Endow., vol. 3, no. 1-2, pp , [24] J. Xie, S. Yin, X. Ruan et al., Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters, in Proceedings of the 19th International Heterogeneity in Computing Workshop, Atlanta, Georgia, [25] M. Cardosa, A. Singh, H. Pucha et al., "Exploiting Spatio-Temporal Tradeoffs for Energy Efficient MapReduce in the Cloud," Department of Computer Science and Engineering, University of Minnesota, [26] Y. Chen, A. Ganapathi, and R. H. Katz, To Compress or Not to Compress - Compute vs. IO Tradeoffs for Mapreduce Energy Efficiency, in Proceedings of the first ACM SIGCOMM workshop on Green networking, New Delhi, India, 2010, pp [27] J. Norstad. "A MapReduce Algorithm for Matrix Multiplication," 2010; [28] M. C. Schatz, CloudBurst: Highly Sensitive Read Mapping with MapReduce, Bioinformatics, vol. 25, no. 11, pp , [29] R. Ge, X. Feng, W.-c. Feng et al., CPU MISER: A Performance- Directed, Run-Time System for Power-Aware Clusters, in Proceedings of International Conference on Parallel Processing (ICPP 2007), 2007, pp [30] R. Ge, X. Feng, and K. W. Cameron, Performance-constrained Distributed DVS Scheduling for Scientific Applications on Poweraware Clusters, in Proceedings of the 2005 ACM/IEEE conference on Supercomputing, 2005, pp
ABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationConfiguring a MapReduce Framework for Dynamic and Efficient Energy Adaptation
Configuring a MapReduce Framework for Dynamic and Efficient Energy Adaptation Jessica Hartog, Zacharia Fadika, Elif Dede, Madhusudhan Govindaraju Department of Computer Science, State University of New
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationOptimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*
Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationA2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications
A2E: Adaptively Aggressive Energy Efficient DVFS Scheduling for Data Intensive Applications Li Tan 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California, Riverside 2 Texas
More informationDynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce
Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,
More informationMapReduce: A Programming Model for Large-Scale Distributed Computation
CSC 258/458 MapReduce: A Programming Model for Large-Scale Distributed Computation University of Rochester Department of Computer Science Shantonu Hossain April 18, 2011 Outline Motivation MapReduce Overview
More informationCAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters
2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing : Exploring Memory Locality for Big Data Analytics in Virtualized Clusters Eunji Hwang, Hyungoo Kim, Beomseok Nam and Young-ri
More informationSRCMap: Energy Proportional Storage using Dynamic Consolidation
SRCMap: Energy Proportional Storage using Dynamic Consolidation By: Akshat Verma, Ricardo Koller, Luis Useche, Raju Rangaswami Presented by: James Larkby-Lahet Motivation storage consumes 10-25% of datacenter
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationBIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest
More informationPerformance Analysis in the Real World of Online Services
Performance Analysis in the Real World of Online Services Dileep Bhandarkar, Ph. D. Distinguished Engineer 2009 IEEE International Symposium on Performance Analysis of Systems and Software My Background:
More informationSurvey on MapReduce Scheduling Algorithms
Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used
More informationcalled Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil
Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate
More informationMixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp
MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp Hadoop Pig, Hive Hadoop + Enterprise storage?! Shared storage
More informationAccelerating Parallel Analysis of Scientific Simulation Data via Zazen
Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationPresented by: Nafiseh Mahmoudi Spring 2017
Presented by: Nafiseh Mahmoudi Spring 2017 Authors: Publication: Type: ACM Transactions on Storage (TOS), 2016 Research Paper 2 High speed data processing demands high storage I/O performance. Flash memory
More informationA Cool Scheduler for Multi-Core Systems Exploiting Program Phases
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 5, MAY 2014 1061 A Cool Scheduler for Multi-Core Systems Exploiting Program Phases Zhiming Zhang and J. Morris Chang, Senior Member, IEEE Abstract Rapid growth
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationForget about the Clouds, Shoot for the MOON
Forget about the Clouds, Shoot for the MOON Wu FENG feng@cs.vt.edu Dept. of Computer Science Dept. of Electrical & Computer Engineering Virginia Bioinformatics Institute September 2012, W. Feng Motivation
More informationA Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data
A Survey on Parallel Rough Set Based Knowledge Acquisition Using MapReduce from Big Data Sachin Jadhav, Shubhangi Suryawanshi Abstract Nowadays, the volume of data is growing at an nprecedented rate, big
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationAccelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationImproved MapReduce k-means Clustering Algorithm with Combiner
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationThe Fusion Distributed File System
Slide 1 / 44 The Fusion Distributed File System Dongfang Zhao February 2015 Slide 2 / 44 Outline Introduction FusionFS System Architecture Metadata Management Data Movement Implementation Details Unique
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationRESTORE: REUSING RESULTS OF MAPREDUCE JOBS. Presented by: Ahmed Elbagoury
RESTORE: REUSING RESULTS OF MAPREDUCE JOBS Presented by: Ahmed Elbagoury Outline Background & Motivation What is Restore? Types of Result Reuse System Architecture Experiments Conclusion Discussion Background
More informationTowards Energy Efficient MapReduce
Towards Energy Efficient MapReduce Yanpei Chen Laura Keys Randy H. Katz Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2009-109 http://www.eecs.berkeley.edu/pubs/techrpts/2009/eecs-2009-109.html
More informationMVAPICH2 vs. OpenMPI for a Clustering Algorithm
MVAPICH2 vs. OpenMPI for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland, Baltimore
More informationOVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI
CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationPer-call Energy Saving Strategies in All-to-all Communications
Computer Science Technical Reports Computer Science 2011 Per-call Energy Saving Strategies in All-to-all Communications Vaibhav Sundriyal Iowa State University, vaibhavs@iastate.edu Masha Sosonkina Iowa
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationDept. Of Computer Science, Colorado State University
CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [HADOOP/HDFS] Trying to have your cake and eat it too Each phase pines for tasks with locality and their numbers on a tether Alas within a phase, you get one,
More informationJULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING
JULIA ENABLED COMPUTATION OF MOLECULAR LIBRARY COMPLEXITY IN DNA SEQUENCING Larson Hogstrom, Mukarram Tahir, Andres Hasfura Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 18.337/6.338
More informationAN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang
International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More information6.2 DATA DISTRIBUTION AND EXPERIMENT DETAILS
Chapter 6 Indexing Results 6. INTRODUCTION The generation of inverted indexes for text databases is a computationally intensive process that requires the exclusive use of processing resources for long
More informationEnergy Conservation In Computational Grids
Energy Conservation In Computational Grids Monika Yadav 1 and Sudheer Katta 2 and M. R. Bhujade 3 1 Department of Computer Science and Engineering, IIT Bombay monika@cse.iitb.ac.in 2 Department of Electrical
More informationChisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique
Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationBatch Inherence of Map Reduce Framework
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287
More informationHP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON
HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON Li Tan 1, Longxiang Chen 1, Zizhong Chen 1, Ziliang Zong 2, Rong Ge 3, and Dong Li 4 1 University of California,
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationParallel Performance Studies for a Clustering Algorithm
Parallel Performance Studies for a Clustering Algorithm Robin V. Blasberg and Matthias K. Gobbert Naval Research Laboratory, Washington, D.C. Department of Mathematics and Statistics, University of Maryland,
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationExploiting Heterogeneity in the Public Cloud for Cost-Effective Data Analytics
Exploiting Heterogeneity in the Public Cloud for Cost-Effective Data Analytics Gunho Lee, Byung-Gon Chun, Randy H. Katz University of California, Berkeley, Intel Labs Berkeley Abstract Data analytics are
More informationInfiniBand SDR, DDR, and QDR Technology Guide
White Paper InfiniBand SDR, DDR, and QDR Technology Guide The InfiniBand standard supports single, double, and quadruple data rate that enables an InfiniBand link to transmit more data. This paper discusses
More informationImproving Throughput in Cloud Storage System
Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because
More informationJumbo: Beyond MapReduce for Workload Balancing
Jumbo: Beyond Reduce for Workload Balancing Sven Groot Supervised by Masaru Kitsuregawa Institute of Industrial Science, The University of Tokyo 4-6-1 Komaba Meguro-ku, Tokyo 153-8505, Japan sgroot@tkl.iis.u-tokyo.ac.jp
More informationLEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud
LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He*, Qi Li # Huazhong University of Science and Technology *Nanyang Technological
More informationMEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS
MEMORY/RESOURCE MANAGEMENT IN MULTICORE SYSTEMS INSTRUCTOR: Dr. MUHAMMAD SHAABAN PRESENTED BY: MOHIT SATHAWANE AKSHAY YEMBARWAR WHAT IS MULTICORE SYSTEMS? Multi-core processor architecture means placing
More informationptop: A Process-level Power Profiling Tool
ptop: A Process-level Power Profiling Tool Thanh Do, Suhib Rawshdeh, and Weisong Shi Wayne State University {thanh, suhib, weisong}@wayne.edu ABSTRACT We solve the problem of estimating the amount of energy
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationHigh Throughput WAN Data Transfer with Hadoop-based Storage
High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San
More informationCS Project Report
CS7960 - Project Report Kshitij Sudan kshitij@cs.utah.edu 1 Introduction With the growth in services provided over the Internet, the amount of data processing required has grown tremendously. To satisfy
More informationADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT
ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationBigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13
Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University
More informationPerformance Modeling and Analysis of Flash based Storage Devices
Performance Modeling and Analysis of Flash based Storage Devices H. Howie Huang, Shan Li George Washington University Alex Szalay, Andreas Terzis Johns Hopkins University MSST 11 May 26, 2011 NAND Flash
More informationMitigating Data Skew Using Map Reduce Application
Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,
More informationManaging Performance Variance of Applications Using Storage I/O Control
Performance Study Managing Performance Variance of Applications Using Storage I/O Control VMware vsphere 4.1 Application performance can be impacted when servers contend for I/O resources in a shared storage
More informationLITERATURE SURVEY (BIG DATA ANALYTICS)!
LITERATURE SURVEY (BIG DATA ANALYTICS) Applications frequently require more resources than are available on an inexpensive machine. Many organizations find themselves with business processes that no longer
More informationEnergy-Efficient Cloud Computing: Techniques &
Energy-Efficient Cloud Computing: Techniques & Tools Thomas Knauth 1 Energy-Efficiency in Data Centers Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431 2 Cloud Land 5th
More informationMultiprocessing and Scalability. A.R. Hurson Computer Science and Engineering The Pennsylvania State University
A.R. Hurson Computer Science and Engineering The Pennsylvania State University 1 Large-scale multiprocessor systems have long held the promise of substantially higher performance than traditional uniprocessor
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationMapReduce for Data Intensive Scientific Analyses
apreduce for Data Intensive Scientific Analyses Jaliya Ekanayake Shrideep Pallickara Geoffrey Fox Department of Computer Science Indiana University Bloomington, IN, 47405 5/11/2009 Jaliya Ekanayake 1 Presentation
More informationA Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop
A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,
More informationStorage Hierarchy Management for Scientific Computing
Storage Hierarchy Management for Scientific Computing by Ethan Leo Miller Sc. B. (Brown University) 1987 M.S. (University of California at Berkeley) 1990 A dissertation submitted in partial satisfaction
More informationClustering Lecture 8: MapReduce
Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data
More informationEFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD
EFFICIENT ALLOCATION OF DYNAMIC RESOURCES IN A CLOUD S.THIRUNAVUKKARASU 1, DR.K.P.KALIYAMURTHIE 2 Assistant Professor, Dept of IT, Bharath University, Chennai-73 1 Professor& Head, Dept of IT, Bharath
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationGlobal Journal of Engineering Science and Research Management
A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationExperiences with the Parallel Virtual File System (PVFS) in Linux Clusters
Experiences with the Parallel Virtual File System (PVFS) in Linux Clusters Kent Milfeld, Avijit Purkayastha, Chona Guiang Texas Advanced Computing Center The University of Texas Austin, Texas USA Abstract
More informationOn the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters
1 On the Comparative Performance of Parallel Algorithms on Small GPU/CUDA Clusters N. P. Karunadasa & D. N. Ranasinghe University of Colombo School of Computing, Sri Lanka nishantha@opensource.lk, dnr@ucsc.cmb.ac.lk
More informationEnosis: Bridging the Semantic Gap between
Enosis: Bridging the Semantic Gap between File-based and Object-based Data Models Anthony Kougkas - akougkas@hawk.iit.edu, Hariharan Devarajan, Xian-He Sun Outline Introduction Background Approach Evaluation
More informationI. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS
Performance Analysis of Java NativeThread and NativePthread on Win32 Platform Bala Dhandayuthapani Veerasamy Research Scholar Manonmaniam Sundaranar University Tirunelveli, Tamilnadu, India dhanssoft@gmail.com
More informationA Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the Cloud
Calhoun: The NPS Institutional Archive Faculty and Researcher Publications Faculty and Researcher Publications 2013-03 A Multilevel Secure MapReduce Framework for Cross-Domain Information Sharing in the
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationEvaluating Private Information Retrieval on the Cloud
Evaluating Private Information Retrieval on the Cloud Casey Devet University ofwaterloo cjdevet@cs.uwaterloo.ca Abstract The goal of Private Information Retrieval (PIR) is for a client to query a database
More informationAnalysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark
Analysis of Extended Performance for clustering of Satellite Images Using Bigdata Platform Spark PL.Marichamy 1, M.Phil Research Scholar, Department of Computer Application, Alagappa University, Karaikudi,
More informationA Lost Cycles Analysis for Performance Prediction using High-Level Synthesis
A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,
More informationMicrosoft Exchange Server 2010 workload optimization on the new IBM PureFlex System
Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System Best practices Roland Mueller IBM Systems and Technology Group ISV Enablement April 2012 Copyright IBM Corporation, 2012
More informationMapReduce. U of Toronto, 2014
MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationDesigning Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,
More informationPRELIMINARY RESULTS: MODELING RELATION BETWEEN TOTAL EXECUTION TIME OF MAPREDUCE APPLICATIONS AND NUMBER OF MAPPERS/REDUCERS
SCHOOL OF INFORMATION TECHNOLOGIES PRELIMINARY RESULTS: MODELING RELATION BETWEEN TOTAL EXECUTION TIME OF MAPREDUCE APPLICATIONS AND NUMBER OF MAPPERS/REDUCERS TECHNICAL REPORT 679 NIKZAD BABAII RIZVANDI,
More informationA priority based dynamic bandwidth scheduling in SDN networks 1
Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems
More informationModification and Evaluation of Linux I/O Schedulers
Modification and Evaluation of Linux I/O Schedulers 1 Asad Naweed, Joe Di Natale, and Sarah J Andrabi University of North Carolina at Chapel Hill Abstract In this paper we present three different Linux
More informationThe Use of Cloud Computing Resources in an HPC Environment
The Use of Cloud Computing Resources in an HPC Environment Bill, Labate, UCLA Office of Information Technology Prakashan Korambath, UCLA Institute for Digital Research & Education Cloud computing becomes
More informationMap Reduce Group Meeting
Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for
More information