vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration
|
|
- Antonia Stevenson
- 6 years ago
- Views:
Transcription
1 2012 IEEE International Conference on Cluster Computing Workshops vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration Kejiang Ye, Xiaohong Jiang, Yanzhang He, Xiang Li, Haiming Yan, Peng Huang College of Computer Science, Zhejiang University Hangzhou , China {yekejiang,jiangxh,heyanzhang,lixiang,yanhaiming,huangpeng}@zju.edu.cn Abstract Big data processing is currently becoming increasingly important in modern era due to the continuous growth of the amount of data generated by various fields such as particle physics, human genomics, earth observation, etc. However, the efficiency of processing large-scale data on modern virtual infrastructure, especially on the virtualized cloud computing infrastructure, is not clear. This paper focuses on the performance of hadoop virtual cluster and proposes a scalable hadoop virtual cluster platform vhadoop for the large-scale MapReduce-based parallel data processing. We first describe the design and implementation of vhadoop platform. Then we perform a series of experiments to investigate both the static and dynamic performance of vhadoop platform, such as the performance characterization of cross-domain hadoop virtual cluster and live migraiton of hadoop virtual cluster. After that, we use the vhadoop platform to process 6 typical parallel clustering algorithms, such as Canopy, Dirichlet, Fuzzy k-means, k-means, MeanShift, MinHash, etc, on two typical datasets. Experimental results verify the efficiency of vhadoop platform to process the MapReduce-based parallel machine learning applications. Keywords-Hadoop; MapReduce; Virtual Cluster; Cloud Computing; Machine Learning; Big Data I. INTRODUCTION Big data [1] has recently received considerable attention due to the continuous growth of the amount of data generated by various fields such as particle physics, human genomics, earth observation, etc. How to compute, transfer, and store these huge data is a prominent challenge which will bring great impact on the traditional architectures and methods of computation, networking and storage. MapReduce [2] is an efficient parallel programming model for the dataintensive applications with the benefits such as simplicity, fault tolerance, and scalability. Hadoop [3] is the opensource implementation of MapReduce which can process hundreds of terabytes of data on at least 10,000 cores. This efficient parallel programming model also benefits the machine learning algorithms, such as clustering, classification, recommendations, to improve the processing efficiency on big data sets. Meanwhile, with rapid development of virtualization technology [4] and cloud computing technology [5], virtual machine () will be the basic computation unit in the cloud computing era to conduct computation in the future. Virtualization provides an abstraction of hardware resources enabling multiple instantiation of operating systems to run simultaneously on a single physical machine, i.e. server consolidation, to improve resource utilization [6]. Another prominent advantage of virtualization is the live migration technique which refers to the act of migrating a virtual machine from one physical machine to another even when the virtual machine continues to execute. This is an effective means to improve the dynamic manageability in the virtualized cloud computing data center [7, 8]. Although virtualization and MapReduce have been widely studied respectively for several years, there are relatively few studies on the combination of these two technologies together that running MapReduce applications on hadoop virtual cluster environment. As the cloud computing becomes more and more mature, big data processing on virtual infrastructure will become more and more common. There are several reasons for this trend: (i) Big data processing with high efficiency is a big challenge which needs to be executed on distributed platforms in parallel. MapReduce is a popular parallel computing framework for the big data processing. (ii) In the cloud era, resource virtualization is a typical feature that most of tasks will be executed on the virtual infrastructure. For example, users can simply rent a hadoop virtual cluster from Amazon EC2 cloud to run the MapReduce tasks without purchasing expensive physical servers. (iii) Virtualization holds many other benefits such as rapid startup, dynamic configuration, high scalability, etc. The hadoop virtual cluster can benefit from all the above advantages. (iv) Moving data to computing resources is more expensive than moving computing resources (such as ) to data due to the high overheads of transferring large amounts data. While virtual machine is more convenient to transfer (or migrate) from one physical machine to another with very low overheads. In this paper, we propose a scalable hadoop virtual cluster platform vhadoop for the large-scale MapReducebased parallel data processing with performance consideration. We first describe the design and implementation of vhadoop platform. Then we perform a series of experiments /12 $ IEEE DOI /ClusterW
2 nmon Monitor Machine Learning Algorithm Library MapReduce Tuner Folk Physical Machine A Assign Maps Master Assign Reduces Input Data Output Data Physical Machine B Map Phase Reduce Phase Figure 1. vhadoop Platform for the Parallel Machine Learning with Performance Consideration. to investigate both the static and dynamic performance of vhadoop, such as the performance characterization of cross-domain virtual cluster and virtual cluster migration. After that, we use the vhadoop platform to process several typical parallel clustering tasks, including Canopy, Dirichlet, Fuzzy k-means, k-means, MeanShift, MinHash, ontwo typical datasets. Experimental results verify the efficiency of vhadoop platform to process the MapReduce-based parallel machine learning applications. The rest of the paper is structured as follows. In Section II, we design and implement a platform vhadoop for the parallel machine learning on hadoop virtual cluster. In Section III, we study both the static and dynamic performance of vhadoop. In Section IV, we use the real parallel machine learning applications to verify the efficiency of vhadoop platform. Section V presents the related work. Finally we give our conclusion and future work in Section VI. II. VHADOOP PLATFORM In this section, we propose a platform vhadoop for the large-scale parallel machine learning on hadoop virtual cluster. A. System Architecture & Flow Figure 1 illustrates the vhadoop architecture for the parallel machine learning. It consists of five main modules: Virtualization Module, Hadoop Module, Machine Learning Algorithm Library, nmon Monitor, MapReduce Tuner. All the five modules corporate with each other to provide a scalable hadoop virtual cluster platform for parallel machine learning. The vhadoop execution flow is shown as follows: 1) Machine Learning Algorithm Library triggers and sends a hadoop virtual cluster request. 2) The Virtualization Module calls and starts a hadoop virtual cluster. 3) The Hadoop Module configures the hadoop parameters, such as master and worker virtual machines. 4) The input data is prepared by uploading to the Hadoop Distributed File System (HDFS). 5) The master virtual machine assigns maps and reduces to the worker virtual machines. 6) Perform the mapping operation. 7) Perform the reducing operation. 8) Collect and analyze the output data. When the whole process begins, both the master virtual machine and worker virtual machines are monitored by the nmon Monitor. 9) The vhadoop performance can be adjusted by the MapReduce Tuner based on the monitoring data. B. Platform Design & Implementation Virtualization Module: is the basic module to implement the resource virtualization. By using the virtualization technology, one physical machine can be shared by several virtual machines. We currently use Xen [4] as infrastructure virtualization layer. Xen supports live migration of virtual machines which is often used to achieve the goal of load balancing, energy saving, and online maintains. Hadoop Module: is responsible for the initial configuration of hadoop virtual cluster. The parameters include: the name of master node and work nodes, dfs.replication, dfs.block.size, map.tasks.maximum, reduce.tasks.maximum, etc. We currently configure Hadoop in images of vhadoop. Machine Learning Algorithm Library: is the library for MapReduce-based parallel machine learning algorithms, including clustering, classification, recommendations. There 153
3 are various algorithms being categorized into the above three categories. For example, Canopy, Dirichlet, Fuzzy k-means, k-means, MeanShift, MinHash, used in this paper, can be categorized to the clustering algorithms. We construct the algorithm library based on the Mahout 1 library which is an open-source machine learning library on hadoop. nmon Monitor: is responsible for monitoring the resource status of both the master virtual machine and worker virtual machines. The utilization of CPU, memory, disk, and network are all monitored. Performance bottleneck can be found by analyzing the monitored data. nmon 2 is an opensource performance monitor for the traditional Linux system. It monitors the comprehensive performance of the Linux system. We extend it to our distributed vhadoop platform to monitor the node performance in parallel. nmon analyser is another tool to generate graphics by using the nmon output files. MapReduce Tuner: is responsible for tuning the configuration parameters of hadoop virtual cluster. The adjustment can be done according to the results generated by the nmon Monitor. It can be implemented by re-configuring the parameters of vhadoop platform or using the live migration technique to dynamically adjust the vhadoop configurations. III. PERFORMANCE ANALYSIS OF HADOOP VIRTUAL CLUSTER In this section, we study both the static and dynamic performance of hadoop virtual cluster. In the static performance analysis, we mainly study the performance of crossdomain hadoop virtual cluster and the scalability of hadoop virtual cluster. While in the dynamic performance analysis, we investigate the live migration performance of hadoop virtual cluster. A. Experimental Configuration 1) Hadoop Virtual Cluster Configuration: All the experiments are performed on Dell T710 servers, with 2 Quad-core 64-bit Xeon processors E5620 at 2.40GHz and 32GB DRAM. We use CentOS 5.6 with kernel version e15xen in Domain 0, and Xen as the virtualization hypervisor. Each virtual machine is installed with Ubuntu 8.10 as the guest OS with the configuration of 1VCPU and 1024MB vmemory. The Hadoop version is , the Mahout version is 0.6. All the virtual machine images are stored on a separate NFS server. 2) MapReduce-based Benchmarks: We choose four typical MapReduce-based benchmarks to test the MapReduce and HDFS performance of hadoop virtual cluster. Table I describes the four benchmarks. The Wordcount benchmark reads text files and counts how often words occur. Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the Table I MAPREDUCE-BASED PARALLEL BENCHMARKS Name Category Description Wordcount MapReduce Reads text files and counts how often words occur MRBench MapReduce Checks whether small job runs are responsive and running efficiently on the cluster TeraSort MapReduce Sorts the data as fast as possible, & HDFS combining testing the HDFS and MapReduce layers DFSIOTest HDFS Is a read and write test for HDFS Figure 2. Performance Comparison of Wordcount Benchmark between Normal and Cross-Domain Hadoop Virtual Cluster. word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum. The MRBench benchmark [9] checks whether small jobs are responsive and running efficiently on the cluster. It focuses on the MapReduce layer since its impact on the HDFS layer is very limited. The TeraSort benchmark is to sort 1TB of data (or any other amount of data you want) as fast as possible. It is a benchmark that combines testing the HDFS and MapReduce layers of an hadoop cluster. A full TeraSort benchmark run consists of the following three steps: (i) Generating the input data via TeraGen. (ii) Running the actual TeraSort on the input data. (iii) Validating the sorted output data via TeraValidate. The TestDFSIO benchmark is a read and write test for HDFS. It is helpful for tasks such as stress testing HDFS, to discover performance bottlenecks in the network. 3) Live Migration Benchmark: To measure the migration performance and overheads of hadoop virtual cluster, we extend our formal Virt-LM Benchmark [10] from single virtual machine migration to multiple virtual machines (virtual cluster) migration which can record the migration time and downtime of each virtual machine and the whole virtual cluster. 154
4 (a) Map Scales (b) Reduce Scales Figure 3. Performance Comparison of MRBench Benchmark between Normal and Cross-Domain Hadoop Virtual Cluster. (a) TeraSort Test (b) DFSIO Test Figure 4. Performance Comparison of TeraSort and DFSIO Benchmarks between Normal and Cross-Domain Hadoop Virtual Cluster. 4) Experimental Precision: In order to ensure the data precision, each of the showed experimental results were obtained via running benchmarks three times with the same configuration and average the three values. B. Static Performance Analysis Due to the large size of virtual cluster and the limited resources in physical machine, a virtual cluster may cross multiple domains (physical machines). We create 16-node hadoop virtual cluster (1 namenode and 15 datanode) to compare the performance of cross-domain hadoop virtual cluster with normal hadoop virtual cluster. In the crossdomain case, 16 virtual machines are distributed equally to the two physical machines, while in the normal case, all the 16 virtual machines are distributed to only one physical machine. Figure 2 shows the Wordcount performance when running on normal and cross-domain hadoop virtual cluster with 16 nodes. The input data is the chosen from the TOEFL (The Test of English as a Foreign Language) reading materials. From the figure, it is obviously that the running time increases as the size of input data scales. Further, the crossdomain hadoop virtual cluster acquires poor performance compared to the normal case which means the MapReduce performance can be obviously affected by the cross-domain configuration due to the increase of network I/O delay. Figure 3 shows the MRBench performance. In Figure 3(a), we set the reduce=1 and scale the number of maps from 1 to 6, while in Figure 3(b), we set the map=15, and scale the number of reduces from 1 to 6. From the figure, we find that as the number of maps and reduces scales, the running time increases quickly. It is because the concurrent running 155
5 (a) Migration Time. (b) Downtime. Figure 5. The Migration Overheads of Idle and Wordcount Hadoop Virtual Cluster with Different DRAM Configurations. will cause the network congestion, thus leading the longer execution time. The performance of cross-domain hadoop virtual cluster is worse than the normal case which is similar to the phenomenon of Figure 2. Figure 4(a) shows both the data generation time and the sort time of TeraSort benchmark. From the figure, we find that when the data size is small, both the data generation time and sort time is relatively small. However, when the data size exceeds 400MB, the running time increases quickly. The performance of cross-domain hadoop virtual cluster is relatively worse. Figure 4(b) shows the DFS performance with DFSIO benchmark. From the figure, we can find that read throughput is better than write throughput. The performance of cross-domain hadoop virtual cluster is worse than the normal case. Discussion From the above analysis, we find that when the data size and concurrent number are small, the performance of cross-domain and normal case are very similar. The gap will become increasingly evident as the data size or concurrent number scales. The reason is that, when the data size and concurrent number scales, the network communication overheads become the main bottleneck itself. The distribution of virtual machines across multiple domains will further affect the network communication performance, thereby affecting the performance of the MapReduce applications. C. Dynamic Performance Analysis Live migration is a key ingredient behind the management activities of cloud computing system to achieve the goals of load balancing, energy saving, failure recovery, and system maintenance. Figure 5 shows the migration time and downtime of each node in the 16-node hadoop virtual cluster which migrates Table II OVERALL MIGRATION TIME AND DOWNTIME OF 16-NODE HADOOP VIRTUAL CLUSTER Overall Migration Overall Time (s) Downtime (ms) idle.1024mb idle.512mb wordcount.1024mb wordcount.512mb from one physical machine to the other. From the figure, we can get the following observations: (i) The larger the memory incurs the longer the migration time will be, while the downtime doesn t has the causal relationship with the size of memory. (ii) Compared with the idle hadoop virtual cluster, the migration time of hadoop virtual cluster running Wordcount benchmark is slightly longer than that of idle hadoop virtual cluster. However, the downtime of hadoop virtual cluster running Wordcount benchmark is much longer than that of idle hadoop virtual cluster. (iii) The downtime of each node in the hadoop virtual cluster running Wordcount benchmark varies widely because of the imbalance of each node in the hadoop virtual cluster. Table II shows the overall migration time and downtime of the whole hadoop virtual cluster. The migration time of hadoop virtual cluster running Wordcount benchmark is about three times of that of idle hadoop virtual cluster. While the downtime of hadoop virtual cluster running Wordcount benchmark is about 13 times of that of idle hadoop virtual cluster. Discussion From the above analysis, we find that live migration of hadoop virtual cluster incurs some overheads, especially the downtime. Fortunately, it is tolerable for the hadoop virtual cluster due to efficient fault tolerant mechanism in hadoop itself. The unavailable service during 156
6 Figure 6. Parallel Clustering on Synthetic Control Data Set with Different Hadoop Virtual Cluster Scales. the period of downtime can be restored by re-sending the requests or obtaining from other available data block copies. Despite a long downtime, the MapReduce workloads can be successfully finished. IV. PARALLEL MACHINE LEARNING ON HADOOP VIRTUAL CLUSTER In this section, we run several typical parallel clustering algorithms on two data sets to illustrate the efficiency of running parallel machine learning on the vhadoop platform. A. MapReduce-based Clustering Algorithms Canopy Clustering is a very simple, fast and accurate method for grouping objects into clusters. All objects are represented as a point in a multidimensional feature space. Canopy Clustering is often used as an initial step in more rigorous clustering techniques, such as K-Means Clustering. k-means Clustering is a rather simple but well known algorithm for grouping objects. All objects need to be represented as a set of numerical features. In addition, the user has to specify the number of groups (referred to as k) he/she wishes to identify. Fuzzy k-means Clustering is an extension of K-Means, the popular simple clustering technique. While K-Means discovers hard clusters (a point belong to only one cluster), Fuzzy K-Means is a more statistically formalized method and discovers soft clusters where a particular point can belong to more than one cluster with certain probability. Mean Shift Clustering produces arbitrarily-shaped clusters depending upon the topology of the data without a priori knowledge of the number of clusters (as required in K- Means). Dirichlet Process Clustering performs Bayesian mixture modeling. Minhash Clustering performs probabilistic dimension reduction of high dimensional data. The essence of the technique is to hash each item using multiple independent hash functions such that the probability of collision of similar items is higher. Multiple such hash tables can then be constructed to answer near neighbor types of queries efficiently. B. Clustering on Synthetic Control Chart Time Series Data Set The Synthetic Control Chart Time Series Data Set 1 contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos in There are six different classes of control charts: (i) Normal, (ii) Cyclic, (iii) Increasing trend, (iv) Decreasing trend, (v) Upward shift, and (vi) Downward shift. We use this real data set to perform the MapReduce-based parallel machine learning on the vhadoop platform. Figure 6 shows the parallel clustering results on the synthetic control chart time series data set with different hadoop virtual cluster sizes. From the figure, we find that the running time of all the three clustering algorithms - canopy, dirichlet, meanshift - increase as the hadoop virtual cluster scales from 2 nodes (1 namenode and 1 datanode) to 16 nodes (1 namenode and 15 datanode). Because the size of data set is fixed, the larger virtual cluster size incurs more data communication between each node in the hadoop virtual cluster. C. Visualizing Sample Clustering We use the DisplayClustering to generates 1000 samples from three symmetric distributions. The data set can be used by the other clustering programs. It displays the points on
7 a screen and superimposes the model parameters that were used to generate the points. Figure 7 shows the visualization sample clustering results on the vhadoop platform with different cluster sizes. Compared to Figure 6, the visualizing sample clustering performs relatively smooth as the size of hadoop virtual cluster scales from 2 to 16. It is because, the workload of visualizing sample clustering is relatively light and can be finished quickly, thereby didn t cause too much pressure on the network. Figure 8(a)-(f) show the screenshot of sample points and clustering results with different clustering algorithms. They display the sample points and then superimpose all of the clusters from each iteration. The last iteration s clustering results are in bold red and the previous several results are colored (orange, yellow, green, blue, magenta) in order after which all earlier clusters are in light grey. This helps to visualize how the clusters converge upon a solution over multiple iterations. V. RELATED WORK Virtualization technology is currently becoming increasingly popular as a core technology to implement the cloud computing paradigm. Many efforts have been made to study the performance characterization of virtualization, including performance evaluation [11, 12], performance modeling [13 15], and performance optimization [16, 17]. Server consolidation [6] is one of the most important application scenario of virtualization to improve the resource utilization. While the live migration technique [18 21] is often used to achieve the goal of load balancing, energy saving [22], online maintenance, etc, in the cloud computing environments. MapReduce technology is an efficient technique to process huge amount of data in parallel. Kambatla et al. [23] optimized the hadoop provisioning in the cloud to reduce the cost and improve the performance. Ibrahimet et al. compared the performance of hadoop cluster on virtual machines and physical machines and found that running MapReduce application on virtual machines incurs additional performance degradation compared to the case that running on physical machines [24]. They also discussed the issues of implementing MapReduce on virtual machines by decoupling the storage unit from the computation unit to reduce the disk I/O overheads [25, 26]. Zaharia et al. pointed out the virtual machine interference, especially the network I/O interference, is the main reason causing the performance degradation of MapReduce system [27]. However, they only focus on the static performance analysis and have not referred to the dynamic performance, i.e. the live migration performance of hadoop virtual cluster. Further, they don t refer to the problem of parallel machine learning on the hadoop virtual cluster which is becoming increasing important in the big data processing on virtualized cloud computing infrastructures. VI. CONCLUSION In this paper, we study the performance and efficiency of running MapReduce-based parallel machine learning applications on hadoop virtual cluster. We first propose a scalable hadoop virtual cluster platform vhadoop for the parallel machine learning with performance consideration through binding the nmon performance monitor, mahout machine learning library, and MapReduce tuner on Xen virtualization platform. Then we perform a series of experiments to investigate both the static and dynamic performance of hadoop virtual cluster, such as the performance characterization of cross-domain virtual cluster and virtual cluster migration, which is helpful to improve the performance of real hadoop virtual cluster. After that, we verify the performance and efficiency of running MapReduce-based parallel machine learning applications, such as Canopy, Dirichlet, Fuzzy k- Means, k-means, MeanShift, MinHash, on our vhadoop platform. Experimental results show that: (i) The network I/O and NFS disk I/O are two main bottlenecks of vhadoop platform due to the shared resource contention and interference. The poor I/O performance in virtualization system and the heavy network communication operations in hadoop system make the network as the main performance bottleneck. (ii) There is a performance degradation when the data size or cluster scale increases. The cross-domain distribution of hadoop virtual cluster will also affect the communication performance of vhadoop. (iii) The vhadoop can perform the live migration of hadoop virtual cluster successfully. Although the service is unavailable in the period of downtime, the hadoop fault tolerance mechanism will re-run the job or restore from other available backup data. (iv) The vhadoop platform is efficient enough to run the MapReduced-based parallel machine learning algorithms on real data sets. Future work will include integrating the vhadoop platform to open source cloud computing system to provide scalable on-demand computation service for processing dataintensive (or big-data) applications with parallel machine learning algorithms. ACKNOWLEDGMENT This work is supported by National High Technology Research 863 Major Program of China (No. 2011AA01A207), National Natural Science Foundation of China (No ), MOE-Intel Information Technology Foundation (No. MOE-INTEL-11-06). REFERENCES [1] C. Lynch, Big data: How do your data grow? Nature, vol. 455, no. 7209, pp , [2] J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp ,
8 (a) Canopy (b) Dirichlet (c) Fuzzy k-means (d) Kmeans (e) MeanShift (f) MinHash Figure 7. Parallel Visualizing Sample Clustering with Different Hadoop Virtual Cluster Scales. (a) Sample Data (b) Canopy (c) Dirichlet (d) Fuzzy k-means (e) k-means (f) Means Shift Figure 8. The Screenshot of Clustering Results with Different Clustering Algorithms. 159
9 [3] T. White, Hadoop: The definitive guide. Yahoo Press, [4] P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield, Xen and the art of virtualization, in Proceedings of the nineteenth ACM Symposium on Operating Systems Principles, 2003, p [5] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al., A view of cloud computing, Communications of the ACM, vol. 53, no. 4, pp , [6] P. Apparao, R. Iyer, X. Zhang, D. Newell, and T. Adelmeyer, Characterization & analysis of a server consolidation benchmark, in Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments, 2008, pp [7] C.Clark,K.Fraser,S.Hand,J.Hansen,E.Jul,C.Limpach, I. Pratt, and A. Warfield, Live migration of virtual machines, in Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation-Volume 2, 2005, p [8] M. Nelson, B. Lim, and G. Hutchins, Fast transparent migration for virtual machines, in Proceedings of the annual conference on USENIX Annual Technical Conference, 2005, p. 25. [9] K. Kim, K. Jeon, H. Han, S. Kim, H. Jung, and H. Yeom, Mrbench: A benchmark for mapreduce framework, in Parallel and Distributed Systems, ICPADS th IEEE International Conference on. IEEE, 2008, pp [10] D. Huang, D. Ye, Q. He, J. Chen, and K. Ye, Virt-LM: a benchmark for live migration of virtual machine, in Proceeding of the second ACM/SPEC International Conference on Performance Engineering (ICPE), 2011, pp [11] K. Ye, J. Che, Q. He, D. Huang, and X. Jiang, Performance combinative evaluation from single virtual machine to multiple virtual machine systems, International Journal of Numerical Analysis and Modeling, vol. 9, no. 2, pp , [12] L. Cherkasova and R. Gardner, Measuring cpu overhead for i/o processing in the xen virtual machine monitor, in Proceedings of the annual conference on USENIX Annual Technical Conference. USENIX Association, 2005, pp [13] K. Ye, X. Jiang, S. Chen, D. Huang, and B. Wang, Analyzing and modeling the performance in Xen-based virtual cluster environment, in th IEEE International Conference on High Performance Computing and Communications (HPCC), 2010, pp [14] O. Tickoo, R. Iyer, R. Illikkal, and D. Newell, Modeling virtual machine performance: challenges and approaches, ACM SIGMETRICS Performance Evaluation Review, vol. 37, no. 3, pp , [15] S. Kundu, R. Rangaswami, K. Dutta, and M. Zhao, Application performance modeling in a virtualized environment, in High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. Ieee, 2010, pp [16] A. Menon, A. Cox, and W. Zwaenepoel, Optimizing network virtualization in xen, in Proc. USENIX Annual Technical Conference (USENIX 2006), 2006, pp [17] D. Ongaro, A. Cox, and S. Rixner, Scheduling i/o in virtual machine monitors, in Proceedings of the fourth ACM SIG- PLAN/SIGOPS international conference on Virtual execution environments. ACM, 2008, pp [18] K. Ye, X. Jiang, D. Huang, J. Chen, and B. Wang, Live migration of multiple virtual machines with resource reservation in cloud computing environments, in 2011 IEEE International Conference on Cloud Computing (CLOUD), 2011, pp [19] U. Deshpande, X. Wang, and K. Gopalan, Live gang migration of virtual machines, in Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC), 2011, pp [20] S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu, Flock: virtual machine co-migration for the cloud, in Proceedings of the 20th International Symposium on High Performance Distributed Computing (HPDC), 2011, pp [21] W. Voorsluys, J. Broberg, S. Venugopal, and R. Buyya, Cost of virtual machine live migration in clouds: A performance evaluation, in 1st International Conference on Cloud Computing (CloudCom), 2009, pp [22] K. Ye, D. Huang, X. Jiang, H. Chen, and S. Wu, Virtual machine based energy-efficient data center architecture for cloud computing: a performance perspective, in Proceedings of the 2010 IEEE/ACM International Conference on Green Computing and Communications (GreenCom), 2010, pp [23] K. Kambatla, A. Pathak, and H. Pucha, Towards optimizing hadoop provisioning in the cloud, in Proc. of the First Workshop on Hot Topics in Cloud Computing, [24] S. Ibrahim, H. Jin, L. Lu, L. Qi, S. Wu, and X. Shi, Evaluating mapreduce on virtual machines: The hadoop case, Cloud Computing, pp , [25] S. Ibrahim, H. Jin, B. Cheng, H. Cao, S. Wu, and L. Qi, Cloudlet: towards mapreduce implementation on virtual machines, in Proceedings of the 18th ACM international symposium on High performance distributed computing. ACM, 2009, pp [26] S. Ibrahim, H. Jin, L. Lu, B. He, and S. Wu, Adaptive disk i/o scheduling for mapreduce in virtualized environment, in Parallel Processing (ICPP), 2011 International Conference on. IEEE, 2011, pp [27] M. Zaharia, A. Konwinski, A. Joseph, R. Katz, and I. Stoica, Improving mapreduce performance in heterogeneous environments, in Proceedings of the 8th USENIX conference on Operating systems design and implementation. USENIX Association, 2008, pp
Live Virtual Machine Migration with Efficient Working Set Prediction
2011 International Conference on Network and Electronics Engineering IPCSIT vol.11 (2011) (2011) IACSIT Press, Singapore Live Virtual Machine Migration with Efficient Working Set Prediction Ei Phyu Zaw
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationPROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP
ISSN: 0976-2876 (Print) ISSN: 2250-0138 (Online) PROFILING BASED REDUCE MEMORY PROVISIONING FOR IMPROVING THE PERFORMANCE IN HADOOP T. S. NISHA a1 AND K. SATYANARAYAN REDDY b a Department of CSE, Cambridge
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationA Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop
A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,
More informationEvaluate the Performance and Scalability of Image Deployment in Virtual Data Center
Evaluate the Performance and Scalability of Image Deployment in Virtual Data Center Kejiang Ye, Xiaohong Jiang, Qinming He, Xing Li, and Jianhai Chen College of Computer Science, Zhejiang University, Zheda
More informationLEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud
LEEN: Locality/Fairness- Aware Key Partitioning for MapReduce in the Cloud Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He*, Qi Li # Huazhong University of Science and Technology *Nanyang Technological
More informationA priority based dynamic bandwidth scheduling in SDN networks 1
Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems
More informationSurvey on MapReduce Scheduling Algorithms
Survey on MapReduce Scheduling Algorithms Liya Thomas, Mtech Student, Department of CSE, SCTCE,TVM Syama R, Assistant Professor Department of CSE, SCTCE,TVM ABSTRACT MapReduce is a programming model used
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationA Micro Partitioning Technique in MapReduce for Massive Data Analysis
A Micro Partitioning Technique in MapReduce for Massive Data Analysis Nandhini.C, Premadevi.P PG Scholar, Dept. of CSE, Angel College of Engg and Tech, Tiruppur, Tamil Nadu Assistant Professor, Dept. of
More informationImplementation and performance test of cloud platform based on Hadoop
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Implementation and performance test of cloud platform based on Hadoop To cite this article: Jingxian Xu et al 2018 IOP Conf. Ser.:
More informationImproving Throughput in Cloud Storage System
Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because
More informationStorage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster
Storage access optimization with virtual machine migration during execution of parallel data processing on a virtual machine PC cluster Shiori Toyoshima Ochanomizu University 2 1 1, Otsuka, Bunkyo-ku Tokyo
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationDynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c
2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic
More informationResearch and Design of Crypto Card Virtualization Framework Lei SUN, Ze-wu WANG and Rui-chen SUN
2016 International Conference on Wireless Communication and Network Engineering (WCNE 2016) ISBN: 978-1-60595-403-5 Research and Design of Crypto Card Virtualization Framework Lei SUN, Ze-wu WANG and Rui-chen
More informationHuge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2
2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering
More informationParallel data processing with MapReduce
Parallel data processing with MapReduce Tomi Aarnio Helsinki University of Technology tomi.aarnio@hut.fi Abstract MapReduce is a parallel programming model and an associated implementation introduced by
More informationAN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA K-MEANS CLUSTERING ON HADOOP SYSTEM. Mengzhao Yang, Haibin Mei and Dongmei Huang
International Journal of Innovative Computing, Information and Control ICIC International c 2017 ISSN 1349-4198 Volume 13, Number 3, June 2017 pp. 1037 1046 AN EFFECTIVE DETECTION OF SATELLITE IMAGES VIA
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationWHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD
Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Exclusive Summary We live in the data age. It s not easy to measure the total volume of data stored
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 215 Provisioning Rapid Elasticity by Light-Weight Live Resource Migration S. Kirthica
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationImplementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b
International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory
More informationPSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets
2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming PSON: A Parallelized SON Algorithm with MapReduce for Mining Frequent Sets Tao Xiao Chunfeng Yuan Yihua Huang Department
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationA New HadoopBased Network Management System with Policy Approach
Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,
More informationAn Integration and Load Balancing in Data Centers Using Virtualization
An Integration and Load Balancing in Data Centers Using Virtualization USHA BELLAD #1 and JALAJA G *2 # Student M.Tech, CSE, B N M Institute of Technology, Bengaluru, India * Associate Professor, CSE,
More informationFigure 1: Virtualization
Volume 6, Issue 9, September 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Profitable
More informationProcessing Technology of Massive Human Health Data Based on Hadoop
6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,
More informationMitigating Data Skew Using Map Reduce Application
Ms. Archana P.M Mitigating Data Skew Using Map Reduce Application Mr. Malathesh S.H 4 th sem, M.Tech (C.S.E) Associate Professor C.S.E Dept. M.S.E.C, V.T.U Bangalore, India archanaanil062@gmail.com M.S.E.C,
More informationDISTRIBUTED VIRTUAL CLUSTER MANAGEMENT SYSTEM
DISTRIBUTED VIRTUAL CLUSTER MANAGEMENT SYSTEM V.V. Korkhov 1,a, S.S. Kobyshev 1, A.B. Degtyarev 1, A. Cubahiro 2, L. Gaspary 3, X. Wang 4, Z. Wu 4 1 Saint Petersburg State University, 7/9 Universitetskaya
More informationMODELING OF CPU USAGE FOR VIRTUALIZED APPLICATION
e-issn 2455 1392 Volume 2 Issue 4, April 2016 pp. 644-651 Scientific Journal Impact Factor : 3.468 http://www.ijcter.com MODELING OF CPU USAGE FOR VIRTUALIZED APPLICATION Lochan.B 1, Divyashree B A 2 1
More information4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,
More informationDistributed Face Recognition Using Hadoop
Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,
More informationCAVA: Exploring Memory Locality for Big Data Analytics in Virtualized Clusters
2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing : Exploring Memory Locality for Big Data Analytics in Virtualized Clusters Eunji Hwang, Hyungoo Kim, Beomseok Nam and Young-ri
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationTwitter data Analytics using Distributed Computing
Twitter data Analytics using Distributed Computing Uma Narayanan Athrira Unnikrishnan Dr. Varghese Paul Dr. Shelbi Joseph Research Scholar M.tech Student Professor Assistant Professor Dept. of IT, SOE
More informationAn Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform
An Experimental Study of Load Balancing of OpenNebula Open-Source Cloud Computing Platform A B M Moniruzzaman, StudentMember, IEEE Kawser Wazed Nafi Syed Akther Hossain, Member, IEEE & ACM Abstract Cloud
More informationImproved MapReduce k-means Clustering Algorithm with Combiner
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Improved MapReduce k-means Clustering Algorithm with Combiner Prajesh P Anchalia Department Of Computer Science and Engineering
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationBIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More informationAdvanced Peer: Data sharing in network based on Peer to Peer
Advanced Peer: Data sharing in network based on Peer to Peer Syed Ayesha Firdose 1, P.Venu Babu 2 1 M.Tech (CSE), Malineni Lakshmaiah Women's Engineering College,Pulladigunta, Vatticherukur, Prathipadu
More informationA Study on Load Balancing in Cloud Computing * Parveen Kumar,* Er.Mandeep Kaur Guru kashi University, Talwandi Sabo
A Study on Load Balancing in Cloud Computing * Parveen Kumar,* Er.Mandeep Kaur Guru kashi University, Talwandi Sabo Abstract: Load Balancing is a computer networking method to distribute workload across
More informationResearch on Load Balancing in Task Allocation Process in Heterogeneous Hadoop Cluster
2017 2 nd International Conference on Artificial Intelligence and Engineering Applications (AIEA 2017) ISBN: 978-1-60595-485-1 Research on Load Balancing in Task Allocation Process in Heterogeneous Hadoop
More informationAnalyzing and Improving Load Balancing Algorithm of MooseFS
, pp. 169-176 http://dx.doi.org/10.14257/ijgdc.2014.7.4.16 Analyzing and Improving Load Balancing Algorithm of MooseFS Zhang Baojun 1, Pan Ruifang 1 and Ye Fujun 2 1. New Media Institute, Zhejiang University
More informationGlobal Journal of Engineering Science and Research Management
A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV
More informationResearch Article Apriori Association Rule Algorithms using VMware Environment
Research Journal of Applied Sciences, Engineering and Technology 8(2): 16-166, 214 DOI:1.1926/rjaset.8.955 ISSN: 24-7459; e-issn: 24-7467 214 Maxwell Scientific Publication Corp. Submitted: January 2,
More informationMap Reduce Group Meeting
Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model
More informationA New Model of Search Engine based on Cloud Computing
A New Model of Search Engine based on Cloud Computing DING Jian-li 1,2, YANG Bo 1 1. College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China 2. Tianjin Key
More informationIN organizations, most of their computers are
Provisioning Hadoop Virtual Cluster in Opportunistic Cluster Arindam Choudhury, Elisa Heymann, Miquel Angel Senar 1 Abstract Traditional opportunistic cluster is designed for running compute-intensive
More informationFusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic
WHITE PAPER Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Executive
More informationResearch Article Mobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:
More informationService Oriented Performance Analysis
Service Oriented Performance Analysis Da Qi Ren and Masood Mortazavi US R&D Center Santa Clara, CA, USA www.huawei.com Performance Model for Service in Data Center and Cloud 1. Service Oriented (end to
More informationPerformance Benefits of DataMPI: A Case Study with BigDataBench
Benefits of DataMPI: A Case Study with BigDataBench Fan Liang 1,2 Chen Feng 1,2 Xiaoyi Lu 3 Zhiwei Xu 1 1 Institute of Computing Technology, Chinese Academy of Sciences 2 University of Chinese Academy
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)
More informationAvailable online at ScienceDirect. Procedia Computer Science 89 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 (2016 ) 203 208 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) Tolhit A Scheduling
More informationData Analysis Using MapReduce in Hadoop Environment
Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti
More informationChisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique
Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationOptimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework
Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework Li-Yung Ho Institute of Information Science Academia Sinica, Department of Computer Science and Information Engineering
More informationScalability and performance of a virtualized SAP system
Association for Information Systems AIS Electronic Library (AISeL) AMCIS 2010 Proceedings Americas Conference on Information Systems (AMCIS) 8-2010 Scalability and performance of a virtualized SAP system
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationAn Efficient Virtual CPU Scheduling Algorithm for Xen Hypervisor in Virtualized Environment
An Efficient Virtual CPU Scheduling Algorithm for Xen Hypervisor in Virtualized Environment Chia-Ying Tseng 1 and Po-Chun Huang 2 Department of Computer Science and Engineering, Tatung University #40,
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based
More informationIndexing Strategies of MapReduce for Information Retrieval in Big Data
International Journal of Advances in Computer Science and Technology (IJACST), Vol.5, No.3, Pages : 01-06 (2016) Indexing Strategies of MapReduce for Information Retrieval in Big Data Mazen Farid, Rohaya
More informationData Analytics on RAMCloud
Data Analytics on RAMCloud Jonathan Ellithorpe jdellit@stanford.edu Abstract MapReduce [1] has already become the canonical method for doing large scale data processing. However, for many algorithms including
More informationREMEM: REmote MEMory as Checkpointing Storage
REMEM: REmote MEMory as Checkpointing Storage Hui Jin Illinois Institute of Technology Xian-He Sun Illinois Institute of Technology Yong Chen Oak Ridge National Laboratory Tao Ke Illinois Institute of
More informationTHE SURVEY ON MAPREDUCE
THE SURVEY ON MAPREDUCE V.VIJAYALAKSHMI Assistant professor, Department of Computer Science and Engineering, Christ College of Engineering and Technology, Puducherry, India, E-mail: vivenan09@gmail.com.
More informationSMCCSE: PaaS Platform for processing large amounts of social media
KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and
More informationImplementation of Aggregation of Map and Reduce Function for Performance Improvisation
2016 IJSRSET Volume 2 Issue 5 Print ISSN: 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Implementation of Aggregation of Map and Reduce Function for Performance Improvisation
More informationJava. Measurement of Virtualization Overhead in a Java Application Server. Kazuaki Takahashi 1 and Hitoshi Oi 1. J2EE SPECjAppServer2004
Vol.1-EVA-3 No. 1//3 1. Java 1 1 JEE SPECjAppServer CPU SPECjAppServer 3 CPU Measurement of Virtualization Overhead in a Java Application Server Kazuaki Takahashi 1 and Hitoshi Oi 1 In this technical report,
More informationDecision analysis of the weather log by Hadoop
Advances in Engineering Research (AER), volume 116 International Conference on Communication and Electronic Information Engineering (CEIE 2016) Decision analysis of the weather log by Hadoop Hao Wu Department
More informationGEOSS Clearinghouse onto Amazon EC2/Azure
GEOSS Clearinghouse onto Amazon EC2/Azure Qunying Huang1, Chaowei Yang1, Doug Nebert 2 Kai Liu1, Zhipeng Gui1, Yan Xu3 1Joint Center of Intelligent Computing George Mason University 2Federal Geographic
More informationHadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R
Hadoop Virtualization Extensions on VMware vsphere 5 T E C H N I C A L W H I T E P A P E R Table of Contents Introduction... 3 Topology Awareness in Hadoop... 3 Virtual Hadoop... 4 HVE Solution... 5 Architecture...
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationParallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem
I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **
More informationResearch on Availability of Virtual Machine Hot Standby based on Double Shadow Page Tables
International Conference on Computer, Networks and Communication Engineering (ICCNCE 2013) Research on Availability of Virtual Machine Hot Standby based on Double Shadow Page Tables Zhiyun Zheng, Huiling
More informationSurvey on Incremental MapReduce for Data Mining
Survey on Incremental MapReduce for Data Mining Trupti M. Shinde 1, Prof.S.V.Chobe 2 1 Research Scholar, Computer Engineering Dept., Dr. D. Y. Patil Institute of Engineering &Technology, 2 Associate Professor,
More informationModeling and evaluation on Ad hoc query processing with Adaptive Index in Map Reduce Environment
DEIM Forum 213 F2-1 Adaptive indexing 153 855 4-6-1 E-mail: {okudera,yokoyama,miyuki,kitsure}@tkl.iis.u-tokyo.ac.jp MapReduce MapReduce MapReduce Modeling and evaluation on Ad hoc query processing with
More informationOptimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures*
Optimizing the use of the Hard Disk in MapReduce Frameworks for Multi-core Architectures* Tharso Ferreira 1, Antonio Espinosa 1, Juan Carlos Moure 2 and Porfidio Hernández 2 Computer Architecture and Operating
More information2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing
2012 Seventh International Conference on P2P, Parallel, Grid, Cloud and Internet Computing Present the Challenges in Eucalyptus Cloud Infrastructure for Implementing Virtual Machine Migration Technique
More informationThe Performance Analysis of a Service Deployment System Based on the Centralized Storage
The Performance Analysis of a Service Deployment System Based on the Centralized Storage Zhu Xu Dong School of Computer Science and Information Engineering Zhejiang Gongshang University 310018 Hangzhou,
More informationAn Enhanced Approach for Resource Management Optimization in Hadoop
An Enhanced Approach for Resource Management Optimization in Hadoop R. Sandeep Raj 1, G. Prabhakar Raju 2 1 MTech Student, Department of CSE, Anurag Group of Institutions, India 2 Associate Professor,
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationSurvey Paper on Traditional Hadoop and Pipelined Map Reduce
International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationJuxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms
, pp.289-295 http://dx.doi.org/10.14257/astl.2017.147.40 Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms Dr. E. Laxmi Lydia 1 Associate Professor, Department
More informationEXTRACT DATA IN LARGE DATABASE WITH HADOOP
International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0
More informationReview on Managing RDF Graph Using MapReduce
Review on Managing RDF Graph Using MapReduce 1 Hetal K. Makavana, 2 Prof. Ashutosh A. Abhangi 1 M.E. Computer Engineering, 2 Assistant Professor Noble Group of Institutions Junagadh, India Abstract solution
More informationMI-PDB, MIE-PDB: Advanced Database Systems
MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:
More informationAeromancer: A Workflow Manager for Large- Scale MapReduce-Based Scientific Workflows
Aeromancer: A Workflow Manager for Large- Scale MapReduce-Based Scientific Workflows Presented by Sarunya Pumma Supervisors: Dr. Wu-chun Feng, Dr. Mark Gardner, and Dr. Hao Wang synergy.cs.vt.edu Outline
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationThe Load Balancing Research of SDN based on Ant Colony Algorithm with Job Classification Wucai Lin1,a, Lichen Zhang2,b
2nd Workshop on Advanced Research and Technology in Industry Applications (WARTIA 2016) The Load Balancing Research of SDN based on Ant Colony Algorithm with Job Classification Wucai Lin1,a, Lichen Zhang2,b
More information