Autonomic Data Replication in Cloud Environment
|
|
- Bruno Ford
- 5 years ago
- Views:
Transcription
1 International Journal of Electronics and Computer Science Engineering 459 Available Online at ISSN Autonomic Data Replication in Cloud Environment Dhananjaya Gupt, Mrs.Anju Bala Computer Science and Engineering, Thapar University, Patiala, India Abstract-- Cloud computing is an emerging practice that offers more flexibility in infrastructure and reduces cost than our traditional computing models. Cloud providers offer everything from access to raw storage capacity resources to complete application services. The services that are provided by the cloud can be accessed from anywhere and data flows from one place to another. Since data is moving via network, there are chances of data loss. So we need to keep multiple copies of data and thus data replication is one of the main issues in cloud computing. In this paper we have implemented automatic replication of data from local host to cloud environment. Data replication is implemented by using HADOOP which stores the data at various nodes. If one node goes down then data can be retrieved from other node seamlessly. Keywords - Cloud Computing, Fault tolerance, Data Replication. I.INTRODUCTION Cloud computing is an emerging practice that offers more flexibility in infrastructure and reduces cost than our traditional computing models. Cloud computing software frameworks manage cloud resources and provide scalable and fault tolerant computing utilities with globally uniform and hardware-transparent user interfaces [1]. The cloud provider takes the responsibility of managing the infrastructural issues. These days, Cloud providers offer everything from access to raw storage capacity resources to complete application services in many areas such as payroll and customer relationship management etc. Data flows through the network from one location to another while using the services provided by the cloud. Thus it becomes critical task to secure and maintain copies of data as it flows through the network. There are fault tolerance techniques available that replicates data at different location to tolerate data losses and ensures continued service. Replication is a key mechanism to achieve scalability, availability and fault-tolerance. It can be used to create and maintain copies of data at different sites [13]. When events affecting a primary location where the data resides occur, data can be recovered from the secondary location to provide continued service, fault tolerance, higher availability. Though it s a performance overhead as it takes time to recover data from other sites and restart the service again but fault can be tolerated and availability can be increased. The aim of our research wok is to implement the data replication from local machine to cloud environment. In this paper we implement data replication from local machine to cloud environment. Hadoop has been used to replicate data on different site. Hadoop which is an Apache project; all components are available via the Apache open source license. Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets [2].
2 IJECSE,Volume2, Number 2 Dhananjaya Gupt and Mrs. Anju Bala 460 Figure-1: HDFS Architecture [4]. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes [4]. The rest of the paper is organized as follows: section II describes some work related to our research and challenges in replicated environment. Section III shows the Hybrid Virtualized Architecture. Section IV includes the implementation of this Architecture in which data is replicated on two different sites using Hadoop s HDFS and the experimental results. Section V concludes the paper. II.RELATED WORK As mentioned in the introduction data flows in the network then it becomes critical to secure and maintain data at multiple sites so that if there is any data loss then it could be recovered without much overhead. Persistent data stored in distributed file systems ranges in size from small to large, is likely read multiple times, and is typically long-lived. In comparison, intermediate data generated in cloud programming paradigms has uniquely contrasting characteristics [6]. There are many fault tolerance techniques available that deals with virtual machine (VM) migration, process migration, application migration to overcome the impact of fault [9][10]. Time series based precopy technique migrate VM from one source host to target host [11]. Data in form of pages is transferred in this approach. Proactive Live Process migration mechanism [12] migrate process before the fault occurs. To tolerate fault, migration would take place which involves large amount of data transfer. This is a performance hit as time would be consumed in data transfer and during that period system would be unavailable. Performance and Availability can be increased if data is placed at more than one site. Replication is one of the most widely studied phenomena in a distributed environment [13]. Replication is a strategy in which multiple copies of some data are stored at multiple sites. When required, data is fetched from the nearest available replica to avoid delay and increase performance. Availability needs to be high in cloud computing paradigm which makes replication of data in cloud environment, a challenging task. Difficulty in providing efficient and correct wide area database replication is that it requires integrating different techniques from several fields including distributed system, databases, network protocols and operating sytem [16]. Data replication schemes over storage providers with a KVS (key-value store) interface are inherently more difficult to realize than replication schemes over providers with richer interfaces [15]. Following are the few challenges in replicated environment: Data Consistency: Maintaining data integrity and consistency in a replicated environment is of prime importance. High precision applications may require strict consistency (e.g. 1SR) of the updates made by transactions [14]. Downtime during new replica creation: If strict data consistency is to be maintained, performance is severely affected if a new replica is to be created. As sites will not be able to fulfill requests due to consistency requirements.
3 Autonomic Data Replication in Cloud Environment 461 Maintenance overhead: If the files are replicated at more than one site, it occupies storage space and it has to be administered. Thus, there are overheads in storing multiple files. Lower write performance: Performance of write operations can be dramatically lower in applications requiring high updates in replicated environment, because the transaction may need to update multiple copies [14]. III.HYBRID VIRTUALIZED ARCHITECTURE Figure-2: Hybrid Virtualized Architecture Figure 2 shows Hybrid Virtualized Architecture which includes Virtual Machine Workstation (VMware Fusion) to provide virtualization, Hadoop framework to provide HDFS functionality. The Eclipse is used as Java Integrated Development Environment to write application code. The virtual environment helped us to analyze the cloud environment for different types of application on a single machine. This is a mater-slave architecture where the master node provides the functionality to the slave node by providing fault tolerance as if one node is failed the data can be retrieved from the other slave nodes. The cluster is setup between both the machines. Virtualized Hybrid Architecture consists of hosting server installing VMware and two hosted VMs (master and slave) on which an Ubuntu OS. Following are the components of the Hybrid Architecture: Local Machine: An Application is executed on the local machine running on windows 7 32-bit platform. This application is developed using java Eclipse API. The data generated by this application is sent to HDFS which stores it on multiple locations. While performing experiments, we kept replication factor to be 2. Application access the file system using the HDFS client, that exports the HDFS file system interface. Master VM: It runs over Ubuntu platform. Hadoop is set up on this VM. NameNode and the DataNode runs on this VM. Data is received by the NameNode and replicated to the DataNodes depending on the replication factor. To increase reliability and availability the replication factor can be increased. NameNode keeps track of which DataNode is live so that when one DataNode is down data can be fetched from the other one. Slave VM: It runs over the same Ubuntu platform. Same Hadoop is set up on this VM also and it runs the second Data node. This node receives data from the master VM. Whenever Data node on the master VM fails then Name node automatically fetches data from this data node. IV.IMPLEMENTATION AND EXPERIMETAL RESULTS We have implemented an application in java using Eclipse API. Data required by the application is retrieved from HDFS. Operations are performed on data by retrieving it on the local machine. When transactions are done, data is sent to the HDFS. We have implemented Asynchronous mechanism to replicate data in virtualized cloud environment. In this Architecture, Hadoop is set up on both the virtual machines. On first machine we create NameNode and DataNode. On second machine we create only Data node. Hadoop is configured is have one NameNode and two DataNodes. Data generated by the application is pumped into HDFS where it is replicated on two DataNodes. When data is required it is retrieved from the HDFS. At that moment if either of the DataNodes
4 IJECSE,Volume2, Number 2 Dhananjaya Gupt and Mrs. Anju Bala 462 fails, data is automatically recovered from the other one. The experimental platforms and software packages used in this system are as follows: Table-1: Platform Configuration TYPE Processor CPU Speed Memory Platform SPECIFICATION Intel i5 2.5GHz 4.00 GB 32-bit OS Operating System Ubuntu Table-2: Software Package versions SOFTWARE PACKAGES VERSIONS JDK 1.6 Eclipse Hadoop VMware Fusion Hadoop provides a web interface for statistics. Using this interface we can have status of all the nodes. Based on some failure cases, we are able to determine how our data is fetched from the other nodes when the requested node goes down. Case 1: Figure-3 shows the overall status monitoring of the system. Here we can browse through the system to determine which Data nodes are live. In this case, both the Data nodes are live and data can be retrieved from any of these nodes depending on the HDFS. The number of live and dead nodes is determined through this interface. Case 2: Figure-4 shows the detail of all the data nodes upon which the data is present. Data is automatically retrieved from the nearest node available. Case 3: Figure-5 shows the access of files stored in HDFS via any of the Data nodes. Data is replicated on two different nodes. If one node goes down then other node retrieves data and increases availability and fault tolerance.
5 Autonomic Data Replication in Cloud Environment 463 Figure-3 web interface of the Name Node Figure-4: accessing live Data Node containing data
6 IJECSE,Volume2, Number 2 Dhananjaya Gupt and Mrs. Anju Bala 464 Figure-5: accessing data present at the Data Node. V.CONCLUSION AND FUTURE SCOPE In this paper, we have proposed cloud virtualized system architecture based on Hadoop. We have presented highly reliable system that provides data replication in a cloud virtualized environment. Data is replicated on multiple VMs. An application is developed and executed. Experimental results are obtained, that validate the system fault tolerance and replication at multiple nodes. When one node fails then data is recovered via other node. Some future extensions are possible as performance can be improved by replicating data in the real time with higher replication factor to ensure much higher availability and fault tolerance. This data replication mechanism can be combined with other Fault Tolerance techniques to achieve more reliability and Fault Tolerance. REFERENCES [1] Application Architecture for Cloud Computing, white paper, [2] Apache Hadoop. [3] Golam Moktader Nayeem, Mohammad Jahangir Alam, Analysis of Different Software Fault Tolerance Techniques, [4] HDFS (hadoop distributed file system) architecture, design.html, [5] Alain Tchana, Laurent Broto, Daniel Hagimont, Fault Tolerant Approaches in Cloud Computing Infrastructures, The Eighth International Conference on Autonomic and Autonomous System, ICAS [6] Steven Y. Ko, Imranul Hoque, Brian Cho and Indranil Gupta, On Availability of Intermediate Data in Cloud Computations, [7] Geoffroy Vallee, Kulathep Charoenpornwattana, Christian Engelmann, Anand Tikotekar, Stephen L. Scott, A Framework for Proactive Fault Tolerance. [8] Julia Myint, Thinn Thu Naing, Management of Data Replication for PC Cluster-based Cloud Storage System, International Journal on Cloud Computing: Services and Architecture (IJCCSA), Vol.1, No.3, 31-41, November [9] Chao Wang1, Frank Mueller, Christian Engelmann, Stephen L. Scott, Proactive Process-Level Live Migration in HPC Environments, [10] Gang Chen, Hai Jin, Deqing Zou, Bing Bing Zhou, Weizhong Qiang, Gang Hu, SHelp: Automatic Self- healing for Multiple Application Instances in a Virtual Machine Environment, IEEE International Conference on Cluster Computing, [11] Bolin Hu, Zhou Lei, Yu Lei, Dong Xu, Jiandun Li, A Time-Series Based Precopy Approach for Live Migration of Virtual Machines, IEEE 17th International Conference on Parallel and Distributed Systems, [12] Chao Wang, Frank Mueller, Christian Engelmann, Proactive process level live migration and back migration in HPC environments, [13] Sushant Goel, Rajkumar Buyya, data replication strategies in wide area distributed systems. [14] Yu, H., and Vahdat, A. Consistent and automatic replica regeneration. Trans. Storage 1, 1 (2005), [15] Christian Cachin, Birgit Junker, Alessandro Sorniotti, On Limitations of Using Cloud Storage for Data Replication. [16] Yair Amir, Claudiu Danilov, Michal Miskin-Amir, Jonathan Stanton, Ciprian Tutu, Practical Wide-Area Database Replication,CNDS Johns Hopkins University,
Distributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationHadoop and HDFS Overview. Madhu Ankam
Hadoop and HDFS Overview Madhu Ankam Why Hadoop We are gathering more data than ever Examples of data : Server logs Web logs Financial transactions Analytics Emails and text messages Social media like
More informationCS60021: Scalable Data Mining. Sourangshu Bhattacharya
CS60021: Scalable Data Mining Sourangshu Bhattacharya In this Lecture: Outline: HDFS Motivation HDFS User commands HDFS System architecture HDFS Implementation details Sourangshu Bhattacharya Computer
More informationCLIENT DATA NODE NAME NODE
Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficiency
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationAn Optimized Time Series based Two phase strategy Pre-Copy Algorithm for Live Virtual Machine Migration
An Optimized Time Series based Two phase strategy Pre-Copy Algorithm for Live Virtual Machine Migration Ruchi Tailwal P.G. Scholar, Graphic Era Hill University Dehradun, India Avita Katal Assistant Professor,
More informationProcessing Technology of Massive Human Health Data Based on Hadoop
6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,
More informationCloud Computing. Hwajung Lee. Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University
Cloud Computing Hwajung Lee Key Reference: Prof. Jong-Moon Chung s Lecture Notes at Yonsei University Cloud Computing Cloud Introduction Cloud Service Model Big Data Hadoop MapReduce HDFS (Hadoop Distributed
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationDynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c
2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationMI-PDB, MIE-PDB: Advanced Database Systems
MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationHDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017
HDFS Architecture Gregory Kesden, CSE-291 (Storage Systems) Fall 2017 Based Upon: http://hadoop.apache.org/docs/r3.0.0-alpha1/hadoopproject-dist/hadoop-hdfs/hdfsdesign.html Assumptions At scale, hardware
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationService and Cloud Computing Lecture 10: DFS2 Prof. George Baciu PQ838
COMP4442 Service and Cloud Computing Lecture 10: DFS2 www.comp.polyu.edu.hk/~csgeorge/comp4442 Prof. George Baciu PQ838 csgeorge@comp.polyu.edu.hk 1 Preamble 2 Recall the Cloud Stack Model A B Application
More informationMixing and matching virtual and physical HPC clusters. Paolo Anedda
Mixing and matching virtual and physical HPC clusters Paolo Anedda paolo.anedda@crs4.it HPC 2010 - Cetraro 22/06/2010 1 Outline Introduction Scalability Issues System architecture Conclusions & Future
More informationEnhanced Live Migration of Virtual Machine Using Comparison of Modified and Unmodified Pages
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 2, February 2014,
More informationMounica B, Aditya Srivastava, Md. Faisal Alam
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2017 IJSRCSEIT Volume 2 Issue 3 ISSN : 2456-3307 Clustering of large datasets using Hadoop Ecosystem
More informationHuge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2
2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering
More informationA Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop
A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationEnhanced Hadoop with Search and MapReduce Concurrency Optimization
Volume 114 No. 12 2017, 323-331 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Enhanced Hadoop with Search and MapReduce Concurrency Optimization
More informationNamenode HA. Sanjay Radia - Hortonworks
Namenode HA Sanjay Radia - Hortonworks Sanjay Radia - Background Working on Hadoop for the last 4 years Part of the original team at Yahoo Primarily worked on HDFS, MR Capacity scheduler wire protocols,
More informationHDFS Architecture Guide
by Dhruba Borthakur Table of contents 1 Introduction...3 2 Assumptions and Goals...3 2.1 Hardware Failure... 3 2.2 Streaming Data Access...3 2.3 Large Data Sets...3 2.4 Simple Coherency Model... 4 2.5
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationSMCCSE: PaaS Platform for processing large amounts of social media
KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and
More informationFault Tolerance in Cloud Computing: A Review
Fault Tolerance in Cloud Computing: A Review 1 Gagan Amrit Kaur, 2 Sonia Sharma 1,2 Dept. of Computer Engineering & Technology, G.N.D.U, Amritsar, Punjab, India Abstract Cloud computing may be defined
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationRandy Pagels Sr. Developer Technology Specialist DX US Team AZURE PRIMED
Randy Pagels Sr. Developer Technology Specialist DX US Team rpagels@microsoft.com AZURE PRIMED 2016.04.11 Interactive Data Analytics Discover the root cause of any app performance behavior almost instantaneously
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationDistributed Face Recognition Using Hadoop
Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,
More informationEXTRACT DATA IN LARGE DATABASE WITH HADOOP
International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationThe Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c
Applied Mechanics and Materials Online: 2013-09-27 ISSN: 1662-7482, Vols. 423-426, pp 2733-2736 doi:10.4028/www.scientific.net/amm.423-426.2733 2013 Trans Tech Publications, Switzerland The Design of Distributed
More informationDistributed Systems 16. Distributed File Systems II
Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS
More informationThe Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian ZHENG 1, Mingjiang LI 1, Jinpeng YUAN 1
International Conference on Intelligent Systems Research and Mechatronics Engineering (ISRME 2015) The Analysis Research of Hierarchical Storage System Based on Hadoop Framework Yan LIU 1, a, Tianjian
More informationHADOOP 3.0 is here! Dr. Sandeep Deshmukh Sadepach Labs Pvt. Ltd. - Let us grow together!
HADOOP 3.0 is here! Dr. Sandeep Deshmukh sandeep@sadepach.com Sadepach Labs Pvt. Ltd. - Let us grow together! About me BE from VNIT Nagpur, MTech+PhD from IIT Bombay Worked with Persistent Systems - Life
More informationCPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationSurvey Paper on Traditional Hadoop and Pipelined Map Reduce
International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,
More informationPC-CLUSTER BASED STORAGE SYSTEM ARCHITECTURE FOR CLOUD STORAGE
PC-CLUSTER BASE STORAGE SYSTEM ARCHITECTURE FOR CLOU STORAGE Tin Tin Yee 1 and Thinn Thu Naing 2 1 University of Computer Studies, Yangon, Myanmar tintinyee.tty@gmail.com 2 University of Computer Studies,
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationGuoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.
Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14 Page 1 Introduction & Notations Multi-Job optimization Evaluation Conclusion
More informationBig Data for Engineers Spring Resource Management
Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationBest Practices for Deploying Hadoop Workloads on HCI Powered by vsan
Best Practices for Deploying Hadoop Workloads on HCI Powered by vsan Chen Wei, ware, Inc. Paudie ORiordan, ware, Inc. #vmworld HCI2038BU #HCI2038BU Disclaimer This presentation may contain product features
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationQADR with Energy Consumption for DIA in Cloud
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 4, April 2014,
More informationA New HadoopBased Network Management System with Policy Approach
Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,
More informationA brief history on Hadoop
Hadoop Basics A brief history on Hadoop 2003 - Google launches project Nutch to handle billions of searches and indexing millions of web pages. Oct 2003 - Google releases papers with GFS (Google File System)
More informationTOOLS FOR INTEGRATING BIG DATA IN CLOUD COMPUTING: A STATE OF ART SURVEY
Journal of Analysis and Computation (JAC) (An International Peer Reviewed Journal), www.ijaconline.com, ISSN 0973-2861 International Conference on Emerging Trends in IOT & Machine Learning, 2018 TOOLS
More informationIndexing Strategies of MapReduce for Information Retrieval in Big Data
International Journal of Advances in Computer Science and Technology (IJACST), Vol.5, No.3, Pages : 01-06 (2016) Indexing Strategies of MapReduce for Information Retrieval in Big Data Mazen Farid, Rohaya
More informationKonstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia, Chansler}@Yahoo-Inc.com Presenter: Alex Hu } Introduction } Architecture } File
More informationCATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING
CATEGORIZATION OF THE DOCUMENTS BY USING MACHINE LEARNING Amol Jagtap ME Computer Engineering, AISSMS COE Pune, India Email: 1 amol.jagtap55@gmail.com Abstract Machine learning is a scientific discipline
More informationFramework for Preventing Deadlock : A Resource Co-allocation Issue in Grid Environment
Framework for Preventing Deadlock : A Resource Co-allocation Issue in Grid Environment Dr. Deepti Malhotra Department of Computer Science and Information Technology Central University of Jammu, Jammu,
More informationResearch and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b, Lei Xinhua1, c, Li Xiaoming3, d
4th International Conference on Machinery, Materials and Computing Technology (ICMMCT 2016) Research and Realization of AP Clustering Algorithm Based on Cloud Computing Yue Qiang1, a *, Hu Zhongyu2, b,
More informationResearch on Mass Image Storage Platform Based on Cloud Computing
6th International Conference on Sensor Network and Computer Engineering (ICSNCE 2016) Research on Mass Image Storage Platform Based on Cloud Computing Xiaoqing Zhou1, a *, Jiaxiu Sun2, b and Zhiyong Zhou1,
More informationTP1-2: Analyzing Hadoop Logs
TP1-2: Analyzing Hadoop Logs Shadi Ibrahim January 26th, 2017 MapReduce has emerged as a leading programming model for data-intensive computing. It was originally proposed by Google to simplify development
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More informationFacilitating Consistency Check between Specification & Implementation with MapReduce Framework
Facilitating Consistency Check between Specification & Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, Keijiro ARAKI Kyushu University, Japan 2 Our expectation Light-weight formal
More informationMPLEMENTATION OF DIGITAL LIBRARY USING HDFS AND SOLR
MPLEMENTATION OF DIGITAL LIBRARY USING HDFS AND SOLR H. K. Khanuja, Amruta Mujumdar, Manashree Waghmare, Mrudula Kulkarni, Mrunal Bajaj Department of Computer Engineering Marathwada Mitra Mandal's College
More informationAugust Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund
August 15 2016 Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC Supported by the National Natural Science Fund The Current Computing System What is Hadoop? Why Hadoop? The New Computing System with Hadoop
More informationBig Data 7. Resource Management
Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage
More informationResearch on Availability of Virtual Machine Hot Standby based on Double Shadow Page Tables
International Conference on Computer, Networks and Communication Engineering (ICCNCE 2013) Research on Availability of Virtual Machine Hot Standby based on Double Shadow Page Tables Zhiyun Zheng, Huiling
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationTwo-Level Cooperation in Autonomic Cloud Resource Management
Two-Level Cooperation in Autonomic Cloud Resource Management Giang Son Tran a, Alain Tchana b, Laurent Broto a, Daniel Hagimont a a ENSEEIHT University of Toulouse, Toulouse, France Email: {giang.tran,
More informationChapter 11: Implementing File Systems
Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation
More informationA Novel Architecture to Efficient utilization of Hadoop Distributed File Systems for Small Files
A Novel Architecture to Efficient utilization of Hadoop Distributed File Systems for Small Files Vaishali 1, Prem Sagar Sharma 2 1 M. Tech Scholar, Dept. of CSE., BSAITM Faridabad, (HR), India 2 Assistant
More informationSoftware-defined Storage: Fast, Safe and Efficient
Software-defined Storage: Fast, Safe and Efficient TRY NOW Thanks to Blockchain and Intel Intelligent Storage Acceleration Library Every piece of data is required to be stored somewhere. We all know about
More informationCS3600 SYSTEMS AND NETWORKS
CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection
More informationHDFS Federation. Sanjay Radia Founder and Hortonworks. Page 1
HDFS Federation Sanjay Radia Founder and Architect @ Hortonworks Page 1 About Me Apache Hadoop Committer and Member of Hadoop PMC Architect of core-hadoop @ Yahoo - Focusing on HDFS, MapReduce scheduler,
More informationImproving Hadoop MapReduce Performance on Supercomputers with JVM Reuse
Thanh-Chung Dao 1 Improving Hadoop MapReduce Performance on Supercomputers with JVM Reuse Thanh-Chung Dao and Shigeru Chiba The University of Tokyo Thanh-Chung Dao 2 Supercomputers Expensive clusters Multi-core
More informationData Storage Infrastructure at Facebook
Data Storage Infrastructure at Facebook Spring 2018 Cleveland State University CIS 601 Presentation Yi Dong Instructor: Dr. Chung Outline Strategy of data storage, processing, and log collection Data flow
More informationChapter 12: File System Implementation
Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 25 RAIDs, HDFS/Hadoop Slides based on Text by Silberschatz, Galvin, Gagne (not) Various sources 1 1 FAQ Striping:
More informationBigData and Map Reduce VITMAC03
BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to
More informationMapReduce. U of Toronto, 2014
MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in
More informationChapter 10: File System Implementation
Chapter 10: File System Implementation Chapter 10: File System Implementation File-System Structure" File-System Implementation " Directory Implementation" Allocation Methods" Free-Space Management " Efficiency
More informationIN organizations, most of their computers are
Provisioning Hadoop Virtual Cluster in Opportunistic Cluster Arindam Choudhury, Elisa Heymann, Miquel Angel Senar 1 Abstract Traditional opportunistic cluster is designed for running compute-intensive
More informationImproving CPU Performance of Xen Hypervisor in Virtualized Environment
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 5 Issue 3; May-June 2018; Page No. 14-19 Improving CPU Performance of
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationCloud Computing CS
Cloud Computing CS 15-319 Distributed File Systems and Cloud Storage Part II Lecture 13, Feb 27, 2012 Majd F. Sakr, Mohammad Hammoud and Suhail Rehman 1 Today Last session Distributed File Systems and
More informationDistributed Systems CS6421
Distributed Systems CS6421 Intro to Distributed Systems and the Cloud Prof. Tim Wood v I teach: Software Engineering, Operating Systems, Sr. Design I like: distributed systems, networks, building cool
More informationOPERATING SYSTEM. Chapter 12: File System Implementation
OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management
More informationData Analysis Using MapReduce in Hadoop Environment
Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti
More informationDatabase Applications (15-415)
Database Applications (15-415) Hadoop Lecture 24, April 23, 2014 Mohammad Hammoud Today Last Session: NoSQL databases Today s Session: Hadoop = HDFS + MapReduce Announcements: Final Exam is on Sunday April
More informationUbiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses
Ubiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses Pengfei Tang Computer Science Dept. Worcester Polytechnic Institute (WPI) Introduction:
More informationAN APPROACH FOR FAULT TOLERANCE IN CLOUD COMPUTING USING MACHINE LEARNING TECHNIQUE
Volume 117 No. 22 2017, 345-351 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu AN APPROACH FOR FAULT TOLERANCE IN CLOUD COMPUTING USING MACHINE LEARNING
More informationHadoop File Management System
Volume-6, Issue-5, September-October 2016 International Journal of Engineering and Management Research Page Number: 281-286 Hadoop File Management System Swaraj Pritam Padhy 1, Sashi Bhusan Maharana 2
More informationGoogle File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information
Subject 10 Fall 2015 Google File System and BigTable and tiny bits of HDFS (Hadoop File System) and Chubby Not in textbook; additional information Disclaimer: These abbreviated notes DO NOT substitute
More informationCloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ]
s@lm@n Cloudera Exam CCA-410 Cloudera Certified Administrator for Apache Hadoop (CCAH) Version: 7.5 [ Total Questions: 97 ] Question No : 1 Which two updates occur when a client application opens a stream
More informationOPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD.
OPERATING SYSTEMS II DPL. ING. CIPRIAN PUNGILĂ, PHD. File System Implementation FILES. DIRECTORIES (FOLDERS). FILE SYSTEM PROTECTION. B I B L I O G R A P H Y 1. S I L B E R S C H AT Z, G A L V I N, A N
More informationA SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING
Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s
More informationVirtualization of the MS Exchange Server Environment
MS Exchange Server Acceleration Maximizing Users in a Virtualized Environment with Flash-Powered Consolidation Allon Cohen, PhD OCZ Technology Group Introduction Microsoft (MS) Exchange Server is one of
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationIntroduction to Cloud Computing
Introduction to Cloud Computing Distributed File Systems 15 319, spring 2010 12 th Lecture, Feb 18 th Majd F. Sakr Lecture Motivation Quick Refresher on Files and File Systems Understand the importance
More information