A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop
|
|
- Virginia Hill
- 5 years ago
- Views:
Transcription
1 A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering, Konkuk University, 1 Hwayang-dong, Gwangjin-gu, Seoul , Republic of Korea {touhg105, shhan87, hlee}@konkuk.ac.kr 2 Center for Social Media Cloud Computing, Konkuk University, 1 Hwayang-dong, Gwangjin-gu, Seoul , Republic of Korea hlee@konkuk.ac.kr, okwow2@gmail.com 3 Digital Media Research Center, Korea Electronics Technology Institute, Electronics Center #1599, Sangam-dong, Seoul , Republic of Korea mozzalt@keti.re.kr Abstract. Delivering scalable rich multimedia applications and services on the Internet requires sophisticated technologies for transcoding, distributing, and streaming content. Although cloud computing provides an infrastructure for such technologies, the specific challenges of task management, load balancing, and fault tolerance remain. To address these issues, we propose a cloud-based distributed multimedia streaming service, or CloudDMSS. The system is designed to run on all major cloud computing services, and is highly adapted to the structure and policies of Hadoop, which give it additional capabilities for transcoding, task distribution, load balancing, content replication and distribution. Keywords: Streaming Service, Mobile Media Service, Cloud Computing, Media Transcoding 1 Introduction With the recent proliferation of rich social media across a variety of personal devices, considerable attention has shifted to the challenge of adaptively distributing and streaming multimedia content over the Internet. Among the technologies that have emerged, cloud-based media streaming, transcoding, and distributed storage have been the most noteworthy and influential. The reason why cloud-based technologies have emerged in this regard can be discerned from the features of recent multimedia services: media heterogeneity, Quality of Service (QoS) heterogeneity, network heterogeneity, and device heterogeneity [1]. To support such features, the streaming, transcoding, and distribution of media must depend on massive and massively scalable computational resources, i.e., CPUs, memory, network bandwidth, and storage. * Corresponding Author 348
2 Though cloud computing can provide these resources, in doing so, it also introduce a heavy burden on existing Internet infrastructure and cloud resources, and it introduces a host of new challenges (e.g., cluster rebalancing, namespace management, data distribution/replication, auto-recovery, and fault tolerance), all of which are intensified under the massive swings in traffic associated with rich media streaming. These challenges have proven difficult for developers and service vendors alike, and continue to trouble current media delivery systems. To address these challenges, we herein propose a cloud-based distributed multimedia streaming service (CloudDMSS) system designed to run on current cloud computing infrastructure. CloudDMSS capabilities include (1) Transcoding of large amounts of media into the MPEG-4 video format for delivery to a variety of devices, including PCs, smart pads, and phones (2) Exponential reduction in transcoding time through incorporation of the Hadoop file system (HDFS) for storage of multimedia data and MapReduce for distributed parallel processing (3) Reduction in content delays and traffic bottlenecks using streaming job distribution algorithms (4) Improvement in overall performance using dual-hadoop clustering per physical cluster (5) Efficient content distribution and improved scalability through adherence to Hadoop policies The remainder of this paper is organized as follows: section 2 discusses relevant research on cloud-based streaming services; section 3 describes the core architecture of CloudDMSS with respect to transcoding, job distribution, content replication and distribution, etc.; section 4 presents our prototype of the proposed system and its configuration; and section 5 offers concluding remarks and plans for future work. 2 Related Work In recent years, many researchers have applied cloud computing technologies to rich media services, in response to the explosion of demand for such services. This section presents the research most relevant to our CloudDMSS system. Hui et al. in [5] proposed MediaCloud, a layered architecture that defines a new paradigm for dealing with multimedia applications and services. The architecture comprises three layers a Media Service Layer, a Media Overlay Layer, and a Resource Management Layer and addresses such key challenges as heterogeneity, scalability, and QoS provisioning. However, this architecture is treated mainly at the conceptual level, leaving most of the challenges of real-world implementation to future work [5]. In contrast, Luo et al. addressed the implementation challenge of QoS provision over virtualized infrastructure by presenting a practical architecture and mechanism for a private media cloud [3]. They describe their system in terms of four major components: monitoring, load balancing, traffic management, and security. 349
3 In regard to cloud-based streaming, Lee et al. in [4] have proposed a configuration scheme for connectivity-aware P2P networks with algorithms for connectivity-aware mobile P2P network configuration and connectivity-aware P2P network reconfiguration. In [6], Chang et al. described a cloud-based media streaming architecture that dynamically adjusts streaming services in response to mobile device resources, multimedia codec features, and network environment. They also presented a design for the stream dispatcher component, including real-time adaptation of codecs in response to client device profiling, and a dynamic adjustment of multimedia streaming (DAMS) algorithm. In [9], Huang et al. presented CloudStream, a cloudbased video proxy capable of delivering high-quality video streams by transcoding the original video in real time to a scalable codec, which in turn allows adaptation of the stream to various network dynamics. They also proposed a multi-level transcoding parallelization framework with two mapping options: hallsh-based mapping and lateness-first mapping. 3 Proposed System Architecture Fig.1 shows an overview of the CloudDMSS architecture, highlighting three main modules: the Hadoop-based distributed multimedia transcoding module (HadoopDMT), and the Hadoop-based distributed multimedia streaming module (HadoopDMS), and the cloud multimedia management module (CMM). The HadoopDMT transcodes a variety of multimedia data into MPEG4, a standard format in which media can be streamed and played on a variety of devices. Quality and speed are improved by adopting the Hadoop distributed file system (HDFS) [2] for storing video data from many sources, MapReduce [2] for distributed parallel processing of this data, and Xuggler [7] for transcoding the data. Once transcoded, media contents are automatically moved and stored in HDFS of the HadoopDMS module. The contents are split into blocks of configurable size and distributed across the system. When a block is distributed, it is also replicated at three data nodes managed by NameNode, which constructs a directory tree of all transcoded contents according to the Hadoop distribution policy. This virtually guarantees content availability even under system or node failure. Furthermore, by conforming to the Hadoop policy, HadoopDMS automatically benefits from Hadoop s distributed processing capabilities, as well as its facilities for data replication, file splitting and merging, load balancing, and fault tolerance. The role of the CMM module is to manage jobs such as streaming and transcoding tasks, and to balance the load on streaming servers with media stream scheduling. The module s streaming job distribution algorithm is called streaming resource-based connection (SRC). SRC optimally distributes streaming jobs among streaming servers in HadoopDMS based on CPU usage rates and the currently streaming traffic. 350
4 Fig. 1. Architectural overview of the CloudDMSS system 4 Implementation and Prototype For our prototype implementation of CloudDMSS, we constructed our own cloud computing servers comprising 28 nodes in total. Each node consisted of Linux OS (Ubuntu LTS) running on two Intel Xeon quad-core 2.13 GHz processors with 4 GB registered ECC DDR memory and 1 TB SATA-2 disk storage. All nodes were interconnected through 100 Mbps Ethernet adapters. To implement the CMM module, one node was designated as our management server, running Tomcat, and a second node was designated as the content DB server, running MySQL. The HadoopDMS module was composed of 1 NameNode and 12 DataNodes running on HDFS. The HadoopDMS module consisted of 3 streaming servers based on NginX and 10 content storage servers running on HDFS. We used a dual Hadoop cluster in one physical cluster to distribute load between transcoding and streaming tasks. Our software specification included Java 1.6.0_39 (64-bit), Hadoop-1.0.4, Xuggler (64- bit) for video transcoding, H.264 streaming module-2.2.7, and fuse_dfs_
5 (a) (b) Fig. 2. (a) Web-based dashboard for streaming transcoded contents (b) Web-based dashboard for transcoding tasks and resources The output from the prototype system is provided in Fig. 2. Fig. 2(a) shows a selection of streamed content offered through a web-based dashboard, running commodity PC hardware. Fig. 2(b) shows a screenshot of the web page for managing transcoding tasks. Using this page, users and administrators can upload original content, select transcoding options (e.g., resolution, format, and codec) and stream the content to other users. Users can also monitor MapReduce-based transcoding processes and the remaining HDFS storage capacity. 352
6 5 Conclusion and Future Works In this paper, we proposed the CloudDMSS system for efficient cloud-based streaming of rich social media. Our system addresses a number of pressing issues related to distributed media streaming, including transcoding for heterogeneous devices, job distribution, and content replication/distribution under HDFS. Our current plan is to implement a fully functional CloudDMSS system, and to conduct thorough quantitative performance analysis of the system on a variety of cloud computing infrastructures, including Amazon EC2 and Rackspace Compute. Acknowledgements. This research was supported by the MSIP (Ministry of Science, ICT & Future Planning) of Korea under the C-ITRC (Convergence Information Technology Research Center) support program (NIPA-2013-H ) supervised by the NIPA (National IT Industry Promotion Agency). References 1. Zue, W., Luo, C., Wang, J., Li, S.: Multimedia Cloud Computing. IEEE Signal Processing Magazine 28, (2011) 2. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communication of the ACM 51, (2008) 3. Luo, H., Egbert, A., Stahlhut, T.: QoS Architecture for Cloud-based Media Computing. In: IEEE 3 rd International Conference on Software Engineering and Service Science, pp IEEE press, Beijing (2012) 4. Lee, H.S., Lim K.H., Kim, S.J.: A Configuration Scheme for Connectivity-aware Mobile P2P Networks for Efficient Mobile Cloud-based Video Streaming Services. Cluster Computing (2013) 5. Hui, W., Lin, C., Yang, Y.: MediaCloud: A New Paradigm of Multimedia Computing. KSII Transactions on Internet and Information Systems 6, (2012) 6. Chang, S.Y., Lai, C.F., Huang Y.M.: Dynamic Adjustable Multimedia Streaming Service Architecture over Cloud Computing. Computer Communications 35, (2012) 7. Xuggler Java Library, 353
SMCCSE: PaaS Platform for processing large amounts of social media
KSII The first International Conference on Internet (ICONI) 2011, December 2011 1 Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and
More informationStreaming Task Distribution Method for Reliable Distributed Streaming Service in Cloud Environment
Appl. Math. Inf. Sci. 9, No. 2L, 451-460 (2015) 451 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/092l20 Streaming Task Distribution Method for Reliable
More informationA Novel Model for Home Media Streaming Service in Cloud Computing Environment
, pp.265-274 http://dx.doi.org/10.14257/ijsh.2013.7.6.26 A Novel Model for Home Media Streaming Service in Cloud Computing Environment Yun Cui 1, Myoungjin Kim 1 and Hanku Lee1, 2,* 1 Department of Internet
More informationA Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov 2012 2827 Copyright 2012 KSII A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of
More informationDistributed Face Recognition Using Hadoop
Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,
More informationFacilitating Consistency Check between Specification & Implementation with MapReduce Framework
Facilitating Consistency Check between Specification & Implementation with MapReduce Framework Shigeru KUSAKABE, Yoichi OMORI, Keijiro ARAKI Kyushu University, Japan 2 Our expectation Light-weight formal
More informationDynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c
2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationAvailable online at ScienceDirect. Procedia Computer Science 56 (2015 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 56 (2015 ) 266 270 The 10th International Conference on Future Networks and Communications (FNC 2015) A Context-based Future
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationMapReduce. U of Toronto, 2014
MapReduce U of Toronto, 2014 http://www.google.org/flutrends/ca/ (2012) Average Searches Per Day: 5,134,000,000 2 Motivation Process lots of data Google processed about 24 petabytes of data per day in
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationABSTRACT The traditional approach to transcoding multimedia data requires specific and expensive hardware because of the highcapacity
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 34-39 IIST IMPLEMENTATION OF MAP REDUCE BASED MULTIMEDIA
More informationAccelerating Parallel Analysis of Scientific Simulation Data via Zazen
Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation
More informationHuge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2
2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationBigData and Map Reduce VITMAC03
BigData and Map Reduce VITMAC03 1 Motivation Process lots of data Google processed about 24 petabytes of data per day in 2009. A single machine cannot serve all the data You need a distributed system to
More informationCPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More informationVirtualized Testbed Development using Openstack
, pp.742-746 http://dx.doi.org/10.14257/astl.2015.120.147 Virtualized Testbed Development using Openstack Byeongok Kwak 1, Heeyoung Jung 1, 1 Electronics and Telecommunications Research Institute (ETRI),
More informationQoS-Adaptive Router Based on Per-Flow Management over NGN 1
QoS-Adaptive Router Based on Per-Flow Management over NGN 1 Boyoung Rhee 1, Sungchol Cho 1, Sunyoung Han 1,2, Chun-hyon Chang 1, and Jung Guk Kim 2 1 Department of Computer Science and Engineering, Konkuk
More informationEnhanced Hadoop with Search and MapReduce Concurrency Optimization
Volume 114 No. 12 2017, 323-331 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Enhanced Hadoop with Search and MapReduce Concurrency Optimization
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationCase Study: Deployment of Amazon Web Services to Fuel innovation in Multimedia Applications
Case Study: Deployment of Amazon Web Services to Fuel innovation in Multimedia Applications Part of Series: Designorate Case Study Written by: Rafiq Elmansy Published by: Designorate www.designorate.com
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationData Analysis Using MapReduce in Hadoop Environment
Data Analysis Using MapReduce in Hadoop Environment Muhammad Khairul Rijal Muhammad*, Saiful Adli Ismail, Mohd Nazri Kama, Othman Mohd Yusop, Azri Azmi Advanced Informatics School (UTM AIS), Universiti
More informationThe Design of Distributed File System Based on HDFS Yannan Wang 1, a, Shudong Zhang 2, b, Hui Liu 3, c
Applied Mechanics and Materials Online: 2013-09-27 ISSN: 1662-7482, Vols. 423-426, pp 2733-2736 doi:10.4028/www.scientific.net/amm.423-426.2733 2013 Trans Tech Publications, Switzerland The Design of Distributed
More informationMyCloud Computing Business computing in the cloud, ready to go in minutes
MyCloud Computing Business computing in the cloud, ready to go in minutes In today s dynamic environment, businesses need to be able to respond quickly to changing demands. Using virtualised computing
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationCLIENT DATA NODE NAME NODE
Volume 6, Issue 12, December 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Efficiency
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationAugust Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC. Supported by the National Natural Science Fund
August 15 2016 Li Qiang, Huang Qiulan, Sun Gongxing IHEP-CC Supported by the National Natural Science Fund The Current Computing System What is Hadoop? Why Hadoop? The New Computing System with Hadoop
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationResearch Article Mobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 DOI:10.19026/ajfst.5.3106 ISSN: 2042-4868; e-issn: 2042-4876 2013 Maxwell Scientific Publication Corp. Submitted: May 29, 2013 Accepted:
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationCluster Setup and Distributed File System
Cluster Setup and Distributed File System R&D Storage for the R&D Storage Group People Involved Gaetano Capasso - INFN-Naples Domenico Del Prete INFN-Naples Diacono Domenico INFN-Bari Donvito Giacinto
More informationProcessing Technology of Massive Human Health Data Based on Hadoop
6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,
More informationWHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD
Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Exclusive Summary We live in the data age. It s not easy to measure the total volume of data stored
More informationGoogle File System (GFS) and Hadoop Distributed File System (HDFS)
Google File System (GFS) and Hadoop Distributed File System (HDFS) 1 Hadoop: Architectural Design Principles Linear scalability More nodes can do more work within the same time Linear on data size, linear
More informationDell Technologies IoT Solution Surveillance with Genetec Security Center
Dell Technologies IoT Solution Surveillance with Genetec Security Center Surveillance December 2018 H17436 Sizing Guide Abstract The purpose of this guide is to help you understand the benefits of using
More informationCLOUD-SCALE FILE SYSTEMS
Data Management in the Cloud CLOUD-SCALE FILE SYSTEMS 92 Google File System (GFS) Designing a file system for the Cloud design assumptions design choices Architecture GFS Master GFS Chunkservers GFS Clients
More informationCISC 7610 Lecture 2b The beginnings of NoSQL
CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone
More informationBig Data for Engineers Spring Resource Management
Ghislain Fourny Big Data for Engineers Spring 2018 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models
More informationA BigData Tour HDFS, Ceph and MapReduce
A BigData Tour HDFS, Ceph and MapReduce These slides are possible thanks to these sources Jonathan Drusi - SCInet Toronto Hadoop Tutorial, Amir Payberah - Course in Data Intensive Computing SICS; Yahoo!
More informationBig Data 7. Resource Management
Ghislain Fourny Big Data 7. Resource Management artjazz / 123RF Stock Photo Data Technology Stack User interfaces Querying Data stores Indexing Processing Validation Data models Syntax Encoding Storage
More informationRemote Direct Storage Management for Exa-Scale Storage
, pp.15-20 http://dx.doi.org/10.14257/astl.2016.139.04 Remote Direct Storage Management for Exa-Scale Storage Dong-Oh Kim, Myung-Hoon Cha, Hong-Yeon Kim Storage System Research Team, High Performance Computing
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationA Study on Transmission System for Realistic Media Effect Representation
Indian Journal of Science and Technology, Vol 8(S5), 28 32, March 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 DOI : 10.17485/ijst/2015/v8iS5/61461 A Study on Transmission System for Realistic
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationVoldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation
Voldemort Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/29 Outline 1 2 3 Smruti R. Sarangi Leader Election 2/29 Data
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationHadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017
Hadoop File System 1 S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y Moving Computation is Cheaper than Moving Data Motivation: Big Data! What is BigData? - Google
More informationCS60021: Scalable Data Mining. Sourangshu Bhattacharya
CS60021: Scalable Data Mining Sourangshu Bhattacharya In this Lecture: Outline: HDFS Motivation HDFS User commands HDFS System architecture HDFS Implementation details Sourangshu Bhattacharya Computer
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationCloud Computing Paradigms for Pleasingly Parallel Biomedical Applications
Cloud Computing Paradigms for Pleasingly Parallel Biomedical Applications Thilina Gunarathne, Tak-Lon Wu Judy Qiu, Geoffrey Fox School of Informatics, Pervasive Technology Institute Indiana University
More informationOpen storage architecture for private Oracle database clouds
Open storage architecture for private Oracle database clouds White Paper rev. 2016-05-18 2016 FlashGrid Inc. 1 www.flashgrid.io Abstract Enterprise IT is transitioning from proprietary mainframe and UNIX
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationCS 345A Data Mining. MapReduce
CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes
More informationStudy on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment
, pp.375-384 http://dx.doi.org/10.14257/ijmue.2015.10.10.37 Study on the Distributed Crawling for Processing Massive Data in the Distributed Network Environment Chang-Su Kim PaiChai University, 155-40,
More informationImplementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b
International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory
More informationDeep Learning Based Real-time Object Recognition System with Image Web Crawler
, pp.103-110 http://dx.doi.org/10.14257/astl.2016.142.19 Deep Learning Based Real-time Object Recognition System with Image Web Crawler Myung-jae Lee 1, Hyeok-june Jeong 1, Young-guk Ha 2 1 Department
More informationCourse Overview. ECE 1779 Introduction to Cloud Computing. Marking. Class Mechanics. Eyal de Lara
ECE 1779 Introduction to Cloud Computing Eyal de Lara delara@cs.toronto.edu www.cs.toronto.edu/~delara/courses/ece1779 Course Overview Date Topic Sep 14 Introduction Sep 21 Python Sep 22 Tutorial: Python
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationAvailable online at ScienceDirect. Procedia Computer Science 98 (2016 )
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 554 559 The 3rd International Symposium on Emerging Information, Communication and Networks Integration of Big
More informationInternational Journal of Scientific & Engineering Research, Volume 7, Issue 2, February-2016 ISSN
68 Improving Access Efficiency of Small Files in HDFS Monica B. Bisane, Student, Department of CSE, G.C.O.E, Amravati,India, monica9.bisane@gmail.com Asst.Prof. Pushpanjali M. Chouragade, Department of
More informationWelcome to the New Era of Cloud Computing
Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing
More informationAccelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationBIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest
More informationCooperation between Data Modeling and Simulation Modeling for Performance Analysis of Hadoop
Cooperation between Data ing and Simulation ing for Performance Analysis of Hadoop Byeong Soo Kim and Tag Gon Kim Department of Electrical Engineering Korea Advanced Institute of Science and Technology
More informationescience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows
escience in the Cloud: A MODIS Satellite Data Reprojection and Reduction Pipeline in the Windows Jie Li1, Deb Agarwal2, Azure Marty Platform Humphrey1, Keith Jackson2, Catharine van Ingen3, Youngryel Ryu4
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationPACM: A Prediction-based Auto-adaptive Compression Model for HDFS. Ruijian Wang, Chao Wang, Li Zha
PACM: A Prediction-based Auto-adaptive Compression Model for HDFS Ruijian Wang, Chao Wang, Li Zha Hadoop Distributed File System Store a variety of data http://popista.com/distributed-filesystem/distributed-file-system:/125620
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationImage Filtering with MapReduce in Pseudo-Distribution Mode
Image Filtering with MapReduce in Pseudo-Distribution Mode Tharindu D. Gamage, Jayathu G. Samarawickrama, Ranga Rodrigo and Ajith A. Pasqual Department of Electronic & Telecommunication Engineering, University
More informationA New HadoopBased Network Management System with Policy Approach
Computer Engineering and Applications Vol. 3, No. 3, September 2014 A New HadoopBased Network Management System with Policy Approach Department of Computer Engineering and IT, Shiraz University of Technology,
More informationIBM InfoSphere Streams v4.0 Performance Best Practices
Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related
More informationDynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce
Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,
More informationHigh Throughput WAN Data Transfer with Hadoop-based Storage
High Throughput WAN Data Transfer with Hadoop-based Storage A Amin 2, B Bockelman 4, J Letts 1, T Levshina 3, T Martin 1, H Pi 1, I Sfiligoi 1, M Thomas 2, F Wuerthwein 1 1 University of California, San
More informationPARALLELIZATION OF BACKWARD DELETED DISTANCE CALCULATION IN GRAPH BASED FEATURES USING HADOOP JAYACHANDRAN PILLAMARI. B.E., Osmania University, 2009
PARALLELIZATION OF BACKWARD DELETED DISTANCE CALCULATION IN GRAPH BASED FEATURES USING HADOOP by JAYACHANDRAN PILLAMARI B.E., Osmania University, 2009 A REPORT submitted in partial fulfillment of the requirements
More informationA novel test access mechanism for parallel testing of multi-core system
LETTER IEICE Electronics Express, Vol.11, No.6, 1 6 A novel test access mechanism for parallel testing of multi-core system Taewoo Han, Inhyuk Choi, and Sungho Kang a) Dept of Electrical and Electronic
More informationA hybrid cloud-based distributed data management infrastructure for bridge monitoring
A hybrid cloud-based distributed data management infrastructure for bridge monitoring *Seongwoon Jeong 1), Rui Hou 2), Jerome P. Lynch 3), Hoon Sohn 4) and Kincho H. Law 5) 1),5) Dept. of Civil and Environ.
More informationParallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce
Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce Huayu Wu Institute for Infocomm Research, A*STAR, Singapore huwu@i2r.a-star.edu.sg Abstract. Processing XML queries over
More informationE-Training Content Delivery Networking System for Augmented Reality Car Maintenance Training Application
E-Training Content Delivery Networking System for Augmented Reality Car Maintenance Training Application Yu-Doo Kim and Il-Young Moon Korea University of Technology and Education kydman@koreatech.ac.kr
More informationABSTRACT I. INTRODUCTION
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISS: 2456-3307 Hadoop Periodic Jobs Using Data Blocks to Achieve
More informationFalling Out of the Clouds: When Your Big Data Needs a New Home
Falling Out of the Clouds: When Your Big Data Needs a New Home Executive Summary Today s public cloud computing infrastructures are not architected to support truly large Big Data applications. While it
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More information2. Cloud Storage Service
Indian Journal of Science and Technology, Vol 8(S8), 105 111, April 2015 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 DOI: 10.17485/ijst/2015/v8iS8/64230 A Performance Measurement Framework of Cloud
More informationHadoop and HDFS Overview. Madhu Ankam
Hadoop and HDFS Overview Madhu Ankam Why Hadoop We are gathering more data than ever Examples of data : Server logs Web logs Financial transactions Analytics Emails and text messages Social media like
More informationChapter 3 Virtualization Model for Cloud Computing Environment
Chapter 3 Virtualization Model for Cloud Computing Environment This chapter introduces the concept of virtualization in Cloud Computing Environment along with need of virtualization, components and characteristics
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationMixing and matching virtual and physical HPC clusters. Paolo Anedda
Mixing and matching virtual and physical HPC clusters Paolo Anedda paolo.anedda@crs4.it HPC 2010 - Cetraro 22/06/2010 1 Outline Introduction Scalability Issues System architecture Conclusions & Future
More informationFACEBOOK BASED HOME APPLIANCES SECURITY CONTROL AND MONITORING USING MICROCONTROLLER. Department of EIE, Saveetha Engineering College, Chennai.
FACEBOOK BASED HOME APPLIANCES SECURITY CONTROL AND MONITORING USING MICROCONTROLLER P.Kabilan 1, H.Nafil Askar 2, P.Naresh Anand 3, S.Manimaran 4, G.Venkatesh 5 Department of EIE, Saveetha Engineering
More informationHiTune. Dataflow-Based Performance Analysis for Big Data Cloud
HiTune Dataflow-Based Performance Analysis for Big Data Cloud Jinquan (Jason) Dai, Jie Huang, Shengsheng Huang, Bo Huang, Yan Liu Intel Asia-Pacific Research and Development Ltd Shanghai, China, 200241
More information