SMCCSE: PaaS Platform for processing large amounts of social media
|
|
- Aubrey Washington
- 5 years ago
- Views:
Transcription
1 KSII The first International Conference on Internet (ICONI) 2011, December Copyright c 2011 KSII SMCCSE: PaaS Platform for processing large amounts of social media Myoungjin Kim 1, Hanku Lee 2 and Hyeokju Lee 1 1 Division of Internet & Multimedia Engineering, Konkuk University Seoul - Korea [ tough105, hjlee09@konkuk.ac.kr] 2 Center for Social Media Cloud Computing, Konkuk University Seoul - Korea [ hlee@konkuk.ac.kr] *Corresponding author: Hanku Lee Abstract We briefly review the author s SMCCSE (Social Media Cloud Computing Service Environment) and its model, features and architecture, supporting social networking services based on large amounts of social media such as audio, videos and images. The main purpose of this paper is to propose and describe a social media cloud PaaS platform as a core platform of SMCCSE in detail. In the experimental section, we argue simple benchmark of a partially implemented cloud distributed parallel data processing platform that converts large amounts of image datasets into specific image formats using HDFS and MapReduce. That is to say, we evaluate the functions of image transmoding and transcoding modules based on Hadoop. Keywords: Cloud Computing, Cloud Services, MapReduce, Hadoop, Social Media, SNS and PaaS 1. Introduction Recently, lots of SNS providers are tending to release SNS based on social media including users own thoughts, opinions, and views. To develop SNS based on large amounts of social media (audio, videos and images), a large scalable mass-storage that stores social media data created by users daily is needed. In addition, to transfer published and created social media data to end-users, large-scalable computing power carrying out transcoding and transmoding that convert audio, videos and images data into proper forms for various end users devices is required. For this, we have proposed and described SMCCSE (Social Media Cloud Computing Service Environment) in earlier publication [1]. Our SMCCSE is able to provide enabling cloud computing technologies, computing resources and cloud services needed to develop social media-based SNS. Here we start to describe Social Media Cloud PaaS platform as a core platform of our SMCCSE in more details. This platform is composed of social media data analysis platform for large scalable data analysis, cloud distributed and parallel data processing platform for storing, distributing, processing social media data and cloud infra management platform for managing and monitoring computing resources. Also, we begin to discuss our performance of Hadoop-based cloud distributed and parallel data processing platform to convert image files into This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency (NIPA-2011 (C ))
2 specific formats in SMCCSE under a variety of conditions. This paper is organized as follows. Section2 introduces our SMCCSE. In the next section, we describe social media cloud computing PaaS platform in more details. Section4 shows our results of experiments. In the last section, we conclude our paper and discuss future directions. 2. Social Media Cloud Computing Service Model Our service model is a multiple service model combining with cloud computing that supports to develop SNSs such as Twitter and Facebook, social media services such as YouTube and social game services like the social network game of Facebook. Fig. 1 Social Media Cloud Computing Model Firstly, our service model offers social media APIs, social SDK based on Web and service delivery platform to easily develop SNS as the form of SaaS. Secondly, in order to provide social media data with reliable services to users, it also presents distributed and parallel data processing platform that deals with large social data (audio, video, and picture) for storing, distributing and en/decoding them as the form of PaaS. Lastly, it provides IaaS based on virtualization to reduce the cost associated with building computing resources such as servers, storages and so on. Figure1 shows the concept of Social Media Cloud Computing model. Here we have introduced the architecture of SMCCSE. The general idea of designing SMCCSE is to establish an environment supporting the development of SNS and addressing of numerous SNSs, to provide the approaches of processing big social media data and to provide a set of mechanisms to manage infrastructure. SMCCSE is largely divided into 3 layers: SaaS layer, PaaS layer and IaaS layer. Furthermore, 3 layers are composed of 8 parts: Social Media Service Platform, Social Media Common Algorithms Library, Distributed Processing Platform, Cloud Security, Cloud QoS, Green IDC, Infra Management and Virtualization. 3. Social Media Cloud Computing PaaS Platform This section explains PaaS platform likely to be the core of SMCC platform and IaaS functioning to provide physical computing environments. Fig 2 shows the whole architecture of PaaS platform and IaaS. 3.1 Social Media Data Analysis Platform The role of social media data analysis platform is to analyze social media data including text, images, audio and videos and to provide various libraries that perform the functions of encoding, decoding, transcoding and transmoding videos, images and audio. In SNSs based on social media, the function of analyzing social media data is one of the most important elements to offer reliable services to users. In order to recommend and offer social media of specific types to users, our platform analyzes usage pattern, types and correlation of social media shared, created published by users in advance. The other key function of social media data analysis platform is to provide user friendly interface that conducts the functions of transcoding and transmoding so that user can easily create, share and upload social media, especially image and video contents via social media common algorithms libraries. 3.2 Cloud Distributed and Parallel Data Processing Platform The main function of cloud distributed and parallel data processing platform as a core platform in SMCCSE is to store, distribute and
3 process large amounts of social media data in order to transfer them to user s devices such as mobile phones, smart pad, PC, TV and etc. Distributed and parallel data processing system is composed of two systems: distributed data system and distributed parallel processing system. Distributed data system adopts HDFS [2] (Hadoop Distributed File System) for distributed file system and HBase (Hadoop Database) for distributed DB system. In addition, we also select MapReduce [3] for distributed parallel programming model. This platform practically carries out functions provided by social media common algorithms libraries in social media data analysis platform. First of all, created social media data (images, audio, videos) and text-based social data are stored on HDFS. Stored data is processed through two steps using MapReduce. In the first step, our platform conducts analysis work for the execution of each business logic defined by social media APIs in SaaS platform. For instance, if it defines social media API that shows the list of video clips a particular group of users have seen most, MapReduce actually analyzes social media data and returns its result to social media API in order to provide the list to requestors. The other step, the function of encoding, decoding, transcoding and transmoding are carried out to serve QoS service to hundreds of heterogeneous smart devices. The traditional approaches to image convertion result in a great deal of waste time. However, our platform has solved such a problem by using enabling cloud computing technologies and large scalable computing resources in cloud computing environment. To begin with, via fixed size policy [4,5], a traditional file splitting technique, a single image content is splitted into small chunks and stored them in HDFS. After that, each splittedd chunk is encoded in parallel and then each encoded chunk combines into a single file again using MapReduce programming model which can reduce run time for encoding work. Also, the functions of transcoding and transmoding are carried out by adopting the same approach above. 3.3 Cloud Infra Management Platform Lastly, Cloud Infra Management Platform contains the concepts of cloud QoS, Green IDC and Cloud Infra Management. Cloud Infra management manage and monitor computing resources that do not depend on specific OS or Platform. It includes the functions of resource scheduling, resource information management, resource monitoring and virtual machine management. These functions are provided on web service based on Eucalyptus. Fig. 2 Social Media Cloud Computing PaaS Platform
4 4. Experiments and Results 4.1 Experimental Setup Our SMCCSE cloud server used in the experiments is a single enterprise scale cluster that consists of 27 computational nodes (slave nodes) and 1 head node (master node). The only way to access the cluster is through the master node. All nodes are running on Linux OS (CentOS 5.5). Table 1 shows setup environments of SMCCSE. Section CPU RAM HDD Switch Table 1. Setup Environments Specification Intel Xeon4 Core DPE GHz (28 nodes * 2EA) 4GB Registered ECC DDR Memory (28nodes * 4EA) 1TB SATA-2 7,200 RPM * 28EA NetGear JGS524 (1000Mbps) OS Linux CentOS 5.5 Java Hadoop Java 1.6.0_23 Hadoop JAI JAI (Java Image Processing) Functionality Testing In this section, we discuss the performance of partially implemented MapReduce-based cloud distributed and parallel data processing platform to convert image files into specific formats in SMCCSE. Especially, we measure run time for image conversion function based on MapReduce to convert large amounts of image datasets into proper image files suitable for a variety of mobile devices. In fact, our functionality testing focuses on converting diverse image datasets (Table2) into a specific format (from JPG to PNG). Datasets consist of 9 groups and an average size of one image file is approximately 19.8MB. We distribute and store each dataset on HDFS for evaluation. After that, we measure run times for image conversion in MapReduce under diverse conditions. Selected default options in Hadoop are as follows: 1) the numbers of block replications are 3EA. 2) Block size is 64MB. To implement an image conversion function, Java Advanced Imaging (JAI) APIs are used. In the first experiment, we measure the run times for an image conversion function under varying cluster size. According to Fig 3, it is known that the run times decrease when the numbers of nodes increase. In particular, the elapsed times decrease dramatically until 8 nodes. From 8 nodes to 28 nodes, the run times are reduced gradually. In addition, we also measure parallel speedup. Experiments are run with different numbers of parallel nodes to be able to calculate parallel speedup. Parallel speedup calculates how many times the parallel and distributed execution is faster than running an image conversion functions implemented by the same MapReduce programming on a single node. If speed up is greater than 1, it means that there is at least some gain from carrying out the work in parallel. If speedup is the same as the number of machines, which means that our cloud server and MapReduce programming has a perfect scalability and also has ideal performance. Calculated speedup is shown on Fig 4. As known from the results, in 2, 4 and 8 nodes, there is the best ideal and perfect scalability in parallel. Although, the performance is not optimum since 10 nodes, we know our cloud server has high-performance throughput via distributed processing. Moreover, in case of too large and too small data sets, we know that throughput in distributed processing performance is reduced. As a matter of fact, calculated speedups of 2GB, 100GB, 10GB and 40GB in 28 nodes are 15.3, 14.2, 23.1 and Section Name Format Source Table 2. Image datasets Flickr dataset JPG Keyword: Sun Content Size 1GB 2GB 4GB 8GB 10GB Number of files Size 20GB 40GB 50GB 100GB Number of files In the second experiment, to demonstrate the excellence of MapReduce programming to implement an image conversion function, we compare our cloud server with two machines running on one node. Machine A is equipped with AMD Athlon II-X GHz, 4GB memory and 600GB running on CentOS5.5 and machine B is the same as the master node s
5 machine. We measure each running time taken in our cloud server using MapReduce programming and taken in machine A and B applying only sequential programing using JAI libraries without MapReduce, respectively. and B without MapReduce is better than our server in case of 1 and 2 nodes is that distributed processing on MapReduce programming causes overhead elements with related to the creation of map tasks, job scheduling and low disk performance 5. Conclusion Fig. 3 Run times to convert image under varying cluster size In this paper, we describe social media cloud computing PaaS platform as a core platform in SMCCSE. The main roles of Our PaaS platform are to analyze social media data created by users, to store them on HDFS, to carry out transcoding, transmoding and en /decoding functions of social media using MapReduce programming model and to manage and monitoring large scalable computing resources in order to reduce the waste of resources utilization. In addition, to verify our performance of social media cloud computing PaaS platform, we have evaluated image conversion functions to convert large amounts of image datasets in experimental section under a variety of conditions. According to the performance results, we know that our PaaS platform in SMCCSE has a good performance for processing large amounts of datasets. In the future work, we are going to implement fully functional SMCCSE that converts video clips and audio files into proper file format. Fig. 4 Speedup for an image conversion function with different number of nodes Fig. 5 Run times for our SMCCSE with two different machine Fig 5 shows the result. The elapsed times in machine A, B are smaller than the running time taken in less than 2 nodes in our cloud server. The reason why the performance in machine A References [1] Myoungjin Kim and Hanku Lee, SMCC: Social Media Cloud Computing Model for Developing SNS based on Social Media, Communications in Computer and Information Sciences, vol.206, [2] /hdfs_design.pdf [3] op, [4] Xiaofei Liao and Hai Jin, A new distributed storage scheme for cluster video server, Journal of Systems Architecture, vol.51, issue 2, pp.79-94, [5] Shenoy, P.J, Vin, H.M, Efficient striping techniques for multimedia servers, Network and Operating System Support for digital Audio and Video, pp.25-36, 1997
A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop
A Robust Cloud-based Service Architecture for Multimedia Streaming Using Hadoop Myoungjin Kim 1, Seungho Han 1, Jongjin Jung 3, Hanku Lee 1,2,*, Okkyung Choi 2 1 Department of Internet and Multimedia Engineering,
More informationA Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of SMCCSE
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 6, NO. 11, Nov 2012 2827 Copyright 2012 KSII A Hadoop-based Multimedia Transcoding System for Processing Social Media in the PaaS Platform of
More informationA Novel Model for Home Media Streaming Service in Cloud Computing Environment
, pp.265-274 http://dx.doi.org/10.14257/ijsh.2013.7.6.26 A Novel Model for Home Media Streaming Service in Cloud Computing Environment Yun Cui 1, Myoungjin Kim 1 and Hanku Lee1, 2,* 1 Department of Internet
More informationABSTRACT The traditional approach to transcoding multimedia data requires specific and expensive hardware because of the highcapacity
International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol. 2, Issue 4, Dec 2015, 34-39 IIST IMPLEMENTATION OF MAP REDUCE BASED MULTIMEDIA
More informationCPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Cloud Computing Ennan Zhai Computer Science Department Yale University Recall: Lec-7 In the lec-7, I talked about: - P2P vs Enterprise control - Firewall - NATs - Software defined network
More informationData Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros
Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on
More informationWHITEPAPER. Improve Hadoop Performance with Memblaze PBlaze SSD
Improve Hadoop Performance with Memblaze PBlaze SSD Improve Hadoop Performance with Memblaze PBlaze SSD Exclusive Summary We live in the data age. It s not easy to measure the total volume of data stored
More informationDistributed Face Recognition Using Hadoop
Distributed Face Recognition Using Hadoop A. Thorat, V. Malhotra, S. Narvekar and A. Joshi Dept. of Computer Engineering and IT College of Engineering, Pune {abhishekthorat02@gmail.com, vinayak.malhotra20@gmail.com,
More informationNext-Generation Cloud Platform
Next-Generation Cloud Platform Jangwoo Kim Jun 24, 2013 E-mail: jangwoo@postech.ac.kr High Performance Computing Lab Department of Computer Science & Engineering Pohang University of Science and Technology
More informationImplementation of Parallel CASINO Algorithm Based on MapReduce. Li Zhang a, Yijie Shi b
International Conference on Artificial Intelligence and Engineering Applications (AIEA 2016) Implementation of Parallel CASINO Algorithm Based on MapReduce Li Zhang a, Yijie Shi b State key laboratory
More informationA TRANSACTION PAPER ON MULTIMEDIA DATA COMPRESSION IN DISTRIBUTION SYSTEMS VIA SVD AND HADOOP
A TRANSACTION PAPER ON MULTIMEDIA DATA COMPRESSION IN DISTRIBUTION SYSTEMS VIA SVD AND HADOOP Nilesh M. Deshmukh CSE Department, MSSCET, BAMU, Maharashtra, (India) ABSTRACT The tremendous growth of Internet
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationAccelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card
Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database
More informationHikCentral V1.1 Software Requirements & Hardware Performance
HikCentral V1.1 Software Requirements & Hardware Performance Contents Chapter 1 Software Requirements... 2 Chapter 2 Control Client Performance... 3 2.1 Low-End Configuration... 3 2.2 Mid-End Configuration...
More informationDecision analysis of the weather log by Hadoop
Advances in Engineering Research (AER), volume 116 International Conference on Communication and Electronic Information Engineering (CEIE 2016) Decision analysis of the weather log by Hadoop Hao Wu Department
More informationA Fast and High Throughput SQL Query System for Big Data
A Fast and High Throughput SQL Query System for Big Data Feng Zhu, Jie Liu, and Lijie Xu Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing, China 100190
More informationA Review Paper on Big data & Hadoop
A Review Paper on Big data & Hadoop Rupali Jagadale MCA Department, Modern College of Engg. Modern College of Engginering Pune,India rupalijagadale02@gmail.com Pratibha Adkar MCA Department, Modern College
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationAccelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationMOHA: Many-Task Computing Framework on Hadoop
Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationAccelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators
WHITE PAPER Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationCorrelation based File Prefetching Approach for Hadoop
IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie
More informationHadoop/MapReduce Computing Paradigm
Hadoop/Reduce Computing Paradigm 1 Large-Scale Data Analytics Reduce computing paradigm (E.g., Hadoop) vs. Traditional database systems vs. Database Many enterprises are turning to Hadoop Especially applications
More informationIntroduction to Hadoop. Owen O Malley Yahoo!, Grid Team
Introduction to Hadoop Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com Who Am I? Yahoo! Architect on Hadoop Map/Reduce Design, review, and implement features in Hadoop Working on Hadoop full time since
More informationA priority based dynamic bandwidth scheduling in SDN networks 1
Acta Technica 62 No. 2A/2017, 445 454 c 2017 Institute of Thermomechanics CAS, v.v.i. A priority based dynamic bandwidth scheduling in SDN networks 1 Zun Wang 2 Abstract. In order to solve the problems
More informationMATE-EC2: A Middleware for Processing Data with Amazon Web Services
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering
More informationBlazer Pro V2.1 Client Requirements & Hardware Performance
Blazer Pro V2.1 Client Requirements & Hardware Performance Table of Contents Chapter 1 Client Requirements... 2 Chapter 2 Control Client Performance... 3 2.1 Local Control Client on Blazer Pro Server...
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationRemote Direct Storage Management for Exa-Scale Storage
, pp.15-20 http://dx.doi.org/10.14257/astl.2016.139.04 Remote Direct Storage Management for Exa-Scale Storage Dong-Oh Kim, Myung-Hoon Cha, Hong-Yeon Kim Storage System Research Team, High Performance Computing
More informationNetwork Intrusion Forensics System based on Collection and Preservation of Attack Evidence
, pp.354-359 http://dx.doi.org/10.14257/astl.2016.139.71 Network Intrusion Forensics System based on Collection and Preservation of Attack Evidence Jong-Hyun Kim, Yangseo Choi, Joo-Young Lee, Sunoh Choi,
More informationStreaming Task Distribution Method for Reliable Distributed Streaming Service in Cloud Environment
Appl. Math. Inf. Sci. 9, No. 2L, 451-460 (2015) 451 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/092l20 Streaming Task Distribution Method for Reliable
More information4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)
4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015) Benchmark Testing for Transwarp Inceptor A big data analysis system based on in-memory computing Mingang Chen1,2,a,
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationPACM: A Prediction-based Auto-adaptive Compression Model for HDFS. Ruijian Wang, Chao Wang, Li Zha
PACM: A Prediction-based Auto-adaptive Compression Model for HDFS Ruijian Wang, Chao Wang, Li Zha Hadoop Distributed File System Store a variety of data http://popista.com/distributed-filesystem/distributed-file-system:/125620
More informationProcessing Technology of Massive Human Health Data Based on Hadoop
6th International Conference on Machinery, Materials, Environment, Biotechnology and Computer (MMEBC 2016) Processing Technology of Massive Human Health Data Based on Hadoop Miao Liu1, a, Junsheng Yu1,
More informationStriped Data Server for Scalable Parallel Data Analysis
Journal of Physics: Conference Series PAPER OPEN ACCESS Striped Data Server for Scalable Parallel Data Analysis To cite this article: Jin Chang et al 2018 J. Phys.: Conf. Ser. 1085 042035 View the article
More informationDesign and Implementation of Various File Deduplication Schemes on Storage Devices
Design and Implementation of Various File Deduplication Schemes on Storage Devices Yong-Ting Wu, Min-Chieh Yu, Jenq-Shiou Leu Department of Electronic and Computer Engineering National Taiwan University
More informationNowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?
Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/
More informationCloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018
Cloud Computing and Hadoop Distributed File System UCSB CS70, Spring 08 Cluster Computing Motivations Large-scale data processing on clusters Scan 000 TB on node @ 00 MB/s = days Scan on 000-node cluster
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationThe Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler
The Hadoop Distributed File System Konstantin Shvachko Hairong Kuang Sanjay Radia Robert Chansler MSST 10 Hadoop in Perspective Hadoop scales computation capacity, storage capacity, and I/O bandwidth by
More informationHigh Performance SSD & Benefit for Server Application
High Performance SSD & Benefit for Server Application AUG 12 th, 2008 Tony Park Marketing INDILINX Co., Ltd. 2008-08-20 1 HDD SATA 3Gbps Memory PCI-e 10G Eth 120MB/s 300MB/s 8GB/s 2GB/s 1GB/s SSD SATA
More informationFast forward. To your <next>
Fast forward To your Navin Shenoy EXECUTIVE VICE PRESIDENT GENERAL MANAGER, DATA CENTER GROUP CLOUD ECONOMICS INTELLIGENT DATA PRACTICES NETWORK TRANSFORMATION Intel Xeon Scalable Platform The
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationMongoDB on Kaminario K2
MongoDB on Kaminario K2 June 2016 Table of Contents 2 3 3 4 7 10 12 13 13 14 14 Executive Summary Test Overview MongoPerf Test Scenarios Test 1: Write-Simulation of MongoDB Write Operations Test 2: Write-Simulation
More informationImproving Throughput in Cloud Storage System
Improving Throughput in Cloud Storage System Chanho Choi chchoi@dcslab.snu.ac.kr Shin-gyu Kim sgkim@dcslab.snu.ac.kr Hyeonsang Eom hseom@dcslab.snu.ac.kr Heon Y. Yeom yeom@dcslab.snu.ac.kr Abstract Because
More informationIBM InfoSphere Streams v4.0 Performance Best Practices
Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related
More informationMAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti
International Journal of Computer Engineering and Applications, ICCSTAR-2016, Special Issue, May.16 MAPREDUCE FOR BIG DATA PROCESSING BASED ON NETWORK TRAFFIC PERFORMANCE Rajeshwari Adrakatti 1 Department
More informationInternational Journal of Advanced Engineering and Management Research Vol. 2 Issue 5, ISSN:
International Journal of Advanced Engineering and Management Research Vol. 2 Issue 5, 2017 http://ijaemr.com/ ISSN: 2456-3676 IMPLEMENTATION OF BIG DATA FRAMEWORK IN WEB ACCESS LOG ANALYSIS Imam Fahrur
More informationHewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE
Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE Digital transformation is taking place in businesses of all sizes Big Data and Analytics Mobility Internet of Things
More informationBIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE
BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest
More informationThe SHARED hosting plan is designed to meet the advanced hosting needs of businesses who are not yet ready to move on to a server solution.
SHARED HOSTING @ RS.2000/- PER YEAR ( SSH ACCESS, MODSECURITY FIREWALL, DAILY BACKUPS, MEMCHACACHED, REDIS, VARNISH, NODE.JS, REMOTE MYSQL ACCESS, GEO IP LOCATION TOOL 5GB FREE VPN TRAFFIC,, 24/7/365 SUPPORT
More informationAccelerating Parallel Analysis of Scientific Simulation Data via Zazen
Accelerating Parallel Analysis of Scientific Simulation Data via Zazen Tiankai Tu, Charles A. Rendleman, Patrick J. Miller, Federico Sacerdoti, Ron O. Dror, and David E. Shaw D. E. Shaw Research Motivation
More informationEnhanced Hadoop with Search and MapReduce Concurrency Optimization
Volume 114 No. 12 2017, 323-331 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu Enhanced Hadoop with Search and MapReduce Concurrency Optimization
More informationModeling and evaluation on Ad hoc query processing with Adaptive Index in Map Reduce Environment
DEIM Forum 213 F2-1 Adaptive indexing 153 855 4-6-1 E-mail: {okudera,yokoyama,miyuki,kitsure}@tkl.iis.u-tokyo.ac.jp MapReduce MapReduce MapReduce Modeling and evaluation on Ad hoc query processing with
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationGuoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14.
Guoping Wang and Chee-Yong Chan Department of Computer Science, School of Computing National University of Singapore VLDB 14 Page 1 Introduction & Notations Multi-Job optimization Evaluation Conclusion
More informationIndexing Strategies of MapReduce for Information Retrieval in Big Data
International Journal of Advances in Computer Science and Technology (IJACST), Vol.5, No.3, Pages : 01-06 (2016) Indexing Strategies of MapReduce for Information Retrieval in Big Data Mazen Farid, Rohaya
More informationEnhancement of Real Time EPICS IOC PV Management for Data Archiving System. Jae-Ha Kim
Enhancement of Real Time EPICS IOC PV Management for Data Archiving System Jae-Ha Kim Korea Multi-purpose Accelerator Complex, Korea Atomic Energy Research Institute, Gyeongju, Korea For operating a 100MeV
More informationEmbedded Technosolutions
Hadoop Big Data An Important technology in IT Sector Hadoop - Big Data Oerie 90% of the worlds data was generated in the last few years. Due to the advent of new technologies, devices, and communication
More informationImplementation of Map-Reduce based Image Processing Module
Implementation of Map-Reduce based Image Processing Module Under the Guidance : Prof. Danny J. Pereira(HOD) Divya Gandhi, Dnyaneshwari Naik, Komal Waghe, Renu Ramekar ABSTRACT As of late, there is a quick
More informationService Oriented Performance Analysis
Service Oriented Performance Analysis Da Qi Ren and Masood Mortazavi US R&D Center Santa Clara, CA, USA www.huawei.com Performance Model for Service in Data Center and Cloud 1. Service Oriented (end to
More informationHikCentral V1.2 Software Requirements & Hardware Performance
HikCentral V1.2 Software Requirements & Hardware Performance Contents Chapter 1 Software Requirements... 2 Chapter 2 Control Client Performance... 1 Chapter 3 Server Performance... 1 3.1 VSM Server (without
More informationResearch on Mass Image Storage Platform Based on Cloud Computing
6th International Conference on Sensor Network and Computer Engineering (ICSNCE 2016) Research on Mass Image Storage Platform Based on Cloud Computing Xiaoqing Zhou1, a *, Jiaxiu Sun2, b and Zhiyong Zhou1,
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationHuge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2
2nd International Conference on Materials Science, Machinery and Energy Engineering (MSMEE 2017) Huge Data Analysis and Processing Platform based on Hadoop Yuanbin LI1, a, Rong CHEN2 1 Information Engineering
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More informationAn Efficient Informal Data Processing Method by Removing Duplicated Data
An Efficient Informal Data Processing Method by Removing Duplicated Data Jaejeong Lee 1, Hyeongrak Park and Byoungchul Ahn * Dept. of Computer Engineering, Yeungnam University, Gyeongsan, Korea. *Corresponding
More informationISILON X-SERIES. Isilon X210. Isilon X410 ARCHITECTURE SPECIFICATION SHEET Dell Inc. or its subsidiaries.
SPECIFICATION SHEET Isilon X410 Isilon X210 ISILON X-SERIES The Dell EMC Isilon X-Series, powered by the Isilon OneFS operating system, uses a highly versatile yet simple scale-out storage architecture
More informationWelcome to the New Era of Cloud Computing
Welcome to the New Era of Cloud Computing Aaron Kimball The web is replacing the desktop 1 SDKs & toolkits are there What about the backend? Image: Wikipedia user Calyponte 2 Two key concepts Processing
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationEMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X400. EMC Isilon X410 ARCHITECTURE
EMC ISILON X-SERIES EMC Isilon X200 EMC Isilon X400 The EMC Isilon X-Series, powered by the OneFS operating system, uses a highly versatile yet simple scale-out storage architecture to speed access to
More informationCluster Setup and Distributed File System
Cluster Setup and Distributed File System R&D Storage for the R&D Storage Group People Involved Gaetano Capasso - INFN-Naples Domenico Del Prete INFN-Naples Diacono Domenico INFN-Bari Donvito Giacinto
More informationTechnical Brief: Specifying a PC for Mascot
Technical Brief: Specifying a PC for Mascot Matrix Science 8 Wyndham Place London W1H 1PP United Kingdom Tel: +44 (0)20 7723 2142 Fax: +44 (0)20 7725 9360 info@matrixscience.com http://www.matrixscience.com
More informationInternational Journal of Advance Engineering and Research Development. A Study: Hadoop Framework
Scientific Journal of Impact Factor (SJIF): e-issn (O): 2348- International Journal of Advance Engineering and Research Development Volume 3, Issue 2, February -2016 A Study: Hadoop Framework Devateja
More informationHighly accurate simulations of big-data clusters for system planning and optimization
White Paper Highly accurate simulations of big-data clusters for system planning and optimization Intel CoFluent Technology for Big Data Intel Rack Scale Design Using Intel CoFluent Technology for Big
More informationLinux Software RAID Level 0 Technique for High Performance Computing by using PCI-Express based SSD
Linux Software RAID Level Technique for High Performance Computing by using PCI-Express based SSD Jae Gi Son, Taegyeong Kim, Kuk Jin Jang, *Hyedong Jung Department of Industrial Convergence, Korea Electronics
More informationParallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem
I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **
More informationDo-It-Yourself 1. Oracle Big Data Appliance 2X Faster than
Oracle Big Data Appliance 2X Faster than Do-It-Yourself 1 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such
More informationReal-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b
4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 1
More informationDistributed Filesystem
Distributed Filesystem 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributing Code! Don t move data to workers move workers to the data! - Store data on the local disks of nodes in the
More informationGlobal Journal of Engineering Science and Research Management
A FUNDAMENTAL CONCEPT OF MAPREDUCE WITH MASSIVE FILES DATASET IN BIG DATA USING HADOOP PSEUDO-DISTRIBUTION MODE K. Srikanth*, P. Venkateswarlu, Ashok Suragala * Department of Information Technology, JNTUK-UCEV
More informationEXTRACT DATA IN LARGE DATABASE WITH HADOOP
International Journal of Advances in Engineering & Scientific Research (IJAESR) ISSN: 2349 3607 (Online), ISSN: 2349 4824 (Print) Download Full paper from : http://www.arseam.com/content/volume-1-issue-7-nov-2014-0
More informationJuxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms
, pp.289-295 http://dx.doi.org/10.14257/astl.2017.147.40 Juxtaposition of Apache Tez and Hadoop MapReduce on Hadoop Cluster - Applying Compression Algorithms Dr. E. Laxmi Lydia 1 Associate Professor, Department
More informationHDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung
HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per
More informationProperly Sizing Processing and Memory for your AWMS Server
Overview This document provides guidelines for purchasing new hardware which will host the AirWave Wireless Management System. Your hardware should incorporate margin for WLAN expansion as well as future
More informationAn Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc. The study on magnanimous data-storage system based on cloud computing
[Type text] [Type text] [Type text] ISSN : 0974-7435 Volume 10 Issue 11 BioTechnology 2014 An Indian Journal FULL PAPER BTAIJ, 10(11), 2014 [5368-5376] The study on magnanimous data-storage system based
More informationcalled Hadoop Distribution file System (HDFS). HDFS is designed to run on clusters of commodity hardware and is capable of handling large files. A fil
Parallel Genome-Wide Analysis With Central And Graphic Processing Units Muhamad Fitra Kacamarga mkacamarga@binus.edu James W. Baurley baurley@binus.edu Bens Pardamean bpardamean@binus.edu Abstract The
More informationLevel 2 Diploma Unit 3 Computer Systems
Level 2 Diploma Unit 3 Computer Systems You are an IT technician in a small company which creates web sites. The company has recently employed someone who is partially sighted and is also left handed.
More informationDynamic Data Placement Strategy in MapReduce-styled Data Processing Platform Hua-Ci WANG 1,a,*, Cai CHEN 2,b,*, Yi LIANG 3,c
2016 Joint International Conference on Service Science, Management and Engineering (SSME 2016) and International Conference on Information Science and Technology (IST 2016) ISBN: 978-1-60595-379-3 Dynamic
More informationSimulation of Cloud Computing Environments with CloudSim
Simulation of Cloud Computing Environments with CloudSim Print ISSN: 1312-2622; Online ISSN: 2367-5357 DOI: 10.1515/itc-2016-0001 Key Words: Cloud computing; datacenter; simulation; resource management.
More informationConstruction and Application of Cloud Data Center in University
International Conference on Logistics Engineering, Management and Computer Science (LEMCS 2014) Construction and Application of Cloud Data Center in University Hong Chai Institute of Railway Technology,
More informationA SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING
Journal homepage: www.mjret.in ISSN:2348-6953 A SURVEY ON SCHEDULING IN HADOOP FOR BIGDATA PROCESSING Bhavsar Nikhil, Bhavsar Riddhikesh,Patil Balu,Tad Mukesh Department of Computer Engineering JSPM s
More informationDynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce
Dynamic processing slots scheduling for I/O intensive jobs of Hadoop MapReduce Shiori KURAZUMI, Tomoaki TSUMURA, Shoichi SAITO and Hiroshi MATSUO Nagoya Institute of Technology Gokiso, Showa, Nagoya, Aichi,
More informationDemystifying the Cloud With a Look at Hybrid Hosting and OpenStack
Demystifying the Cloud With a Look at Hybrid Hosting and OpenStack Robert Collazo Systems Engineer Rackspace Hosting The Rackspace Vision Agenda Truly a New Era of Computing 70 s 80 s Mainframe Era 90
More informationCA485 Ray Walshe Google File System
Google File System Overview Google File System is scalable, distributed file system on inexpensive commodity hardware that provides: Fault Tolerance File system runs on hundreds or thousands of storage
More information