Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt

Size: px
Start display at page:

Download "Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt"

Transcription

1 Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt Date: 10 Sep, 2017 Draft v 4.0

2 Table of Contents 1. Introduction Infrastructure Reference Architecture Current Status of CPI-Related Solutions Targeted Data Management Continuum Current Infrastructure Architecture Targeted Solution Architecture Recommendations for Applications and Data Management Main Recommended Components Estimated Hi-Level Sizing and Specifications Conclusion and Next Actions References Page 2 of 23

3 1. Introduction Realizing the advantages of using mobile technology for data collection and statistical production, the United Nations Economic Commission for Africa (ECA) is implementing a series of pilot projects on strengthening the capacity of African countries to use mobile technologies to collect data for effective policy and decision making. The pilot projects are designed to be executed by the National Statistical Office (NSO) in collaboration with a Training and Research Institute (TRI) designated by the NSO. The main partner in the project is the NSO in Egypt, called the Central Agency for Public Mobilization and Statistics (CAPMAS). CAPMAS has in turn designated Nile University as the TRI. The main objectives of the pilot project are as follows: Strengthen the capacity of country to collect data with mobile technology Experiment with self enumeration using mobile devices to collect data and determine the suitability of such data for the production of statistics; Strengthen working relationship between NSO and TRI in statistical development. The focus of this report is to support CAPMAS to install and/or upgrade technical infrastructure, including computer servers and software to receive data from the project and integrate into standard statistical processes in Egypt, as well as to acquire handheld devices. Based on several meetings and assessment events with CAPMAS team, the current infrastructure and the targeted upgrades has been illustrated in this report. At the end, sizing estimates along with recommendations for Big Data components and platform has been made. The main infrastructure achievement at CAPMAS is the virtualized data center which is recommended to be upgraded further to Cloud Computing platform. The National Institute of Standards and Technology (NIST) Cloud reference architecture is recommend to be sued to achieve a private cloud computing platform for this purpose. Page 3 of 23

4 2. Infrastructure Reference Architecture For the sacked of standardizing the infrastructure design for the project, a suitable reference architecture need to be used. As the cloud computing provides several benefits and at the same time exiting data center provide a solid foundation for such approach, The National Institute of Standards and Technology (NIST) Cloud reference architecture will be used as detailed in reference 2, following are key points. The Architectural Components of the NIST Reference Architecture describes the important aspects of service deployment and service orchestration. The overall service management of the cloud is acknowledged as an important element in the scheme of the architecture. Business Support mechanisms are in place to recognize customer management issues like contracts, accounting and pricing and are vital to cloud computing. Following figure presents an overview of the NIST cloud computing reference architecture, which identifies the major actors, their activities and functions in cloud computing. The diagram depicts a generic high-level architecture and is intended to facilitate the understanding of the requirements, uses, characteristics and standards of cloud computing. Page 4 of 23

5 The NIST cloud computing definition is widely accepted as a valuable contribution toward providing a clear understanding of cloud computing technologies and cloud services. It provides a simple and unambiguous taxonomy of three service models available to cloud consumers: cloud software as a service (SaaS), cloud platform as a service (PaaS), and cloud infrastructure as a service (IaaS). It also summarizes four deployment models describing how the computing infrastructure that delivers these services can be shared: private cloud, community cloud, public cloud, and hybrid cloud. Finally, the NIST definition also provides a unifying view of five essential characteristics that all cloud services exhibit: ondemand self-service, broad network access, resource pooling, rapid elasticity, and measured service. The NIST cloud computing reference architecture defines five major actors: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud broker. Each actor is an entity (a person or an organization) that participates in a transaction or process and/or performs tasks in cloud computing. Following table briefly lists the actors defined in the NIST cloud computing reference architecture: Actor Cloud Consumer Cloud Provider Cloud Auditor Cloud Broker Cloud Carrier Definition A person or organization that maintains a business relationship with, and uses service from, Cloud Providers A person, organization, or entity responsible for making a service available to interested parties A party that can conduct independent assessment of cloud services, information system operations, performance and security of the cloud implementation An entity that manages the use, performance and delivery of cloud services, and negotiates relationships between Cloud Providers and Cloud Consumers An intermediary that provides connectivity and transport of cloud services from Cloud Providers to Cloud Consumers Page 5 of 23

6 Our focus in this solution will be on the Private Cloud Model that need to be in place at CAPMAS as infrastructure of the mobile data collection applications as well as back end processing technologies. NIST defines A private cloud to give a single Cloud Consumer s organization the exclusive access to and usage of the infrastructure and computational resources. It may be managed either by the Cloud Consumer organization or by a third party, and may be hosted on the organization s premises (i.e. on-site private clouds) or outsourced to a hosting company (i.e. outsourced private clouds). Page 6 of 23

7 3. Current Status of CPI-Related Solutions Currently, there is neither dedicated infrastructure for CPI related processing at CAPMAS nor back end processing components like database engines or big data platforms to handle data processing, transformation and modeling. Most work is done either manually or collected to spread sheets for processing and estimation of CPI and intermediate statistics and KPIs. The following statistics provided by CAPMAS illustrates the workload for the CPI process in terms of effort needed by involved members: KPI Measure Description Number of Researchers Number of Supervisors Number of Researchers per Supervisor Overall number of governorates Overall number of regions Overall Number of markets Number of markets per region Number of markets per researcher About About Not specified One region to a one researcher Filed persons assigned to collected data from the different markets Filed person assigned to manage filed operation of researchers The average number of researchers being supervised by a supervisor Governorates where filed operation takes place Regions where markets are located for collecting prices Markets where prices are being collected Number of markets per regions where operation takes place Number of markets assigned during one month to single researcher Page 7 of 23

8 Number of forms per researcher Number of products per form Number of branch reviewers Number of head office reviewers products Number of forms to be completed by a researcher in one month Number of products the researcher need to get prices for per each single form Number of reviewers assigned to review the collected prices for each branch office Number of reviewers at the head office responsible for the final review of prices collected from all filed operations Page 8 of 23

9 4. Targeted Data Management Continuum The effectiveness of mobile data collection solution for the CPI Process requires the exitance of enterprise data management platform that is capable of handling collected data in integrated, secured and accessible way so that collaborative model among researchers, supervisors and CAPMAS branches, central departments and CPI departments can be achieved. The current situation in the CPI process at CAPMAS lacks for such enterprise data management platform hence most of the process is done manually through paper forms except for the final analysis which is conducted using excel sheets or local desktop software prohibiting the value of collaborative data models. The target platform and infrastructure should fulfill the following main requirements split by each phase of the data management continuum: Data Collection: enables automating the data sourcing, review, approval and consolidation using automated process through the workflow embedded into the mobile application for the filed researchers and their supervisors. Page 9 of 23

10 Data Aggregation: the sourced data from the mobile applications after review and approval needed to be aggregated properly into the backend database through direct connection and predefined rules defined by the CPI department. Data Matching: ability to extract external data and maintain master data while provide ability to query date using predefined queries as well as ad-hoc queries. At the same time, enable augmenting CPI data with other data like spatial and geolocation data. Data Quality: provide means for checking data quality and validation during the collection process and post collection while reviewing on the back-office processing and applying standard CPI statistical analysis. Data Persistence: retain and organize data for as long time as possible while provides capabilities of multi structured data to save the cost of storage. Data Consolidation: assemble data entities integrated into the back-end systems with flexible meta data management to ensure accessibility by specific roles. Data Distribution: enable analysis tools to access, retrieve and communicate data in an intuitive way suitable to each level of CPI employees as well as structured for branches access and top management reporting. The new model proposed to be implemented in the pilot project will address the above requirements for each area targeting an integrated data management platform that enables data integration, collaboration, retention using most recent big data management technologies. Transfer data directly to secured servers managed internally by CAPMAS including the following features: End-to-end encryption using existing CAPMAS telecommunication infrastructure. Reliable simultaneous connections to CAPMAS datacentre servers. Online/offline synchronization. GIS Integration. Multilanguage. Could architecture be used by all surveys and by all statistical processes. Could architecture be easily used to handle the self-enumeration concept. Page 10 of 23

11 5. Current Infrastructure Architecture At CAPMAS, virtualized data center infrastructure is used widely for other applications which can be leveraged for the CPI project with some modifications and upgrades as per the next sections. The current infrastructure is based on VMWare virtualization technologies as details in reference 3 main points are following. VMware Infrastructure includes the following components as shown in above figure: VMware ESX Server A production-proven virtualization layer run on physical servers that abstract processor, memory, storage and networking resources to be provisioned to multiple virtual machines VMware Virtual Machine File System (VMFS) A high-performance cluster file system for virtual machines Page 11 of 23

12 VMware Virtual Symmetric Multi-Processing (SMP) Enables a single virtual machine to use multiple physical processors simultaneously VirtualCenter Management Server The central point for configuring, provisioning and managing virtualized IT infrastructure Virtual Infrastructure Client (VI Client) An interface that allows administrators and users to connect remotely to the Virtual Center Management Server or individual ESX Server installations from any Windows PC Virtual Infrastructure Web Access A Web interface for virtual machine management and remote consoles access VMware VMotion Enables the live migration of running virtual machines from one physical server to another with zero downtime, continuous service availability and complete transaction integrity Page 12 of 23

13 VMware High Availability (HA) Provides easy-to-use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers that have spare capacity VMware Distributed Resource Scheduler (DRS) Intelligently allocates and balances computing capacity dynamically across collections of hardware resources for virtual machines VMware Consolidated Backup Provides an easy to use, centralized facility for agentfree backup of virtual machines. It simplifies backup administration and reduces the load on ESX Server installations VMware Infrastructure SDK Provides a standard interface for VMware and third-party solutions to access VMware Infrastructure Page 13 of 23

14 6. Targeted Solution Architecture While leveraging the current virtualized infrastructure using a cloud computing model is the designated approach, the target infrastructure has several roles in running the mobile data collection solution to work smoothly as planned. Those roles including as per reference 4: Support the tabled mobile application communications for field researcher and supervisor applications. Enable hosting and running the REST APIs and associated data services developed for the mobile application data interfacing. Provide Big Data capabilities for long term data retention and high-performance computing. For supporting the tabled mobile application communications for field researcher and supervisor applications, following figure shows the communications topology: System Communication Diagram Page 14 of 23

15 The tablet devices are connected a 4G broadband cellular network The end-to-end communication between field devices and the back-end server is done through a Virtual Private Network (VPN) tunneling to ensure data security. Due to communication limitation, tablet devices should alternate between Online and Offline modes In Offline mode, the tablet device can still gather and store data and save them locally on a local database that resides on the tablet In Online mode, the device can synchronize the local and central database, send and receive messages and perform all other functions that require connectivity. On the other side, for enabling hosting and running the REST APIs and associated data services developed for the mobile application data interfacing, following figure shows the main tablet mobile applications system components and data flow: Mobile Tablet Applications System Modules Diagram Providing Big Data capabilities for long term data retention and high-performance computing will be covered in next section. Page 15 of 23

16 7. Recommendations for Applications and Data Management In the previous section on the tablet mobile application system components, the CAPMAS Backend Server is the landing space for collected data through the field researchers and supervisors. To provide Big Data capabilities for long term data retention and high-performance computing, and receiving additional data like self-enumeration and external sources integration, additional services will be integrated beneath the backend server receiving tabled data. The following features will be attained through the additional services: # Feature Description 1 Distributed Data Management Data will be stored in distributed blocks on several nodes enables granular management, scalability and highperformance computing. 2 Distributed Processing Aggregation, transformation, statistical analysis, data modeling will be implemented on a distributed application framework to enable high performance scalable resilient computing. 3 Batch Loading Enable ingestion of accumulated data into batches for long frequency loads. 4 Streaming Loading Enables ingesting data into small frequent streams of data in the form of pipeline of messages or transactions. 5 In Memory Processing Running data analysis in selected set of data in memory for faster processing and manipulation. 6 Data Science Modeling Specialized libraries that implements machine learning, deep learning, statistical modeling, data mining and analysis operations atop of the data platform 7 Graph Analysis Components that enable big graph implementation and network analysis models. Page 16 of 23

17 8. Main Recommended Components Based on the previous sections of current status and targeted requirements, several component need to be installed to achieve needed upgrades of exiting infrastructure. The following sections describes recommended components subject to review during the implementation of infrastructure upgrades and setup: - VMware vcloud Suite Leverage the current virtualized infrastructure into cloud management. vcloud Suite is an integrated offering that brings together VMware s industry-leading vsphere hypervisor and VMware vrealize Suite multi-vendor hybrid cloud management platform. VMware s new portable licensing units allow vcloud Suite to build and manage vsphere-based private clouds. Accelerate application delivery across both traditional and container based applications by giving developers the freedom to use the tools that make them most productive while still ensuring that applications can be moved seamlessly from developer laptop to production. - Apache Hadoop Distributed File System (HDFS) Distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. - Apache YARN The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global Resource Manager (RM) and per-application Application Master (AM). An application is either a single job or a DAG of jobs. The Resource Manager and the Node Manager form the data-computation framework. The Resource Manager is the ultimate authority that arbitrates resources among all the applications in the system. The Node Manager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Resource Manager/Scheduler. The per-application Application Master is, in effect, a framework specific library and is tasked with negotiating resources from the Resource Manager and working with the Node Manager(s) to execute and monitor the tasks. Page 17 of 23

18 - Apache Spark A fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. - Apache Hive Data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. - Apache HBase Provides random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, nonrelational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. - Apache Oozie Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). Oozie is a scalable, reliable and extensible system. - Apache Tez building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN. Provides expressive dataflow definition APIs, flexible Input-Processor-Output runtime model, data type agnostic, Simplifying deployment, performance gains over Map Reduce, optimal resource management, plan reconfiguration at runtime and dynamic physical data flow decisions Page 18 of 23

19 - Apache Flume A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. - Apache Sqoop A tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance. - MongoDB A document database with the scalability and flexibility that you want with the querying and indexing that you need. MongoDB stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time. Will be used a document store for unstructured data. - PostgreSQL A powerful SQL based database engine that will be used for landing mobile tablet applications collected data working behind the data services of the REST APIs. It provides extensive high-performance processing as well as special capabilities like GIS data handling. Page 19 of 23

20 9. Estimated Hi-Level Sizing and Specifications The following table lists the estimated sizing for the infrastructure required for deploying and running the for mentioned components. Sizing will be revised during the implementation taking advantage from the cloud approach deployed on top of the virtualized infrastructure at CAPMAS data center: # VM Function Estimated Node Sizing 1 2 x Name Nodes 4 Cores 3.0 GHz 16 GB RAM 200 GB Storage Linux OS 2 2 x Resource Scheduling Nodes 4 Cores 3.0 GHz 16 GB RAM 200 GB Storage Linux OS 3 8 x Worker Nodes 2 Cores 3.0 GHz 8 GB RAM 500 GB Storage Linux OS 4 2 x Document Services Nodes 4 Cores 3.0 GHz 16 GB RAM 500 GB Storage Linux OS 5 2 x REST APIs Hosting Nodes 4 Cores 3.0 GHz 16 GB RAM 100 GB Storage Linux OS Page 20 of 23

21 6 2 x Central Database Nodes 4 Cores 3.0 GHz 16 GB RAM 500 GB Disk Space Linux OS 7 2 x Back Office Applications 4 Cores 3.0 GHz 8 GB RAM 200 GB Disk Space Windows Server Page 21 of 23

22 10. Conclusion and Next Actions The achievement of virtualized infrastructure at CAPMAS is paving the way for building solid foundation for the mobile data collection solution as well as other potential data solutions and integration with external data sources. To leverage this achievement two main additional layers need to be build: Extending Virtualization to Cloud Platform Deploying Big Data Management Platform Next Actions would include commencing in implementing plan for the two above items where implementation team need to be invited while ensuring complete know-how transfer to CAPMAS team specially on the Big Data management solutions as well as extending the backend capabilities to support the mobile data collection solution as the main focus of this pilot project. Page 22 of 23

23 11. References 1- UNECA CAPMAS Nile University Letter of Agreement (LoA). 2- Cloud Computing Reference Architecture: Recommendations of the National Institute of Standards and Technology 3- VMware Virtualization Documentation 4- CAPMAS Pricing Tablet Application Requirements and Design Document. 5- VMware vcloud Suite 6- Apache Hadoop Main Page Page 23 of 23

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics

IBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM

FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Introduction to Big-Data

Introduction to Big-Data Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,

More information

Oracle Big Data Connectors

Oracle Big Data Connectors Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution

EMC Virtual Infrastructure for Microsoft Applications Data Center Solution EMC Virtual Infrastructure for Microsoft Applications Data Center Solution Enabled by EMC Symmetrix V-Max and Reference Architecture EMC Global Solutions Copyright and Trademark Information Copyright 2009

More information

Chapter 4. Fundamental Concepts and Models

Chapter 4. Fundamental Concepts and Models Chapter 4. Fundamental Concepts and Models 4.1 Roles and Boundaries 4.2 Cloud Characteristics 4.3 Cloud Delivery Models 4.4 Cloud Deployment Models The upcoming sections cover introductory topic areas

More information

Hadoop Development Introduction

Hadoop Development Introduction Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

MyCloud Computing Business computing in the cloud, ready to go in minutes

MyCloud Computing Business computing in the cloud, ready to go in minutes MyCloud Computing Business computing in the cloud, ready to go in minutes In today s dynamic environment, businesses need to be able to respond quickly to changing demands. Using virtualised computing

More information

Copyright 2015 EMC Corporation. All rights reserved. Published in the USA.

Copyright 2015 EMC Corporation. All rights reserved. Published in the USA. This Reference Architecture Guide describes, in summary, a solution that enables IT organizations to quickly and effectively provision and manage Oracle Database as a Service (DBaaS) on Federation Enterprise

More information

CLOUD COMPUTING. Lecture 4: Introductory lecture for cloud computing. By: Latifa ALrashed. Networks and Communication Department

CLOUD COMPUTING. Lecture 4: Introductory lecture for cloud computing. By: Latifa ALrashed. Networks and Communication Department 1 CLOUD COMPUTING Networks and Communication Department Lecture 4: Introductory lecture for cloud computing By: Latifa ALrashed Outline 2 Introduction to the cloud comupting Define the concept of cloud

More information

VMware vsphere with ESX 6 and vcenter 6

VMware vsphere with ESX 6 and vcenter 6 VMware vsphere with ESX 6 and vcenter 6 Course VM-06 5 Days Instructor-led, Hands-on Course Description This class is a 5-day intense introduction to virtualization using VMware s immensely popular vsphere

More information

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?

More information

A Glimpse of the Hadoop Echosystem

A Glimpse of the Hadoop Echosystem A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

WHITE PAPER SEPTEMBER VMWARE vsphere AND vsphere WITH OPERATIONS MANAGEMENT. Licensing, Pricing and Packaging

WHITE PAPER SEPTEMBER VMWARE vsphere AND vsphere WITH OPERATIONS MANAGEMENT. Licensing, Pricing and Packaging WHITE PAPER SEPTEMBER 2017 VMWARE vsphere AND vsphere WITH OPERATIONS MANAGEMENT Licensing, Pricing and Packaging Table of Contents Executive Summary 3 VMware vsphere with Operations Management Overview

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

The Future of Virtualization Desktop to the Datacentre. Raghu Raghuram Vice President Product and Solutions VMware

The Future of Virtualization Desktop to the Datacentre. Raghu Raghuram Vice President Product and Solutions VMware The Future of Virtualization Desktop to the Datacentre Raghu Raghuram Vice President Product and Solutions VMware Virtualization- Desktop to the Datacentre VDC- vcloud vclient With our partners, we are

More information

Introduction to BigData, Hadoop:-

Introduction to BigData, Hadoop:- Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,

More information

Hortonworks Data Platform

Hortonworks Data Platform Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks

More information

Chapter 5. The MapReduce Programming Model and Implementation

Chapter 5. The MapReduce Programming Model and Implementation Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Customer Case Studies on Accelerating Their Path to Hybrid Cloud

Customer Case Studies on Accelerating Their Path to Hybrid Cloud Customer Case Studies on Accelerating Their Path to Hybrid Cloud Hitachi and VMware: Global Strategic Partners Committed to Success Sunny Sahajpal EMEA Strategic Alliances and OEM Mananger VMware Partner

More information

Cisco Integration Platform

Cisco Integration Platform Data Sheet Cisco Integration Platform The Cisco Integration Platform fuels new business agility and innovation by linking data and services from any application - inside the enterprise and out. Product

More information

COPYRIGHTED MATERIAL. Introducing VMware Infrastructure 3. Chapter 1

COPYRIGHTED MATERIAL. Introducing VMware Infrastructure 3. Chapter 1 Mccain c01.tex V3-04/16/2008 5:22am Page 1 Chapter 1 Introducing VMware Infrastructure 3 VMware Infrastructure 3 (VI3) is the most widely used virtualization platform available today. The lineup of products

More information

Exam Questions

Exam Questions Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure

More information

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Cloud + Big Data Putting it all Together

Cloud + Big Data Putting it all Together Cloud + Big Data Putting it all Together Even Solberg 2009 VMware Inc. All rights reserved 2 Big, Fast and Flexible Data Big Big Data Processing Fast OLTP workloads Flexible Document Object Big Data Analytics

More information

Cloud Services. Introduction

Cloud Services. Introduction Introduction adi Digital have developed a resilient, secure, flexible, high availability Software as a Service (SaaS) cloud platform. This Platform provides a simple to use, cost effective and convenient

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large

More information

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

7 Things ISVs Must Know About Virtualization

7 Things ISVs Must Know About Virtualization 7 Things ISVs Must Know About Virtualization July 2010 VIRTUALIZATION BENEFITS REPORT Table of Contents Executive Summary...1 Introduction...1 1. Applications just run!...2 2. Performance is excellent...2

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

OPENSTACK PRIVATE CLOUD WITH GITHUB

OPENSTACK PRIVATE CLOUD WITH GITHUB OPENSTACK PRIVATE CLOUD WITH GITHUB Kiran Gurbani 1 Abstract Today, with rapid growth of the cloud computing technology, enterprises and organizations need to build their private cloud for their own specific

More information

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud Contents Introduction... 3 What is VMware Cloud on AWS?... 5 Customer Benefits of Adopting VMware Cloud on AWS... 6 VMware Cloud

More information

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014

COMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014 COMP6511A: Large-Scale Distributed Systems Windows Azure Lin Gu Hong Kong University of Science and Technology Spring, 2014 Cloud Systems Infrastructure as a (IaaS): basic compute and storage resources

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018

Cloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018 Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning

More information

Hedvig as backup target for Veeam

Hedvig as backup target for Veeam Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...

More information

How to Keep UP Through Digital Transformation with Next-Generation App Development

How to Keep UP Through Digital Transformation with Next-Generation App Development How to Keep UP Through Digital Transformation with Next-Generation App Development Peter Sjoberg Jon Olby A Look Back, A Look Forward Dedicated, data structure dependent, inefficient, virtualized Infrastructure

More information

Cloud Computing introduction

Cloud Computing introduction Cloud and Datacenter Networking Università degli Studi di Napoli Federico II Dipartimento di Ingegneria Elettrica e delle Tecnologie dell Informazione DIETI Laurea Magistrale in Ingegneria Informatica

More information

Top 40 Cloud Computing Interview Questions

Top 40 Cloud Computing Interview Questions Top 40 Cloud Computing Interview Questions 1) What are the advantages of using cloud computing? The advantages of using cloud computing are a) Data backup and storage of data b) Powerful server capabilities

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

Improving Blade Economics with Virtualization

Improving Blade Economics with Virtualization Improving Blade Economics with Virtualization John Kennedy Senior Systems Engineer VMware, Inc. jkennedy@vmware.com The agenda Description of Virtualization VMware Products Benefits of virtualization Overview

More information

The Future of Virtualization. Jeff Jennings Global Vice President Products & Solutions VMware

The Future of Virtualization. Jeff Jennings Global Vice President Products & Solutions VMware The Future of Virtualization Jeff Jennings Global Vice President Products & Solutions VMware From Virtual Infrastructure to VDC- Windows Linux Future Future Future lication Availability Security Scalability

More information

Turning Relational Database Tables into Spark Data Sources

Turning Relational Database Tables into Spark Data Sources Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following

More information

VMware vsphere 4. The Best Platform for Building Cloud Infrastructures

VMware vsphere 4. The Best Platform for Building Cloud Infrastructures Table of Contents Get the efficiency and low cost of cloud computing with uncompromising control over service levels and with the freedom of choice................ 3 Key Benefits........................................................

More information

@Pentaho #BigDataWebSeries

@Pentaho #BigDataWebSeries Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of

More information

IBM Cloud for VMware Solutions

IBM Cloud for VMware Solutions Introduction 2 IBM Cloud IBM Cloud for VMware Solutions Zeb Ahmed Senior Offering Manager VMware on IBM Cloud Mehran Hadipour Director Business Development - Zerto Internal Use Only Do not distribute 3

More information

EMC Business Continuity for Microsoft Applications

EMC Business Continuity for Microsoft Applications EMC Business Continuity for Microsoft Applications Enabled by EMC Celerra, EMC MirrorView/A, EMC Celerra Replicator, VMware Site Recovery Manager, and VMware vsphere 4 Copyright 2009 EMC Corporation. All

More information

CSE 444: Database Internals. Lecture 23 Spark

CSE 444: Database Internals. Lecture 23 Spark CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

1V0-621.testking. 1V VMware Certified Associate 6 - Data Center Virtualization Fundamentals Exam

1V0-621.testking.  1V VMware Certified Associate 6 - Data Center Virtualization Fundamentals Exam 1V0-621.testking Number: 1V0-621 Passing Score: 800 Time Limit: 120 min 1V0-621 VMware Certified Associate 6 - Data Center Virtualization Fundamentals Exam Exam A QUESTION 1 An administrator needs to gracefully

More information

Citrix Workspace Cloud

Citrix Workspace Cloud Citrix Workspace Cloud Roger Bösch Citrix Systems International GmbH Workspace Cloud is a NEW Citrix Management and Delivery Platform Customers Now Have a Spectrum of Workspace Delivery Options Done By

More information

YOUR APPLICATION S JOURNEY TO THE CLOUD. What s the best way to get cloud native capabilities for your existing applications?

YOUR APPLICATION S JOURNEY TO THE CLOUD. What s the best way to get cloud native capabilities for your existing applications? YOUR APPLICATION S JOURNEY TO THE CLOUD What s the best way to get cloud native capabilities for your existing applications? Introduction Moving applications to cloud is a priority for many IT organizations.

More information

Migration and Building of Data Centers in IBM SoftLayer

Migration and Building of Data Centers in IBM SoftLayer Migration and Building of Data Centers in IBM SoftLayer Advantages of IBM SoftLayer and RackWare Together IBM SoftLayer offers customers the advantage of migrating and building complex environments into

More information

Department of Digital Systems. Digital Communications and Networks. Master Thesis

Department of Digital Systems. Digital Communications and Networks. Master Thesis Department of Digital Systems Digital Communications and Networks Master Thesis Study of technologies/research systems for big scientific data analytics Surname/Name: Petsas Konstantinos Registration Number:

More information

Back To The Future - VMware Product Directions. Andre Kemp Sr. Product Marketing Manager Asia - Pacific

Back To The Future - VMware Product Directions. Andre Kemp Sr. Product Marketing Manager Asia - Pacific Back To The Future - VMware Product Directions Andre Kemp Sr. Product Marketing Manager Asia - Pacific Disclaimer This session contains product features that are currently under development. This session/overview

More information

VMware Join the Virtual Revolution! Brian McNeil VMware National Partner Business Manager

VMware Join the Virtual Revolution! Brian McNeil VMware National Partner Business Manager VMware Join the Virtual Revolution! Brian McNeil VMware National Partner Business Manager 1 VMware By the Numbers Year Founded Employees R&D Engineers with Advanced Degrees Technology Partners Channel

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

INFS 214: Introduction to Computing

INFS 214: Introduction to Computing INFS 214: Introduction to Computing Session 13 Cloud Computing Lecturer: Dr. Ebenezer Ankrah, Dept. of Information Studies Contact Information: eankrah@ug.edu.gh College of Education School of Continuing

More information

Developing Enterprise Cloud Solutions with Azure

Developing Enterprise Cloud Solutions with Azure Developing Enterprise Cloud Solutions with Azure Java Focused 5 Day Course AUDIENCE FORMAT Developers and Software Architects Instructor-led with hands-on labs LEVEL 300 COURSE DESCRIPTION This course

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Understanding Cloud Migration. Ruth Wilson, Data Center Services Executive

Understanding Cloud Migration. Ruth Wilson, Data Center Services Executive Understanding Cloud Migration Ruth Wilson, Data Center Services Executive rhwilson@us.ibm.com Migrating to a Cloud is similar to migrating data and applications between data centers with a few key differences

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

Process Orchestrator Releases Hard or Soft Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y.

Process Orchestrator Releases Hard or Soft Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y. This document describes the version and compatibility requirements for installing and upgrading Cisco Process Orchestrator. This document also provides information about the hardware platforms and software

More information

Welcome. Jeremy Poon Territory Manager, VMware

Welcome. Jeremy Poon Territory Manager, VMware Welcome Jeremy Poon Territory Manager, VMware Partner Recognition VMware Infrastructure The New Computing Platform Presented by: Yasser Elgammal Regional Director, VMware VMware: Who We Are World s leading

More information

Build your own Cloud on Christof Westhues

Build your own Cloud on Christof Westhues Build your own Cloud on Christof Westhues chwe@de.ibm.com IBM Big Data & Elastic Storage Tour Software Defined Infrastructure Roadshow December 2 4, 2014 New applications and IT are being built for Cloud

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Achieving Horizontal Scalability. Alain Houf Sales Engineer Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches

More information

Accelerating Digital Transformation with InterSystems IRIS and vsan

Accelerating Digital Transformation with InterSystems IRIS and vsan HCI2501BU Accelerating Digital Transformation with InterSystems IRIS and vsan Murray Oldfield, InterSystems Andreas Dieckow, InterSystems Christian Rauber, VMware #vmworld #HCI2501BU Disclaimer This presentation

More information

Hadoop Online Training

Hadoop Online Training Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the

More information

1V Number: 1V0-621 Passing Score: 800 Time Limit: 120 min. 1V0-621

1V Number: 1V0-621 Passing Score: 800 Time Limit: 120 min.  1V0-621 1V0-621 Number: 1V0-621 Passing Score: 800 Time Limit: 120 min 1V0-621 VMware Certified Associate 6 - Data Center Virtualization Fundamentals Exam Exam A QUESTION 1 Which tab in the vsphere Web Client

More information

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung

HDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per

More information

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES 1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB

More information