Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt
|
|
- Agnes Dorthy Gray
- 5 years ago
- Views:
Transcription
1 Report on The Infrastructure for Implementing the Mobile Technologies for Data Collection in Egypt Date: 10 Sep, 2017 Draft v 4.0
2 Table of Contents 1. Introduction Infrastructure Reference Architecture Current Status of CPI-Related Solutions Targeted Data Management Continuum Current Infrastructure Architecture Targeted Solution Architecture Recommendations for Applications and Data Management Main Recommended Components Estimated Hi-Level Sizing and Specifications Conclusion and Next Actions References Page 2 of 23
3 1. Introduction Realizing the advantages of using mobile technology for data collection and statistical production, the United Nations Economic Commission for Africa (ECA) is implementing a series of pilot projects on strengthening the capacity of African countries to use mobile technologies to collect data for effective policy and decision making. The pilot projects are designed to be executed by the National Statistical Office (NSO) in collaboration with a Training and Research Institute (TRI) designated by the NSO. The main partner in the project is the NSO in Egypt, called the Central Agency for Public Mobilization and Statistics (CAPMAS). CAPMAS has in turn designated Nile University as the TRI. The main objectives of the pilot project are as follows: Strengthen the capacity of country to collect data with mobile technology Experiment with self enumeration using mobile devices to collect data and determine the suitability of such data for the production of statistics; Strengthen working relationship between NSO and TRI in statistical development. The focus of this report is to support CAPMAS to install and/or upgrade technical infrastructure, including computer servers and software to receive data from the project and integrate into standard statistical processes in Egypt, as well as to acquire handheld devices. Based on several meetings and assessment events with CAPMAS team, the current infrastructure and the targeted upgrades has been illustrated in this report. At the end, sizing estimates along with recommendations for Big Data components and platform has been made. The main infrastructure achievement at CAPMAS is the virtualized data center which is recommended to be upgraded further to Cloud Computing platform. The National Institute of Standards and Technology (NIST) Cloud reference architecture is recommend to be sued to achieve a private cloud computing platform for this purpose. Page 3 of 23
4 2. Infrastructure Reference Architecture For the sacked of standardizing the infrastructure design for the project, a suitable reference architecture need to be used. As the cloud computing provides several benefits and at the same time exiting data center provide a solid foundation for such approach, The National Institute of Standards and Technology (NIST) Cloud reference architecture will be used as detailed in reference 2, following are key points. The Architectural Components of the NIST Reference Architecture describes the important aspects of service deployment and service orchestration. The overall service management of the cloud is acknowledged as an important element in the scheme of the architecture. Business Support mechanisms are in place to recognize customer management issues like contracts, accounting and pricing and are vital to cloud computing. Following figure presents an overview of the NIST cloud computing reference architecture, which identifies the major actors, their activities and functions in cloud computing. The diagram depicts a generic high-level architecture and is intended to facilitate the understanding of the requirements, uses, characteristics and standards of cloud computing. Page 4 of 23
5 The NIST cloud computing definition is widely accepted as a valuable contribution toward providing a clear understanding of cloud computing technologies and cloud services. It provides a simple and unambiguous taxonomy of three service models available to cloud consumers: cloud software as a service (SaaS), cloud platform as a service (PaaS), and cloud infrastructure as a service (IaaS). It also summarizes four deployment models describing how the computing infrastructure that delivers these services can be shared: private cloud, community cloud, public cloud, and hybrid cloud. Finally, the NIST definition also provides a unifying view of five essential characteristics that all cloud services exhibit: ondemand self-service, broad network access, resource pooling, rapid elasticity, and measured service. The NIST cloud computing reference architecture defines five major actors: cloud consumer, cloud provider, cloud carrier, cloud auditor and cloud broker. Each actor is an entity (a person or an organization) that participates in a transaction or process and/or performs tasks in cloud computing. Following table briefly lists the actors defined in the NIST cloud computing reference architecture: Actor Cloud Consumer Cloud Provider Cloud Auditor Cloud Broker Cloud Carrier Definition A person or organization that maintains a business relationship with, and uses service from, Cloud Providers A person, organization, or entity responsible for making a service available to interested parties A party that can conduct independent assessment of cloud services, information system operations, performance and security of the cloud implementation An entity that manages the use, performance and delivery of cloud services, and negotiates relationships between Cloud Providers and Cloud Consumers An intermediary that provides connectivity and transport of cloud services from Cloud Providers to Cloud Consumers Page 5 of 23
6 Our focus in this solution will be on the Private Cloud Model that need to be in place at CAPMAS as infrastructure of the mobile data collection applications as well as back end processing technologies. NIST defines A private cloud to give a single Cloud Consumer s organization the exclusive access to and usage of the infrastructure and computational resources. It may be managed either by the Cloud Consumer organization or by a third party, and may be hosted on the organization s premises (i.e. on-site private clouds) or outsourced to a hosting company (i.e. outsourced private clouds). Page 6 of 23
7 3. Current Status of CPI-Related Solutions Currently, there is neither dedicated infrastructure for CPI related processing at CAPMAS nor back end processing components like database engines or big data platforms to handle data processing, transformation and modeling. Most work is done either manually or collected to spread sheets for processing and estimation of CPI and intermediate statistics and KPIs. The following statistics provided by CAPMAS illustrates the workload for the CPI process in terms of effort needed by involved members: KPI Measure Description Number of Researchers Number of Supervisors Number of Researchers per Supervisor Overall number of governorates Overall number of regions Overall Number of markets Number of markets per region Number of markets per researcher About About Not specified One region to a one researcher Filed persons assigned to collected data from the different markets Filed person assigned to manage filed operation of researchers The average number of researchers being supervised by a supervisor Governorates where filed operation takes place Regions where markets are located for collecting prices Markets where prices are being collected Number of markets per regions where operation takes place Number of markets assigned during one month to single researcher Page 7 of 23
8 Number of forms per researcher Number of products per form Number of branch reviewers Number of head office reviewers products Number of forms to be completed by a researcher in one month Number of products the researcher need to get prices for per each single form Number of reviewers assigned to review the collected prices for each branch office Number of reviewers at the head office responsible for the final review of prices collected from all filed operations Page 8 of 23
9 4. Targeted Data Management Continuum The effectiveness of mobile data collection solution for the CPI Process requires the exitance of enterprise data management platform that is capable of handling collected data in integrated, secured and accessible way so that collaborative model among researchers, supervisors and CAPMAS branches, central departments and CPI departments can be achieved. The current situation in the CPI process at CAPMAS lacks for such enterprise data management platform hence most of the process is done manually through paper forms except for the final analysis which is conducted using excel sheets or local desktop software prohibiting the value of collaborative data models. The target platform and infrastructure should fulfill the following main requirements split by each phase of the data management continuum: Data Collection: enables automating the data sourcing, review, approval and consolidation using automated process through the workflow embedded into the mobile application for the filed researchers and their supervisors. Page 9 of 23
10 Data Aggregation: the sourced data from the mobile applications after review and approval needed to be aggregated properly into the backend database through direct connection and predefined rules defined by the CPI department. Data Matching: ability to extract external data and maintain master data while provide ability to query date using predefined queries as well as ad-hoc queries. At the same time, enable augmenting CPI data with other data like spatial and geolocation data. Data Quality: provide means for checking data quality and validation during the collection process and post collection while reviewing on the back-office processing and applying standard CPI statistical analysis. Data Persistence: retain and organize data for as long time as possible while provides capabilities of multi structured data to save the cost of storage. Data Consolidation: assemble data entities integrated into the back-end systems with flexible meta data management to ensure accessibility by specific roles. Data Distribution: enable analysis tools to access, retrieve and communicate data in an intuitive way suitable to each level of CPI employees as well as structured for branches access and top management reporting. The new model proposed to be implemented in the pilot project will address the above requirements for each area targeting an integrated data management platform that enables data integration, collaboration, retention using most recent big data management technologies. Transfer data directly to secured servers managed internally by CAPMAS including the following features: End-to-end encryption using existing CAPMAS telecommunication infrastructure. Reliable simultaneous connections to CAPMAS datacentre servers. Online/offline synchronization. GIS Integration. Multilanguage. Could architecture be used by all surveys and by all statistical processes. Could architecture be easily used to handle the self-enumeration concept. Page 10 of 23
11 5. Current Infrastructure Architecture At CAPMAS, virtualized data center infrastructure is used widely for other applications which can be leveraged for the CPI project with some modifications and upgrades as per the next sections. The current infrastructure is based on VMWare virtualization technologies as details in reference 3 main points are following. VMware Infrastructure includes the following components as shown in above figure: VMware ESX Server A production-proven virtualization layer run on physical servers that abstract processor, memory, storage and networking resources to be provisioned to multiple virtual machines VMware Virtual Machine File System (VMFS) A high-performance cluster file system for virtual machines Page 11 of 23
12 VMware Virtual Symmetric Multi-Processing (SMP) Enables a single virtual machine to use multiple physical processors simultaneously VirtualCenter Management Server The central point for configuring, provisioning and managing virtualized IT infrastructure Virtual Infrastructure Client (VI Client) An interface that allows administrators and users to connect remotely to the Virtual Center Management Server or individual ESX Server installations from any Windows PC Virtual Infrastructure Web Access A Web interface for virtual machine management and remote consoles access VMware VMotion Enables the live migration of running virtual machines from one physical server to another with zero downtime, continuous service availability and complete transaction integrity Page 12 of 23
13 VMware High Availability (HA) Provides easy-to-use, cost effective high availability for applications running in virtual machines. In the event of server failure, affected virtual machines are automatically restarted on other production servers that have spare capacity VMware Distributed Resource Scheduler (DRS) Intelligently allocates and balances computing capacity dynamically across collections of hardware resources for virtual machines VMware Consolidated Backup Provides an easy to use, centralized facility for agentfree backup of virtual machines. It simplifies backup administration and reduces the load on ESX Server installations VMware Infrastructure SDK Provides a standard interface for VMware and third-party solutions to access VMware Infrastructure Page 13 of 23
14 6. Targeted Solution Architecture While leveraging the current virtualized infrastructure using a cloud computing model is the designated approach, the target infrastructure has several roles in running the mobile data collection solution to work smoothly as planned. Those roles including as per reference 4: Support the tabled mobile application communications for field researcher and supervisor applications. Enable hosting and running the REST APIs and associated data services developed for the mobile application data interfacing. Provide Big Data capabilities for long term data retention and high-performance computing. For supporting the tabled mobile application communications for field researcher and supervisor applications, following figure shows the communications topology: System Communication Diagram Page 14 of 23
15 The tablet devices are connected a 4G broadband cellular network The end-to-end communication between field devices and the back-end server is done through a Virtual Private Network (VPN) tunneling to ensure data security. Due to communication limitation, tablet devices should alternate between Online and Offline modes In Offline mode, the tablet device can still gather and store data and save them locally on a local database that resides on the tablet In Online mode, the device can synchronize the local and central database, send and receive messages and perform all other functions that require connectivity. On the other side, for enabling hosting and running the REST APIs and associated data services developed for the mobile application data interfacing, following figure shows the main tablet mobile applications system components and data flow: Mobile Tablet Applications System Modules Diagram Providing Big Data capabilities for long term data retention and high-performance computing will be covered in next section. Page 15 of 23
16 7. Recommendations for Applications and Data Management In the previous section on the tablet mobile application system components, the CAPMAS Backend Server is the landing space for collected data through the field researchers and supervisors. To provide Big Data capabilities for long term data retention and high-performance computing, and receiving additional data like self-enumeration and external sources integration, additional services will be integrated beneath the backend server receiving tabled data. The following features will be attained through the additional services: # Feature Description 1 Distributed Data Management Data will be stored in distributed blocks on several nodes enables granular management, scalability and highperformance computing. 2 Distributed Processing Aggregation, transformation, statistical analysis, data modeling will be implemented on a distributed application framework to enable high performance scalable resilient computing. 3 Batch Loading Enable ingestion of accumulated data into batches for long frequency loads. 4 Streaming Loading Enables ingesting data into small frequent streams of data in the form of pipeline of messages or transactions. 5 In Memory Processing Running data analysis in selected set of data in memory for faster processing and manipulation. 6 Data Science Modeling Specialized libraries that implements machine learning, deep learning, statistical modeling, data mining and analysis operations atop of the data platform 7 Graph Analysis Components that enable big graph implementation and network analysis models. Page 16 of 23
17 8. Main Recommended Components Based on the previous sections of current status and targeted requirements, several component need to be installed to achieve needed upgrades of exiting infrastructure. The following sections describes recommended components subject to review during the implementation of infrastructure upgrades and setup: - VMware vcloud Suite Leverage the current virtualized infrastructure into cloud management. vcloud Suite is an integrated offering that brings together VMware s industry-leading vsphere hypervisor and VMware vrealize Suite multi-vendor hybrid cloud management platform. VMware s new portable licensing units allow vcloud Suite to build and manage vsphere-based private clouds. Accelerate application delivery across both traditional and container based applications by giving developers the freedom to use the tools that make them most productive while still ensuring that applications can be moved seamlessly from developer laptop to production. - Apache Hadoop Distributed File System (HDFS) Distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. - Apache YARN The fundamental idea of YARN is to split up the functionalities of resource management and job scheduling/monitoring into separate daemons. The idea is to have a global Resource Manager (RM) and per-application Application Master (AM). An application is either a single job or a DAG of jobs. The Resource Manager and the Node Manager form the data-computation framework. The Resource Manager is the ultimate authority that arbitrates resources among all the applications in the system. The Node Manager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Resource Manager/Scheduler. The per-application Application Master is, in effect, a framework specific library and is tasked with negotiating resources from the Resource Manager and working with the Node Manager(s) to execute and monitor the tasks. Page 17 of 23
18 - Apache Spark A fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. - Apache Hive Data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. - Apache HBase Provides random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. Apache HBase is an open-source, distributed, versioned, nonrelational database modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, Apache HBase provides Bigtable-like capabilities on top of Hadoop and HDFS. - Apache Oozie Oozie is a workflow scheduler system to manage Apache Hadoop jobs. Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions. Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability. Integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts). Oozie is a scalable, reliable and extensible system. - Apache Tez building an application framework which allows for a complex directed-acyclic-graph of tasks for processing data. It is currently built atop Apache Hadoop YARN. Provides expressive dataflow definition APIs, flexible Input-Processor-Output runtime model, data type agnostic, Simplifying deployment, performance gains over Map Reduce, optimal resource management, plan reconfiguration at runtime and dynamic physical data flow decisions Page 18 of 23
19 - Apache Flume A distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application. - Apache Sqoop A tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS. Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance. - MongoDB A document database with the scalability and flexibility that you want with the querying and indexing that you need. MongoDB stores data in flexible, JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time. Will be used a document store for unstructured data. - PostgreSQL A powerful SQL based database engine that will be used for landing mobile tablet applications collected data working behind the data services of the REST APIs. It provides extensive high-performance processing as well as special capabilities like GIS data handling. Page 19 of 23
20 9. Estimated Hi-Level Sizing and Specifications The following table lists the estimated sizing for the infrastructure required for deploying and running the for mentioned components. Sizing will be revised during the implementation taking advantage from the cloud approach deployed on top of the virtualized infrastructure at CAPMAS data center: # VM Function Estimated Node Sizing 1 2 x Name Nodes 4 Cores 3.0 GHz 16 GB RAM 200 GB Storage Linux OS 2 2 x Resource Scheduling Nodes 4 Cores 3.0 GHz 16 GB RAM 200 GB Storage Linux OS 3 8 x Worker Nodes 2 Cores 3.0 GHz 8 GB RAM 500 GB Storage Linux OS 4 2 x Document Services Nodes 4 Cores 3.0 GHz 16 GB RAM 500 GB Storage Linux OS 5 2 x REST APIs Hosting Nodes 4 Cores 3.0 GHz 16 GB RAM 100 GB Storage Linux OS Page 20 of 23
21 6 2 x Central Database Nodes 4 Cores 3.0 GHz 16 GB RAM 500 GB Disk Space Linux OS 7 2 x Back Office Applications 4 Cores 3.0 GHz 8 GB RAM 200 GB Disk Space Windows Server Page 21 of 23
22 10. Conclusion and Next Actions The achievement of virtualized infrastructure at CAPMAS is paving the way for building solid foundation for the mobile data collection solution as well as other potential data solutions and integration with external data sources. To leverage this achievement two main additional layers need to be build: Extending Virtualization to Cloud Platform Deploying Big Data Management Platform Next Actions would include commencing in implementing plan for the two above items where implementation team need to be invited while ensuring complete know-how transfer to CAPMAS team specially on the Big Data management solutions as well as extending the backend capabilities to support the mobile data collection solution as the main focus of this pilot project. Page 22 of 23
23 11. References 1- UNECA CAPMAS Nile University Letter of Agreement (LoA). 2- Cloud Computing Reference Architecture: Recommendations of the National Institute of Standards and Technology 3- VMware Virtualization Documentation 4- CAPMAS Pricing Tablet Application Requirements and Design Document. 5- VMware vcloud Suite 6- Apache Hadoop Main Page Page 23 of 23
Cloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationBig Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours
Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals
More informationBig Data Hadoop Stack
Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware
More informationConfiguring and Deploying Hadoop Cluster Deployment Templates
Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page
More informationBig Data Analytics using Apache Hadoop and Spark with Scala
Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationIBM Data Science Experience White paper. SparkR. Transforming R into a tool for big data analytics
IBM Data Science Experience White paper R Transforming R into a tool for big data analytics 2 R Executive summary This white paper introduces R, a package for the R statistical programming language that
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationHadoop An Overview. - Socrates CCDH
Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationGain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.
Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources
More informationHadoop. Introduction / Overview
Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures
More informationMicrosoft Big Data and Hadoop
Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common
More informationInnovatus Technologies
HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String
More informationFLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM
FLORIDA DEPARTMENT OF TRANSPORTATION PRODUCTION BIG DATA PLATFORM RECOMMENDATION AND JUSTIFACTION Executive Summary: VHB has been tasked by the Florida Department of Transportation District Five to design
More informationDATA SCIENCE USING SPARK: AN INTRODUCTION
DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data
More informationIntroduction to Big-Data
Introduction to Big-Data Ms.N.D.Sonwane 1, Mr.S.P.Taley 2 1 Assistant Professor, Computer Science & Engineering, DBACER, Maharashtra, India 2 Assistant Professor, Information Technology, DBACER, Maharashtra,
More informationOracle Big Data Connectors
Oracle Big Data Connectors Oracle Big Data Connectors is a software suite that integrates processing in Apache Hadoop distributions with operations in Oracle Database. It enables the use of Hadoop to process
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationEMC Virtual Infrastructure for Microsoft Applications Data Center Solution
EMC Virtual Infrastructure for Microsoft Applications Data Center Solution Enabled by EMC Symmetrix V-Max and Reference Architecture EMC Global Solutions Copyright and Trademark Information Copyright 2009
More informationChapter 4. Fundamental Concepts and Models
Chapter 4. Fundamental Concepts and Models 4.1 Roles and Boundaries 4.2 Cloud Characteristics 4.3 Cloud Delivery Models 4.4 Cloud Deployment Models The upcoming sections cover introductory topic areas
More informationHadoop Development Introduction
Hadoop Development Introduction What is Bigdata? Evolution of Bigdata Types of Data and their Significance Need for Bigdata Analytics Why Bigdata with Hadoop? History of Hadoop Why Hadoop is in demand
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationMyCloud Computing Business computing in the cloud, ready to go in minutes
MyCloud Computing Business computing in the cloud, ready to go in minutes In today s dynamic environment, businesses need to be able to respond quickly to changing demands. Using virtualised computing
More informationCopyright 2015 EMC Corporation. All rights reserved. Published in the USA.
This Reference Architecture Guide describes, in summary, a solution that enables IT organizations to quickly and effectively provision and manage Oracle Database as a Service (DBaaS) on Federation Enterprise
More informationCLOUD COMPUTING. Lecture 4: Introductory lecture for cloud computing. By: Latifa ALrashed. Networks and Communication Department
1 CLOUD COMPUTING Networks and Communication Department Lecture 4: Introductory lecture for cloud computing By: Latifa ALrashed Outline 2 Introduction to the cloud comupting Define the concept of cloud
More informationVMware vsphere with ESX 6 and vcenter 6
VMware vsphere with ESX 6 and vcenter 6 Course VM-06 5 Days Instructor-led, Hands-on Course Description This class is a 5-day intense introduction to virtualization using VMware s immensely popular vsphere
More informationTopics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples
Hadoop Introduction 1 Topics Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples 2 Big Data Analytics What is Big Data?
More informationA Glimpse of the Hadoop Echosystem
A Glimpse of the Hadoop Echosystem 1 Hadoop Echosystem A cluster is shared among several users in an organization Different services HDFS and MapReduce provide the lower layers of the infrastructures Other
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationWHITE PAPER SEPTEMBER VMWARE vsphere AND vsphere WITH OPERATIONS MANAGEMENT. Licensing, Pricing and Packaging
WHITE PAPER SEPTEMBER 2017 VMWARE vsphere AND vsphere WITH OPERATIONS MANAGEMENT Licensing, Pricing and Packaging Table of Contents Executive Summary 3 VMware vsphere with Operations Management Overview
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationDelving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture
Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases
More informationThe Future of Virtualization Desktop to the Datacentre. Raghu Raghuram Vice President Product and Solutions VMware
The Future of Virtualization Desktop to the Datacentre Raghu Raghuram Vice President Product and Solutions VMware Virtualization- Desktop to the Datacentre VDC- vcloud vclient With our partners, we are
More informationIntroduction to BigData, Hadoop:-
Introduction to BigData, Hadoop:- Big Data Introduction: Hadoop Introduction What is Hadoop? Why Hadoop? Hadoop History. Different types of Components in Hadoop? HDFS, MapReduce, PIG, Hive, SQOOP, HBASE,
More informationHortonworks Data Platform
Hortonworks Data Platform Workflow Management (August 31, 2017) docs.hortonworks.com Hortonworks Data Platform: Workflow Management Copyright 2012-2017 Hortonworks, Inc. Some rights reserved. The Hortonworks
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationHow Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,
How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS
More informationCustomer Case Studies on Accelerating Their Path to Hybrid Cloud
Customer Case Studies on Accelerating Their Path to Hybrid Cloud Hitachi and VMware: Global Strategic Partners Committed to Success Sunny Sahajpal EMEA Strategic Alliances and OEM Mananger VMware Partner
More informationCisco Integration Platform
Data Sheet Cisco Integration Platform The Cisco Integration Platform fuels new business agility and innovation by linking data and services from any application - inside the enterprise and out. Product
More informationCOPYRIGHTED MATERIAL. Introducing VMware Infrastructure 3. Chapter 1
Mccain c01.tex V3-04/16/2008 5:22am Page 1 Chapter 1 Introducing VMware Infrastructure 3 VMware Infrastructure 3 (VI3) is the most widely used virtualization platform available today. The lineup of products
More informationExam Questions
Exam Questions 70-775 Perform Data Engineering on Microsoft Azure HDInsight (beta) https://www.2passeasy.com/dumps/70-775/ NEW QUESTION 1 You are implementing a batch processing solution by using Azure
More informationCloud Computing 3. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 3 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationCloud + Big Data Putting it all Together
Cloud + Big Data Putting it all Together Even Solberg 2009 VMware Inc. All rights reserved 2 Big, Fast and Flexible Data Big Big Data Processing Fast OLTP workloads Flexible Document Object Big Data Analytics
More informationCloud Services. Introduction
Introduction adi Digital have developed a resilient, secure, flexible, high availability Software as a Service (SaaS) cloud platform. This Platform provides a simple to use, cost effective and convenient
More informationIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce Antonino Virgillito THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large
More informationAbstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight
ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More information7 Things ISVs Must Know About Virtualization
7 Things ISVs Must Know About Virtualization July 2010 VIRTUALIZATION BENEFITS REPORT Table of Contents Executive Summary...1 Introduction...1 1. Applications just run!...2 2. Performance is excellent...2
More informationCertified Big Data Hadoop and Spark Scala Course Curriculum
Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills
More informationBlended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)
Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance
More informationOPENSTACK PRIVATE CLOUD WITH GITHUB
OPENSTACK PRIVATE CLOUD WITH GITHUB Kiran Gurbani 1 Abstract Today, with rapid growth of the cloud computing technology, enterprises and organizations need to build their private cloud for their own specific
More informationEBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud
EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud Contents Introduction... 3 What is VMware Cloud on AWS?... 5 Customer Benefits of Adopting VMware Cloud on AWS... 6 VMware Cloud
More informationCOMP6511A: Large-Scale Distributed Systems. Windows Azure. Lin Gu. Hong Kong University of Science and Technology Spring, 2014
COMP6511A: Large-Scale Distributed Systems Windows Azure Lin Gu Hong Kong University of Science and Technology Spring, 2014 Cloud Systems Infrastructure as a (IaaS): basic compute and storage resources
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationCloud Computing 2. CSCI 4850/5850 High-Performance Computing Spring 2018
Cloud Computing 2 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning
More informationHedvig as backup target for Veeam
Hedvig as backup target for Veeam Solution Whitepaper Version 1.0 April 2018 Table of contents Executive overview... 3 Introduction... 3 Solution components... 4 Hedvig... 4 Hedvig Virtual Disk (vdisk)...
More informationHow to Keep UP Through Digital Transformation with Next-Generation App Development
How to Keep UP Through Digital Transformation with Next-Generation App Development Peter Sjoberg Jon Olby A Look Back, A Look Forward Dedicated, data structure dependent, inefficient, virtualized Infrastructure
More informationCloud Computing introduction
Cloud and Datacenter Networking Università degli Studi di Napoli Federico II Dipartimento di Ingegneria Elettrica e delle Tecnologie dell Informazione DIETI Laurea Magistrale in Ingegneria Informatica
More informationTop 40 Cloud Computing Interview Questions
Top 40 Cloud Computing Interview Questions 1) What are the advantages of using cloud computing? The advantages of using cloud computing are a) Data backup and storage of data b) Powerful server capabilities
More informationOracle Big Data Fundamentals Ed 2
Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies
More informationImproving Blade Economics with Virtualization
Improving Blade Economics with Virtualization John Kennedy Senior Systems Engineer VMware, Inc. jkennedy@vmware.com The agenda Description of Virtualization VMware Products Benefits of virtualization Overview
More informationThe Future of Virtualization. Jeff Jennings Global Vice President Products & Solutions VMware
The Future of Virtualization Jeff Jennings Global Vice President Products & Solutions VMware From Virtual Infrastructure to VDC- Windows Linux Future Future Future lication Availability Security Scalability
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationVMware vsphere 4. The Best Platform for Building Cloud Infrastructures
Table of Contents Get the efficiency and low cost of cloud computing with uncompromising control over service levels and with the freedom of choice................ 3 Key Benefits........................................................
More information@Pentaho #BigDataWebSeries
Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of
More informationIBM Cloud for VMware Solutions
Introduction 2 IBM Cloud IBM Cloud for VMware Solutions Zeb Ahmed Senior Offering Manager VMware on IBM Cloud Mehran Hadipour Director Business Development - Zerto Internal Use Only Do not distribute 3
More informationEMC Business Continuity for Microsoft Applications
EMC Business Continuity for Microsoft Applications Enabled by EMC Celerra, EMC MirrorView/A, EMC Celerra Replicator, VMware Site Recovery Manager, and VMware vsphere 4 Copyright 2009 EMC Corporation. All
More informationCSE 444: Database Internals. Lecture 23 Spark
CSE 444: Database Internals Lecture 23 Spark References Spark is an open source system from Berkeley Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. Matei
More information2/26/2017. Originally developed at the University of California - Berkeley's AMPLab
Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second
More information1V0-621.testking. 1V VMware Certified Associate 6 - Data Center Virtualization Fundamentals Exam
1V0-621.testking Number: 1V0-621 Passing Score: 800 Time Limit: 120 min 1V0-621 VMware Certified Associate 6 - Data Center Virtualization Fundamentals Exam Exam A QUESTION 1 An administrator needs to gracefully
More informationCitrix Workspace Cloud
Citrix Workspace Cloud Roger Bösch Citrix Systems International GmbH Workspace Cloud is a NEW Citrix Management and Delivery Platform Customers Now Have a Spectrum of Workspace Delivery Options Done By
More informationYOUR APPLICATION S JOURNEY TO THE CLOUD. What s the best way to get cloud native capabilities for your existing applications?
YOUR APPLICATION S JOURNEY TO THE CLOUD What s the best way to get cloud native capabilities for your existing applications? Introduction Moving applications to cloud is a priority for many IT organizations.
More informationMigration and Building of Data Centers in IBM SoftLayer
Migration and Building of Data Centers in IBM SoftLayer Advantages of IBM SoftLayer and RackWare Together IBM SoftLayer offers customers the advantage of migrating and building complex environments into
More informationDepartment of Digital Systems. Digital Communications and Networks. Master Thesis
Department of Digital Systems Digital Communications and Networks Master Thesis Study of technologies/research systems for big scientific data analytics Surname/Name: Petsas Konstantinos Registration Number:
More informationBack To The Future - VMware Product Directions. Andre Kemp Sr. Product Marketing Manager Asia - Pacific
Back To The Future - VMware Product Directions Andre Kemp Sr. Product Marketing Manager Asia - Pacific Disclaimer This session contains product features that are currently under development. This session/overview
More informationVMware Join the Virtual Revolution! Brian McNeil VMware National Partner Business Manager
VMware Join the Virtual Revolution! Brian McNeil VMware National Partner Business Manager 1 VMware By the Numbers Year Founded Employees R&D Engineers with Advanced Degrees Technology Partners Channel
More informationBig Data Hadoop Course Content
Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux
More informationINFS 214: Introduction to Computing
INFS 214: Introduction to Computing Session 13 Cloud Computing Lecturer: Dr. Ebenezer Ankrah, Dept. of Information Studies Contact Information: eankrah@ug.edu.gh College of Education School of Continuing
More informationDeveloping Enterprise Cloud Solutions with Azure
Developing Enterprise Cloud Solutions with Azure Java Focused 5 Day Course AUDIENCE FORMAT Developers and Software Architects Instructor-led with hands-on labs LEVEL 300 COURSE DESCRIPTION This course
More informationThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,
More informationUnderstanding Cloud Migration. Ruth Wilson, Data Center Services Executive
Understanding Cloud Migration Ruth Wilson, Data Center Services Executive rhwilson@us.ibm.com Migrating to a Cloud is similar to migrating data and applications between data centers with a few key differences
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More informationProcess Orchestrator Releases Hard or Soft Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y.
This document describes the version and compatibility requirements for installing and upgrading Cisco Process Orchestrator. This document also provides information about the hardware platforms and software
More informationWelcome. Jeremy Poon Territory Manager, VMware
Welcome Jeremy Poon Territory Manager, VMware Partner Recognition VMware Infrastructure The New Computing Platform Presented by: Yasser Elgammal Regional Director, VMware VMware: Who We Are World s leading
More informationBuild your own Cloud on Christof Westhues
Build your own Cloud on Christof Westhues chwe@de.ibm.com IBM Big Data & Elastic Storage Tour Software Defined Infrastructure Roadshow December 2 4, 2014 New applications and IT are being built for Cloud
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationAchieving Horizontal Scalability. Alain Houf Sales Engineer
Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches
More informationAccelerating Digital Transformation with InterSystems IRIS and vsan
HCI2501BU Accelerating Digital Transformation with InterSystems IRIS and vsan Murray Oldfield, InterSystems Andreas Dieckow, InterSystems Christian Rauber, VMware #vmworld #HCI2501BU Disclaimer This presentation
More informationHadoop Online Training
Hadoop Online Training IQ training facility offers Hadoop Online Training. Our Hadoop trainers come with vast work experience and teaching skills. Our Hadoop training online is regarded as the one of the
More information1V Number: 1V0-621 Passing Score: 800 Time Limit: 120 min. 1V0-621
1V0-621 Number: 1V0-621 Passing Score: 800 Time Limit: 120 min 1V0-621 VMware Certified Associate 6 - Data Center Virtualization Fundamentals Exam Exam A QUESTION 1 Which tab in the vsphere Web Client
More informationHDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung
HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per
More informationTHE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES
1 THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES Vincent Garonne, Mario Lassnig, Martin Barisits, Thomas Beermann, Ralph Vigne, Cedric Serfon Vincent.Garonne@cern.ch ph-adp-ddm-lab@cern.ch XLDB
More information