Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Size: px
Start display at page:

Download "Big Data on AWS. Peter-Mark Verwoerd Solutions Architect"

Transcription

1 Big Data on AWS Peter-Mark Verwoerd Solutions Architect

2 What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing vs. high latency processing Technical: Concepts above Big Data reference architectures and design patterns

3 The World is Producing Ever-Larger Volumes of Big Data EB ZB IT/ Application server logs IT Infrastructure logs, Metering, Audit logs, Change logs Web sites / Mobile Apps/ Ads Clickstream, User Engagement GB TB PB Sensor data Weather, Smart Grids, Wearables Social Media, User Content 450MM+ Tweets/day

4 Big Data : Best Served Fresh Big Data Hourly server logs: how your systems were misbehaving an hour ago Weekly / Monthly Bill: What you spent this past billing cycle? Daily customer-preferences report from your web-site s click stream: tells you what deal or ad to try next time Daily fraud reports: tells you if there was fraud yesterday Real-time Big Data CloudWatch metrics: what just went wrong now Real-time spending alerts/caps: guaranteeing you can t overspend Real-time analysis: tells you what to offer the current customer now Real-time detection: blocks fraudulent use now

5 The Challenge Data Big Data Real-time Big Data = Plethora of tools

6 The Zoo Hive Pig Shark Impala Apache Kafka Storm Hadoop/ EMR DynamoDB Apache Spark Amazon Kinesis Apache Flume HDFS Redshift Apache Spark Streaming S3?

7 Partners Flume, Sqoop HParser

8 Simplify Ingest Store Process Visualize Kafka S3 Hive/Pig Hadoop/ EMR Tableau Data Kinesis DynamoDB Shark Spark Jaspersoft Answers Flume HDFS Storm Scribe Redshift Spark Streaming

9 Ingest Data Ingest

10 Ingest Ingest Ingest The act of collecting and storing data

11 Why Data Ingest Tools? Collect random and high velocity data Many different sources High TPS Collecting random and high velocity data is a challenging task Hard to durably store data at scale Hard to keep highly available Hard to scale

12 Kafka Or Kinesis Why Data Ingest Tools? Data ingest tools convert random streams of data into fewer set of sequential streams Sequential streams are easier to process Easier to scale Easier to persist Processing Processing

13 Data Ingest Tools Facebook Scribe Data collectors Amazon Kinesis Data collectors Apache Kafka Data collectors Apache Flume Data Movement and Transformation

14 Partners Data Load and Transformation Flume, Sqoop HParser Big Data Edition

15 Storage Data Ingest Store

16 Storage Structured Complex Query SQL Amazon RDS (MySQL, Oracle, SQL Server, Postgres) Data Warehouse Amazon Redshift Search Amazon CloudSearch Unstructured Custom Query Hadoop/HDFS Amazon Elastic MapReduce (EMR) Structured Simple Query NoSQL Amazon DynamoDB Cache Amazon ElastiCache (Memcached, Redis) Unstructured No Query Cloud Storage Amazon S3 Amazon Glacier

17 Structure Low Amazon S3 Amazon Glacier High Amazon ElastiCache Amazon DynamoDB Amazon RDS Amazon EMR Amazon Redshift High High Low Low Request rate Cost/GB Latency Data Volume Low Low High High

18 Elasti- Cache Amazon DynamoDB Amazon RDS Cloud Search Amazon Redshift Amazon EMR (Hive) Amazon S3 Amazon Glacier Average latency ms ms ms,sec ms,sec sec,min sec,min, hrs ms,sec,min (~ size) hrs Data volume GB GB TBs (no limit) GB TB (3 TB Max) GB TB TB PB (1.6 PB max) GB PB (~nodes) GB PB (no limit) GB PB (no limit) Item size B-KB KB (64 KB max) KB (~rowsize) KB (1 MB max) KB (64 K max) KB-MB KB-GB (5 TB max) Request rate Very High Very High High High Low Low Low Very High (no limit) GB (40 TB max) Very Low (no limit) Cost ($/GB/month) $$ $ Durability Low - Moderate Very High High High High High Very High Very High

19 Process Data Ingest Store Process

20 Process Answering questions about data Questions Analytics: Think SQL/Data warehouse Classification: Think Sentiment Analysis Predication: Think page-views Prediction Etc

21 Processing Frameworks Generally come in two major types Batch processing Stream processing

22 Processing Frameworks Batch Processing Take large amount (>100TB) of cold data and ask questions Takes hours to get answers back Example: Generating Monthly AWS Billing Reports

23 Processing Frameworks Stream Processing (aka. Real-time) Take small amount of hot data and ask questions Takes short amount of time to get your answer back Example: Cloudwatch 1min metrics

24 Processing Frameworks Hadoop/EMR Batch Processing Spark Batch Processing Spark Streaming Stream Processing Storm Stream Processing Redshift Batch Processing

25 Partners Advanced Analytics Impala

26 Visualize Data Ingest Store Process Visualize

27 What is the oil consumption of USA per day? What is the average oil consumption per day of Europe? Are there any outliers? Which country consumes the most oil? What countries are oil exporters? Is there a trend of increasing oil consumption over time? What is the rage of oil production? Order countries by oil consumption/production? What is the distribution of oil producing countries? Is there a cluster of oil producers? Activities of Data Visualization Users

28

29 Partners BI & Data Visualization

30 Putting it all together (coupled architecture) Ingest/Store and processing tightly coupled Examples: S3 + EMR/Hadoop HDFS + EMR/Hadoop S3 + Redshift

31 Putting it all together (coupled architecture) Coupled systems provide Less flexibility Cold data vs. Hot High latency processing vs. Low latency processing Example EMR+HDFS/S3 Cold: Can handle processing 100 records/sec Hot: processing records/sec?? Redshift + S3 High latency: Generate reports once a day Low latency: Generate reports every minute

32 Putting it all together (de-coupled architecture) Multi-tier data processing architecture Similar to multi-tier web-application architectures Ingest & Store de-coupled from Processing Concept of databus Data Databus Process Answers

33 Putting it all together (de-coupled architecture) Ingest tools write to multiple data stores within data-bus Processing frameworks (Hadoop, Spark, etc) consume from databus Consumers can decide which data store to read from depending on their data processing requirement Ingest Store S3 Data Process Answers Kafka HDFS

34 Data temperature & processing latency

35 Pattern 1: Redshift (cold & high)

36 Pattern 2: DynamoDB (warm and low)

37 Pattern 3: Hadoop (cold and high)

38 Pattern 4: Hadoop (warm and low)

39 Pattern 5: Spark (cold and low)

40 Pattern 6: Stream Processing (hot and low)

41 Putting it All Together

42 What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing vs. high latency processing Technical: Concepts above Big Data reference architectures and design patterns

43 Questions?

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

Data Lake Best Practices

Data Lake Best Practices Data Lake Best Practices Agenda Why Data Lake Key Components of a Data Lake Modern Data Architecture Some Best Practices Case Study Summary Takeaways What is a Data Lake? What, why etc. What is a data

More information

2013 AWS Worldwide Public Sector Summit Washington, D.C.

2013 AWS Worldwide Public Sector Summit Washington, D.C. 2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively

More information

Real-time Streaming Applications on AWS Patterns and Use Cases

Real-time Streaming Applications on AWS Patterns and Use Cases Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,

More information

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Managing IoT and Time Series Data with Amazon ElastiCache for Redis Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

How can you implement this through a script that a scheduling daemon runs daily on the application servers? You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Best Practices and Performance Tuning on Amazon Elastic MapReduce Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or

More information

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing

More information

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI

More information

Big Data Hadoop Course Content

Big Data Hadoop Course Content Big Data Hadoop Course Content Topics covered in the training Introduction to Linux and Big Data Virtual Machine ( VM) Introduction/ Installation of VirtualBox and the Big Data VM Introduction to Linux

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

Store, Protect, Optimize Your Healthcare Data in AWS

Store, Protect, Optimize Your Healthcare Data in AWS Healthcare reform, increasing patient expectations, exponential data growth, and the threat of cyberattacks are forcing healthcare providers to re-evaluate their data management strategies. Healthcare

More information

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges

More information

Hadoop, Yarn and Beyond

Hadoop, Yarn and Beyond Hadoop, Yarn and Beyond 1 B. R A M A M U R T H Y Overview We learned about Hadoop1.x or the core. Just like Java evolved, Java core, Java 1.X, Java 2.. So on, software and systems evolve, naturally.. Lets

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services What s New at AWS? A selection of some new stuff Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services Speed of Innovation AWS Pace of Innovation AWS has been continually expanding its

More information

Streaming Data: The Opportunity & How to Work With It

Streaming Data: The Opportunity & How to Work With It Streaming Data: The Opportunity & How to Work With It Roger Barga, GM Amazon Kinesis April 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interest in and demand for stream

More information

AWS 101. Patrick Pierson, IonChannel

AWS 101. Patrick Pierson, IonChannel AWS 101 Patrick Pierson, IonChannel What is AWS? Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help

More information

TESTING BIG DATA WORLD RIGA. by Konstantin Pletenev OCTOBER, 2017, TAPOST GROW CONFIDENTLY

TESTING BIG DATA WORLD RIGA. by Konstantin Pletenev OCTOBER, 2017, TAPOST GROW CONFIDENTLY RIGA TESTING BIG DATA WORLD by Konstantin Pletenev OCTOBER, 2017, TAPOST GROW CONFIDENTLY BIG DATA IS NOT ABOUT THE DATA THE REVOLUTION IS NOT THAT THERE S MORE DATA AVAILABLE THE REVOLUTION IS THAT WE

More information

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI) The Certificate in Software Development Life Cycle in BIGDATA, Business Intelligence and Tableau program

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

A Deeper Look into Amazon Kinesis

A Deeper Look into Amazon Kinesis A Deeper Look into Amazon Kinesis Shawn Gandhi, Solutions Architect @shawnagram AWS Meetup NYC 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

AWS Storage Optimization. AWS Whitepaper

AWS Storage Optimization. AWS Whitepaper AWS Storage Optimization AWS Whitepaper AWS Storage Optimization: AWS Whitepaper Copyright 2018 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress

More information

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture

Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Delving Deep into Hadoop Course Contents Introduction to Hadoop and Architecture Hadoop 1.0 Architecture Introduction to Hadoop & Big Data Hadoop Evolution Hadoop Architecture Networking Concepts Use cases

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger -

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger - Werden Sie ein Teil von Internet der Dinge auf AWS AWS Enterprise Summit 2015 Dr. Markus Schmidberger - schmidbe@amazon.de Internet of Things is the network of physical objects or "things" embedded with

More information

Integrating Splunk with AWS services:

Integrating Splunk with AWS services: Integrating Splunk with AWS services: Using Redshi+, Elas0c Map Reduce (EMR), Amazon Machine Learning & S3 to gain ac0onable insights via predic0ve analy0cs via Splunk Patrick Shumate Solutions Architect,

More information

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype? Big data hype? Big Data: Hype or Hallelujah? Data Base and Data Mining Group of 2 Google Flu trends On the Internet February 2010 detected flu outbreak two weeks ahead of CDC data Nowcasting http://www.internetlivestats.com/

More information

Big Data Hadoop Stack

Big Data Hadoop Stack Big Data Hadoop Stack Lecture #1 Hadoop Beginnings What is Hadoop? Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware

More information

Gabriel Villa. Architecting an Analytics Solution on AWS

Gabriel Villa. Architecting an Analytics Solution on AWS Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate

More information

Data Lake Based Systems that Work

Data Lake Based Systems that Work Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

<Insert Picture Here> Introduction to Big Data Technology

<Insert Picture Here> Introduction to Big Data Technology Introduction to Big Data Technology The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Energy Management with AWS

Energy Management with AWS Energy Management with AWS Kyle Hart and Nandakumar Sreenivasan Amazon Web Services August [XX], 2017 Tampa Convention Center Tampa, Florida What is Cloud? The NIST Definition Broad Network Access On-Demand

More information

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a) Cloudera s Developer Training for Apache Spark and Hadoop delivers the key concepts and expertise need to develop high-performance

More information

The Technology of the Business Data Lake. Appendix

The Technology of the Business Data Lake. Appendix The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform

More information

About Intellipaat. About the Course. Why Take This Course?

About Intellipaat. About the Course. Why Take This Course? About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over

More information

Hadoop. Introduction / Overview

Hadoop. Introduction / Overview Hadoop Introduction / Overview Preface We will use these PowerPoint slides to guide us through our topic. Expect 15 minute segments of lecture Expect 1-4 hour lab segments Expect minimal pretty pictures

More information

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect AWS IoT Overview July 2016 Thomas Jones, Partner Solutions Architect AWS customers are connecting physical things to the cloud in every industry imaginable. Healthcare and Life Sciences Municipal Infrastructure

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

MONITORING SERVERLESS ARCHITECTURES

MONITORING SERVERLESS ARCHITECTURES MONITORING SERVERLESS ARCHITECTURES CAN YOU HELP WITH SOME PRODUCTION PROBLEMS? Your Manager (CC) Rachel Gardner Rafal Gancarz Lead Consultant @ OpenCredo WHAT IS SERVERLESS? (CC) theaucitron Cloud-native

More information

Amazon AWS-Solution-Architect-Associate Exam

Amazon AWS-Solution-Architect-Associate Exam Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?

More information

AWS Serverless Architecture Think Big

AWS Serverless Architecture Think Big MAKING BIG DATA COME ALIVE AWS Serverless Architecture Think Big Garrett Holbrook, Data Engineer Feb 1 st, 2017 Agenda What is Think Big? Example Project Walkthrough AWS Serverless 2 Think Big, a Teradata

More information

Amazon Linux: Operating System of the Cloud

Amazon Linux: Operating System of the Cloud Amazon Linux: Operating System of the Cloud Chris Schlaeger Director, Kernel and Operating Systems Managing Director, Amazon Development Center Germany GmbH How did Amazon get into Cloud Computing? We

More information

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,... Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

More information

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,... Data Ingestion ETL, Distcp, Kafka, OpenRefine, Query & Exploration SQL, Search, Cypher, Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

Microsoft Big Data and Hadoop

Microsoft Big Data and Hadoop Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Big Data and Object Storage

Big Data and Object Storage Big Data and Object Storage or where to store the cold and small data? Sven Bauernfeind Computacenter AG & Co. ohg, Consultancy Germany 28.02.2018 Munich Volume, Variety & Velocity + Analytics Velocity

More information

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture Big Data Syllabus Hadoop YARN Setup Programming in YARN framework j Understanding big data and Hadoop Big Data Limitations and Solutions of existing Data Analytics Architecture Hadoop Features Hadoop Ecosystem

More information

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem Zohar Elkayam www.realdbamagic.com Twitter: @realmgic Who am I? Zohar Elkayam, CTO at Brillix Programmer, DBA, team leader, database trainer,

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Modern ETL Tools for Cloud and Big Data Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Agenda Landscape Cloud ETL Tools Big Data ETL Tools Best Practices

More information

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services Serverless Computing Redefining the Cloud Roger S. Barga, Ph.D. General Manager Amazon Web Services Technology Triggers Highly Recommended http://a16z.com/2016/12/16/the-end-of-cloud-computing/ Serverless

More information

AWS Agility + Splunk Visibility = Cloud Success. Splunk App for AWS Demo. Laura Ripans, AWS Alliance Manager

AWS Agility + Splunk Visibility = Cloud Success. Splunk App for AWS Demo. Laura Ripans, AWS Alliance Manager AWS Agility + Splunk Visibility = Cloud Success Splunk App for AWS Demo Laura Ripans, AWS Alliance Manager Disruptive innovation and business transformation starts with data I HAVE BEEN GIVEN AN AWS ACCOUNT!!!

More information

@Pentaho #BigDataWebSeries

@Pentaho #BigDataWebSeries Enterprise Data Warehouse Optimization with Hadoop Big Data @Pentaho #BigDataWebSeries Your Hosts Today Dave Henry SVP Enterprise Solutions Davy Nys VP EMEA & APAC 2 Source/copyright: The Human Face of

More information

EXAM - AWS-Solution-Architect- Associate. AWS Certified Solutions Architect - Associate. Buy Full Product

EXAM - AWS-Solution-Architect- Associate. AWS Certified Solutions Architect - Associate. Buy Full Product Amazon EXAM - AWS-Solution-Architect- Associate AWS Certified Solutions Architect - Associate Buy Full Product http://www.examskey.com/aws-solution-architect- Associate.html Examskey Amazon AWS-Solution-Architect-Associate

More information

The age of Big Data Big Data for Oracle Database Professionals

The age of Big Data Big Data for Oracle Database Professionals The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

Hortonworks and The Internet of Things

Hortonworks and The Internet of Things Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded

More information

Innovatus Technologies

Innovatus Technologies HADOOP 2.X BIGDATA ANALYTICS 1. Java Overview of Java Classes and Objects Garbage Collection and Modifiers Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String

More information

Spatial Analytics Built for Big Data Platforms

Spatial Analytics Built for Big Data Platforms Spatial Analytics Built for Big Platforms Roberto Infante Software Development Manager, Spatial and Graph 1 Copyright 2011, Oracle and/or its affiliates. All rights Global Digital Growth The Internet of

More information

Certified Big Data Hadoop and Spark Scala Course Curriculum

Certified Big Data Hadoop and Spark Scala Course Curriculum Certified Big Data Hadoop and Spark Scala Course Curriculum The Certified Big Data Hadoop and Spark Scala course by DataFlair is a perfect blend of indepth theoretical knowledge and strong practical skills

More information

Big Data Infrastructure at Spotify

Big Data Infrastructure at Spotify Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure September 26, 2013 2 Who am I? According to ZDNet: "The work they have done to improve the Apache Hive data warehouse system

More information

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud WHITE PAPER / AUGUST 8, 2018 DISCLAIMER The following is intended to outline our general product direction. It is intended for

More information

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect AWS Storage Gateway Not your father s hybrid storage University of Arizona IT Summit 2017 Jay Vagalatos, AWS Solutions Architect October 23, 2017 The AWS Storage Portfolio Amazon EBS (persistent) Block

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may

More information

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera, How Apache Hadoop Complements Existing BI Systems Dr. Amr Awadallah Founder, CTO Cloudera, Inc. Twitter: @awadallah, @cloudera 2 The Problems with Current Data Systems BI Reports + Interactive Apps RDBMS

More information

Prototyping Data Intensive Apps: TrendingTopics.org

Prototyping Data Intensive Apps: TrendingTopics.org Prototyping Data Intensive Apps: TrendingTopics.org Pete Skomoroch Research Scientist at LinkedIn Consultant at Data Wrangling @peteskomoroch 09/29/09 1 Talk Outline TrendingTopics Overview Wikipedia Page

More information

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Tour of Database Platforms as a Service June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Bio Solutions Architect at Pythian Specialize high performance data processing and analytics 15 years

More information

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated. AWS Storage Amazon EFS Amazon EBS Amazon EC2 Instance storage Amazon S3 Amazon Glacier AWS Storage Gateway File Block Object Hybrid integrated storage Amazon S3 Amazon Glacier Amazon EBS Amazon EFS Durable

More information

8/3/17. Encryption and Decryption centralized Single point of contact First line of defense. Bishop

8/3/17. Encryption and Decryption centralized Single point of contact First line of defense. Bishop Bishop Encryption and Decryption centralized Single point of contact First line of defense If working with VPC Creation and management of security groups Provides additional networking and security options

More information

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA Keywords: Big Data, Oracle Big Data Appliance, Hadoop, NoSQL, Oracle

More information

Capture Business Opportunities from Systems of Record and Systems of Innovation

Capture Business Opportunities from Systems of Record and Systems of Innovation Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information

More information

Windows Azure Overview

Windows Azure Overview Windows Azure Overview Christine Collet, Genoveva Vargas-Solar Grenoble INP, France MS Azure Educator Grant Packaged Software Infrastructure (as a Service) Platform (as a Service) Software (as a Service)

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr

Cisco and Cloudera Deliver WorldClass Solutions for Powering the Enterprise Data Hub alerts, etc. Organizations need the right technology and infrastr Solution Overview Cisco UCS Integrated Infrastructure for Big Data and Analytics with Cloudera Enterprise Bring faster performance and scalability for big data analytics. Highlights Proven platform for

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

BIG DATA TESTING: A UNIFIED VIEW

BIG DATA TESTING: A UNIFIED VIEW http://core.ecu.edu/strg BIG DATA TESTING: A UNIFIED VIEW BY NAM THAI ECU, Computer Science Department, March 16, 2016 2/30 PRESENTATION CONTENT 1. Overview of Big Data A. 5 V s of Big Data B. Data generation

More information

Reactive Microservices Architecture on AWS

Reactive Microservices Architecture on AWS Reactive Microservices Architecture on AWS Sascha Möllering Solutions Architect, @sascha242, Amazon Web Services Germany GmbH Why are we here today? https://secure.flickr.com/photos/mgifford/4525333972

More information

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements

More information

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Big Data Hadoop Developer Course Content Who is the target audience? Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours Complete beginners who want to learn Big Data Hadoop Professionals

More information

Oracle Big Data Fundamentals Ed 2

Oracle Big Data Fundamentals Ed 2 Oracle University Contact Us: 1.800.529.0165 Oracle Big Data Fundamentals Ed 2 Duration: 5 Days What you will learn In the Oracle Big Data Fundamentals course, you learn about big data, the technologies

More information

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions 1) A company is storing an access key (access key ID and secret access key) in a text file on a custom AMI. The company uses the access key to access DynamoDB tables from instances created from the AMI.

More information