Best Practices and Performance Tuning on Amazon Elastic MapReduce

Size: px
Start display at page:

Download "Best Practices and Performance Tuning on Amazon Elastic MapReduce"

Transcription

1 Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ , Amazon Web Services, Inc. or its Affiliates. All rights reserved.

2 Challenge: Data is Everywhere - Sensors - Devices - Logs - Operations -

3 Size: Growing in Petabytes If every byte was a word, and you take a second to read a word, it will take you 32 million years to read a whole Petabyte Migrate & Collect Store &Transform Process & Analyze Visualize & Predict

4 How do we deal with it?

5 Strategy: Divide and Conquer Hadoop Hadoop on EC2 Amazon EMR Hadoop has to scale? Managing on your own can be a LOT of work A Managed Hadoop Framework in the Cloud

6 Tool: Amazon EMR

7 Agenda Challenge: Data is Everywhere Size: Growing in PBs Strategy: Divide & Conquer Tool: Amazon EMR Why EMR? Well Architected EMR Design for Production

8 Why EMR?

9 Why EMR? Automated Decoupled Elastic Integrated Current Low-cost

10 Why EMR? : Automation EC2 Provisioning Cluster Setup Hadoop Configuration Monitoring and Failure Handling Job submission Installing Applications

11 Why EMR? : Decoupled Architecture Separate compute and storage Resize and shutdown with no data loss Point multiple clusters at the same data on Amazon S3 Save with spot and reserved instances HDFS for iterative and disk I/O intensive workloads Easily evolve infrastructure as technology evolves

12 Why EMR?: Elastic Core nodes for scaling HDFS Task nodes for scaling processing power Intelligent resize: Wait for work to finish before stopping instances Use instance groups: to manage different instance types in the same cluster

13 Why EMR?: Current Application Open source release EMR release Spark 1.5 September 9, 2015 September 2015 Spark November 9, 2015 November 2015 Spark 1.6 January 4, 2016 January 2016 Spark March 9, 2016 April 2016

14 Why EMR?: Integration with other services Amazon KMS Amazon Kinesis Amazon Redshift Amazon S3 Amazon DynamoDB IAM for Authentication AWS Data Pipeline

15 Why EMR?: Low-cost Spot instances: Bid for unused EC2s at up to 90% less price Reserved instances: For persistent clusters, make use of EC2 reserved instances to save up to 50% Transient clusters: Terminate the cluster when not in use

16 Agenda Challenge: Data is Everywhere Size: Growing in PBs Strategy: Divide & Conquer Tool: Amazon EMR Why EMR? -Automated -Decoupled -Elastic -Integrated -Current -Cost-efficient Well Architected EMR Design for Production

17 Well-Architected Amazon EMR

18 Well Architected EMR: Design for Production Performance Reliability Security Cost efficiency

19 Performance Efficiency Choice of Instance Type Choice of Storage Choice of application or framework

20 Performance: Choice of instance type - Master Less than 50 nodes? No Heavy network I/O M3.2xlarge or larger YES YES M3.xlarge C3 family or R3 with Enhanced networking

21 Performance: Choice of instance type - Workers General Compute Memory Storage M3 Family M1 Family C1 Family C3 Family CC1.4xlarge CC2.8xlarge M2 Family R3 Family Cr1.8xlarge D2 Family I2 Family Batch Process Machine Learning Interactive Analysis Large HDFS

22 How many nodes do I need?

23 Performance: Sizing - How many tasks will you have to execute? - Based on data size, number of files/splits - Based on the job you are evaluating - When do you need to get done? - How much parallelism do you need? - How many tasks can each node process in parallel?

24 Performance: Sizing - Which other jobs are you running? - How much data do you need to store locally? - Which files are reused 3x or more?

25 Performance: Sizing Guidelines: - Size based on HDFS storage first if needed - Add enough (task) nodes to handle processing - Do not add more than 5 tasks nodes per core node - Prefer smaller clusters of larger machines

26 Performance: Tune it e.g. for Spark Enable Dynamic Allocation of executors Executor Memory Executor Cores YARN containers Driver Memory Deployment mode on YARN (Cluster Client?)

27 Performance: Choice of storage EMRFS (S3) HDFS - Instance storage - Amazon EBS - Ability to decouple - Reliable and durable - Cost efficient - Works well for jobs that read a dataset once per run. - Need a persistent cluster - Reliability is configurable. But need multiple nodes to achieve replication factor. - Great for jobs with iterative reads on the same dataset like machine learning Combine with S3DistCp and move data from S3 once to HDFS for iterative workloads

28 Performance: S3 vs HDFS at Netflix

29 S3: Range GET Data locality doesn t matter? EMR worker nodes GET Range 0-64MB GET Range MB GET Range MB GET Range (n-64)-nmb S3 object (use larger files)

30 S3: Avoid sequential key names Amo/ var/ Server/ B S V W UTF-8 Binary ordering 100/s 100/s 100/s S3 partition S3 partition S3 partition

31

32 S3: Avoid sequential key names

33 S3: Real world heavy EMRFS users 1.2TB/Day logs 30TB /Day data 250 Hadoop Jobs 75Billion transactions/day 5 Petabytes of Data 10+PB Data Warehouse on Amazon S3 > 1PB read each day

34 Performance: Compression Always compress data on S3 to reduce bandwidth & cost.

35 Performance: Choice of Framework - Embarrassingly parallel? - Can be optimized with a DAG? Count these words A B Count = 1 These = 1 Words = 1 C D Count = 1 These = 1 Words = 1 E

36 Reliability Externalize Hive Metastore Data and Applications on S3 Automate - Store your metadata outside the cluster on RDS - Multi-AZ RDS cluster will give you HA - Source of truth for data - Store config & applications on S3, too - Consistent View - Bootstraps - Config options - Cloudformation

37 Reliability: Hive Metastore on RDS MySQL [ { "Classification": "hive-site", "Properties": { "javax.jdo.option.connectionurl": "jdbc:mysql:\/\/emr-hive- metastore.cauttwbz9zri.us-east- 1.rds.amazonaws.com:3306\/hive?createDatabaseIfNotExist=true", ] "javax.jdo.option.connectionusername": "admin", "javax.jdo.option.connectionpassword": "Passw0rd!" }, "configurations" : [] } Config file lives on s3://<bucketname>/hive-meta-config.json

38 Security Encryption IAM Roles VPC - Server side - Client side - HDFS Transparent - RPC with SSL - File system with LUKS - Secure Integration with AWS services - Private Subnets - S3 endpoints - NAT

39 Security Encryption - Server side - Client side - HDFS Transparent - RPC with SSL - File system with LUKS S3 Server-side encryption (SSE): - Using S3-managed keys or - Using KMS-managed keys (SSE-KMS) - Configured at cluster start time

40 Security: Server-Side Encryption (SSE-KMS) AWS KMS 0utput writes via EMRFS with Client-side Encryption enabled Amazon S3 Bucket EMRFS with Client-side Encryption Amazon S3 Bucket

41 Security Encryption - Server side - Client side - HDFS Transparent - RPC with SSL - File system with LUKS S3 Server-side encryption (SSE-KMS): 1. Define Customer Master Key in KMS 2. Create key policy that allows EMR s instance profile / role to use this key 3. Configure cluster with CMK KeyID or Alias

42 Security: Sample Key Policy { "Sid": "Allow use of the key", } [ ] "Effect": "Allow", "Principal": { "AWS": [ }, "Action": [ "arn:aws:iam::xxxxxxxxxxxx:role/emr_defaultrole"] "kms:encrypt", "kms:decrypt", "kms:reencrypt*", "kms:generatedatakey*", "kms:describekey" ], "Resource": "*, "Condition": { "ForAnyValue:StringLike": { "kms:encryptioncontext:aws:s3:arn": [ "arn:aws:s3:::bucket1/*", "arn:aws:s3:::bucket2/*" ] } }

43 Security: Enable SSE-KMS with selected CMK [ { "Classification":"emrfs-site", "Properties": { "fs.s3.enableserversideencryption": "true", "fs.s3.serversideencryption.kms.keyid":"a4567b ab a " } } ]

44 Security Encryption - Server side - Client side - HDFS Transparent - RPC with SSL - File system with LUKS Client-side encryption: - Using custom key materials provider - Configured at cluster start

45 Security: End to End Encryption Amazon S3 Bucket AWS S3 SDK AmazonS3EncryptionClient() AWS KMS 0utput writes via EMRFS with Client-side Encryption enabled Encrypted Object spark.ssl.enabled hadoop.rpc.protecti on hadoop.ssl.enabl ed mapreduce.shuffle.ssl.enabled Amazon S3 Bucket EMRFS with Client-side Encryption HDFS transparent encryption with Hadoop KMS LUKS with bootstrap action for local file systems

46 Cost Efficiency Using cost-effective resources S3 instead of HDFS for larger datasets? Taking advantage of Spot and Reserved instances? Optimize over time Monitor and watch out for new instance types, features that may reduce cost. Matching Supply and Demand Is the cluster big enough? Can we make it transient? Monitor the usage with Ganglia and Amazon CloudWatch alarms Spot instances: Bid for unused EC2s at up to 90% less price Reserved instances: For persistent clusters, make use of EC2 reserved instances to save up to 50% Transient clusters: Terminate the cluster when not in use

47 Agenda Challenge: Data is Everywhere Size: Growing in PBs Strategy: Divide & Conquer Tool: Amazon EMR Why EMR? -Automated -Decoupled -Elastic -Integrated -Current -Cost-efficient Well Architected EMR Design for Production -Performance tuning -Reliability measures -Security facts -Cost efficiency

48 Thank you!

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

How can you implement this through a script that a scheduling daemon runs daily on the application servers? You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups

More information

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI

More information

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect AWS Storage Gateway Not your father s hybrid storage University of Arizona IT Summit 2017 Jay Vagalatos, AWS Solutions Architect October 23, 2017 The AWS Storage Portfolio Amazon EBS (persistent) Block

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

2013 AWS Worldwide Public Sector Summit Washington, D.C.

2013 AWS Worldwide Public Sector Summit Washington, D.C. 2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic

More information

Amazon AWS-Solution-Architect-Associate Exam

Amazon AWS-Solution-Architect-Associate Exam Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect AWS IoT Overview July 2016 Thomas Jones, Partner Solutions Architect AWS customers are connecting physical things to the cloud in every industry imaginable. Healthcare and Life Sciences Municipal Infrastructure

More information

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing

More information

AWS Well Architected Framework

AWS Well Architected Framework AWS Well Architected Framework What We Will Cover The Well-Architected Framework Key Best Practices How to Get Started Resources Main Pillars Security Reliability Performance Efficiency Cost Optimization

More information

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions 1) A company is storing an access key (access key ID and secret access key) in a text file on a custom AMI. The company uses the access key to access DynamoDB tables from instances created from the AMI.

More information

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services What s New at AWS? A selection of some new stuff Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services Speed of Innovation AWS Pace of Innovation AWS has been continually expanding its

More information

CS15-319: Cloud Computing. Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud

CS15-319: Cloud Computing. Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud CS15-319: Cloud Computing Lecture 3 Course Project and Amazon AWS Majd Sakr and Mohammad Hammoud Lecture Outline Discussion On Course Project Amazon Web Services 2 Course Project Course Project Phase I-A

More information

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved.

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved. Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect Amazon.com, Inc. and its affiliates. All rights reserved. Learning about Cloud Computing with AWS What is Cloud Computing and

More information

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing

More information

About Intellipaat. About the Course. Why Take This Course?

About Intellipaat. About the Course. Why Take This Course? About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over

More information

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Integrating Splunk with AWS services:

Integrating Splunk with AWS services: Integrating Splunk with AWS services: Using Redshi+, Elas0c Map Reduce (EMR), Amazon Machine Learning & S3 to gain ac0onable insights via predic0ve analy0cs via Splunk Patrick Shumate Solutions Architect,

More information

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India (AWS) Overview: AWS is a cloud service from Amazon, which provides services in the form of building blocks, these building blocks can be used to create and deploy various types of application in the cloud.

More information

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger -

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger - Werden Sie ein Teil von Internet der Dinge auf AWS AWS Enterprise Summit 2015 Dr. Markus Schmidberger - schmidbe@amazon.de Internet of Things is the network of physical objects or "things" embedded with

More information

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

Quick Install for Amazon EMR

Quick Install for Amazon EMR Quick Install for Amazon EMR Version: 4.2 Doc Build Date: 11/15/2017 Copyright Trifacta Inc. 2017 - All Rights Reserved. CONFIDENTIAL These materials (the Documentation ) are the confidential and proprietary

More information

Cloud Computing. Amazon Web Services (AWS)

Cloud Computing. Amazon Web Services (AWS) Cloud Computing What is Cloud Computing? Benefit of cloud computing Overview of IAAS, PAAS, SAAS Types Of Cloud private, public & hybrid Amazon Web Services (AWS) Introduction to Cloud Computing. Introduction

More information

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Introduction to Cloud Computing A Short history Client Server Computing Concepts Challenges with Distributed Computing Introduction

More information

High School Technology Services myhsts.org Certification Courses

High School Technology Services myhsts.org Certification Courses AWS Associate certification training Last updated on June 2017 a- AWS Certified Solutions Architect (40 hours) Amazon Web Services (AWS) Certification is fast becoming the must have certificates for any

More information

CLOUD AND AWS TECHNICAL ESSENTIALS PLUS

CLOUD AND AWS TECHNICAL ESSENTIALS PLUS 1 P a g e CLOUD AND AWS TECHNICAL ESSENTIALS PLUS Contents Description... 2 Course Objectives... 2 Cloud computing essentials:... 2 Pre-Cloud and Need for Cloud:... 2 Cloud Computing and in-depth discussion...

More information

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS

SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS SAP VORA 1.4 on AWS - MARKETPLACE EDITION FREQUENTLY ASKED QUESTIONS 1. What is SAP Vora? SAP Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights

More information

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

LINUX, WINDOWS(MCSE),

LINUX, WINDOWS(MCSE), Virtualization Foundation Evolution of Virtualization Virtualization Basics Virtualization Types (Type1 & Type2) Virtualization Demo (VMware ESXi, Citrix Xenserver, Hyper-V, KVM) Cloud Computing Foundation

More information

Informatica Big Data Management on the AWS Cloud

Informatica Big Data Management on the AWS Cloud Informatica Big Data Management on the AWS Cloud Quick Start Reference Deployment November 2016 Andrew McIntyre, Informatica Big Data Management Team Santiago Cardenas, AWS Quick Start Reference Team Contents

More information

AWS Solutions Architect Exam Tips

AWS Solutions Architect Exam Tips AWS Solutions Architect Exam Tips This is not a brain dump! Questions and Answers are not given here, rather guidelines for further research, reviewing the Architecting on AWS courseware and AWS documentation.

More information

Data Lake Best Practices

Data Lake Best Practices Data Lake Best Practices Agenda Why Data Lake Key Components of a Data Lake Modern Data Architecture Some Best Practices Case Study Summary Takeaways What is a Data Lake? What, why etc. What is a data

More information

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions SAA-C01 AWS Solutions Architect Associate Exam Summary Syllabus Questions Table of Contents Introduction to SAA-C01 Exam on AWS Solutions Architect Associate... 2 AWS SAA-C01 Certification Details:...

More information

Amazon Web Services (AWS) Training Course Content

Amazon Web Services (AWS) Training Course Content Amazon Web Services (AWS) Training Course Content SECTION 1: CLOUD COMPUTING INTRODUCTION History of Cloud Computing Concept of Client Server Computing Distributed Computing and it s Challenges What is

More information

AWS 101. Patrick Pierson, IonChannel

AWS 101. Patrick Pierson, IonChannel AWS 101 Patrick Pierson, IonChannel What is AWS? Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help

More information

Informatica Data Lake Management on the AWS Cloud

Informatica Data Lake Management on the AWS Cloud Informatica Data Lake Management on the AWS Cloud Quick Start Reference Deployment January 2018 Informatica Big Data Team Vinod Shukla AWS Quick Start Reference Team Contents Overview... 2 Informatica

More information

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014

Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Cloudera s Enterprise Data Hub on the Amazon Web Services Cloud: Quick Start Reference Deployment October 2014 Karthik Krishnan Page 1 of 20 Table of Contents Table of Contents... 2 Abstract... 3 What

More information

Crypto-Options on AWS. Bertram Dorn Specialized Solutions Architect Security/Compliance Network/Databases Amazon Web Services Germany GmbH

Crypto-Options on AWS. Bertram Dorn Specialized Solutions Architect Security/Compliance Network/Databases Amazon Web Services Germany GmbH Crypto-Options on AWS Bertram Dorn Specialized Solutions Architect Security/Compliance Network/Databases Amazon Web Services Germany GmbH Amazon.com, Inc. and its affiliates. All rights reserved. Agenda

More information

8/3/17. Encryption and Decryption centralized Single point of contact First line of defense. Bishop

8/3/17. Encryption and Decryption centralized Single point of contact First line of defense. Bishop Bishop Encryption and Decryption centralized Single point of contact First line of defense If working with VPC Creation and management of security groups Provides additional networking and security options

More information

AWS Solution Architect Associate

AWS Solution Architect Associate AWS Solution Architect Associate 1. Introduction to Amazon Web Services Overview Introduction to Cloud Computing History of Amazon Web Services Why we should Care about Amazon Web Services Overview of

More information

Amazon Web Services Training. Training Topics:

Amazon Web Services Training. Training Topics: Amazon Web Services Training Training Topics: SECTION1: INTRODUCTION TO CLOUD COMPUTING A Short history Client Server Computing Concepts Challenges with Distributed Computing Introduction to Cloud Computing

More information

The Evolution of a Data Project

The Evolution of a Data Project The Evolution of a Data Project The Evolution of a Data Project Python script The Evolution of a Data Project Python script SQL on live DB The Evolution of a Data Project Python script SQL on live DB SQL

More information

Overview of AWS Security - Database Services

Overview of AWS Security - Database Services Overview of AWS Security - Database Services June 2016 (Please consult http://aws.amazon.com/security/ for the latest version of this paper) 2016, Amazon Web Services, Inc. or its affiliates. All rights

More information

The Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter

The Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter AWS Solutions Architect (Associate) Exam Course Manual Enter Linux Academy Keller, Texas United States of America March 31, 2017 To All Linux Academy Students: Welcome to Linux Academy's AWS Certified

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Real-time Streaming Applications on AWS Patterns and Use Cases

Real-time Streaming Applications on AWS Patterns and Use Cases Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,

More information

Implementing Informatica Big Data Management in an Amazon Cloud Environment

Implementing Informatica Big Data Management in an Amazon Cloud Environment Implementing Informatica Big Data Management in an Amazon Cloud Environment Copyright Informatica LLC 2017. Informatica LLC. Informatica, the Informatica logo, Informatica Big Data Management, and Informatica

More information

Managing Deep Learning Workflows

Managing Deep Learning Workflows Managing Deep Learning Workflows Deep Learning on AWS Batch treske@amazon.de September 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Understanding Data Understanding

More information

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services Serverless Computing Redefining the Cloud Roger S. Barga, Ph.D. General Manager Amazon Web Services Technology Triggers Highly Recommended http://a16z.com/2016/12/16/the-end-of-cloud-computing/ Serverless

More information

AWS Storage Optimization. AWS Whitepaper

AWS Storage Optimization. AWS Whitepaper AWS Storage Optimization AWS Whitepaper AWS Storage Optimization: AWS Whitepaper Copyright 2018 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress

More information

Pass4test Certification IT garanti, The Easy Way!

Pass4test Certification IT garanti, The Easy Way! Pass4test Certification IT garanti, The Easy Way! http://www.pass4test.fr Service de mise à jour gratuit pendant un an Exam : SOA-C01 Title : AWS Certified SysOps Administrator - Associate Vendor : Amazon

More information

Security Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance

Security Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance Security Aspekts on Services for Serverless Architectures Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance Agenda: Security in General Services in Scope Aspects of Services for

More information

Security & Compliance in the AWS Cloud. Amazon Web Services

Security & Compliance in the AWS Cloud. Amazon Web Services Security & Compliance in the AWS Cloud Amazon Web Services Our Culture Simple Security Controls Job Zero AWS Pace of Innovation AWS has been continually expanding its services to support virtually any

More information

Automating Elasticity. March 2018

Automating Elasticity. March 2018 Automating Elasticity March 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only. It represents AWS s current product

More information

AWS Certifications. Columbus Amazon Web Services Meetup - February 2018

AWS Certifications. Columbus Amazon Web Services Meetup - February 2018 AWS Certifications Columbus Amazon Web Services Meetup - February 2018 Presenter: Andrew May Senior Solutions Architect & Cloud Solutions Lead @ Leading EDJE Java developer since 2000 2 ½ years AWS experience

More information

Cloud Computing /AWS Course Content

Cloud Computing /AWS Course Content Cloud Computing /AWS Course Content 1. Amazon VPC What is Amazon VPC? How to Get Started with Amazon VPC Create New VPC Launch an instance (Server) to use this VPC Security in Your VPC Networking in Your

More information

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014 CS15-319 / 15-619 Cloud Computing Recitation 3 September 9 th & 11 th, 2014 Overview Last Week s Reflection --Project 1.1, Quiz 1, Unit 1 This Week s Schedule --Unit2 (module 3 & 4), Project 1.2 Questions

More information

Security & Compliance in the AWS Cloud. Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web

Security & Compliance in the AWS Cloud. Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web Security & Compliance in the AWS Cloud Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web Services @awscloud www.cloudsec.com #CLOUDSEC Security & Compliance in the AWS Cloud TECHNICAL & BUSINESS

More information

Migrating Existing Applications to AWS. Matt Tavis Principal Solutions Architect

Migrating Existing Applications to AWS. Matt Tavis Principal Solutions Architect Migrating Existing Applications to AWS Matt Tavis Principal Solutions Architect Planning on moving apps to the cloud? You have a lot to decide A Path to the Cloud Select apps Test platform Plan migration

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

Aurora, RDS, or On-Prem, Which is right for you

Aurora, RDS, or On-Prem, Which is right for you Aurora, RDS, or On-Prem, Which is right for you Kathy Gibbs Database Specialist TAM Katgibbs@amazon.com Santa Clara, California April 23th 25th, 2018 Agenda RDS Aurora EC2 On-Premise Wrap-up/Recommendation

More information

Integrate MATLAB Analytics into Enterprise Applications

Integrate MATLAB Analytics into Enterprise Applications Integrate Analytics into Enterprise Applications Aurélie Urbain MathWorks Consulting Services 2015 The MathWorks, Inc. 1 Data Analytics Workflow Data Acquisition Data Analytics Analytics Integration Business

More information

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved BERLIN 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Building Multi-Region Applications Jan Metzner, Solutions Architect Brian Wagner, Solutions Architect 2015, Amazon Web Services,

More information

Introduction to Cloud Computing

Introduction to Cloud Computing You will learn how to: Build and deploy cloud applications and develop an effective implementation strategy Leverage cloud vendors Amazon EC2 and Amazon S3 Exploit Software as a Service (SaaS) to optimize

More information

Store, Protect, Optimize Your Healthcare Data in AWS

Store, Protect, Optimize Your Healthcare Data in AWS Healthcare reform, increasing patient expectations, exponential data growth, and the threat of cyberattacks are forcing healthcare providers to re-evaluate their data management strategies. Healthcare

More information

Administration 1. DLM Administration. Date of Publish:

Administration 1. DLM Administration. Date of Publish: 1 DLM Administration Date of Publish: 2018-05-18 http://docs.hortonworks.com Contents Replication concepts... 3 HDFS cloud replication...3 Hive cloud replication... 3 Cloud replication guidelines and considerations...4

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

Scaling on AWS. From 1 to 10 Million Users. Matthias Jung, Solutions Architect

Scaling on AWS. From 1 to 10 Million Users. Matthias Jung, Solutions Architect Berlin 2015 Scaling on AWS From 1 to 10 Million Users Matthias Jung, Solutions Architect AWS @jungmats How to Scale? lot of results not the right starting point What is the right starting point? First

More information

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : Storage & Database Services : Introduction

More information

Serverless Architecture Hochskalierbare Anwendungen ohne Server. Sascha Möllering, Solutions Architect

Serverless Architecture Hochskalierbare Anwendungen ohne Server. Sascha Möllering, Solutions Architect Serverless Architecture Hochskalierbare Anwendungen ohne Server Sascha Möllering, Solutions Architect Agenda Serverless Architecture AWS Lambda Amazon API Gateway Amazon DynamoDB Amazon S3 Serverless Framework

More information

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

At Course Completion Prepares you as per certification requirements for AWS Developer Associate. [AWS-DAW]: AWS Cloud Developer Associate Workshop Length Delivery Method : 4 days : Instructor-led (Classroom) At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

More information

Gabriel Villa. Architecting an Analytics Solution on AWS

Gabriel Villa. Architecting an Analytics Solution on AWS Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate

More information

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros Data Clustering on the Parallel Hadoop MapReduce Model Dimitrios Verraros Overview The purpose of this thesis is to implement and benchmark the performance of a parallel K- means clustering algorithm on

More information

Compute - 36 PCPUs (72 vcpus) - Intel Xeon E5 2686 v4 (Broadwell) - 512GB RAM - 8 x 2TB NVMe local SSD - Dedicated Host vsphere Features - vsphere HA - vmotion - DRS - Elastic DRS Storage - ESXi boot-from-ebs

More information

Building a Microservices Platform, Patterns and Best Practices

Building a Microservices Platform, Patterns and Best Practices Building a Microservices Platform, Patterns and Best Practices Sascha Möllering, Solutions Architect, @sascha242 May 29th, 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What

More information

Amazon AWS-DevOps-Engineer-Professional Exam

Amazon AWS-DevOps-Engineer-Professional Exam Volume: 173 Questions Question: 1 What method should I use to author automation if I want to wait for a CloudFormation stack to finish completing in a script? A. Event subscription using SQS. B. Event

More information

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS

AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS AWS: Basic Architecture Session SUNEY SHARMA Solutions Architect: AWS suneys@amazon.com AWS Core Infrastructure and Services Traditional Infrastructure Amazon Web Services Security Security Firewalls ACLs

More information

Training on Amazon AWS Cloud Computing. Course Content

Training on Amazon AWS Cloud Computing. Course Content Training on Amazon AWS Cloud Computing Course Content 15 Amazon Web Services (AWS) Cloud Computing 1) Introduction to cloud computing Introduction to Cloud Computing Why Cloud Computing? Benefits of Cloud

More information

Protecting Your Data in AWS. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Protecting Your Data in AWS. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Protecting Your Data in AWS 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encrypting Data in AWS AWS Key Management Service, CloudHSM and other options What to expect from this

More information

Building a Modular and Scalable Virtual Network Architecture with Amazon VPC

Building a Modular and Scalable Virtual Network Architecture with Amazon VPC Building a Modular and Scalable Virtual Network Architecture with Amazon VPC Quick Start Reference Deployment Santiago Cardenas Solutions Architect, AWS Quick Start Reference Team August 2016 (revisions)

More information

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,

More information

Enroll Now to Take online Course Contact: Demo video By Chandra sir

Enroll Now to Take online Course   Contact: Demo video By Chandra sir Enroll Now to Take online Course www.vlrtraining.in/register-for-aws Contact:9059868766 9985269518 Demo video By Chandra sir www.youtube.com/watch?v=8pu1who2j_k Chandra sir Class 01 https://www.youtube.com/watch?v=fccgwstm-cc

More information

Processing of big data with Apache Spark

Processing of big data with Apache Spark Processing of big data with Apache Spark JavaSkop 18 Aleksandar Donevski AGENDA What is Apache Spark? Spark vs Hadoop MapReduce Application Requirements Example Architecture Application Challenges 2 WHAT

More information

4) An organization needs a data store to handle the following data types and access patterns:

4) An organization needs a data store to handle the following data types and access patterns: 1) A company needs to deploy a data lake solution for their data scientists in which all company data is accessible and stored in a central S3 bucket. The company segregates the data by business unit,

More information

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

PrepAwayExam.   High-efficient Exam Materials are the best high pass-rate Exam Dumps PrepAwayExam http://www.prepawayexam.com/ High-efficient Exam Materials are the best high pass-rate Exam Dumps Exam : SAA-C01 Title : AWS Certified Solutions Architect - Associate (Released February 2018)

More information

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)? What is Amazon Machine Image (AMI)? Amazon Elastic Compute Cloud (EC2)?

More information

AWS Security. Stephen E. Schmidt, Directeur de la Sécurité

AWS Security. Stephen E. Schmidt, Directeur de la Sécurité AWS Security Stephen E. Schmidt, Directeur de la Sécurité 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express

More information

Administration 1. DLM Administration. Date of Publish:

Administration 1. DLM Administration. Date of Publish: 1 DLM Administration Date of Publish: 2018-07-03 http://docs.hortonworks.com Contents ii Contents Replication Concepts... 4 HDFS cloud replication...4 Hive cloud replication... 4 Cloud replication guidelines

More information

DATA SCIENCE USING SPARK: AN INTRODUCTION

DATA SCIENCE USING SPARK: AN INTRODUCTION DATA SCIENCE USING SPARK: AN INTRODUCTION TOPICS COVERED Introduction to Spark Getting Started with Spark Programming in Spark Data Science with Spark What next? 2 DATA SCIENCE PROCESS Exploratory Data

More information

AWS Administration. Suggested Pre-requisites Basic IT Knowledge

AWS Administration. Suggested Pre-requisites Basic IT Knowledge Course Description Amazon Web Services Administration (AWS Administration) course starts your Cloud Journey. If you are planning to learn Cloud Computing and Amazon Web Services in particular, then this

More information

CIT 668: System Architecture. Amazon Web Services

CIT 668: System Architecture. Amazon Web Services CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions

More information

Amazon Search Services. Christoph Schmitter

Amazon Search Services. Christoph Schmitter Amazon Search Services Christoph Schmitter csc@amazon.de What we'll cover Overview of Amazon Search Services Understand the difference between Cloudsearch and Amazon ElasticSearch Service Q&A Amazon Search

More information

Corriendo R sobre un ambiente Serverless: Amazon Athena

Corriendo R sobre un ambiente Serverless: Amazon Athena Corriendo R sobre un ambiente Serverless: Amazon Athena Mauricio Muñoz Solutions Architect, AWS Chile April, 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Web Services

More information