Cloud Analytics and Business Intelligence on AWS

Similar documents
Energy Management with AWS

AWS 101. Patrick Pierson, IonChannel

Introduction to Database Services

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Security Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance

Security & Compliance in the AWS Cloud. Vijay Rangarajan Senior Cloud Architect, ASEAN Amazon Web

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Amazon AWS-Solution-Architect-Associate Exam

Security & Compliance in the AWS Cloud. Amazon Web Services

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Integrating Splunk with AWS services:

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Amazon Linux: Operating System of the Cloud

Expected Learning Outcomes Introduction To AWS

QLIK INTEGRATION WITH AMAZON REDSHIFT

Gabriel Villa. Architecting an Analytics Solution on AWS

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger -

CIT 668: System Architecture. Amazon Web Services

About Intellipaat. About the Course. Why Take This Course?

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

2013 AWS Worldwide Public Sector Summit Washington, D.C.

Vernetzte Fahrerassistenzsysteme (BMW + AWS ) Hazard Preview

Store, Protect, Optimize Your Healthcare Data in AWS

Overview of AWS Security - Database Services

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Amazon Elastic File System

Cloud Computing & Visualization

Data Lake Best Practices

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved.

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Scaling on AWS. From 1 to 10 Million Users. Matthias Jung, Solutions Architect

Grischa Baelden AWS Public Sector Account Manager, DACH. Brendan Bouffler. Worldwide Research and Technical Computing Lead

Real-time Streaming Applications on AWS Patterns and Use Cases

WHITEPAPER. MemSQL Enterprise Feature List

Lambda Architecture for Batch and Stream Processing. October 2018

Reactive Microservices Architecture on AWS

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Modern Data Warehouse The New Approach to Azure BI

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

MONITORING SERVERLESS ARCHITECTURES

Agenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

We are ready to serve Latest IT Trends, Are you ready to learn? New Batches Info

DURATION : 03 DAYS. same along with BI tools.

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

VOLTDB + HP VERTICA. page

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

AWS Solution Architecture Patterns

Aurora, RDS, or On-Prem, Which is right for you

The Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter

Microservices on AWS. Matthias Jung, Solutions Architect AWS

AWS Database Migration Service

AWS Well Architected Framework

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Achieving Horizontal Scalability. Alain Houf Sales Engineer

AWS_SOA-C00 Exam. Volume: 758 Questions

Enroll Now to Take online Course Contact: Demo video By Chandra sir

How to go serverless with AWS Lambda

AWS Solution Architect Associate

STATE OF MODERN APPLICATIONS IN THE CLOUD

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Approaching the Petabyte Analytic Database: What I learned

AWS Lambda: Event-driven Code in the Cloud

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Flash Storage Complementing a Data Lake for Real-Time Insight

AWS Administration. Suggested Pre-requisites Basic IT Knowledge

Performance Efficiency Pillar

Introduction to Amazon Cloud & EC2 Overview

Corriendo R sobre un ambiente Serverless: Amazon Athena

CLOUD ECONOMICS: HOW TO QUANTIFY THE BENEFITS OF MOVING TO THE CLOUD

Microservices Architekturen aufbauen, aber wie?

Cloud Computing. Amazon Web Services (AWS)

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

4) An organization needs a data store to handle the following data types and access patterns:

What is Cloud Computing? What are the Private and Public Clouds? What are IaaS, PaaS, and SaaS? What is the Amazon Web Services (AWS)?

microsoft

EXAM - AWS-Solution-Architect- Associate. AWS Certified Solutions Architect - Associate. Buy Full Product

Oracle WebLogic Server 12c on AWS. December 2018

BIG DATA COURSE CONTENT

Overview of Data Services and Streaming Data Solution with Azure

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

AWS Service Drill Downs

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Amazon Web Services Training. Training Topics:

Accelerating Digital Transformation with InterSystems IRIS and vsan

Transcription:

Cloud Analytics and Business Intelligence on AWS

Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse Data Pipeline App Services Queuing & Notifications Workflow App streaming Transcoding Email Search Deployment & Management One-click web app deployment Dev/ops resource management Resource Templates Code Deploy Code Pipeline Code Commit Mobile Services Identity Sync Mobile Analytics Push Notifications Administration & Security Identity Management Access Control Usage & Resource Tracking Service Catalog Key Storage & Management Monitoring and Logs Core Services Compute (VMs, Auto-scaling and Load Balancing) Storage (Object, Block and Archival) CDN Databases (Relational, NoSQL, Caching) Networking (VPC, DX, DNS) Infrastructure Regions Availability Zones Points of Presence

Availability 99.99% Durability 99.999999999% A Distributed Object Store Not a file system No Single Points of Failure Eventually consistent Simple Storage Service Highly scalable object storage for the internet 1 byte to 5TB in size 99.999999999% durability Paradigm Performance Redundancy Security Pricing Typical use case Object store Very Fast Across Availability Zones Public Key / Private Key $0.03/GB/month Write once, read many

S3 Performance & Scalability Reader Connections Amazon S3 provides near linear scalability 34 secs per terabyte S3 Streaming Performance 100 VMs; 9.6GB/s; $26/hr 350 VMs; 28.7GB/s; $90/hr GB/Second

Application Services Deployment & Administration App Services Compute Storage Networking Analytics Databas e AWS Global Infrastructure Amazon Kinesis Managed Service for Real Time Big Data Processing Create Streams to Produce & Consume Data Elastically Add and Remove Shards for Performance Use Kinesis Worker Library to Process Data

Amazon Kinesis AWS Endpoint Data Sources Data Sources Availability Zone Availability Zone Availability Zone App.1 [Aggregate & De-Duplicate] App.2 S3 Data Sources Shard 1 Shard 2 Shard N [Metric Extraction] DynamoDB Data Sources App.3 [Sliding Window Analysis] Data Sources App.4 [Machine Learning] Redshift

AWS Security Services Cloud HSM Dedicated Tenancy SafeNet Luna SA HSM Device Common Criteria EAL4+, NIST FIPS 140-2 Deployment & Administration App Services Compute Storage Networking Analytics Databas e AWS Global Infrastructure AWS Key Management Service Implemented on HSM Automated Key Rotation & Auditing Integration with other AWS Services AWS Server Side Encryption AWS Managed Key Infrastructure

Structured Data Management

Database RDS Redshift Dynamo DB Elasticache Deployment & Administration App Services Compute Storage Networking Analytics Database Relational Database Service Managed Oracle, MySQL & SQL Server Dynamo DB Managed NOSQL Database ElastiCache Managed In Memory Caching Amazon Redshift Massively Parallel Petabyte Scale Data Warehouse AWS Global Infrastructure

Database RDS Dynamo DB Redshift Elasticache Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure Relational Database Service Database-as-a-Service No need to install or manage database instances Scalable and fault tolerant configurations

Database RDS Dynamo DB Redshift Elasticache Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure DynamoDB Provisioned throughput NoSQL database Fast, predictable, configurable performance Fully distributed, fault tolerant HA architecture Integration with EMR & Hive

Dynamo Consistency Writes Writes are acknowledged (committed) once they exist in at least two physical data centers Writes are persisted to SSD Reads Tunable for Application Requirements No reduction in durability or consistency in order to achieve throughput Eventually Consistent Read Stale Values reads possible Highest Throughput Strongly Consistent Read No Stale Values read Lower Potential Throughput

Database RDS Dynamo DB Redshift Elasticache Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure Redshift Managed Massively Parallel Petabyte Scale Data Warehouse Streaming Backup/Restore to S3 Load data from S3, DynamoDB and EMR Extensive Security Features Scale from 160 GB -> 1.6 PB Online

Redshift Parallelizes Everything Common BI Tools Query JDBC/ ODBC Load Leader Node Backup 1 0 GigE Mesh Restore Compute Node Resize Compute Node Compute Node

Exploratory Analytics Data Cleansing Advanced Data Science

Managed Big Data Elastic MapReduce Deployment & Administration App Services Compute Storage Networking Analytics Databas e AWS Global Infrastructure Elastic MapReduce Managed, elastic Hadoop (1.x & 2.x) cluster Integrates with S3, DynamoDB and Redshift Install End User Tools Automatically (Spark, Impala) Support for EC2 Spot Instances

Vibrant Ecosystem Pig HDFS EMR

Weather Insurance for Farms Challenge: Volatile weather is deadly to crops like grapes Solution: Built a predictive model based on freely available data: 150B Soil Observations 60 years of crop data 200 TB of S3 Data 1M government Doppler radar points 850K Precision Rainfall Grids Tracked 3M Daily Weather Measurements 50 EMR clusters process new data as it comes into S3 each day, continuously updating the model

Choose your instance types General m3 family CPU c3 family cc2.8xlarge d2 family Memory m2 family r3 family Disk/IO d2 family i2 family ETL Machine Learning Spark HDFS Try different configurations to find the optimal cost/performance balance

Custom Intel Xeon processors for AWS C4 = highest performing EC2 instances New EC2 Instances C4

The Financial Industry Regulatory Authority 30 Billion Market Events / Day Objective to react to changing Market Dynamics Amazon Elastic MapReduce & Amazon S3 $10-20M Savings by moving Platform to AWS

Event Processing Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure AWS Lambda Fully Managed Event Processor Node.js, Integrated AWS SDK & ImageMagick Natively Compile & Install Node.js modules Specify Runtime RAM & Timeout Automatically Scaled to support Event Volume Events from S3, Dynamo DB, Kinesis & Lambda Integrated CloudWatch Logging

Introducing Amazon Machine Learning SDE expertise Easily create machine learning models Visualize and optimize models Put models into production in seconds Machine Learning expertise Battle-hardened technology

Easy to Use, High Performance Train and optimize models on GBs of data Batch process predictions Real-time prediction API in one-click No servers to provision or manage

Developing with Amazon Machine Learning 1 2 3 Build model Validate & optimize Make predictions

Building a Predictive Model with Amazon Machine Learning Use existing data in S3, Redshift and RDS Automatic data visualization & exploration Descriptive and summary statistics Your data doesn t have to be perfect Missing data, malformed data records, type validation

Model Validation and Optimization Tools

Making Predictions with Amazon Machine Learning Batch predictions Asynchronous predictions with trained model Real time predictions Synchronous, low latency, high throughput Mount API end-point with a single click

Traditional Business Intelligence OLAP Data Sources for ML

Managed Data Warehouse RDS Dynamo DB Redshift ElastiCache Deployment & Administration App Services Compute Storage Networking Analytics Databas e AWS Global Infrastructure Redshift Managed Massively Parallel Petabyte Scale Data Warehouse Streaming Backup/Restore to S3 Load data from S3, DynamoDB and EMR Extensive Security Features Scale from 160 GB -> 1.6 PB Online

Redshift lets you start small and grow big Extra Large Node (dw1.xl & dw2.xl) 3 spindles, 15GiB RAM 2 virtual cores, 10GigE Single Node (160GB SSD or 2TB Magnetic) Cluster 2-32 Nodes (320GB SSD 64TB Magnetic) 8 Extra Large Node (dw1.8xl & dw2.8xl) 24 spindles, 120GiB RAM, 1.2TB SSD or 16TB Magnetic, 16 virtual cores, 10GigE Cluster 2-100 Nodes (2.4TB SSD 1.6PB Magnetic) 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8

End User Reporting EMR Redshift S3 Dynamo DB

Ignite Your Ambition 26 Markets 3 Clearing Houses 5 Central Securities Depositories Leading Index Provider With 41,000+ Indexes Across Asset Classes And Geographies Over 10,000 Corporate Clients in 60 countries Lists more than 3,500 companies in 35 countries, representing more than $8.8 trillion in total market value 100+ DATA PRODUCT OFFERINGS supporting 2.5+ million investment professionals and users IN 98 COUNTRIES Our technology powers over 70 MARKETPLACES, regulators, CSDs and clearinghouses in over 50 COUNTRIES 34

NDW 1.0 Requirements Original scope was to replace on-premises warehouse with Redshift, keeping equivalent schemas and data 4-8 Billion Rows/Day Legacy limited to 1 Year Retention Must be lower cost than legacy system Legacy $1.16M/Year Must satisfy multiple security and regulatory requirements Must perform similarly to legacy warehouse under concurrent query load

Migration Completed On Schedule Migrated off legacy warehouse to Redshift (start to finish) in 7 man-months Redshift costs were 43% of legacy budget for the same data set (~1100 tables) Tuned queries now running faster than on legacy system Data Ingest 5.5B rows/day average for 2014 High water mark: 14B rows in 1 day Best write rates ~2.76M rows/second 450 GB/day (after compression) into Redshift 1,895 GB/day average uncompressed Currently resize clusters once a quarter (if necessary) NDW_Prod is currently growing +3 dw1.8xl nodes per quarter

Integrated Analytics