Data Lake Best Practices

Size: px
Start display at page:

Download "Data Lake Best Practices"

Transcription

1 Data Lake Best Practices

2 Agenda Why Data Lake Key Components of a Data Lake Modern Data Architecture Some Best Practices Case Study Summary Takeaways

3 What is a Data Lake?

4 What, why etc. What is a data lake? It is an architecture that allows you to collect, store, process, analyze and consume all data that flows into your organization. Why data lake? Leverage all data that flows into your organization Customer centricity Business agility Better predictions via Machine Learning Competitive advantage

5 Comparison of a Data Lake to an Enterprise Data Warehouse Complementary to EDW (not replacement) Schema on read (no predefined schemas) Structured/semi-structured/Unstructured data Fast ingestion of new data/content Data Science + Prediction/Advanced Analytics + BI use cases Data at low level of detail/granularity Loosely defined SLAs EMR Flexibility in tools (open source/tools for advanced analytics) S3 Enterprise DW Data lake can be source for EDW Schema on write (predefined schemas) Structured data only Time consuming to introduce new content BI use cases only (no prediction/advanced analytics) Data at summary/aggregated level of detail Tight SLAs (production schedules) Limited flexibility in tools (SQL only)

6 Key Concepts Associated with a Data Lake

7 COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE STORAGE

8 Components of a Data Lake API & UI Entitlements Catalogue & Search Storage & Streams Data Storage High durability Stores raw data from input sources Support for any type of data Low cost Streaming Streaming ingest of feed data Provides the ability to consume any dataset as a stream Facilitates low latency analytics

9 Components of a Data Lake API & UI Entitlements Catalogue & Search Storage & Streams Catalogue Metadata lake Used for summary statistics and data Classification management Search Simplified access model for data discovery

10 Components of a Data Lake API & UI Entitlements Catalogue & Search Storage & Streams Entitlements system Encryption Authentication Authorisation Chargeback Quotas Data masking Regional restrictions

11 Components of a Data Lake API & UI Entitlements Catalogue & Search API & User Interface Exposes the data lake to customers Programmatically query catalogue Expose search API Ensures that entitlements are respected Storage & Streams

12 The Modern Data Architecture

13

14

15

16 API & UI Entitlements Catalogue & Search Storage & Streams

17

18 API & UI Entitlements Catalogue & Search Storage & Streams

19

20 API & UI Entitlements Catalogue & Search Storage & Streams

21

22 Why Is Amazon S3 the Fabric of Data Lake? Natively supported by big data frameworks (Spark, Hive, Presto, etc.) Decouple storage and compute No need to run compute clusters for storage (unlike HDFS) Can run transient Hadoop clusters & Amazon EC2 Spot Instances Multiple & heterogeneous analysis clusters can use the same data Virtually unlimited number of objects and volume of data Very high bandwidth no aggregate throughput limit Designed for 99.99% availability can tolerate zone failure Designed for % durability No need to pay for data replication Native support for versioning Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies Use HDFS for very frequently accessed (hot) data Secure SSL, client/server-side encryption at rest Low cost

23 API & UI Entitlements Catalogue & Search Storage & Streams

24 Indexing and Searching using Metadata ObjectCreated ObjectDeleted PutItem AWS Lambda Update Stream Metadata Index (DynamoDB) Extract Search Fields Update Index AWS Lambda Search Index (Amazon Elasticsearch Service or Amazon CloudSearch)

25 API & UI Entitlements Catalogue & Search Storage & Streams

26

27

28 Identity & Access Management Manage users, groups, and roles Identity federation with Open ID Temporary credentials with Amazon Security Token Service (Amazon STS) Stored policy templates Powerful policy language Amazon S3 bucket policies

29 Data Encryption AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS AWS Key Management Service Automated key rotation & auditing Integration with other AWS services AWS server side encryption AWS managed key infrastructure

30 API & UI Entitlements Catalogue & Search Storage & Streams

31 Data Lake API & UI Exposes the Metadata API, search, and Amazon S3 storage services to customers Can be based on TVM/STS Temporary Access for many services, and a bespoke API for Metadata Drive all UI operations from API?

32 Introducing Amazon API Gateway Host multiple versions and stages of APIs Create and distribute API keys to developers Leverage AWS Sigv4 to authorize access to APIs Throttle and monitor requests to protect the backend Leverages AWS Lambda

33 API & UI Entitlements Catalogue & Search Storage & Streams

34 API & UI Entitlements Catalogue & Search Storage & Streams

35

36

37 API & UI Entitlements Catalogue & Search Storage & Streams

38

39

40

41

42 Data Integration Partners Reduce the effort to move, cleanse, synchronize, manage, and automatize data related processes.

43

44

45 Putting it all together

46 Building a Data Lake on AWS Kinesis Firehose 2 Athena Query Service Batch Glue

47 Processing Data for Analytics on your data lake

48

49 Processing & Analytics Real-time Batch Kinesis Streams & Firehose Elasticsearch Service Spark Streaming on EMR AWS Lambda Kinesis Analytics, Kinesis Streams Apache Flink on EMR Apache Storm on EMR EMR Hadoop, Spark, Presto Redshift Data Warehouse Athena Query Service Amazon Lex Speech recognition Amazon Rekognition AI & Predictive Amazon Polly Text to speech Machine Learning Predictive analytics DynamoDB NoSQL DB Transactional & RDBMS Aurora Relational Database BI & Data Visualization

50 Important considerations

51 Data Temperature Hot Warm Cold Volume MB GB GB TB PB EB Item size B KB KB MB KB TB Latency ms ms, sec min, hrs Durability Low high High Very high Request rate Very high High Low Cost/GB $$-$ $- Hot data Warm data Cold data

52 Which Stream/Message Storage Should I Use? Amazon DynamoDB Streams Amazon Kinesis Streams Amazon Kinesis Firehose Apache Kafka Amazon SQS (Standard) Amazon SQS (FIFO) New AWS managed Yes Yes Yes No Yes Yes Guaranteed ordering Yes Yes No Yes No Yes Delivery (deduping) Exactly-once At-least-once At-least-once At-least-once At-least-once Exactly-once Data retention period 24 hours 7 days N/A Configurable 14 days 14 days Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ 3 AZ Scale / throughput No limit / ~ table IOPS No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic Parallel consumption Yes Yes No Yes No No Stream MapReduce Yes Yes N/A Yes N/A N/A 300 TPS / queue Row/object size 400 KB 1 MB Destination row/object size Configurable 256 KB 256 KB Cost Higher (table cost) Hot Low Low Low (+admin) Low-medium Low-medium Warm

53 Analytics Types & Frameworks Batch Takes minutes to hours Example: Daily/weekly/monthly reports Amazon EMR (MapReduce, Hive, Pig, Spark) Interactive Takes seconds Example: Self-service dashboards Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark) Sub second: ElastiCache (Redis 3.2 TiB, MemCache), SAP Hana Message Takes milliseconds to seconds Example: Message processing Amazon SQS applications on Amazon EC2 Stream Takes milliseconds to seconds Example: Fraud alerts, 1 minute metrics Amazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS Lambda Artificial Intelligence Takes milliseconds to minutes Example: Fraud detection, forecast demand, text to speech Amazon AI (Lex, Polly, ML, Rekognition), Amazon EMR (Spark ML), Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK and Caffe) Stream Message Batch Interactive AI PROCESS / ANALYZE Amazon AI Amazon Redshift Amazon Athena Presto Amazon SQS apps Amazon EC2 Streaming Amazon Kinesis Analytics KCL apps AWS Lambda EMR Amazon EC2 Amazon EMR Fast Slow Fast

54 Slow Which Analysis Tool Should I Use? Use case Amazon Redshift Amazon Athena Amazon EMR Optimized for data warehousing Ad-hoc Interactive Queries Presto Spark Hive Interactive Query General purpose (iterative ML, RT,..) Scale/throughput ~Nodes Automatic / No limits ~ Nodes AWS Managed Service Yes Yes, Serverless Yes Storage Local storage AmazonS3 Amazon S3, HDFS Batch Optimization Columnar storage, data compression, and zone maps CSV, TSV, JSON, Parquet, ORC, Apache Web log Framework dependent Metadata Amazon Redshift managed Athena Catalog Manager Hive Meta-store BI tools supports Yes (JDBC/ODBC) Yes (JDBC) Yes (JDBC/ODBC & Custom) Access controls Users, groups, and access controls AWS IAM Integration with LDAP UDF support Yes (Scalar) No Yes

55 Case Study

56 Case Study: Re-architecting Compliance For our market surveillance systems, we are looking at about 40% [savings with AWS], but the real benefits are the business benefits: We can do things that we physically weren t able to do before, and that is priceless. - Steve Randich, CIO What FINRA needed Infrastructure for its market surveillance platform Support of analysis and storage of approximately 75 billion market events every day Why they chose AWS Fulfillment of FINRA s security requirements Ability to create a flexible platform using dynamic clusters (Hadoop, Hive, and HBase), Amazon EMR, and Amazon S3 Benefits realized Increased agility, speed, and cost savings Estimated savings of $10-20m annually by using AWS

57 Fraud Detection FINRA uses Amazon EMR and Amazon S3 to process up to 75 billion trading events per day and securely store over 5 petabytes of data, attaining savings of $10-20mm per year.

58 Summary

59 AWS enables you to build sophisticated data lakes and related analytics applications Retrospective, Real-time, Predictive You can build incrementally, adding use cases and increasing scale as you go AWS provides a broad range of security and auditing features to enable you to meet your security requirements

60 Takeaways

61 Prescriptive guidance and rapidly deployable solutions to help you store, analyze, and process big data on the AWS Cloud Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight Deploying a Data Lake on AWS - March 2017 AWS Online Tech Talks Harmonize, Search, and Analyze Loosely Coupled Datasets on AWS Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly Webinar Series - YouTube

62 ?

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively

More information

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing

More information

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect

Big Data on AWS. Peter-Mark Verwoerd Solutions Architect Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing

More information

Energy Management with AWS

Energy Management with AWS Energy Management with AWS Kyle Hart and Nandakumar Sreenivasan Amazon Web Services August [XX], 2017 Tampa Convention Center Tampa, Florida What is Cloud? The NIST Definition Broad Network Access On-Demand

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS

ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Managing IoT and Time Series Data with Amazon ElastiCache for Redis Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All

More information

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services Serverless Computing Redefining the Cloud Roger S. Barga, Ph.D. General Manager Amazon Web Services Technology Triggers Highly Recommended http://a16z.com/2016/12/16/the-end-of-cloud-computing/ Serverless

More information

Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility. AWS Whitepaper

Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility. AWS Whitepaper Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility AWS Whitepaper Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility: AWS Whitepaper Copyright 2018 Amazon Web

More information

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services

What s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services What s New at AWS? A selection of some new stuff Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services Speed of Innovation AWS Pace of Innovation AWS has been continually expanding its

More information

Store, Protect, Optimize Your Healthcare Data in AWS

Store, Protect, Optimize Your Healthcare Data in AWS Healthcare reform, increasing patient expectations, exponential data growth, and the threat of cyberattacks are forcing healthcare providers to re-evaluate their data management strategies. Healthcare

More information

Gabriel Villa. Architecting an Analytics Solution on AWS

Gabriel Villa. Architecting an Analytics Solution on AWS Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate

More information

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges

More information

Corriendo R sobre un ambiente Serverless: Amazon Athena

Corriendo R sobre un ambiente Serverless: Amazon Athena Corriendo R sobre un ambiente Serverless: Amazon Athena Mauricio Muñoz Solutions Architect, AWS Chile April, 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Web Services

More information

Real-time Streaming Applications on AWS Patterns and Use Cases

Real-time Streaming Applications on AWS Patterns and Use Cases Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,

More information

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Best Practices and Performance Tuning on Amazon Elastic MapReduce Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

Capture Business Opportunities from Systems of Record and Systems of Innovation

Capture Business Opportunities from Systems of Record and Systems of Innovation Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information

More information

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

AWS 101. Patrick Pierson, IonChannel

AWS 101. Patrick Pierson, IonChannel AWS 101 Patrick Pierson, IonChannel What is AWS? Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help

More information

Integrating Splunk with AWS services:

Integrating Splunk with AWS services: Integrating Splunk with AWS services: Using Redshi+, Elas0c Map Reduce (EMR), Amazon Machine Learning & S3 to gain ac0onable insights via predic0ve analy0cs via Splunk Patrick Shumate Solutions Architect,

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Extend NonStop Applications with Cloud-based Services. Phil Ly, TIC Software John Russell, Canam Software

Extend NonStop Applications with Cloud-based Services. Phil Ly, TIC Software John Russell, Canam Software Extend NonStop Applications with Cloud-based Services Phil Ly, TIC Software John Russell, Canam Software Agenda Cloud Computing and Microservices Amazon Web Services (AWS) Integrate NonStop with AWS Managed

More information

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk Splunk & AWS Gain real-time insights from your data at scale Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk Forward-Looking Statements During the course of this presentation, we may

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

Amazon AWS-Solution-Architect-Associate Exam

Amazon AWS-Solution-Architect-Associate Exam Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?

More information

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.

Activator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without

More information

4) An organization needs a data store to handle the following data types and access patterns:

4) An organization needs a data store to handle the following data types and access patterns: 1) A company needs to deploy a data lake solution for their data scientists in which all company data is accessible and stored in a central S3 bucket. The company segregates the data by business unit,

More information

Experiences with Serverless Big Data

Experiences with Serverless Big Data Experiences with Serverless Big Data AWS Meetup Munich 2016 Markus Schmidberger, Head of Data Service Munich, 17.10.16 Key Components of our Data Service Real-Time Monitoring Enable our development teams

More information

Mid-Atlantic CIO Forum

Mid-Atlantic CIO Forum Mid-Atlantic CIO Forum Agenda Security of the Cloud Security In the Cloud Your Product and Services Roadmap (innovation) AWS and Cloud Services Growth and Expansion at AWS Questions & Discussion Shared

More information

Managing Deep Learning Workflows

Managing Deep Learning Workflows Managing Deep Learning Workflows Deep Learning on AWS Batch treske@amazon.de September 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Understanding Data Understanding

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Innovation at AWS. Eric Ferreira Principal Database Engineer Amazon Redshift

Innovation at AWS. Eric Ferreira Principal Database Engineer Amazon Redshift Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift The Amazon Flywheel Focus on things that stay the same Price Selection Delivery Applying this at AWS Focus

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Streaming Data: The Opportunity & How to Work With It

Streaming Data: The Opportunity & How to Work With It Streaming Data: The Opportunity & How to Work With It Roger Barga, GM Amazon Kinesis April 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interest in and demand for stream

More information

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India

Amazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India (AWS) Overview: AWS is a cloud service from Amazon, which provides services in the form of building blocks, these building blocks can be used to create and deploy various types of application in the cloud.

More information

The Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter

The Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter AWS Solutions Architect (Associate) Exam Course Manual Enter Linux Academy Keller, Texas United States of America March 31, 2017 To All Linux Academy Students: Welcome to Linux Academy's AWS Certified

More information

AWS Storage Optimization. AWS Whitepaper

AWS Storage Optimization. AWS Whitepaper AWS Storage Optimization AWS Whitepaper AWS Storage Optimization: AWS Whitepaper Copyright 2018 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress

More information

Agenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration

Agenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration Storage on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

Practical Applications of Machine Learning for Image and Video in the Cloud

Practical Applications of Machine Learning for Image and Video in the Cloud Practical Applications of Machine Learning for Image and Video in the Cloud Shawn Przybilla, AWS Solutions Architect M&E @shawnprzybilla 2/27/18 There were 3.7 Billion internet users in 2017 1.2 Trillion

More information

Overview of AWS Security - Database Services

Overview of AWS Security - Database Services Overview of AWS Security - Database Services June 2016 (Please consult http://aws.amazon.com/security/ for the latest version of this paper) 2016, Amazon Web Services, Inc. or its affiliates. All rights

More information

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated. AWS Storage Amazon EFS Amazon EBS Amazon EC2 Instance storage Amazon S3 Amazon Glacier AWS Storage Gateway File Block Object Hybrid integrated storage Amazon S3 Amazon Glacier Amazon EBS Amazon EFS Durable

More information

Reactive Microservices Architecture on AWS

Reactive Microservices Architecture on AWS Reactive Microservices Architecture on AWS Sascha Möllering Solutions Architect, @sascha242, Amazon Web Services Germany GmbH Why are we here today? https://secure.flickr.com/photos/mgifford/4525333972

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

Data Architectures in Azure for Analytics & Big Data

Data Architectures in Azure for Analytics & Big Data Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect AWS IoT Overview July 2016 Thomas Jones, Partner Solutions Architect AWS customers are connecting physical things to the cloud in every industry imaginable. Healthcare and Life Sciences Municipal Infrastructure

More information

CloudExpo November 2017 Tomer Levi

CloudExpo November 2017 Tomer Levi CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes

More information

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019

From Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways

More information

AWS Database Migration Service

AWS Database Migration Service AWS Database Migration Service Database Modernisation with Minimal Downtime John Winford Sr. Technical Program Manager May 18, 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

Containers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture

Containers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture Containers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture A Typical Application with Microservices Client Webapp Webapp Webapp Greeting Greeting Greeting Name Name Name Microservice

More information

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud WHITE PAPER / AUGUST 8, 2018 DISCLAIMER The following is intended to outline our general product direction. It is intended for

More information

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Modern ETL Tools for Cloud and Big Data Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Agenda Landscape Cloud ETL Tools Big Data ETL Tools Best Practices

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

How can you implement this through a script that a scheduling daemon runs daily on the application servers? You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups

More information

Microservices on AWS. Matthias Jung, Solutions Architect AWS

Microservices on AWS. Matthias Jung, Solutions Architect AWS Microservices on AWS Matthias Jung, Solutions Architect AWS Agenda What are Microservices? Why Microservices? Challenges of Microservices Microservices on AWS What are Microservices? What are Microservices?

More information

The age of Big Data Big Data for Oracle Database Professionals

The age of Big Data Big Data for Oracle Database Professionals The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

What is Gluent? The Gluent Data Platform

What is Gluent? The Gluent Data Platform What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the

More information

Modernizing Business Intelligence and Analytics

Modernizing Business Intelligence and Analytics Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from

More information

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor

Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

AWS Serverless Architecture Think Big

AWS Serverless Architecture Think Big MAKING BIG DATA COME ALIVE AWS Serverless Architecture Think Big Garrett Holbrook, Data Engineer Feb 1 st, 2017 Agenda What is Think Big? Example Project Walkthrough AWS Serverless 2 Think Big, a Teradata

More information

Stages of Data Processing

Stages of Data Processing Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

2013 AWS Worldwide Public Sector Summit Washington, D.C.

2013 AWS Worldwide Public Sector Summit Washington, D.C. 2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic

More information

The Technology of the Business Data Lake. Appendix

The Technology of the Business Data Lake. Appendix The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger -

Werden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger - Werden Sie ein Teil von Internet der Dinge auf AWS AWS Enterprise Summit 2015 Dr. Markus Schmidberger - schmidbe@amazon.de Internet of Things is the network of physical objects or "things" embedded with

More information

Oracle GoldenGate for Big Data

Oracle GoldenGate for Big Data Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools

Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect Amazon Web Services Foundational Services for Research Computing Mike Kuentz, WWPS Solutions Architect April 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure

More information

Protecting Your Data in AWS. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Protecting Your Data in AWS. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Protecting Your Data in AWS 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encrypting Data in AWS AWS Key Management Service, CloudHSM and other options What to expect from this

More information

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

PrepAwayExam.   High-efficient Exam Materials are the best high pass-rate Exam Dumps PrepAwayExam http://www.prepawayexam.com/ High-efficient Exam Materials are the best high pass-rate Exam Dumps Exam : SAA-C01 Title : AWS Certified Solutions Architect - Associate (Released February 2018)

More information

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions 1) A company is storing an access key (access key ID and secret access key) in a text file on a custom AMI. The company uses the access key to access DynamoDB tables from instances created from the AMI.

More information

Performance Efficiency Pillar

Performance Efficiency Pillar Performance Efficiency Pillar AWS Well-Architected Framework July 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes

More information

Big Data with Hadoop Ecosystem

Big Data with Hadoop Ecosystem Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process

More information

Amazon Search Services. Christoph Schmitter

Amazon Search Services. Christoph Schmitter Amazon Search Services Christoph Schmitter csc@amazon.de What we'll cover Overview of Amazon Search Services Understand the difference between Cloudsearch and Amazon ElasticSearch Service Q&A Amazon Search

More information

Security Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance

Security Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance Security Aspekts on Services for Serverless Architectures Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance Agenda: Security in General Services in Scope Aspects of Services for

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide

FAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome

More information

Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB

Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely is the market leader in managed WordPress hosting, and an AWS Advanced Technology, SaaS, and Public

More information

About Intellipaat. About the Course. Why Take This Course?

About Intellipaat. About the Course. Why Take This Course? About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over

More information

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content

Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Introduction to Cloud Computing A Short history Client Server Computing Concepts Challenges with Distributed Computing Introduction

More information

ADABAS & NATURAL 2050+

ADABAS & NATURAL 2050+ ADABAS & NATURAL 2050+ Guido Falkenberg SVP Global Customer Innovation DIGITAL TRANSFORMATION #WITHOUTCOMPROMISE 2017 Software AG. All rights reserved. ADABAS & NATURAL 2050+ GLOBAL INITIATIVE INNOVATION

More information

exam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0

exam.   Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0 70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Informatica Data Lake Management on the AWS Cloud

Informatica Data Lake Management on the AWS Cloud Informatica Data Lake Management on the AWS Cloud Quick Start Reference Deployment January 2018 Informatica Big Data Team Vinod Shukla AWS Quick Start Reference Team Contents Overview... 2 Informatica

More information

Modern Data Warehouse The New Approach to Azure BI

Modern Data Warehouse The New Approach to Azure BI Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics

More information

What to expect from the session Technical recap VMware Cloud on AWS {Sample} Integration use case Services introduction & solution designs Solution su

What to expect from the session Technical recap VMware Cloud on AWS {Sample} Integration use case Services introduction & solution designs Solution su LHC3376BES AWS Native Services Integration with VMware Cloud on AWS Technical Deep Dive Ian Massingham, Worldwide Lead, AWS Technical Evangelism Paul Bockelman, AWS Principal Solutions Architect (WWPS)

More information