Data Lake Best Practices
|
|
- Beatrice Fleming
- 6 years ago
- Views:
Transcription
1 Data Lake Best Practices
2 Agenda Why Data Lake Key Components of a Data Lake Modern Data Architecture Some Best Practices Case Study Summary Takeaways
3 What is a Data Lake?
4 What, why etc. What is a data lake? It is an architecture that allows you to collect, store, process, analyze and consume all data that flows into your organization. Why data lake? Leverage all data that flows into your organization Customer centricity Business agility Better predictions via Machine Learning Competitive advantage
5 Comparison of a Data Lake to an Enterprise Data Warehouse Complementary to EDW (not replacement) Schema on read (no predefined schemas) Structured/semi-structured/Unstructured data Fast ingestion of new data/content Data Science + Prediction/Advanced Analytics + BI use cases Data at low level of detail/granularity Loosely defined SLAs EMR Flexibility in tools (open source/tools for advanced analytics) S3 Enterprise DW Data lake can be source for EDW Schema on write (predefined schemas) Structured data only Time consuming to introduce new content BI use cases only (no prediction/advanced analytics) Data at summary/aggregated level of detail Tight SLAs (production schedules) Limited flexibility in tools (SQL only)
6 Key Concepts Associated with a Data Lake
7 COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE COMPUTE STORAGE
8 Components of a Data Lake API & UI Entitlements Catalogue & Search Storage & Streams Data Storage High durability Stores raw data from input sources Support for any type of data Low cost Streaming Streaming ingest of feed data Provides the ability to consume any dataset as a stream Facilitates low latency analytics
9 Components of a Data Lake API & UI Entitlements Catalogue & Search Storage & Streams Catalogue Metadata lake Used for summary statistics and data Classification management Search Simplified access model for data discovery
10 Components of a Data Lake API & UI Entitlements Catalogue & Search Storage & Streams Entitlements system Encryption Authentication Authorisation Chargeback Quotas Data masking Regional restrictions
11 Components of a Data Lake API & UI Entitlements Catalogue & Search API & User Interface Exposes the data lake to customers Programmatically query catalogue Expose search API Ensures that entitlements are respected Storage & Streams
12 The Modern Data Architecture
13
14
15
16 API & UI Entitlements Catalogue & Search Storage & Streams
17
18 API & UI Entitlements Catalogue & Search Storage & Streams
19
20 API & UI Entitlements Catalogue & Search Storage & Streams
21
22 Why Is Amazon S3 the Fabric of Data Lake? Natively supported by big data frameworks (Spark, Hive, Presto, etc.) Decouple storage and compute No need to run compute clusters for storage (unlike HDFS) Can run transient Hadoop clusters & Amazon EC2 Spot Instances Multiple & heterogeneous analysis clusters can use the same data Virtually unlimited number of objects and volume of data Very high bandwidth no aggregate throughput limit Designed for 99.99% availability can tolerate zone failure Designed for % durability No need to pay for data replication Native support for versioning Tiered-storage (Standard, IA, Amazon Glacier) via life-cycle policies Use HDFS for very frequently accessed (hot) data Secure SSL, client/server-side encryption at rest Low cost
23 API & UI Entitlements Catalogue & Search Storage & Streams
24 Indexing and Searching using Metadata ObjectCreated ObjectDeleted PutItem AWS Lambda Update Stream Metadata Index (DynamoDB) Extract Search Fields Update Index AWS Lambda Search Index (Amazon Elasticsearch Service or Amazon CloudSearch)
25 API & UI Entitlements Catalogue & Search Storage & Streams
26
27
28 Identity & Access Management Manage users, groups, and roles Identity federation with Open ID Temporary credentials with Amazon Security Token Service (Amazon STS) Stored policy templates Powerful policy language Amazon S3 bucket policies
29 Data Encryption AWS CloudHSM Dedicated Tenancy SafeNetLuna SA HSM Device Common Criteria EAL4+, NIST FIPS AWS Key Management Service Automated key rotation & auditing Integration with other AWS services AWS server side encryption AWS managed key infrastructure
30 API & UI Entitlements Catalogue & Search Storage & Streams
31 Data Lake API & UI Exposes the Metadata API, search, and Amazon S3 storage services to customers Can be based on TVM/STS Temporary Access for many services, and a bespoke API for Metadata Drive all UI operations from API?
32 Introducing Amazon API Gateway Host multiple versions and stages of APIs Create and distribute API keys to developers Leverage AWS Sigv4 to authorize access to APIs Throttle and monitor requests to protect the backend Leverages AWS Lambda
33 API & UI Entitlements Catalogue & Search Storage & Streams
34 API & UI Entitlements Catalogue & Search Storage & Streams
35
36
37 API & UI Entitlements Catalogue & Search Storage & Streams
38
39
40
41
42 Data Integration Partners Reduce the effort to move, cleanse, synchronize, manage, and automatize data related processes.
43
44
45 Putting it all together
46 Building a Data Lake on AWS Kinesis Firehose 2 Athena Query Service Batch Glue
47 Processing Data for Analytics on your data lake
48
49 Processing & Analytics Real-time Batch Kinesis Streams & Firehose Elasticsearch Service Spark Streaming on EMR AWS Lambda Kinesis Analytics, Kinesis Streams Apache Flink on EMR Apache Storm on EMR EMR Hadoop, Spark, Presto Redshift Data Warehouse Athena Query Service Amazon Lex Speech recognition Amazon Rekognition AI & Predictive Amazon Polly Text to speech Machine Learning Predictive analytics DynamoDB NoSQL DB Transactional & RDBMS Aurora Relational Database BI & Data Visualization
50 Important considerations
51 Data Temperature Hot Warm Cold Volume MB GB GB TB PB EB Item size B KB KB MB KB TB Latency ms ms, sec min, hrs Durability Low high High Very high Request rate Very high High Low Cost/GB $$-$ $- Hot data Warm data Cold data
52 Which Stream/Message Storage Should I Use? Amazon DynamoDB Streams Amazon Kinesis Streams Amazon Kinesis Firehose Apache Kafka Amazon SQS (Standard) Amazon SQS (FIFO) New AWS managed Yes Yes Yes No Yes Yes Guaranteed ordering Yes Yes No Yes No Yes Delivery (deduping) Exactly-once At-least-once At-least-once At-least-once At-least-once Exactly-once Data retention period 24 hours 7 days N/A Configurable 14 days 14 days Availability 3 AZ 3 AZ 3 AZ Configurable 3 AZ 3 AZ Scale / throughput No limit / ~ table IOPS No limit / ~ shards No limit / automatic No limit / ~ nodes No limits / automatic Parallel consumption Yes Yes No Yes No No Stream MapReduce Yes Yes N/A Yes N/A N/A 300 TPS / queue Row/object size 400 KB 1 MB Destination row/object size Configurable 256 KB 256 KB Cost Higher (table cost) Hot Low Low Low (+admin) Low-medium Low-medium Warm
53 Analytics Types & Frameworks Batch Takes minutes to hours Example: Daily/weekly/monthly reports Amazon EMR (MapReduce, Hive, Pig, Spark) Interactive Takes seconds Example: Self-service dashboards Amazon Redshift, Amazon Athena, Amazon EMR (Presto, Spark) Sub second: ElastiCache (Redis 3.2 TiB, MemCache), SAP Hana Message Takes milliseconds to seconds Example: Message processing Amazon SQS applications on Amazon EC2 Stream Takes milliseconds to seconds Example: Fraud alerts, 1 minute metrics Amazon EMR (Spark Streaming), Amazon Kinesis Analytics, KCL, Storm, AWS Lambda Artificial Intelligence Takes milliseconds to minutes Example: Fraud detection, forecast demand, text to speech Amazon AI (Lex, Polly, ML, Rekognition), Amazon EMR (Spark ML), Deep Learning AMI (MXNet, TensorFlow, Theano, Torch, CNTK and Caffe) Stream Message Batch Interactive AI PROCESS / ANALYZE Amazon AI Amazon Redshift Amazon Athena Presto Amazon SQS apps Amazon EC2 Streaming Amazon Kinesis Analytics KCL apps AWS Lambda EMR Amazon EC2 Amazon EMR Fast Slow Fast
54 Slow Which Analysis Tool Should I Use? Use case Amazon Redshift Amazon Athena Amazon EMR Optimized for data warehousing Ad-hoc Interactive Queries Presto Spark Hive Interactive Query General purpose (iterative ML, RT,..) Scale/throughput ~Nodes Automatic / No limits ~ Nodes AWS Managed Service Yes Yes, Serverless Yes Storage Local storage AmazonS3 Amazon S3, HDFS Batch Optimization Columnar storage, data compression, and zone maps CSV, TSV, JSON, Parquet, ORC, Apache Web log Framework dependent Metadata Amazon Redshift managed Athena Catalog Manager Hive Meta-store BI tools supports Yes (JDBC/ODBC) Yes (JDBC) Yes (JDBC/ODBC & Custom) Access controls Users, groups, and access controls AWS IAM Integration with LDAP UDF support Yes (Scalar) No Yes
55 Case Study
56 Case Study: Re-architecting Compliance For our market surveillance systems, we are looking at about 40% [savings with AWS], but the real benefits are the business benefits: We can do things that we physically weren t able to do before, and that is priceless. - Steve Randich, CIO What FINRA needed Infrastructure for its market surveillance platform Support of analysis and storage of approximately 75 billion market events every day Why they chose AWS Fulfillment of FINRA s security requirements Ability to create a flexible platform using dynamic clusters (Hadoop, Hive, and HBase), Amazon EMR, and Amazon S3 Benefits realized Increased agility, speed, and cost savings Estimated savings of $10-20m annually by using AWS
57 Fraud Detection FINRA uses Amazon EMR and Amazon S3 to process up to 75 billion trading events per day and securely store over 5 petabytes of data, attaining savings of $10-20mm per year.
58 Summary
59 AWS enables you to build sophisticated data lakes and related analytics applications Retrospective, Real-time, Predictive You can build incrementally, adding use cases and increasing scale as you go AWS provides a broad range of security and auditing features to enable you to meet your security requirements
60 Takeaways
61 Prescriptive guidance and rapidly deployable solutions to help you store, analyze, and process big data on the AWS Cloud Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight Deploying a Data Lake on AWS - March 2017 AWS Online Tech Talks Harmonize, Search, and Analyze Loosely Coupled Datasets on AWS Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly Webinar Series - YouTube
62 ?
Big Data on AWS. Big Data Agility and Performance Delivered in the Cloud. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Big Data on AWS Big Data Agility and Performance Delivered in the Cloud 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Big Data Technologies and techniques for working productively
More informationIntro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect
Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect Igor Roiter Big Data Cloud Solution Architect Working as a Data Specialist for the last 11 years 9 of them as a Consultant specializing
More informationBig Data on AWS. Peter-Mark Verwoerd Solutions Architect
Big Data on AWS Peter-Mark Verwoerd Solutions Architect What to get out of this talk Non-technical: Big Data processing stages: ingest, store, process, visualize Hot vs. Cold data Low latency processing
More informationEnergy Management with AWS
Energy Management with AWS Kyle Hart and Nandakumar Sreenivasan Amazon Web Services August [XX], 2017 Tampa Convention Center Tampa, Florida What is Cloud? The NIST Definition Broad Network Access On-Demand
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationLambda Architecture for Batch and Stream Processing. October 2018
Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.
More informationARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS
ARCHITECTING WEB APPLICATIONS FOR THE CLOUD: DESIGN PRINCIPLES AND PRACTICAL GUIDANCE FOR AWS Dr Adnene Guabtni, Senior Research Scientist, NICTA/Data61, CSIRO Adnene.Guabtni@csiro.au EC2 S3 ELB RDS AMI
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationManaging IoT and Time Series Data with Amazon ElastiCache for Redis
Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All
More informationServerless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services
Serverless Computing Redefining the Cloud Roger S. Barga, Ph.D. General Manager Amazon Web Services Technology Triggers Highly Recommended http://a16z.com/2016/12/16/the-end-of-cloud-computing/ Serverless
More informationBuilding Big Data Storage Solutions (Data Lakes) for Maximum Flexibility. AWS Whitepaper
Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility AWS Whitepaper Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility: AWS Whitepaper Copyright 2018 Amazon Web
More informationWhat s New at AWS? A selection of some new stuff. Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services
What s New at AWS? A selection of some new stuff Constantin Gonzalez, Principal Solutions Architect, Amazon Web Services Speed of Innovation AWS Pace of Innovation AWS has been continually expanding its
More informationStore, Protect, Optimize Your Healthcare Data in AWS
Healthcare reform, increasing patient expectations, exponential data growth, and the threat of cyberattacks are forcing healthcare providers to re-evaluate their data management strategies. Healthcare
More informationGabriel Villa. Architecting an Analytics Solution on AWS
Gabriel Villa Architecting an Analytics Solution on AWS Cloud and Data Architect Skilled leader, solution architect, and technical expert focusing primarily on Microsoft technologies and AWS. Passionate
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More informationCorriendo R sobre un ambiente Serverless: Amazon Athena
Corriendo R sobre un ambiente Serverless: Amazon Athena Mauricio Muñoz Solutions Architect, AWS Chile April, 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Web Services
More informationReal-time Streaming Applications on AWS Patterns and Use Cases
Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,
More informationWhat s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services
What s New at AWS? looking at just a few new things for Enterprise Philipp Behre, Enterprise Solutions Architect, Amazon Web Services 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationBest Practices and Performance Tuning on Amazon Elastic MapReduce
Best Practices and Performance Tuning on Amazon Elastic MapReduce Michael Hanisch Solutions Architect Amo Abeyaratne Big Data and Analytics Consultant ANZ 12.04.2016 2016, Amazon Web Services, Inc. or
More informationQLIK INTEGRATION WITH AMAZON REDSHIFT
QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationCapture Business Opportunities from Systems of Record and Systems of Innovation
Capture Business Opportunities from Systems of Record and Systems of Innovation Amit Satoor, SAP March Hartz, SAP PUBLIC Big Data transformation powers digital innovation system Relevant nuggets of information
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationAWS 101. Patrick Pierson, IonChannel
AWS 101 Patrick Pierson, IonChannel What is AWS? Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help
More informationIntegrating Splunk with AWS services:
Integrating Splunk with AWS services: Using Redshi+, Elas0c Map Reduce (EMR), Amazon Machine Learning & S3 to gain ac0onable insights via predic0ve analy0cs via Splunk Patrick Shumate Solutions Architect,
More informationIntroduction to Database Services
Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationExtend NonStop Applications with Cloud-based Services. Phil Ly, TIC Software John Russell, Canam Software
Extend NonStop Applications with Cloud-based Services Phil Ly, TIC Software John Russell, Canam Software Agenda Cloud Computing and Microservices Amazon Web Services (AWS) Integrate NonStop with AWS Managed
More informationSplunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk
Splunk & AWS Gain real-time insights from your data at scale Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk Forward-Looking Statements During the course of this presentation, we may
More informationBIG DATA COURSE CONTENT
BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationAmazon AWS-Solution-Architect-Associate Exam
Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?
More informationActivator Library. Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success.
Focus on maximizing the value of your data, gain business insights, increase your team s productivity, and achieve success. ACTIVATORS Designed to give your team assistance when you need it most without
More information4) An organization needs a data store to handle the following data types and access patterns:
1) A company needs to deploy a data lake solution for their data scientists in which all company data is accessible and stored in a central S3 bucket. The company segregates the data by business unit,
More informationExperiences with Serverless Big Data
Experiences with Serverless Big Data AWS Meetup Munich 2016 Markus Schmidberger, Head of Data Service Munich, 17.10.16 Key Components of our Data Service Real-Time Monitoring Enable our development teams
More informationMid-Atlantic CIO Forum
Mid-Atlantic CIO Forum Agenda Security of the Cloud Security In the Cloud Your Product and Services Roadmap (innovation) AWS and Cloud Services Growth and Expansion at AWS Questions & Discussion Shared
More informationManaging Deep Learning Workflows
Managing Deep Learning Workflows Deep Learning on AWS Batch treske@amazon.de September 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Business Understanding Data Understanding
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationInnovation at AWS. Eric Ferreira Principal Database Engineer Amazon Redshift
Innovation at AWS Eric Ferreira ericfe@amazon.com Principal Database Engineer Amazon Redshift The Amazon Flywheel Focus on things that stay the same Price Selection Delivery Applying this at AWS Focus
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationStreaming Data: The Opportunity & How to Work With It
Streaming Data: The Opportunity & How to Work With It Roger Barga, GM Amazon Kinesis April 2016 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Interest in and demand for stream
More informationAmazon Web Services. Block 402, 4 th Floor, Saptagiri Towers, Above Pantaloons, Begumpet Main Road, Hyderabad Telangana India
(AWS) Overview: AWS is a cloud service from Amazon, which provides services in the form of building blocks, these building blocks can be used to create and deploy various types of application in the cloud.
More informationThe Orion Papers. AWS Solutions Architect (Associate) Exam Course Manual. Enter
AWS Solutions Architect (Associate) Exam Course Manual Enter Linux Academy Keller, Texas United States of America March 31, 2017 To All Linux Academy Students: Welcome to Linux Academy's AWS Certified
More informationAWS Storage Optimization. AWS Whitepaper
AWS Storage Optimization AWS Whitepaper AWS Storage Optimization: AWS Whitepaper Copyright 2018 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress
More informationAgenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration
Storage on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationPractical Applications of Machine Learning for Image and Video in the Cloud
Practical Applications of Machine Learning for Image and Video in the Cloud Shawn Przybilla, AWS Solutions Architect M&E @shawnprzybilla 2/27/18 There were 3.7 Billion internet users in 2017 1.2 Trillion
More informationOverview of AWS Security - Database Services
Overview of AWS Security - Database Services June 2016 (Please consult http://aws.amazon.com/security/ for the latest version of this paper) 2016, Amazon Web Services, Inc. or its affiliates. All rights
More informationAWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.
AWS Storage Amazon EFS Amazon EBS Amazon EC2 Instance storage Amazon S3 Amazon Glacier AWS Storage Gateway File Block Object Hybrid integrated storage Amazon S3 Amazon Glacier Amazon EBS Amazon EFS Durable
More informationReactive Microservices Architecture on AWS
Reactive Microservices Architecture on AWS Sascha Möllering Solutions Architect, @sascha242, Amazon Web Services Germany GmbH Why are we here today? https://secure.flickr.com/photos/mgifford/4525333972
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationData Architectures in Azure for Analytics & Big Data
Data Architectures in for Analytics & Big Data October 20, 2018 Melissa Coates Solution Architect, BlueGranite Microsoft Data Platform MVP Blog: www.sqlchick.com Twitter: @sqlchick Data Architecture A
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationAWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect
AWS IoT Overview July 2016 Thomas Jones, Partner Solutions Architect AWS customers are connecting physical things to the cloud in every industry imaginable. Healthcare and Life Sciences Municipal Infrastructure
More informationCloudExpo November 2017 Tomer Levi
CloudExpo November 2017 Tomer Levi About me Full Stack Engineer @ Intel s Advanced Analytics group. Artificial Intelligence unit at Intel. Responsible for (1) Radical improvement of critical processes
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationAWS Database Migration Service
AWS Database Migration Service Database Modernisation with Minimal Downtime John Winford Sr. Technical Program Manager May 18, 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationContainers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture
Containers or Serverless? Mike Gillespie Solutions Architect, AWS Solutions Architecture A Typical Application with Microservices Client Webapp Webapp Webapp Greeting Greeting Greeting Name Name Name Microservice
More informationLeverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud
Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud WHITE PAPER / AUGUST 8, 2018 DISCLAIMER The following is intended to outline our general product direction. It is intended for
More informationModern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.
Modern ETL Tools for Cloud and Big Data Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Agenda Landscape Cloud ETL Tools Big Data ETL Tools Best Practices
More informationHadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved
Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop
More informationHow can you implement this through a script that a scheduling daemon runs daily on the application servers?
You ve been tasked with implementing an automated data backup solution for your application servers that run on Amazon EC2 with Amazon EBS volumes. You want to use a distributed data store for your backups
More informationMicroservices on AWS. Matthias Jung, Solutions Architect AWS
Microservices on AWS Matthias Jung, Solutions Architect AWS Agenda What are Microservices? Why Microservices? Challenges of Microservices Microservices on AWS What are Microservices? What are Microservices?
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationMapR Enterprise Hadoop
2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS
More informationWhat is Gluent? The Gluent Data Platform
What is Gluent? The Gluent Data Platform The Gluent Data Platform provides a transparent data virtualization layer between traditional databases and modern data storage platforms, such as Hadoop, in the
More informationModernizing Business Intelligence and Analytics
Modernizing Business Intelligence and Analytics Justin Erickson Senior Director, Product Management 1 Agenda What benefits can I achieve from modernizing my analytic DB? When and how do I migrate from
More informationMaking the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack. Chief Architect RainStor
Making the Most of Hadoop with Optimized Data Compression (and Boost Performance) Mark Cusack Chief Architect RainStor Agenda Importance of Hadoop + data compression Data compression techniques Compression,
More informationMAPR DATA GOVERNANCE WITHOUT COMPROMISE
MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance
More informationAWS Serverless Architecture Think Big
MAKING BIG DATA COME ALIVE AWS Serverless Architecture Think Big Garrett Holbrook, Data Engineer Feb 1 st, 2017 Agenda What is Think Big? Example Project Walkthrough AWS Serverless 2 Think Big, a Teradata
More informationStages of Data Processing
Data processing can be understood as the conversion of raw data into a meaningful and desired form. Basically, producing information that can be understood by the end user. So then, the question arises,
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationThe Technology of the Business Data Lake. Appendix
The Technology of the Business Data Lake Appendix Pivotal data products Term Greenplum Database GemFire Pivotal HD Spring XD Pivotal Data Dispatch Pivotal Analytics Description A massively parallel platform
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationWerden Sie ein Teil von Internet der Dinge auf AWS. AWS Enterprise Summit 2015 Dr. Markus Schmidberger -
Werden Sie ein Teil von Internet der Dinge auf AWS AWS Enterprise Summit 2015 Dr. Markus Schmidberger - schmidbe@amazon.de Internet of Things is the network of physical objects or "things" embedded with
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationHDInsight > Hadoop. October 12, 2017
HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond
More informationCombine Native SQL Flexibility with SAP HANA Platform Performance and Tools
SAP Technical Brief Data Warehousing SAP HANA Data Warehousing Combine Native SQL Flexibility with SAP HANA Platform Performance and Tools A data warehouse for the modern age Data warehouses have been
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data
More informationAmazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect
Amazon Web Services Foundational Services for Research Computing Mike Kuentz, WWPS Solutions Architect April 2017 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS Global Infrastructure
More informationProtecting Your Data in AWS. 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Protecting Your Data in AWS 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Encrypting Data in AWS AWS Key Management Service, CloudHSM and other options What to expect from this
More informationPrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps
PrepAwayExam http://www.prepawayexam.com/ High-efficient Exam Materials are the best high pass-rate Exam Dumps Exam : SAA-C01 Title : AWS Certified Solutions Architect - Associate (Released February 2018)
More informationAWS Solutions Architect Associate (SAA-C01) Sample Exam Questions
1) A company is storing an access key (access key ID and secret access key) in a text file on a custom AMI. The company uses the access key to access DynamoDB tables from instances created from the AMI.
More informationPerformance Efficiency Pillar
Performance Efficiency Pillar AWS Well-Architected Framework July 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes
More informationBig Data with Hadoop Ecosystem
Diógenes Pires Big Data with Hadoop Ecosystem Hands-on (HBase, MySql and Hive + Power BI) Internet Live http://www.internetlivestats.com/ Introduction Business Intelligence Business Intelligence Process
More informationAmazon Search Services. Christoph Schmitter
Amazon Search Services Christoph Schmitter csc@amazon.de What we'll cover Overview of Amazon Search Services Understand the difference between Cloudsearch and Amazon ElasticSearch Service Q&A Amazon Search
More informationSecurity Aspekts on Services for Serverless Architectures. Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance
Security Aspekts on Services for Serverless Architectures Bertram Dorn EMEA Specialized Solutions Architect Security and Compliance Agenda: Security in General Services in Scope Aspects of Services for
More informationCONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM
CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications
More informationFAQs. Business (CIP 2.2) AWS Market Place Troubleshooting and FAQ Guide
FAQs 1. What is the browser compatibility for logging into the TCS Connected Intelligence Data Lake for Business Portal? Please check whether you are using Mozilla Firefox 18 or above and Google Chrome
More informationPagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB
Pagely.com implements log analytics with AWS Glue and Amazon Athena using Beyondsoft s ConvergDB Pagely is the market leader in managed WordPress hosting, and an AWS Advanced Technology, SaaS, and Public
More informationAbout Intellipaat. About the Course. Why Take This Course?
About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over
More informationAmazon Web Services (AWS) Solutions Architect Intermediate Level Course Content
Amazon Web Services (AWS) Solutions Architect Intermediate Level Course Content Introduction to Cloud Computing A Short history Client Server Computing Concepts Challenges with Distributed Computing Introduction
More informationADABAS & NATURAL 2050+
ADABAS & NATURAL 2050+ Guido Falkenberg SVP Global Customer Innovation DIGITAL TRANSFORMATION #WITHOUTCOMPROMISE 2017 Software AG. All rights reserved. ADABAS & NATURAL 2050+ GLOBAL INITIATIVE INNOVATION
More informationexam. Microsoft Perform Data Engineering on Microsoft Azure HDInsight. Version 1.0
70-775.exam Number: 70-775 Passing Score: 800 Time Limit: 120 min File Version: 1.0 Microsoft 70-775 Perform Data Engineering on Microsoft Azure HDInsight Version 1.0 Exam A QUESTION 1 You use YARN to
More informationCloud Computing & Visualization
Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International
More informationInformatica Data Lake Management on the AWS Cloud
Informatica Data Lake Management on the AWS Cloud Quick Start Reference Deployment January 2018 Informatica Big Data Team Vinod Shukla AWS Quick Start Reference Team Contents Overview... 2 Informatica
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationWhat to expect from the session Technical recap VMware Cloud on AWS {Sample} Integration use case Services introduction & solution designs Solution su
LHC3376BES AWS Native Services Integration with VMware Cloud on AWS Technical Deep Dive Ian Massingham, Worldwide Lead, AWS Technical Evangelism Paul Bockelman, AWS Principal Solutions Architect (WWPS)
More information