Cloud Architecture Patterns. Running PostgreSQL at Scale (when RDS will not do what you need) Corey Huinker Corlogic Consulting December 2018
|
|
- Agnes Wiggins
- 5 years ago
- Views:
Transcription
1 Cloud Architecture Patterns Running PostgreSQL at Scale (when RDS will not do what you need) Corey Huinker Corlogic Consulting December 2018
2 First, we need a problem to solve.
3 This is You
4 You Get An Idea For a Product
5 You make a product!...now you have to sell it.
6 To advertise the product, you need an ad......so you talk to an ad agency.
7 But placing ads has challenges Need to find websites with visitors who: Would want to buy your product Are able to buy your product Would like the style of your advertisement
8 A Websites' Claims about their Visitors...
9 ...are not always accurate.
10 Buying ad-space on websites directly is usually not possible. You must use an auction service.
11 How Modern Ad Tracking Is Done Each advertisement is wrapped in a JavaScript program The program starts program starts when the web page loads The program sends a message every ~5 seconds until the page closes The program also sends messages when important events happen Is the advertisement in a space that fits the size of the image? Is the advertisement in a part of the screen that is visible to the user? Did the mouse pass over the ad? Did the video begin to play? Is the audio muted? Did the video finish? These messages are collected by a "pixel server" and combined to construct a timeline of the life of the advertisement on that web page
12 Now your monitored ad reports events.
13 Focal points of ad monitoring We want to know the number of times: How many times did the ad land on a page? ("Impressions") How many times the ad landed in a favorable spot on the page? How many times the ad fit into the space allotted? How many times the ad was visible for 10 seconds? 20? 30? Did the viewer interact with the ad in any way?
14 We want to know: The Real Purpose of ad monitoring How many of our ads were seen by actual humans? ("Engagement") How many of our ads were seen by NHT - Non-Human Traffic? ("bots") How do these numbers compare with the claims of the website? How do these numbers compare with the claims of the auction service? Ultimately, we want to know how much of our money was wasted, so we can change where we spend money in the future.
15 Ad Tracking creates a lot of data Not all impressions report their events Default rate is about 3% Full reporting would require 30x the infrastructure for only a small gain in accuracy Customers can pay for a higher sampling rate Approximate sampling events recorded per day: 50,000,000,000 Full reporting would be 1.5T events per day. Sampling events are chained together to tell the story of that impression. Impression data is then aggregated by date, ad campaign, browser type After aggregation, we have about 500M rows per day. Each row has > 125 measures of viewability metrics
16 Tagged Ads On Web Browsers The Original Architecture (2013) Viewable Events (Billions/day) Pixel Servers Log shipping Stats Cache Aggregators MySQL OLTP ( ) MySQL DW DBs ( ) Redshift ( ) Vertica ( ) Daily ETLs Daily Summary Files 100s of files (~100 GB total size) csv.gz files on S3 One per customer per day
17 User Query (2013) Website Request Stats-Cache API Partial Query Partial Query Vertica ( ) Stats Cache Aggregators Searches MySQL OLTP DB Partial Queries MySQL DBs - Shard By Date Redshift ( ) CSV.gz files not accessed directly by queries Partial Query Results combined at the application level 3 dialects of SQL 1 custom API
18 Capturing Ad Activity Events "Pixel" server: Website that only serves up one 1x1 pixel image Captures data about visiting browsers in web logs Needs to be fast to not delay user experience or risk losing event data Data must be read quickly to give customers real time results. Tagged Ads On Web Browsers Viewable Events (Billions/day) Number of Servers: ~500 Probably too many servers Over-provisioned for reliability EC2 Type: t3.xlarge or similar Low CPU workload Low disk I/O workload High network bandwidth low latency Pixel Servers
19 Real-time Event Accumulation and Aggregation Stats Cache machines consume syslogs from Pixel Servers Log Events from the same browser are combined to form the ad outcome. Outcomes are aggregated by ad campaign, product brand, etc. (10 different aggregations) Each Stats Cache is an incomplete shard of today's data At end of day, all shard data is combined into one CSV per customer Pixel Servers Log shipping Number of Servers: ~450 Custom in-memory DB No disk storage of data CPU load: nearly 100% Cannot use swap EC2 type: r5.2xlarge or similar Stats Cache Daily Summary Files 100s of files (~100 GB total size) S3 - CSVs
20 Real-time Stats Reporting (2013) API Request Stats Cache Shards Aggregation in Application Code Aggregating in application code high memory usage high network usage high potential for error API Response
21 Non-Real-time Data Reporting (2013) API Request MySQL Shards (by date) Aggregation in Application Code Redshift (for very large clients) API Response MySQL OLTP DB
22 First Steps to Fix MySQL OLTP DB Converted to PostgreSQL Logical Replication not yet available Conversion took 2-4 weeks using 2 programmers Added Triggers on MySQL tables to identify modified rows Used mysql_fdw to create migration tables on PostgreSQL Created each new PostgreSQL table as SELECT * FROM foreign table Scheduled tasks update PostgreSQL by reading new records in trigger tables Moved read-only workloads to postgres instance Migrated read-write apps in stages Only downtime was in final cut-over Final system: Single 32 core EC2 master with 1-2 physical read replicas
23 Next To Fix: MySQL Data Warehouse shards Performed adequately when daily volume was < 1% of current volume Impossible to add new columns to tables Easier to create a new shard than to modify an existing one. New metrics being added every few weeks or days (over 100 metrics) Dozens of shards, some cover a month of data, others only a few days Each new shard adds workload to application level aggregator
24 Understanding User Interest In Data Yesterday User Interest In Data Today's Real-time Data Age Of Data 8-30 Days Ago Older 2-7 Days Ago 85% of API requests are for data <= 7 days old This follows Zipf's Law: Conclusions: put newest data on fastest servers move older data onto fewer, slower servers
25 Postgres For The Most Needed Data Vanilla PostgreSQL instance 9.4 i3.8xlarge or similar: 32 cores, 240GB RAM, 5TB disk Data partitioned by day Drop any partitions > 10 days old. All data is copy of data in S3. No need for backups. Focus on loading the data as quickly as possible (< 2 hours) Smaller customer's data available earlier. Adjust application logic to make this data visible earlier. Codename: L7 CSV files on S3 One Per Customer+Date Daily ETL L7 DB
26 What Didn't Work: Redshift Intended to compliment MySQL Performed adequately when daily volume was < 1% of current volume Needed sub-second response, was getting 30s+ response Was the only machine that had a copy of data across all time HDD was slow, tried SSD instances, but had limited space Eventually grew to a 26 node cluster with 32 cores per node. Could not distinguish a large query from a small one Had no insight into how the data was partitioned Reorganizing data according to AWS suggestions would have resulted in vacuums taking several days.
27 What Didn't Work: Vertica Intended to compliment MySQL Good response times over larger data volumes Needed local disk to perform adequately, which limited disk size each cluster could only hold a few months of data 5 node clusters, 32 cores each. Could only have K-safety of 1, or else load took too long (2 hrs vs 10) Nodes failed daily, until glibc bug was fixed Expensive
28 Storing More History with PostgreSQL Goal: Increase storage of L7 to replace Vertica and/or Redshift Combining 30 small EBS drives via RAID-0 to make 1 30-TB drive This method had more IOPS than a single provisioned EBS drive of the same size Same hardware as an L7 could now store ~40 days of data As number of customers increased, 40 days would shrink to 25 Same strategy as L7, just keep the data longer Codename: Elmo - It stores "mo" (more) data CSV files on S3 One Per Customer+Date Daily ETL Elmo Clusters
29 Typeahead Search What we need: "Type-ahead" search queries, like Google search autocomplete query must finish in < 100ms queries can be across any time range, so all customer data must be covered Not all statistics are needed Only show best 10 matches
30 Typeahead Search What we did: Re-structure data to only store each searchable text string once Combine All data for a Customer's Day into one row using arrays PostgreSQL will compress those arrays via TOAST When compressed, all data can fit in 40TB Use btree_gin indexes for full text search All search ETL handled by 1 32 core machine (i3.8xlarge) All search requests handled by 2 replicas (i3.8xlarge) CSV files on S3 One Per Customer+Date Daily ETL Search DB
31 Applying TOAST to Regular Data Combined All of a customer's data for one day into one row with arrays TOAST Compression shifts workload from scarce IOPS to abundant CPU Some customer's data too large for a single row Split the customer's data into several "chunk" rows Used same hardware as other instances (i3.8xlarge) Same RAID-0 as used in Elmo instance could now hold all customer data
32 Applying TOAST to Regular Data ETL too slow to be handled by just once machine (compression takes time) 5 32-core machines with an ETL-load sharing feature such that each one processes a client/day then shares it with other nodes Replaced all Redshift (1 5-node cluster) and Vertica instances (9 5-node clusters)! Big cost savings Codename: Marjory (Elmo and Marjory are Muppets from Jim Henson TV shows) Marjory DB CSV files on S3 One Per Customer+Date Daily ETL Compressed Data Sharing Marjory DB Marjory DB
33 APIs to Foreign Data Wrappers The Stats-Cache API data must be added to any data which is fetched from PostgreSQL Existing in-memory database written in Python The re-aggregation of this data was handled in regular code, not SQL This is slow and error-prone We created a Foreign Data Wrapper using the multicorn Python API The FDW takes the SQL query an makes an API call, then puts results in a result set The API now looks like a set of PostgreSQL tables Aggregation in SQL much faster Code much simpler
34 Complex Foreign Data Wrappers Codename: Frackles Store csv.gz in compressed SQLite files on S3 Each query starts a web server Start one AWS Lambda per customer/day Each lamba fetches and queries on SQLite file Report results back to web server Web server aggregates results and returns as result set Queries are slow, but data is available sooner Very Short ETL, but slower than dedicated servers Very good for queries across long date ranges AWS now offers Athena, a similar (but costly) service SQL Query Frackles FDW AWS Lambda AWS Lambda AWS Lambda AWS Lambda SQLite S3 SQLite File S3 SQLite File S3 SQLite File S3 File
35 Other Tools: PMPP Poor Man's Parallel Processing Written by me First written in PL/PgSQL, but re-coded in C for performance reasons Set returning function that takes db names + queries as input. Allows an application to send multiple queries in parallel to multiple servers all the queries have the same shape (columns, types) User can re-aggregate data returned from the set returning function. Any machine that talks libpq could be queried (PgSQL, Vertica, Redshift) Allows for partial aggregation on DW boxes Secondary aggregation can occur on local machine
36 Other Tools: Decanters Large queries can exhaust memory on an application machine A decanter lets wine "breathe" These machines let the data "breathe" Abundant CPUs, abundant memory per CPU, minimal disk Some very small lookup tables replicated for performance reasons All other local tables are FDWs to OLTP database (postgres_fdw) Common use: Big PMPP query to Stats-Cache, Elmo, Marjory, Frackles, each one doing a local aggregation Final aggregation happens on decanter Can occasionally experience OOM (rather than on an important machine) New decanter can spin up and enter load balancer in 5 minutes No engineering time to be spent rescuing failed decanters
37 ETL Process (2017) Tagged Ads Elmo Clusters Viewable Events Pixel Servers Log shipping Stats Aggregators Marjory Clusters Daily ETLs S3 - CSVs Daily Summaries Search Clusters S3 - SQLite
38 User Queries (2017) User Search Clusters Searches Stats Requests OLTP DB Third Party DW Pg FDW Pg FDW Decanters Frackles FDW S3 - SQLite Stats-Cache FDW PMPP Requests Live Stats Aggregators Elmo Clusters Marjory Clusters
39 Why Not RDS? No ability to install custom extensions (PMPP, pg_partman, etc) No place to do local \copy operations Reduced insight into the server load (this is better now with RDS Performance Insights) Reduced ability to tune pg server No ability to try beta versions Expense
40 Why Not Aurora? Had early adopter access AWS Devs said that it was not geared for DW workloads I/O sometimes good, sometimes bad Wasn't ready yet Data volumes necessitate advanced partitioning Advanced partitioning was not available until v10 Expense
41 Why Not Athena? Athena had no concept of constraint exclusion to avoid reading irrelevant files Costs $5/TB of data read Most queries would cost > $100 each Running thousands of queries per hour
42 Questions?
POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PostgresConf US
POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN PostgresConf US 2018 2018-04-20 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: alexander.kukushkin@zalando.de
More informationIntroduction to Database Services
Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational
More informationDatabase Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu
Database Architecture 2 & Storage Instructor: Matei Zaharia cs245.stanford.edu Summary from Last Time System R mostly matched the architecture of a modern RDBMS» SQL» Many storage & access methods» Cost-based
More informationPOSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN. PGConf.EU 2017, Warsaw
POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN PGConf.EU 2017, Warsaw 26-10-2017 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: alexander.kukushkin@zalando.de
More information5 Fundamental Strategies for Building a Data-centered Data Center
5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse
More informationScaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.
caling MongoDB Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB enior ervice Technical ervice Engineer 1 Me and the expected audience @adamotonete Intermediate - At least 6+ months
More informationPOSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN
POSTGRESQL ON AWS: TIPS & TRICKS (AND HORROR STORIES) ALEXANDER KUKUSHKIN 07-07-2017 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech Email: alexander.kukushkin@zalando.de Twitter: @cyberdemn
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More information<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure
MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure Mario Beck (mario.beck@oracle.com) Principal Sales Consultant MySQL Session Agenda Requirements for
More informationCIT 668: System Architecture. Amazon Web Services
CIT 668: System Architecture Amazon Web Services Topics 1. AWS Global Infrastructure 2. Foundation Services 1. Compute 2. Storage 3. Database 4. Network 3. AWS Economics Amazon Services Architecture Regions
More informationHighway to Hell or Stairway to Cloud?
Highway to Hell or Stairway to Cloud? Percona Live 2018, Frankfurt ALEXANDER KUKUSHKIN 06-11-2018 ABOUT ME Alexander Kukushkin Database Engineer @ZalandoTech The Patroni guy alexander.kukushkin@zalando.de
More informationCASE STUDY Application Migration and optimization on AWS
CASE STUDY Application Migration and optimization on AWS Newt Global Consulting LLC. AMERICAS INDIA HQ Address: www.newtglobal.com/contactus 2018 Newt Global Consulting. All rights reserved. Referred products/
More informationLarge Scale MySQL Migration
to PostgreSQL! May 17, 2012 Content 1 Presentation Former Architecture A Wind of Change 2 PostgreSQL Architecture 3 4 In production Any question? Content 1 Presentation Former Architecture A Wind of Change
More informationArchitecture and Design of MySQL Powered Applications. Peter Zaitsev CEO, Percona Highload Moscow, Russia 31 Oct 2014
Architecture and Design of MySQL Powered Applications Peter Zaitsev CEO, Percona Highload++ 2014 Moscow, Russia 31 Oct 2014 About Percona 2 Open Source Software for MySQL Ecosystem Percona Server Percona
More informationMySQL In the Cloud. Migration, Best Practices, High Availability, Scaling. Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017.
MySQL In the Cloud Migration, Best Practices, High Availability, Scaling Peter Zaitsev CEO Los Angeles MySQL Meetup June 12 th, 2017 1 Let me start. With some Questions! 2 Question One How Many of you
More informationUnlimited Scalability in the Cloud A Case Study of Migration to Amazon DynamoDB
Unlimited Scalability in the Cloud A Case Study of Migration to Amazon DynamoDB Steve Saporta CTO, SpinCar Mar 19, 2016 SpinCar When a web-based business grows... More customers = more transactions More
More informationMySQL Performance Improvements
Taking Advantage of MySQL Performance Improvements Baron Schwartz, Percona Inc. Introduction About Me (Baron Schwartz) Author of High Performance MySQL 2 nd Edition Creator of Maatkit, innotop, and so
More informationBuilding High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL
Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high
More informationMega-scale Postgres How to run 1,000,000 Postgres Databases
Mega-scale Postgres How to run 1,000,000 Postgres Databases Program What is Heroku & Heroku Postgres? Organizing principles for mega-scale operations Heroku Postgres Code deployment is good, but what
More informationWhat is the Future of PostgreSQL?
What is the Future of PostgreSQL? Robert Haas 2013 EDB All rights reserved. 1 PostgreSQL Popularity By The Numbers Date Rating Increase vs. Prior Year % Increase January 2016 282.401 +27.913 +11% January
More informationWhich technology to choose in AWS?
Which technology to choose in AWS? RDS / Aurora / Roll-your-own April 17, 2018 Daniel Kowalewski Senior Technical Operations Engineer Percona 1 2017 Percona AWS MySQL options RDS for MySQL Aurora MySQL
More information2013 AWS Worldwide Public Sector Summit Washington, D.C.
2013 AWS Worldwide Public Sector Summit Washington, D.C. EMR for Fun and for Profit Ben Butler Sr. Manager, Big Data butlerb@amazon.com @bensbutler Overview 1. What is big data? 2. What is AWS Elastic
More informationAWS Solutions Architect Associate (SAA-C01) Sample Exam Questions
1) A company is storing an access key (access key ID and secret access key) in a text file on a custom AMI. The company uses the access key to access DynamoDB tables from instances created from the AMI.
More informationAgenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration
Storage on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationMigrating Enterprise Applications to the Cloud Session 672. Leighton L. Nelson
Migrating Enterprise Applications to the Cloud Session 672 Leighton L. Nelson Leighton L. Nelson Instructional Technology Principal Oracle ACE & Oracle Certified Expert Oracle Database Administrator Author/blogger
More informationRunning MySQL on AWS. Michael Coburn Wednesday, April 15th, 2015
Running MySQL on AWS Michael Coburn Wednesday, April 15th, 2015 Who am I? 2 Senior Architect with Percona 3 years on Friday! Canadian but I now live in Costa Rica I see 3-10 different customer environments
More informationAurora, RDS, or On-Prem, Which is right for you
Aurora, RDS, or On-Prem, Which is right for you Kathy Gibbs Database Specialist TAM Katgibbs@amazon.com Santa Clara, California April 23th 25th, 2018 Agenda RDS Aurora EC2 On-Premise Wrap-up/Recommendation
More informationHOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION
HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION Steve Bertoldi, Solutions Director, MarkLogic Agenda Cloud computing and on premise issues Comparison of traditional vs cloud architecture Review of use
More informationSecrets of PostgreSQL Performance. Frank Wiles Revolution Systems
Secrets of PostgreSQL Performance Frank Wiles Revolution Systems Measure Everything Before And After DON T DO DUMB THINGS! Don t... Assume PostgreSQL is like MySQL Database server doing double duty Disk
More informationHow Enova Financial Uses Postgres. Jim Nasby, Lead Database Architect
How Enova Financial Uses Postgres Jim Nasby, Lead Database Architect Who are we? Some history Migration Where are we today? (The cheerleading section) Cool stuff Q&A Overview 2 Who are we? Who are we?
More informationTour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect
Tour of Database Platforms as a Service June 2016 Warner Chaves Christo Kutrovsky Solutions Architect Bio Solutions Architect at Pythian Specialize high performance data processing and analytics 15 years
More informationAutomating Information Lifecycle Management with
Automating Information Lifecycle Management with Oracle Database 2c The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated
More informationPerformance Test Results for ScaleArc for MySQL on Aurora RDS Nov ScaleArc. All Rights Reserved. 1
Performance Test Results for ScaleArc for MySQL on Aurora RDS Nov 2016 2016 ScaleArc. All Rights Reserved. 1 ScaleArc for MySQL Aurora RDS Testing ScaleArc has updated its ScaleArc for MySQL database load
More informationEsgynDB Enterprise 2.0 Platform Reference Architecture
EsgynDB Enterprise 2.0 Platform Reference Architecture This document outlines a Platform Reference Architecture for EsgynDB Enterprise, built on Apache Trafodion (Incubating) implementation with licensed
More informationMassive Scalability With InterSystems IRIS Data Platform
Massive Scalability With InterSystems IRIS Data Platform Introduction Faced with the enormous and ever-growing amounts of data being generated in the world today, software architects need to pay special
More informationHigh-Performance Distributed DBMS for Analytics
1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest
More informationDesigning dashboards for performance. Reference deck
Designing dashboards for performance Reference deck Basic principles 1. Everything in moderation 2. If it isn t fast in database, it won t be fast in Tableau 3. If it isn t fast in desktop, it won t be
More informationAWS Database Migration Service
AWS Database Migration Service Database Modernisation with Minimal Downtime John Winford Sr. Technical Program Manager May 18, 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationScaling for Humongous amounts of data with MongoDB
Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com From here... http://bit.ly/ot71m4 ...to here... http://bit.ly/oxcsis
More informationIT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:
IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including: 1. IT Cost Containment 84 topics 2. Cloud Computing Readiness 225
More informationDoubling Performance in Amazon Web Services Cloud Using InfoScale Enterprise
Doubling Performance in Amazon Web Services Cloud Using InfoScale Enterprise Veritas InfoScale Enterprise 7.3 Last updated: 2017-07-12 Summary Veritas InfoScale Enterprise comprises the Veritas InfoScale
More informationService Description. IBM DB2 on Cloud. 1. Cloud Service. 1.1 IBM DB2 on Cloud Standard Small. 1.2 IBM DB2 on Cloud Standard Medium
Service Description IBM DB2 on Cloud This Service Description describes the Cloud Service IBM provides to Client. Client means the company and its authorized users and recipients of the Cloud Service.
More informationData Analytics at Logitech Snowflake + Tableau = #Winning
Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief
More informationAbout Intellipaat. About the Course. Why Take This Course?
About Intellipaat Intellipaat is a fast growing professional training provider that is offering training in over 150 most sought-after tools and technologies. We have a learner base of 600,000 in over
More informationDetermining the IOPS Needs for Oracle Database on AWS
Determining the IOPS Needs for Oracle Database on AWS Abdul Sathar Sait December 2014 Contents Abstract 2 Introduction 2 Storage Options for Oracle Database 3 IOPS Basics 4 Estimating IOPS for an Existing
More informationRACKSPACE ONMETAL I/O V2 OUTPERFORMS AMAZON EC2 BY UP TO 2X IN BENCHMARK TESTING
RACKSPACE ONMETAL I/O V2 OUTPERFORMS AMAZON EC2 BY UP TO 2X IN BENCHMARK TESTING EXECUTIVE SUMMARY Today, businesses are increasingly turning to cloud services for rapid deployment of apps and services.
More informationAmazon AWS-Solution-Architect-Associate Exam
Volume: 858 Questions Question: 1 You are trying to launch an EC2 instance, however the instance seems to go into a terminated status immediately. What would probably not be a reason that this is happening?
More informationBERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
BERLIN 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Amazon Aurora: Amazon s New Relational Database Engine Carlos Conde Technology Evangelist @caarlco 2015, Amazon Web Services,
More informationFrom Single Purpose to Multi Purpose Data Lakes. Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019
From Single Purpose to Multi Purpose Data Lakes Thomas Niewel Technical Sales Director DACH Denodo Technologies March, 2019 Agenda Data Lakes Multiple Purpose Data Lakes Customer Example Demo Takeaways
More informationSQL Server 2014 Upgrade
SQL Server 2014 Upgrade Case study featuring In-Memory OLTP and Hybrid-Cloud Scenarios Evgeny Ternovsky, Program Manager II, Data Platform Group Bill Kan, Service Engineer II, Data Platform Group Background
More informationMiddle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems
Middle East Technical University Jeren AKHOUNDI (1836345) Ipek Deniz Demirtel (1997691) Derya Nur Ulus (1899608) CENG553 Database Management Systems * Introduction to Cloud Computing * Cloud DataBase as
More informationOracle Autonomous Database
Oracle Autonomous Database Maria Colgan Master Product Manager Oracle Database Development August 2018 @SQLMaria #thinkautonomous Safe Harbor Statement The following is intended to outline our general
More informationModern Data Warehouse The New Approach to Azure BI
Modern Data Warehouse The New Approach to Azure BI History On-Premise SQL Server Big Data Solutions Technical Barriers Modern Analytics Platform On-Premise SQL Server Big Data Solutions Modern Analytics
More informationLazyBase: Trading freshness and performance in a scalable database
LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY
More informationScalability of web applications
Scalability of web applications CSCI 470: Web Science Keith Vertanen Copyright 2014 Scalability questions Overview What's important in order to build scalable web sites? High availability vs. load balancing
More informationh7ps://bit.ly/citustutorial
Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul
More informationIBM Terms of Use SaaS Specific Offering Terms. IBM DB2 on Cloud. 1. IBM SaaS. 2. Charge Metrics
IBM Terms of Use SaaS Specific Offering Terms IBM DB2 on Cloud The Terms of Use ( ToU ) is composed of this IBM Terms of Use - SaaS Specific Offering Terms ( SaaS Specific Offering Terms ) and a document
More informationTime-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018
Time-Series Data in MongoDB on a Budget Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018 TIME SERIES DATA in MongoDB on a Budget Click to add text
More informationOracle made it easy: Cloud DB Vergleich
Oracle made it easy: Cloud DB Vergleich MATTHIAS FUCHS, ESENTRI BORYS NESELOVSKYI, OPITZ CONSULTING DOAG 2018 KONFERENZ, NÜRNBERG Cloud Angebote für Oracle Datenbank ORACLE CLOUD Oracle Datenbank Microsoft
More informationCISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL
CISC 7610 Lecture 5 Distributed multimedia databases Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL Motivation YouTube receives 400 hours of video per minute That is 200M hours
More informationSAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions
SAA-C01 AWS Solutions Architect Associate Exam Summary Syllabus Questions Table of Contents Introduction to SAA-C01 Exam on AWS Solutions Architect Associate... 2 AWS SAA-C01 Certification Details:...
More informationThe Future of Postgres Sharding
The Future of Postgres Sharding BRUCE MOMJIAN This presentation will cover the advantages of sharding and future Postgres sharding implementation requirements. Creative Commons Attribution License http://momjian.us/presentations
More informationPostgreSQL migration from AWS RDS to EC2
PostgreSQL migration from AWS RDS to EC2 Technology lover Worked as Software Engineer, Team lead, DevOps, DBA, Data analyst Sr. Tech Architect at Coverfox Email me at mistryhitul007@gmail.com Tweet me
More informationIntroduction To Postgres. Rodrigo Menezes
Introduction To Postgres Rodrigo Menezes I joined in 2013, when we were ~20 people Acquired by Oracle during summer of 2017 Currently, we re about ~250 people I started off as a frontend developer This
More informationAmazon Aurora Deep Dive
Amazon Aurora Deep Dive Anurag Gupta VP, Big Data Amazon Web Services April, 2016 Up Buffer Quorum 100K to Less Proactive 1/10 15 caches Custom, Shared 6-way Peer than read writes/second Automated Pay
More informationAmazon Web Services and Feb 28 outage. Overview presented by Divya
Amazon Web Services and Feb 28 outage Overview presented by Divya Amazon S3 Amazon S3 : store and retrieve any amount of data, at any time, from anywhere on web. Amazon S3 service: Create Buckets Create
More informationData-Intensive Distributed Computing
Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo
More informationcstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman
cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman What is CitusDB? CitusDB is a scalable analytics database that extends PostgreSQL Citus shards your data and automa/cally parallelizes
More informationAWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.
AWS Storage Amazon EFS Amazon EBS Amazon EC2 Instance storage Amazon S3 Amazon Glacier AWS Storage Gateway File Block Object Hybrid integrated storage Amazon S3 Amazon Glacier Amazon EBS Amazon EFS Durable
More informationNetezza The Analytics Appliance
Software 2011 Netezza The Analytics Appliance Michael Eden Information Management Brand Executive Central & Eastern Europe Vilnius 18 October 2011 Information Management 2011IBM Corporation Thought for
More information10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414
Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON
More informationAccelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card
Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card The Rise of MongoDB Summary One of today s growing database
More informationCluster-Level Google How we use Colossus to improve storage efficiency
Cluster-Level Storage @ Google How we use Colossus to improve storage efficiency Denis Serenyi Senior Staff Software Engineer dserenyi@google.com November 13, 2017 Keynote at the 2nd Joint International
More informationPractical MySQL Performance Optimization. Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars
Practical MySQL Performance Optimization Peter Zaitsev, CEO, Percona July 02, 2015 Percona Technical Webinars In This Presentation We ll Look at how to approach Performance Optimization Discuss Practical
More informationCloud Analytics and Business Intelligence on AWS
Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse
More informationETL Best Practices and Techniques. Marc Beacom, Managing Partner, Datalere
ETL Best Practices and Techniques Marc Beacom, Managing Partner, Datalere Thank you Sponsors Experience 10 years DW/BI Consultant 20 Years overall experience Marc Beacom Managing Partner, Datalere Current
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationBig Data solution benchmark
Big Data solution benchmark Introduction In the last few years, Big Data Analytics have gained a very fair amount of success. The trend is expected to grow rapidly with further advancement in the coming
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016
Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016 Nikita Ivanov CTO and Co-Founder GridGain Systems Peter Zaitsev CEO and Co-Founder Percona About the Presentation
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationDesign Patterns for Large- Scale Data Management. Robert Hodges OSCON 2013
Design Patterns for Large- Scale Data Management Robert Hodges OSCON 2013 The Start-Up Dilemma 1. You are releasing Online Storefront V 1.0 2. It could be a complete bust 3. But it could be *really* big
More informationLessons learned while automating MySQL in the AWS cloud. Stephane Combaudon DB Engineer - Slice
Lessons learned while automating MySQL in the AWS cloud Stephane Combaudon DB Engineer - Slice Our environment 5 DB stacks Data volume ranging from 30GB to 2TB+. Master + N slaves for each stack. Master
More information10 BEST PRACTICES FOR REDUCING SPEND IN AWS
10 BEST PRACTICES FOR REDUCING SPEND IN AWS INTRODUCTION Amazon Web Services (AWS) forever changed the world of IT when it entered the market in 2006 offering services for pennies on the dollar. While
More informationWorkshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud
Workshop Report: ElaStraS - An Elastic Transactional Datastore in the Cloud Sudipto Das, Divyakant Agrawal, Amr El Abbadi Report by: Basil Kohler January 4, 2013 Prerequisites This report elaborates and
More informationIn-Memory Data Management Jens Krueger
In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing
More informationIBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store
IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data IBM Db2 Event Store Disclaimer The information contained in this presentation is provided for informational purposes only.
More informationTHE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER
WHITE PAPER THE ZADARA CLOUD An overview of the Zadara Storage Cloud and VPSA Storage Array technology Zadara 6 Venture, Suite 140, Irvine, CA 92618, USA www.zadarastorage.com EXECUTIVE SUMMARY The IT
More informationAmazon Aurora Deep Dive
Amazon Aurora Deep Dive Kevin Jernigan, Sr. Product Manager Amazon Aurora PostgreSQL Amazon RDS for PostgreSQL May 18, 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda
More information/ Cloud Computing. Recitation 6 October 2 nd, 2018
15-319 / 15-619 Cloud Computing Recitation 6 October 2 nd, 2018 1 Overview Announcements for administrative issues Last week s reflection OLI unit 3 module 7, 8 and 9 Quiz 4 Project 2.3 This week s schedule
More informationHigh Noon at AWS. ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2
High Noon at AWS ~ Amazon MySQL RDS versus Tungsten Clustering running MySQL on AWS EC2 Introduction Amazon Web Services (AWS) are gaining popularity, and for good reasons. The Amazon Relational Database
More information10/29/2013. Program Agenda. The Database Trifecta: Simplified Management, Less Capacity, Better Performance
Program Agenda The Database Trifecta: Simplified Management, Less Capacity, Better Performance Data Growth and Complexity Hybrid Columnar Compression Case Study & Real-World Experiences
More informationConceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.
Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion
More informationScaling with mongodb
Scaling with mongodb Ross Lawley Python Engineer @ 10gen Web developer since 1999 Passionate about open source Agile methodology email: ross@10gen.com twitter: RossC0 Today's Talk Scaling Understanding
More informationLecture 16. Today: Start looking into memory hierarchy Cache$! Yay!
Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1
More informationAccelerate Big Data Insights
Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not
More informationAn Introduction to Big Data Formats
Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION
More informationAWS Storage Optimization. AWS Whitepaper
AWS Storage Optimization AWS Whitepaper AWS Storage Optimization: AWS Whitepaper Copyright 2018 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress
More informationWrite On Aws. Aws Tools For Windows Powershell User Guide using the aws tools for windows powershell (p. 19) this section includes information about
We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing it on your computer, you have convenient answers with write on aws. To get
More informationPostgres in Amazon RDS. Denish Patel Lead Database Architect
Postgres in Amazon RDS / Denish Patel Lead Database Architect Who am I? Database Architect with OmniTI for last 7+ years Expertise in PostgreSQL, Oracle, MySQL, NoSQL Contact : denish@omniti.com, Twitter:
More information