Sizing Cloud Data Warehouses

Similar documents
Amazon Elastic File System

Oracle WebLogic Server 12c on AWS. December 2018

DURATION : 03 DAYS. same along with BI tools.

Lambda Architecture for Batch and Stream Processing. October 2018

SQL Server Performance on AWS. October 2018

Your New Autonomous Data Warehouse

Autonomous Data Warehouse in the Cloud

Automating Elasticity. March 2018

Introduction to Database Services

Amazon CloudFront AWS Service Delivery Program Consulting Partner Validation Checklist

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

Securely Access Services Over AWS PrivateLink. January 2019

Data Protection Done Right with Dell EMC and AWS

AWS Storage Optimization. AWS Whitepaper

Amazon Aurora AWS Service Delivery Program Consulting Partner Validation Checklist

ITIL Event Management in the Cloud

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

ClickHouse on MemCloud Performance Benchmarks by Altinity

Overview of the Samsung Push to Talk (PTT) Solution on AWS. October 2017

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Determining the IOPS Needs for Oracle Database on AWS

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Cloud Analytics and Business Intelligence on AWS

Best Practices and Performance Tuning on Amazon Elastic MapReduce

Leveraging Amazon Chime Voice Connector for SIP Trunking. March 2019

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

IBM Db2 Warehouse on Cloud

Advanced Architectures for Oracle Database on Amazon EC2

10 Considerations for a Cloud Procurement. March 2017

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

QLIK INTEGRATION WITH AMAZON REDSHIFT

Cloud Computing & Visualization

Move Amazon RDS MySQL Databases to Amazon VPC using Amazon EC2 ClassicLink and Read Replicas

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Use AWS Config to Monitor License Compliance on Amazon EC2 Dedicated Hosts. April 2016

Netflix OSS Spinnaker on the AWS Cloud

Swift Web Applications on the AWS Cloud

Oracle Machine Learning Notebook

Leverage the Oracle Data Integration Platform Inside Azure and Amazon Cloud

Building a Modular and Scalable Virtual Network Architecture with Amazon VPC

Database in the Cloud Benchmark

Big Data solution benchmark

Introduction to AWS GoldBase

White Paper. Platform9 ROI for Hybrid Clouds

AWS Service Delivery Program AWS Database Migration Service (DMS) Consulting Partner Validation Checklist

Service Description. IBM DB2 on Cloud. 1. Cloud Service. 1.1 IBM DB2 on Cloud Standard Small. 1.2 IBM DB2 on Cloud Standard Medium

EC2 Scheduler. AWS Implementation Guide. Lalit Grover. September Last updated: September 2017 (see revisions)

Agenda. Introduction Storage Primer Block Storage Shared File Systems Object Store On-Premises Storage Integration

Aurora, RDS, or On-Prem, Which is right for you

CIT 668: System Architecture. Amazon Web Services

IBM Terms of Use SaaS Specific Offering Terms. IBM DB2 on Cloud. 1. IBM SaaS. 2. Charge Metrics

All-Flash Storage Solution for SAP HANA:

IBM Db2 Event Store Simplifying and Accelerating Storage and Analysis of Fast Data. IBM Db2 Event Store

IBM dashdb for Analytics

Gabriel Villa. Architecting an Analytics Solution on AWS

SoftNAS Cloud Performance Evaluation on AWS

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Amazon AWS-Solution-Architect-Associate Exam

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Amazon Web Services 101 April 17 th, 2014 Joel Williams Solutions Architect. Amazon.com, Inc. and its affiliates. All rights reserved.

What s New at AWS? looking at just a few new things for Enterprise. Philipp Behre, Enterprise Solutions Architect, Amazon Web Services

Confluence Data Center on the AWS Cloud

Technical Brief. Adding Zadara Storage to VMware Cloud on AWS

Implementing a Cloud Analytics Solution on Amazon Web Services with Amazon Redshift and Informatica Cloud Reference Architecture Guide

Introduction to AWS GoldBase. A Solution to Automate Security, Compliance, and Governance in AWS

Amazon Web Services. Foundational Services for Research Computing. April Mike Kuentz, WWPS Solutions Architect

Puppet on the AWS Cloud

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Automated Netezza Migration to Big Data Open Source

Licensing Oracle on Amazon EC2, RDS and Microsoft Azure now twice as expensive!

The Future of Analytics in the Cloud

EBOOK: VMware Cloud on AWS: Optimized for the Next-Generation Hybrid Cloud

SoftNAS Cloud Performance Evaluation on Microsoft Azure

Microsoft Azure for AWS Experts

4) An organization needs a data store to handle the following data types and access patterns:

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

JIRA Software and JIRA Service Desk Data Center on the AWS Cloud

Modernizing Business Intelligence and Analytics

BERLIN. 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

IBM dashdb Enterprise

Doubling Performance in Amazon Web Services Cloud Using InfoScale Enterprise

Certified Reference Design for VMware Cloud Providers

THE JOURNEY OVERVIEW THREE PHASES TO A SUCCESSFUL MIGRATION ADOPTION ACCENTURE IS 80% IN THE CLOUD


Set Up a Compliant Archive. November 2016

Overview of AWS Security - Database Services

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

TCC, so your business continues

Video on Demand on AWS

Configuring Generic Amazon AWS S3 Compatible Storage and Actifio OnVault for Long-Term Data Retention

Amazon Elastic Compute Cloud (EC2)

Performance Efficiency Pillar

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

Running VMware vsan Witness Appliance in VMware vcloudair First Published On: April 26, 2017 Last Updated On: April 26, 2017

Deep Dive on Amazon Elastic File System

Amazon Web Services. For Government, Education, and Nonprofit Organizations

Cloud and Storage. Transforming IT with AWS and Zadara. Doug Cliche, Storage Solutions Architect June 5, 2018

Lenovo Database Configuration for Microsoft SQL Server TB

Transcription:

Recommended Guidelines to Sizing a Cloud Data Warehouse January 2019

Notices Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents AWS s current product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS s products or services are provided as is without warranties, representations, or conditions of any kind, whether express or implied. AWS s responsibilities and liabilities to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers. 2019, Inc. or its affiliates. All rights reserved.

Contents Introduction 1 Sizing Guidelines 2 Redshift Cluster Resize 4 Conclusion 5 Contributors 6 Document Revisions 6

Abstract This whitepaper describes a process to determine an appropriate configuration for your migration to a cloud data warehouse. This process is appropriate for typical data migrations to a cloud data warehouse, such as Amazon Redshift. The intended audience includes database administrators, data engineers, data architects, and other data warehouse stakeholders. Whether you are performing a PoC (proof of concept) evaluation, a production migration, or are migrating from an on-premises appliance or another cloud data warehouse, you can follow the simple guidelines in this whitepaper to help you increase the chances of delivering a data warehouse cluster with the desired storage, performance, and cost profile.

Introduction One of the first tasks of migrating to any data warehouse is sizing the data warehouse appropriately by determining the appropriate number of cluster nodes and their compute and storage profiles. Fortunately, with cloud data warehouses such as Amazon Redshift, it is a relatively straightforward task to make immediate course corrections to resize your cluster up or down. However, sizing a cloud data warehouse based on the wrong type of information can lead to your PoC evaluations and production environments being executed on suboptimal cluster configurations. Resizing a cluster might be easy, but repeating PoCs and dealing with change control procedures for production environments can potentially be more time consuming, risky, and costly, which puts your project milestones at risk. Migrations of several petabytes to exabytes of uncompressed data typically benefit from a more holistic sizing approach that factors in existing data warehouse properties, data profiles, and workload profiles. Holistic sizing approaches are more involved and fall under the category of professional services engagement. For more information, contact AWS. Page 1

Sizing Guidelines For migrations of approximately one petabyte or less of uncompressed data, you can use a simple, storage-centric sizing approach to identify an appropriate data warehouse cluster configuration. With the simple-sizing approach, your organization s uncompressed data size is the key input for sizing your Redshift cluster. However, you must refine that size a little. Redshift typically achieves 3x 4x data compression, which means that the data that is persisted in Redshift is typically 3 4 times smaller than the amount of uncompressed data. In addition, it is always a best practice to maintain 20% of free capacity in a Redshift cluster, so you should increase your compressed data size by a factor of 1.25 to ensure a healthy amount (20%) of free space. The simple-sizing approach can be represented by this equation: This equation is appropriate for typical data migrations, but it is important to note that sub-optimal data modelling practices could artificially lead to insufficient storage capacity. Amazon Redshift has four basic node types or instance types with different storage capacities. For more information on Redshift instance types, see the Amazon Redshift Clusters documentation. Page 2

Basic Redshift cluster information Instance Family Instance Name vcpus Memory Storage # Slices Densestorage Densecompute ds2.xlarge 4 31GiB 2TB HDD 2 ds2.8xlarge 36 244GiB 16TB HDD 16 dc2.large 2 15.25GiB 160GB SSD 2 dc2.8xlarge 32 244GiB 2.56TB SSD 16 In an example scenario, the fictitious company, Example.com, would like to migrate 100TB of uncompressed data from its on-premises data warehouse to Amazon Redshift. Using a conservative compression ratio of 3x, you can expect that the compressed data profile in Redshift wil decrease from 100TB to approximately 33TB. You factor in an additional 20% size increase to ensure a healthy amount of free space, which will give you approximately 42TB of storage capacity in your Redshift cluster. You now have your target storage capacity of 42TB. There are multiple Redshift cluster configurations that can satisfy that requirement. The Example.com VP of Data Analytics wants to start out small, select the least expensive option for their cloud data warehouse, and then scale up as necessary. With that extra requirement, you can configure your Redshift cluster using the dense-storage, ds2.xlarge node type which has 2TB of storage capacity. With this information, your simple-sizing equation is: Page 3

You should also consider the following information about this example Redshift cluster configuration: Cluster Type Instance Type Nodes Cluster Capacity Memory Compute Storage Cost ($/month) Densestorage Densecompute ds2.xlarge 21 651Gb 84 Cores 42TB $x ds2.8xlarge 3 732Gb 108 Cores 48TB $1.2x dc2.8xlarge 17 4,148Gb 544 Cores 44TB $4.52x If initial testing shows that the Redshift cluster you selected is under or over powered, you can use the straightforward resizing capabilities available in Redshift to scale the Redshift cluster configuration up or down for the necessary price and performance. Redshift Cluster Resize After your data migration from your on-premises data warehouse to the cloud is complete, over time it is normal to make incremental node additions or removals from your cloud data warehouse. These changes help you to maintain the cost, storage, and performance profiles you need for your data warehouse. With Amazon Redshift, there are two main methods to resize your cluster: Elastic resize Your existing Redshift cluster is modified to add or remove nodes, either manually or with an API call. This resize typically requires approximately 15 minutes to complete. Some tasks might continue to run in the background, but your Redshift cluster is fully available during that time. Classic resize Enables a Redshift cluster to be reconfigured with a different node count and instance type. The original cluster enters read-only mode during the resize, which can take multiple hours. In addition, Amazon Redshift supports concurrency-based scaling, which is a feature that adds transient capacity to your cluster during concurrency spikes. This, in effect, temporarily increases the number of Amazon Redshift nodes processing your queries. With concurrency scaling, Redshift automatically adds transient clusters to your Redshift cluster to handle concurrent requests with consistently fast performance. This means that your Redshift cluster is temporarily scaled up with additional compute nodes to provide increased concurrency and consistent performance. Page 4

For more information about resizing a Redshift cluster, see: Resizing a Cluster (Redshift Documentation) https://docs.aws.amazon.com/redshift/latest/mgmt/working-withclusters.html#cluster-resize-intro Elastic Resize (Redshift Documentation) https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-redshiftelastic-resize/ Elastic Resize (Blog Post) https://aws.amazon.com/blogs/big-data/scale-your-amazon-redshift-clustersup-and-down-in-minutes-to-get-the-performance-you-need-when-you-need-it/ Concurrency Scaling (Blog Post) https://www.allthingsdistributed.com/2018/11/amazon-redshift-performanceoptimization.html Conclusion It is important that you size your cloud data warehouse using the right information and approach. Although it is easy to resize a cloud data warehouse (such as Amazon Redshift) up or down to achieve a different cost or performance profile, the change control procedures for modifying a production environment, repeating a PoC evaluation, etc. could pose significant challenges to project milestones. You can follow the simple sizing approach outlined in this whitepaper to help you identify the appropriate cluster configurations for your data migration. Page 5

Contributors Contributors to this document include: Asser Moustafa, Solutions Architect Specialist, Data Warehousing Thiyagarajan Arumugam, Solutions Architect Specialist, Data Warehousing Document Revisions Date January 2019 Description First publication Page 6