Understanding Interleaved Sort Keys in Amazon Redshift

Size: px
Start display at page:

Download "Understanding Interleaved Sort Keys in Amazon Redshift"

Transcription

1 Understanding Interleaved Sort Keys in Amazon Redshift

2

3 Introduction Recently, Amazon announced interleaved sort keys for Amazon Redshift. Until now, compound sort keys were the only option and, while they deliver incredible performance for workloads that include a common filter on a single dimension known ahead of time, they don t do much to facilitate ad hoc multidimensional analysis. In this paper, we will use some dummy data 1 and a set of Postgres queries to discuss the role of sort keys and compare how both types of keys work in theory. We won t be performing any work in Redshift directly until Part 2, where we will examine how interleaved sort keys are implemented in practice, discuss a common tactic that can benefit from using compound and interleaved sort keys together, and run some benchmark queries against a data set with billions of rows. In both parts we will link to code that can be used to recreate our results. Life Without the B-tree Redshift bills itself as a fast, fully managed, petabyte-scale data warehouse and it uses techniques that you may not find in a relational database built for transactional (OLTP) workloads. Most folks are familiar with the concept of using multiple B-tree indexes on the same table in order to optimize performance across a variety of queries with varying 'where' clauses. However, B-tree indexes have a couple of drawbacks when applied to the large scale analytical (OLAP) workloads that are common in data warehousing. First, they are secondary data structures. Every index you create makes a copy of the columns on which you ve indexed and stores this copy separately from the table as a doubly-linked list sorted within the leaf nodes of a B-tree. The additional space required to store multiple indexes in addition to the table can be prohibitively expensive when dealing with large volumes of data. Second, B-tree indexes are most useful for highly selective queries that lookup a specific data point. Analytical queries that aggregate a range of data points will typically be better served by a simple full table scan as opposed to traversing the B-tree, scanning the leaf nodes and fetching non-indexed columns from the table, as this approach suffers from a lot of random I/O, even for small ranges. Compound Sort Keys and Zone Maps For the above reasons, Redshift eschews the B-tree and instead employs a lighter form of indexing that lends itself well to table scans. Each table in Redshift can optionally define a sort key which is simply a subset of columns that will be used to sort the table on disk

4 The first column here is the value being looked up in the 'where' clause of a Postgres query, the second column is the number of records that contain the lookup value and the third column is the access method employed. Postgres default configuration abandons index scans pretty quickly. Bitmap scans don t make it much further. 2 A compound sort key produces a sort order similar to that of the 'order by' clause where the first column is sorted in its entirety, then within each first column grouping the second column is sorted in its entirety and so on until the entire key has been sorted. This is very similar to the leaf nodes of a B-tree index with a compound key but instead of using a secondary data structure Redshift stores the data in the table sorted in this manner from the jump. Once the sort order of the table has been computed, Redshift, being a columnar database breaks out The sort order produced by a compound key on our dummy data 3. This is an abbreviated screenshot

5 each column, optionally compresses it and stores blocks of the column values contiguously on disk. Additionally, it maintains a secondary data structure called a zone map that lists the minimum and maximum column values for each block. Since the zone map is maintained at the block level (as opposed to the row level with a B-tree), its size as a secondary data structure is not prohibitively expensive and could conceivably be held in memory. Redshift blocks are 1 MB in size, so assuming a zone map has a width of 100B, the zone map for a 1TB compressed table could be stored in 100MB. The zone map allows for pruning of blocks that are irrelevant for any given query. If a query against our dummy data requests records 'where region = 2' then we can see from the zone map below that only blocks 2, 6, 10 and 14 need to be accessed. However, the zone map s ability to facilitate pruning for a column depends on the sort order of the table on disk. As we can see in the zone map to the right, sorting on our compound key favors the leading columns in the key. An equality filter on year will scan only 4 contiguous blocks. Likewise, an equality filter on region also scans only 4 blocks, albeit The zone map for our dummy data when using a compound sort key. 4 Filtering on year or region scans 4 blocks. Filtering on customer or product scans all 16 blocks. non-contiguous blocks. An equality filter on customer will scan the entire table (16 blocks) because the zone map provides nothing valuable in the way of pruning. An equality filter on product must scan the entire table as well. Another way of saying this is the compound sort key produces great locality 5 for the leading columns of the key and terrible locality for the trailing columns. The values of the trailing columns are spread across many of the blocks in the table and any particular block includes a broad range of column values. Until recently, we either had to live with this fact or make multiple copies of the table with different compound sort keys for each. Now there s a better way!

6 The Z-order Curve and Interleaved Sort Keys To sort a table on disk we need to produce an ordered list of sort keys. A list of sort keys is a one-dimensional structure so we must linearize any multidimensional data set before mapping it to sort keys. A 2-dimensional Z-order curve formed by interleaving the binary representations of the coordinates (x,y) and sorting the resulting interleaved binary numbers. 9 If we think of our data as being spread amongst the coordinates of an N-dimensional space, with N being the number of columns in our sort key, linearization would be the process of finding a path or curve 6 through every point in this space. That path represents our ordered list of sort keys and we want it to weight each dimension equally, preserving locality across dimensions and maximizing the utility of a zone map. The sort order produced by a compound sort key represents one possible path called The zone map for our demo data set when using an interleaved sort key 10. Filtering on any single column scans 8 of 16 blocks. a row-order curve 7 but as mentioned earlier it does a poor job of preserving locality. Luckily, the Z-order curve 8 was introduced almost 50 years ago and not only preserves locality across

7 dimensions but can easily be calculated by interleaving the bits of the binary coordinates and sorting the resulting binary numbers. If we apply this technique to our dummy data the data is sorted in equal measure for both the leading and trailing columns of the key and no block stores a range broader than 50% of the column values. This means that a filter on any single column in the sort key will scan 8 out of 16 blocks. Ideally, the number of blocks scanned for a particular where clause can be calculated as follows with interleaved sort keys: N^((S-P)/S) where N = the total number of blocks required to store the table, S = the number of columns in the sort key, P = the number of sort key columns specified in the where clause. So for the 16 blocks that make up our dummy data, a four-column interleaved key results in the following block scan counts: Filter on 1 of 4 sort key columns: 8 blocks Filter on 2 of 4 sort key columns: 4 blocks Filter on 3 of 4 sort key columns: 2 blocks Filter on 4 of 4 sort key columns: 1 block Although this suggests worse performance when filtering on the leading column compared to a compound sort key (remember we only scanned 4 blocks when filtering on the leading column of the compound key for our dummy data), the average performance across all possible sort key filter combinations will be much better with interleaved keys. This is what makes interleaved sort keys a great match for ad hoc multidimensional analysis. The average number of block scans on our dummy data for compound vs. interleaved sort keys. 11 We can also infer two additional points from this formula. First, the benefits of interleaved sort keys increase as the size of the data set increases. For example, if we filter on 2 out of 4 sort key columns then we scale with the square root of N and it would take two orders of magnitude in table growth before we get an order of magnitude in block scan growth. Second, the benefits of interleaved sort keys decrease as the number of columns in the sort key increases. If instead our key was an eight-column key (which is the maximum) the block scan counts would be roughly as follows:

8 Filter on 1 of 8 sort key columns: 12 blocks Filter on 2 of 8 sort key columns: 8 blocks Filter on 3 of 8 sort key columns: 6 blocks Filter on 4 of 8 sort key columns: 4 blocks Filter on 5 of 8 sort key columns: 3 blocks Filter on 6 of 8 sort key columns: 2 blocks Filter on 7 of 8 sort key columns: 2 blocks Filter on 8 of 8 sort key columns: 1 block In this case, our 'where' clause would need to be twice as robust to match the performance of the four-column key. O(n) vs. O(sqrt(n)) The Zone Map: stv_blocklist The zone maps for each table, as well as additional metadata, are stored in the stv_blocklist system table. If we take a look at the definition 12 of the stv_blocklist table, we can see it has a width of about 100B. This corresponds with our estimates from Part I regarding the small total size of this secondary data structure relative to the tables it describes

9 Taking a look at stv_blocklist for a table with a cmpound key confirms what we ve learned about zone maps for compound keys. Below we can see that the zone map for c_custkey, the leading column of the sort key for the orders_compound table, would prove helpful in pruning irrelevant blocks. However, the zone map for any of the other columns in the key would be useless. The zone map for a compound sort key. 13 If we take a look at the zone map for orders_interleaved_4, which has an interleaved sort key, we can see that the zone maps for all of the sort key columns look pretty useful. The zone map for an interleaved sort key. 14 However, it s not the minimum and maximum column values that we are seeing here for each block but instead the minimum and maximum sort key values. Presumably, Redshift de-interleaves the zone map values on the fly whenever a query is issued in order to compare the column values in the where clause of the query to the zone map. But what s interesting is that these sort keys don t seem to map directly to our column values implementation.sql_in#l1-l implementation.sql_in#l56-l109 7

10 If we were to interleave the binary representation of the sort key column values for the first record in the table then we would expect the resulting binary value to be equal to the minimum sort key (which has a decimal value of 1,711,302,144 as shown above). However, this is not the case. Behind the scenes, Redshift is taking care of a few important details for us which we can see by examining the stv_interleaved_counts table. The Multidimensional Space: stv_interleaved_counts With our dummy data set in Part I, we were able to interleave the binary representation of the column values directly to produce our sort keys because we were ignoring a few details about working with real data. An abbreviated view of the stv_interleaved_counts system table for orders_interleaved_4. 16 Notice the the count of records assigned to each compressed_val is somewhat balanced. Sort keys need to be represented by a data type within the database. Redshift uses a bigint to represent the sort key which is a signed 64-bit integer. However, the values of our sort key columns could be of type bigint as well. If we had four bigint sort key columns, then interleaving their bits would produce a 256-bit sort key which could not be stored as a bigint. To account for this, Redshift uses a proprietary compression algorithm to compress the values of each sort key column (this is separate from the compression encoding type of each column 15 ). These compressed values are what actually get interleaved and we can take a look at them in the stv_interleaved_counts system table implementation.sql_in#l111-l151

11 We can check that it is in fact these compressed values that are being interleaved to produce the sort key by interleaving the binary representation of the minimum compressed value for each sort key column to produce the minimum sort key. Likewise, we can interleave the binary representation of the maximum compressed value for each sort key column to produce the maximum sort key. Above, the table on the left shows the minimum and maximum compressed values for each sort key column of orders_interleaved_4. On the right, we see the minimum and maximum sort key values for the entire table. 17 Below, we see that if we interleave the binary representation of the values on the left above, we get the binary representation of the values on the right above. What these compressed values ultimately represent are the dimension coordinates of the multidimensional space for which Redshift will fit a Z-order curve. The stv_interleaved_counts table gives us an idea of how this space is laid out. The coordinate range of each dimension is constrained by the size of the sort key. Since the sort key can be a maximum of 64 bits in size, then the size in bits of the maximum coordinate of each dimension can be no more than 64/N where N is the number of columns in the sort key. For example, the range of coordinates for orders_interleaved_8, which has an eight-column interleaved sort key, is 0 to 255. This makes sense since 2^(64/8) equals 256. If the sort key had only a single column then Redshift could potentially use the entire 64 bits for the maximum coordinate resulting in a range of 0 to 9,223,372,036,854,775,807. In this case, the sort should theoretically be no different than a compound key since we only have a single column. However, zone maps for compound keys only take into account the first 8 bytes of the sort key column values. If the column values The coordinate range for each column of an eight column sort key, as seen in orders_interleaved_8, is limited to 256 values implementation.sql_in#l186-l implementation.sql_in#l153-l183 9

12 have a long common prefix, such as then a zone map on the first 8 bytes won t be of much help. Redshift s proprietary compression algorithm for interleaved sort keys does a better job of taking into account the entire column value so that different values with common prefixes get different compressed values and thus different sort keys yielding a more useful zone map. Interestingly, as we ve seen before, the range of coordinates for orders_interleaved_4, which has a four-column interleaved sort key, is 0 to 1,023. In this case, it looks like the coordinates are actually 10 bits in size (2^10 equals 1,024) but conceivably, they could be as large as 16 bits for a range of 0 to 65,535 (2^(64/4) equals 65,536). The reason for the smaller size is likely due to the size and distribution of the data and Redshift s attempts to prevent skew. Skew: svv_interleaved_columns Interleaved sort keys aren t free. The magic of the Z-order curve works best when data is evenly distributed among the coordinates of the multidimensional space. Upon ingestion, Redshift will analyze the size and distribution of the data to determine the appropriate size of the space and distribute the data within the space to the best of its ability. This logic is a part of the proprietary compression algorithm that maps column values to the compressed values in stv_interleaved_counts and it may need to be recalibrated over time to prevent skew (this is separate from distribution skew across slices). 19 We can see a measure of this skew by looking at the svv_interleaved_columns system table. The upper table here shows the skew for each column of the sort key for orders_interleaved_4. 20 Notice c_region (column 1) has a really high skew. This is because it has a cardinality of 5 as shown in the table on the bottom left. 21 Because of this, Redshift can only assign records to 5 coordinates on the c_region dimension of the multidimensional space as shown in the table on the bottom right. 22 Skew can be caused by a few different factors, some of which cannot be accounted for. Columns with low cardinality will show a degree of skew because the records will be assigned to a small number of coordinates in the dimension. Redshift will try to space these few coordinates out as evenly as possible but there implementation.sql_in#l254-l implementation.sql_in#l275-l implementation.sql_in#l293-l316

13 will still be a lot of coordinates that are left empty. This is demonstrated by the c_region column of the orders_interleaved_4 table. Note: AWS has released a patch that significantly improves the skew measurements in svv_interleaved_columns to better account for duplicate values. The concepts discussed here will still apply mostly, though the numbers may vary. Data that is inherently The upper table here shows the skew for each column of the sort key for orders_interleaved_ skew_custkey. 23 Notice c_custkey (column 0) has a really high skew. This is because the data skewed cannot be accounted for either. In the assigned to coordinate 512 on the c_custkey dimension of the multidimensional space as set is dominated by a c_custkey of 1 and therefore a large number of records are being orders_interleaved_skew_ shown in the bottom tabl. 24 Also note that only half of the c_custkey dimension is being custkey table we ve reassigned half of the data used as 512 is lowest coordinate with records assigned to it. a c_custkey of 1. The c_custkey column still has a high cardinality in that there s still a lot of customers but one customer dominates the data set. We can see this skews the data towards the first coordinate in the dimension. For both of these cases shown there s no way to recalibrate the compression algorithm and resort the data to eliminate this skew. However, if the skew is artificial, meaning a large number of records are being assigned to the same coordinate when they could in fact be redistributed, then we can use the vacuum reindex command to fix this. This commonly occurs for data that increases monotonically over time, such as dates, where all values inserted after the first data load will be assigned to the last coordinate in the dimension causing artificial skew. For example, the orders_interleaved_skew_date table was loaded with data up until 01/01/1998 on the first load. After this initial load, only the c_region and c_mktsegment columns are skewed due to low cardinality. The d_date column has a relatively low skew implementation.sql_in#l318-l implementation.sql_in#l339-l368 11

14 The upper table here shows the skew for each column of the sort key for orders_interleaved_skew_date after initial load. 25 Notice d_date (column 3) has a low skew. The bottom table shows that records are somehwat evenly spread amongst the coordinates of the d_date dimension. 26 The upper table here shows the skew for each column of the sort key for orders_interleaved_skew_date after loading additional data. 27 Notice d_date (column 3) has a higher skew because all of the new records have been assigned to the last coordinate in the d_date dimension as seen in the lower table. 28 After inserting the remaining data from later than 01/01/1998, we can see the skew for the d_date column has skyrocketed because the majority of records have been assigned to the last coordinate in the dimension. Running vacuum reindex eliminates this skew at the cost of extended vacuum times of 10% to 50% because resorting the data in an interleaved fashion potentially involves touching every storage block. Current and Archive Tables We can offset some of the maintenance cost associated with vacuum reindex as well as optimize for common data warehousing workloads by using compound and interleaved sort keys together in a common data warehousing design pattern; current and archive tables implementation.sql_in#l370-l implementation.sql_in#l393-l implementation.sql_in#l434-l implementation.sql_in#l459-l499

15 For many business processes we can assume that after a certain period of time a record is highly unlikely to be updated. For example, an e-commerce company with a return policy of 60 days can safely assume that order records over 60 days old aren t very likely to be updated. This allows them to create an orders_current table that includes orders placed within roughly the last two months plus the current month and an orders_archive table that includes everything earlier than this window. To make this transparent to the end user they can add a view, orders_view, that uses union all to combine both underlying tables. The goal of the orders_current table is to facilitate quick lookups The upper table here shows the skew for each column of the sort key for of individual orders, either for pure orders_interleaved_skew_date after running vacuum reindex. 29 Notice that the reads or read/write operations such skew for d_date (column 3) has dramatically decreased and the records are as update. This makes it well suited once again somehwat evenly spread amongst the coordinates of the d_date for a compound key with the order dimension as shown in the bottom table. 30 id as the leading column of the key. The goal of the orders_archive table is to facilitate historical analysis which could very well include ad hoc multidimensional analysis. This would make it well suited for an interleaved sort key on the dimensions that are most likely to be analyzed. Using this design pattern, when the end of the current month is reached, the oldest month in the orders_current table is copied to the orders_archive table and then deleted from the orders_current table. That way the orders_archive table is likely to only be written to in an append only fashion once a month. The cost of running vacuum reindex on orders_archive would then be amortized over an entire month s worth of data implementation.sql_in#l501-l implementation.sql_in#l531-l571 13

16 Benchmark results on 6 billion rows (600 GB) using a 4 node dc1.8xlarge cluster. The most dramatic improvements can be seen in the chart titled "WHERE on 3 non-leading columns". orders_compound runs in about 18 seconds where orders_interleaved_4 runs over 35x faster in about 0.5 seconds. Benchmark Now that we ve seen the inner workings of both compound and interleaved keys, let s benchmark their performance against each other with a set of 9 queries that are representative of their pros and cons. We ll run each query 3 times on a data set of 6 billion rows (600 GB) using a 4 node dc1.8xlarge cluster and measure the average execution time. We re not concerned with the absolute execution time here, just the relative execution time between keys for different operations. Plotting the results shows that, generally speaking the operations tested, where, group by, order by and join, when operating against the leading column of the key perform better against a compound key. Once the leading column is excluded, the interleaved sort key performs better. Also, as the number of interleaved sort key columns referenced in the query increases so does performance as seen by the decrease in execution time between "WHERE on trailing column" and "WHERE on 3 non-leading columns" for orders_interleaved_4. But, as the number of columns referenced in the interleaved sort key definition increases, performance decreases as seen by the increase in execution time between orders_interleaved_4 and orders_interleaved_8 for all queries. 14

17 Conclusion Hopefully you now have a much better understanding of Redshift s interleaved sort keys and why they are such a great fit for ad hoc multidimensional analysis. Please let us know if we missed anything as well as if you come across other interesting use cases for interleaved sort keys! 15

18 About Chartio Chartio s vision is to make business intelligence as accessible and widely used in the enterprise as the common spreadsheet. Chartio accomplishes this by making business intelligence tools available to organizations that have been poorly served by legacy BI vendors, simplifying setup and maintenance, streamlining storage decisions, and enabling business users to perform their own analyses of complex data. Finally, Chartio enables Agile Business Intelligence. Rather than requiring a monolithic waterfall implementation of Planning, ETL, Governance, Data Warehousing, and Deployment, Chartio make it possible to start small and roll out business intelligence as your organization s needs increase. Learn how to quickly understand your business data at chartio.com. 16

19

20 222 Kearny Street, Suite 525 San Francisco, California United States Copyright All Rights Reserved

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations

Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Built for Speed: Comparing Panoply and Amazon Redshift Rendering Performance Utilizing Tableau Visualizations Table of contents Faster Visualizations from Data Warehouses 3 The Plan 4 The Criteria 4 Learning

More information

Data-Intensive Distributed Computing

Data-Intensive Distributed Computing Data-Intensive Distributed Computing CS 451/651 431/631 (Winter 2018) Part 5: Analyzing Relational Data (1/3) February 8, 2018 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo

More information

Syllabus. Syllabus. Motivation Decision Support. Syllabus

Syllabus. Syllabus. Motivation Decision Support. Syllabus Presentation: Sophia Discussion: Tianyu Metadata Requirements and Conclusion 3 4 Decision Support Decision Making: Everyday, Everywhere Decision Support System: a class of computerized information systems

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance

Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance Data Warehousing > Tools & Utilities Teradata Analyst Pack More Power to Analyze and Tune Your Data Warehouse for Optimal Performance By: Rod Vandervort, Jeff Shelton, and Louis Burger Table of Contents

More information

QLIK INTEGRATION WITH AMAZON REDSHIFT

QLIK INTEGRATION WITH AMAZON REDSHIFT QLIK INTEGRATION WITH AMAZON REDSHIFT Qlik Partner Engineering Created August 2016, last updated March 2017 Contents Introduction... 2 About Amazon Web Services (AWS)... 2 About Amazon Redshift... 2 Qlik

More information

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more

More information

Lenovo Database Configuration

Lenovo Database Configuration Lenovo Database Configuration for Microsoft SQL Server Standard Edition DWFT 9TB Reduce time to value with pretested hardware configurations Data Warehouse problem and a solution The rapid growth of technology

More information

Data Analysis and Data Science

Data Analysis and Data Science Data Analysis and Data Science CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/29/15 Agenda Check-in Online Analytical Processing Data Science Homework 8 Check-in Online Analytical

More information

Was ist dran an einer spezialisierten Data Warehousing platform?

Was ist dran an einer spezialisierten Data Warehousing platform? Was ist dran an einer spezialisierten Data Warehousing platform? Hermann Bär Oracle USA Redwood Shores, CA Schlüsselworte Data warehousing, Exadata, specialized hardware proprietary hardware Introduction

More information

Lenovo Database Configuration for Microsoft SQL Server TB

Lenovo Database Configuration for Microsoft SQL Server TB Database Lenovo Database Configuration for Microsoft SQL Server 2016 22TB Data Warehouse Fast Track Solution Data Warehouse problem and a solution The rapid growth of technology means that the amount of

More information

Sepand Gojgini. ColumnStore Index Primer

Sepand Gojgini. ColumnStore Index Primer Sepand Gojgini ColumnStore Index Primer SQLSaturday Sponsors! Titanium & Global Partner Gold Silver Bronze Without the generosity of these sponsors, this event would not be possible! Please, stop by the

More information

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17 Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa

More information

Data Science. Data Analyst. Data Scientist. Data Architect

Data Science. Data Analyst. Data Scientist. Data Architect Data Science Data Analyst Data Analysis in Excel Programming in R Introduction to Python/SQL/Tableau Data Visualization in R / Tableau Exploratory Data Analysis Data Scientist Inferential Statistics &

More information

An Introduction to DB2 Indexing

An Introduction to DB2 Indexing An Introduction to DB2 Indexing by Craig S. Mullins This article is adapted from the upcoming edition of Craig s book, DB2 Developer s Guide, 5th edition. This new edition, which will be available in May

More information

Processing of Very Large Data

Processing of Very Large Data Processing of Very Large Data Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Software Development Technologies Master studies, first

More information

Optimizing Testing Performance With Data Validation Option

Optimizing Testing Performance With Data Validation Option Optimizing Testing Performance With Data Validation Option 1993-2016 Informatica LLC. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording

More information

White Paper: Bulk Data Migration in the Cloud. Migration in the Cloud

White Paper: Bulk Data Migration in the Cloud. Migration in the Cloud Bulk Data Migration in the Cloud Executive Summary Cloud storage has the potential to transform IT operations for organizations struggling to deliver a consistent level of service to a distributed user

More information

Oracle 1Z0-515 Exam Questions & Answers

Oracle 1Z0-515 Exam Questions & Answers Oracle 1Z0-515 Exam Questions & Answers Number: 1Z0-515 Passing Score: 800 Time Limit: 120 min File Version: 38.7 http://www.gratisexam.com/ Oracle 1Z0-515 Exam Questions & Answers Exam Name: Data Warehousing

More information

5 Fundamental Strategies for Building a Data-centered Data Center

5 Fundamental Strategies for Building a Data-centered Data Center 5 Fundamental Strategies for Building a Data-centered Data Center June 3, 2014 Ken Krupa, Chief Field Architect Gary Vidal, Solutions Specialist Last generation Reference Data Unstructured OLTP Warehouse

More information

The Right Read Optimization is Actually Write Optimization. Leif Walsh

The Right Read Optimization is Actually Write Optimization. Leif Walsh The Right Read Optimization is Actually Write Optimization Leif Walsh leif@tokutek.com The Right Read Optimization is Write Optimization Situation: I have some data. I want to learn things about the world,

More information

An Introduction to Big Data Formats

An Introduction to Big Data Formats Introduction to Big Data Formats 1 An Introduction to Big Data Formats Understanding Avro, Parquet, and ORC WHITE PAPER Introduction to Big Data Formats 2 TABLE OF TABLE OF CONTENTS CONTENTS INTRODUCTION

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining Data Warehousing & Data Mining Wolf-Tilo Balke Kinda El Maarry Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Summary Last week: Logical Model: Cubes,

More information

Seven Awesome SQL Server Features

Seven Awesome SQL Server Features Seven Awesome SQL Server Features That You Can Use for Free Allison Benneth @SQLTran www.sqltran.org SQL Server Editions SQL2005 SQL2008 SQL2008R2 SQL2012 SQL2014 SQL2016 * Free starting with SQL Server

More information

Actian Vector Benchmarks. Cloud Benchmarking Summary Report

Actian Vector Benchmarks. Cloud Benchmarking Summary Report Actian Vector Benchmarks Cloud Benchmarking Summary Report April 2018 The Cloud Database Performance Benchmark Executive Summary The table below shows Actian Vector as evaluated against Amazon Redshift,

More information

Understanding the Optimizer

Understanding the Optimizer Understanding the Optimizer 1 Global topics Introduction At which point does the optimizer his work Optimizer steps Index Questions? 2 Introduction Arno Brinkman BISIT engineering b.v. ABVisie firebird@abvisie.nl

More information

DATA MINING AND WAREHOUSING

DATA MINING AND WAREHOUSING DATA MINING AND WAREHOUSING Qno Question Answer 1 Define data warehouse? Data warehouse is a subject oriented, integrated, time-variant, and nonvolatile collection of data that supports management's decision-making

More information

Kathleen Durant PhD Northeastern University CS Indexes

Kathleen Durant PhD Northeastern University CS Indexes Kathleen Durant PhD Northeastern University CS 3200 Indexes Outline for the day Index definition Types of indexes B+ trees ISAM Hash index Choosing indexed fields Indexes in InnoDB 2 Indexes A typical

More information

Introduction to Database Services

Introduction to Database Services Introduction to Database Services Shaun Pearce AWS Solutions Architect 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Today s agenda Why managed database services? A non-relational

More information

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples.

1. Attempt any two of the following: 10 a. State and justify the characteristics of a Data Warehouse with suitable examples. Instructions to the Examiners: 1. May the Examiners not look for exact words from the text book in the Answers. 2. May any valid example be accepted - example may or may not be from the text book 1. Attempt

More information

EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH

EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH White Paper EMC GREENPLUM MANAGEMENT ENABLED BY AGINITY WORKBENCH A Detailed Review EMC SOLUTIONS GROUP Abstract This white paper discusses the features, benefits, and use of Aginity Workbench for EMC

More information

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP)

CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) CHAPTER 8: ONLINE ANALYTICAL PROCESSING(OLAP) INTRODUCTION A dimension is an attribute within a multidimensional model consisting of a list of values (called members). A fact is defined by a combination

More information

COMP 430 Intro. to Database Systems. Indexing

COMP 430 Intro. to Database Systems. Indexing COMP 430 Intro. to Database Systems Indexing How does DB find records quickly? Various forms of indexing An index is automatically created for primary key. SQL gives us some control, so we should understand

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2

CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 CMSC 341 Lecture 16/17 Hashing, Parts 1 & 2 Prof. John Park Based on slides from previous iterations of this course Today s Topics Overview Uses and motivations of hash tables Major concerns with hash

More information

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A

Data Warehousing and Decision Support. Introduction. Three Complementary Trends. [R&G] Chapter 23, Part A Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 432 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less

SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less SAP IQ - Business Intelligence and vertical data processing with 8 GB RAM or less Dipl.- Inform. Volker Stöffler Volker.Stoeffler@DB-TecKnowledgy.info Public Agenda Introduction: What is SAP IQ - in a

More information

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing.

This tutorial will help computer science graduates to understand the basic-to-advanced concepts related to data warehousing. About the Tutorial A data warehouse is constructed by integrating data from multiple heterogeneous sources. It supports analytical reporting, structured and/or ad hoc queries and decision making. This

More information

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework

Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Azure Scalability Prescriptive Architecture using the Enzo Multitenant Framework Many corporations and Independent Software Vendors considering cloud computing adoption face a similar challenge: how should

More information

Crash Course in Modernization. A whitepaper from mrc

Crash Course in Modernization. A whitepaper from mrc Crash Course in Modernization A whitepaper from mrc Introduction Modernization is a confusing subject for one main reason: It isn t the same across the board. Different vendors sell different forms of

More information

Solutions for Netezza Performance Issues

Solutions for Netezza Performance Issues Solutions for Netezza Performance Issues Vamsi Krishna Parvathaneni Tata Consultancy Services Netezza Architect Netherlands vamsi.parvathaneni@tcs.com Lata Walekar Tata Consultancy Services IBM SW ATU

More information

COMP 530: Operating Systems File Systems: Fundamentals

COMP 530: Operating Systems File Systems: Fundamentals File Systems: Fundamentals Don Porter Portions courtesy Emmett Witchel 1 Files What is a file? A named collection of related information recorded on secondary storage (e.g., disks) File attributes Name,

More information

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11 DATABASE PERFORMANCE AND INDEXES CS121: Relational Databases Fall 2017 Lecture 11 Database Performance 2 Many situations where query performance needs to be improved e.g. as data size grows, query performance

More information

In-Memory Data Management Jens Krueger

In-Memory Data Management Jens Krueger In-Memory Data Management Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute OLTP vs. OLAP 2 Online Transaction Processing (OLTP) Organized in rows Online Analytical Processing

More information

Evolving To The Big Data Warehouse

Evolving To The Big Data Warehouse Evolving To The Big Data Warehouse Kevin Lancaster 1 Copyright Director, 2012, Oracle and/or its Engineered affiliates. All rights Insert Systems, Information Protection Policy Oracle Classification from

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access

Tutorial Outline. Map/Reduce vs. DBMS. MR vs. DBMS [DeWitt and Stonebraker 2008] Acknowledgements. MR is a step backwards in database access Map/Reduce vs. DBMS Sharma Chakravarthy Information Technology Laboratory Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009 Email: sharma@cse.uta.edu

More information

2.3 Algorithms Using Map-Reduce

2.3 Algorithms Using Map-Reduce 28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure

More information

Greenplum Architecture Class Outline

Greenplum Architecture Class Outline Greenplum Architecture Class Outline Introduction to the Greenplum Architecture What is Parallel Processing? The Basics of a Single Computer Data in Memory is Fast as Lightning Parallel Processing Of Data

More information

DATABASE SCALE WITHOUT LIMITS ON AWS

DATABASE SCALE WITHOUT LIMITS ON AWS The move to cloud computing is changing the face of the computer industry, and at the heart of this change is elastic computing. Modern applications now have diverse and demanding requirements that leverage

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 05(b) : 23/10/2012 Data Mining: Concepts and Techniques (3 rd ed.) Chapter

More information

7. Query Processing and Optimization

7. Query Processing and Optimization 7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support Chapter 23, Part A Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke 1 Introduction Increasingly, organizations are analyzing current and historical

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Data Warehouses Chapter 12. Class 10: Data Warehouses 1

Data Warehouses Chapter 12. Class 10: Data Warehouses 1 Data Warehouses Chapter 12 Class 10: Data Warehouses 1 OLTP vs OLAP Operational Database: a database designed to support the day today transactions of an organization Data Warehouse: historical data is

More information

Deccansoft Software Services. SSIS Syllabus

Deccansoft Software Services. SSIS Syllabus Overview: SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server database software which can be used to perform a broad range of data migration, data integration and Data Consolidation

More information

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek

NoVA MySQL October Meetup. Tim Callaghan VP/Engineering, Tokutek NoVA MySQL October Meetup TokuDB and Fractal Tree Indexes Tim Callaghan VP/Engineering, Tokutek 2012.10.23 1 About me, :) Mark Callaghan s lesser-known but nonetheless smart brother. [C. Monash, May 2010]

More information

CRM-to-CRM Data Migration. CRM system. The CRM systems included Know What Data Will Map...3

CRM-to-CRM Data Migration. CRM system. The CRM systems included Know What Data Will Map...3 CRM-to-CRM Data Migration Paul Denwood Table of Contents The purpose of this whitepaper is to describe the issues and best practices related to data Choose the Right Migration Tool...1 migration from one

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture II: Indexing Part I of this course Indexing 3 Database File Organization and Indexing Remember: Database tables

More information

Data Warehousing and Decision Support

Data Warehousing and Decision Support Data Warehousing and Decision Support [R&G] Chapter 23, Part A CS 4320 1 Introduction Increasingly, organizations are analyzing current and historical data to identify useful patterns and support business

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010.

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. Guest Lecture in MIT Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul Guest Lecture in MIT 6.172 Performance Engineering, 18 November 2010. 6.172 How Fractal Trees Work 2 I m an MIT

More information

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Indexing Week 14, Spring 2005 Edited by M. Naci Akkøk, 5.3.2004, 3.3.2005 Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel Overview Conventional indexes B-trees Hashing schemes

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation

More information

Approaching the Petabyte Analytic Database: What I learned

Approaching the Petabyte Analytic Database: What I learned Disclaimer This document is for informational purposes only and is subject to change at any time without notice. The information in this document is proprietary to Actian and no part of this document may

More information

HANA Performance. Efficient Speed and Scale-out for Real-time BI

HANA Performance. Efficient Speed and Scale-out for Real-time BI HANA Performance Efficient Speed and Scale-out for Real-time BI 1 HANA Performance: Efficient Speed and Scale-out for Real-time BI Introduction SAP HANA enables organizations to optimize their business

More information

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi

Column-Stores vs. Row-Stores. How Different are they Really? Arul Bharathi Column-Stores vs. Row-Stores How Different are they Really? Arul Bharathi Authors Daniel J.Abadi Samuel R. Madden Nabil Hachem 2 Contents Introduction Row Oriented Execution Column Oriented Execution Column-Store

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

Cloud Computing & Visualization

Cloud Computing & Visualization Cloud Computing & Visualization Workflows Distributed Computation with Spark Data Warehousing with Redshift Visualization with Tableau #FIUSCIS School of Computing & Information Sciences, Florida International

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

ELTMaestro for Spark: Data integration on clusters

ELTMaestro for Spark: Data integration on clusters Introduction Spark represents an important milestone in the effort to make computing on clusters practical and generally available. Hadoop / MapReduce, introduced the early 2000s, allows clusters to be

More information

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata COGNOS (R) 8 FRAMEWORK MANAGER GUIDELINES FOR MODELING METADATA Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata GUIDELINES FOR MODELING METADATA THE NEXT LEVEL OF PERFORMANCE

More information

SolidFire and Pure Storage Architectural Comparison

SolidFire and Pure Storage Architectural Comparison The All-Flash Array Built for the Next Generation Data Center SolidFire and Pure Storage Architectural Comparison June 2014 This document includes general information about Pure Storage architecture as

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018

Page Replacement. (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018 Page Replacement (and other virtual memory policies) Kevin Webb Swarthmore College March 27, 2018 Today s Goals Making virtual memory virtual : incorporating disk backing. Explore page replacement policies

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData ` Ronen Ovadya, Ofir Manor, JethroData About JethroData Founded 2012 Raised funding from Pitango in 2013 Engineering in Israel,

More information

Information Systems (Informationssysteme)

Information Systems (Informationssysteme) Information Systems (Informationssysteme) Jens Teubner, TU Dortmund jens.teubner@cs.tu-dortmund.de Summer 2018 c Jens Teubner Information Systems Summer 2018 1 Part IX B-Trees c Jens Teubner Information

More information

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time.

CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. CS5412 CLOUD COMPUTING: PRELIM EXAM Open book, open notes. 90 minutes plus 45 minutes grace period, hence 2h 15m maximum working time. SOLUTION SET In class we often used smart highway (SH) systems as

More information

Data Warehousing and OLAP

Data Warehousing and OLAP Data Warehousing and OLAP INFO 330 Slides courtesy of Mirek Riedewald Motivation Large retailer Several databases: inventory, personnel, sales etc. High volume of updates Management requirements Efficient

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

CSE 344 Final Review. August 16 th

CSE 344 Final Review. August 16 th CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join

More information

Hash-Based Indexing 165

Hash-Based Indexing 165 Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19

More information

Table Compression in Oracle9i Release2. An Oracle White Paper May 2002

Table Compression in Oracle9i Release2. An Oracle White Paper May 2002 Table Compression in Oracle9i Release2 An Oracle White Paper May 2002 Table Compression in Oracle9i Release2 Executive Overview...3 Introduction...3 How It works...3 What can be compressed...4 Cost and

More information

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1

How TokuDB Fractal TreeTM. Indexes Work. Bradley C. Kuszmaul. MySQL UC 2010 How Fractal Trees Work 1 MySQL UC 2010 How Fractal Trees Work 1 How TokuDB Fractal TreeTM Indexes Work Bradley C. Kuszmaul MySQL UC 2010 How Fractal Trees Work 2 More Information You can download this talk and others at http://tokutek.com/technology

More information

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved

AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH Snowflake Computing Inc. All Rights Reserved AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH 2019 SNOWFLAKE Our vision Allow our customers to access all their data in one place so they can make actionable decisions anytime, anywhere, with any number

More information

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant

Exadata X3 in action: Measuring Smart Scan efficiency with AWR. Franck Pachot Senior Consultant Exadata X3 in action: Measuring Smart Scan efficiency with AWR Franck Pachot Senior Consultant 16 March 2013 1 Exadata X3 in action: Measuring Smart Scan efficiency with AWR Exadata comes with new statistics

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper SAS IT Resource Management Forecasting Setup Specification Document A SAS White Paper Table of Contents Introduction to SAS IT Resource Management Forecasting... 1 Getting Started with the SAS Enterprise

More information

Data Warehouse and Data Mining

Data Warehouse and Data Mining Data Warehouse and Data Mining Lecture No. 04-06 Data Warehouse Architecture Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

How to analyze JSON with SQL

How to analyze JSON with SQL How to analyze JSON with SQL SCHEMA-ON-READ MADE EASY Author: Kent Graziano 1 What s inside 3 Semi-structured brings new insights to business 4 Schema? No need! 5 How Snowflake solved this problem 6 Enough

More information

Lesson 1: Creating and formatting an Answers analysis

Lesson 1: Creating and formatting an Answers analysis Lesson 1: Creating and formatting an Answers analysis Answers is the ad-hoc query environment in the OBIEE suite. It is in Answers that you create and format analyses to help analyze business results.

More information