Distributing Queries the Citus Way Fast and Lazy. Marco Slot

Size: px
Start display at page:

Download "Distributing Queries the Citus Way Fast and Lazy. Marco Slot"

Transcription

1 Distributing Queries the Citus Way Fast and Lazy Marco Slot

2 What is Citus? Citus is an open source extension to Postgres (9.6, 10, 11) for transparently distributing tables across many Postgres servers. Coordinator data create_distributed_table('data', 'tenant_id'); data_1 data_4 data_2 data_5 data_3 data_6 2 Marco Slot Citus Data PostgresConf US 2018

3 How does Citus work? Citus uses hooks and internal functions to change Postgres behaviour and leverage its internal logic. SELECT - planner - custom scan Citus Postgres standard_planner 3 Marco Slot Citus Data PostgresConf US 2018

4 Different use cases for scaling out There are different use cases that can take advantage of distributed databases, in different ways. Examples: Multi-tenant SaaS app needs to scale beyond a single server Real-time analytics dashboards with high data volumes Advanced search across large, dynamic data sets Business intelligence 4 Marco Slot Citus Data PostgresConf US 2018

5 Citus planner(s) Layered planner accommodates different workloads. Router planner Multi-tenant OLTP Pushdown planner Recursive (Subquery/CTE) planning Logical planner Real-time analytics, search Real-time analytics, data warehouse Data warehouse 5 Marco Slot Citus Data PostgresConf US 2018

6 Citus planner(s) Layered planner accommodates different workloads. Query Rate Complex App Users Multi-tenant OLTP Real-time analytics, search Data Size Query Time Complex Query Real-time analytics, data warehouse Data warehouse 6 Marco Slot Citus Data PostgresConf US 2018

7 Co-located distributed tables Tables are automatically assigned to co-location groups, which ensure that rows with the same distribution column value are on the same node. orders_1 Foreign keys orders_2 Joins orders_3 Rollups products_1 products_2 products_3 This enables foreign keys, direct joins, and rollups (INSERT...SELECT) that include the distribution column. 7 Marco Slot Citus Data PostgresConf US 2018

8 Reference tables Reference tables are replicated to all nodes such that they can be joined with distributed tables on any column. orders_1 products_1 orders_2 products_2 orders_3 products_3 Joins Joins Joins category_1 category_1 category_1 8 Marco Slot Citus Data PostgresConf US 2018

9 Router planner How to be a drop-in distributed database 9

10 Routable queries Technical observation: If a query has <distribution column> = <value> filters that (transitively) apply to all tables, it can be routed to a particular node. Efficiently provides full SQL support, since full query can be handled by Postgres. SELECT SELECT 10

11 Routable queries Technical observation: If a query has <distribution column> = <value> filters that (transitively) apply to all tables, it can be routed to a particular node. Efficiently provides full SQL support, since full query can be handled by Postgres. Return 11

12 Scaling Multi-tenant Applications Use case observation: In a SaaS (B2B) application, most queries are specific to a particular tenant. Can add tenant ID column to all tables and distribute by tenant ID. Most queries are router plannable: Low overhead, low latency, full SQL capabilities of Postgres, scales out 12 Marco Slot Citus Data PostgresConf US 2018

13 Router planner with explicit filters Can explicitly provide filters on all tables: SELECT app_id, event_time FROM ( SELECT tenant_id, app_id, item_name FROM items WHERE tenant_id = 1783 ) LEFT JOIN ( SELECT tenant_id, app_id, max(event_time) AS event_time FROM events WHERE tenant_id = 1783 GROUP BY tenant_id, app_id ) USING (tenant_id, app_id) ORDER BY 2 DESC LIMIT 10; All distributed tables have filters by the same value 13

14 Router planner with inferred filters Citus can infer distribution column filters from joins: SELECT app_id, event_time FROM ( SELECT tenant_id, app_id, item_name FROM items WHERE tenant_id = 1783 ) LEFT JOIN ( SELECT tenant_id, app_id, max(event_time) AS event_time FROM events GROUP BY tenant_id, app_id ) USING (tenant_id, app_id) ORDER BY 2 DESC LIMIT 10; Filter on orders can be inferred from joins 14 Marco Slot Citus Data PostgresConf US 2018

15 Extracting relation filters What does Citus need to do to infer filters? Be lazy and call the Postgres planner: planner() -> citus_planner() -> standard_planner() Obtain filters on all relation from Postgres planning logic 15 Marco Slot Citus Data PostgresConf US 2018

16 Pushdown planning Make your workers work 16

17 Distributed queries Technical observation: Most common SQL features (aggregates, GROUP BY, ORDER BY, LIMIT) can be distributed in a single round. SELECT SELECT SELECT 17

18 Distributed queries Technical observation: Most common SQL features (aggregates, GROUP BY, ORDER BY, LIMIT) can be distributed in a single round. Merge 18

19 Merging query results Get the top 10 pages with the highest response times: SELECT page_id, avg(response_time) FROM page_views GROUP BY page_id ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018

20 Queries on shards Queries on shards when page_id is the distribution column: SELECT page_id, avg(response_time) FROM page_views_ GROUP BY page_id ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018

21 Merging query results When page_id is the distribution column: get top 10 of top 10s. SELECT page_id, avg FROM Concatenated results of queries on shards ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018

22 Queries on shards Queries on shards when page_id is not the distribution column: SELECT page_id, sum(response_time), count(response_time) FROM page_views_ GROUP BY page_id 22 Marco Slot Citus Data PostgresConf US 2018

23 Merging query results When page_id is not the distribution column: merge the averages SELECT page_id, sum(sum) / sum(count) FROM Concatenated results of queries on shards GROUP BY page_id ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018

24 What about subqueries? Instead of a table, we can have joins or subqueries: SELECT page_id, response_time FROM ( SELECT page_id FROM pages WHERE site = ' ) p JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views WHERE view_time > date ' ' GROUP BY page_id ) v USING (page_id) ORDER BY 2 DESC LIMIT 10; 24

25 Distributed queries Technical observation: A query that joins all distributed tables by distribution column with subqueries that do not aggregate across distribution column values can be distributed in a single round. 25 Marco Slot Citus Data PostgresConf US 2018

26 Pushdown planner Determine whether distribution columns are equal using Postgres planner: SELECT page_id, response_time FROM ( SELECT page_id FROM pages WHERE site = ' ) p Distribution column equality JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views WHERE view_time > date ' ' GROUP BY page_id ) v USING (page_id) ORDER BY 2 DESC LIMIT 10; 26 Marco Slot Citus Data PostgresConf US 2018

27 Pushdown planner Subquery results need to be partitionable by distribution column: SELECT page_id, response_time FROM ( SELECT page_id FROM pages WHERE site = ' ) p JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views WHERE view_time > date ' ' GROUP BY page_id ) v USING (page_id) ORDER BY 2 DESC LIMIT 10; 27 Marco Slot Citus Data PostgresConf US 2018 No aggregation across distribution column values.

28 Pushdown planner Subqueries can be executed across co-located shards in parallel: 28 SELECT page_id, response_time FROM ( SELECT page_id FROM pages_ WHERE site = ' ) JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views_ WHERE view_time > date ' ' GROUP BY page_id ) USING (page_id) ORDER BY 2 DESC LIMIT 10;

29 Merging query results Merge the results on the coordinator: SELECT page_id, response_time FROM Concatenated results of queries on shards ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018

30 Scaling Real-time Analytics Applications Use case observation: Real-time analytics dashboards need sub-second response time, regardless of data size. Single-round distributed queries are powerful, fast and scalable. In practice: Maintain aggregation tables using parallel INSERT...SELECT Dashboard selects from the aggregation table 30 Marco Slot Citus Data PostgresConf US 2018

31 Complex subqueries What about subqueries with merge steps? SELECT product_name, count FROM products JOIN ( SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 ) top10_products USING (product_id) ORDER BY count; 31 Marco Slot Citus Data PostgresConf US 2018

32 Recursive planning Have a query you can t solve? Call the Postgres planner! 32

33 Recursive planning Technical observation: Subqueries and CTEs that cannot be pushed down can often be executed as distributed queries. Pull-push execution: - Recursively call planner() on the subquery - During execution, stream results back into worker nodes - Replace the subquery with a function call that acts as a reference table 33 Marco Slot Citus Data PostgresConf US 2018

34 Recursive planning Separately plan CTEs and subqueries that violate pushdown rules: SELECT product_name, count FROM products JOIN ( SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 ) top10_products USING (product_id) ORDER BY count; 34 Marco Slot Citus Data PostgresConf US 2018

35 Recursive planning In the outer query, replace subquery with intermediate result, treated as reference table: SELECT product_name, count FROM products JOIN ( SELECT * FROM read_intermediate_result(...) AS r(product_id text, count int) ) top10_products USING (product_id) ORDER BY count; 35 Marco Slot Citus Data PostgresConf US 2018

36 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 SELECT SELECT SELECT 36 Marco Slot Citus Data PostgresConf US 2018

37 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 Merge 37 Marco Slot Citus Data PostgresConf US 2018

38 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 Results 38 Marco Slot Citus Data PostgresConf US 2018

39 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_name, count FROM products JOIN (SELECT * FROM read_intermediate_result(...) ) ; SELECT SELECT 39 Marco Slot Citus Data PostgresConf US 2018

40 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_name, count FROM products JOIN (SELECT * FROM read_intermediate_result(...) ) ; Merge 40 Marco Slot Citus Data PostgresConf US 2018

41 Recursive planning Different parts of a query can be handled by different planners. SELECT... SELECT... Router Pushdownable SELECT... SELECT... Pushdownable Local 41

42 Joins between tables and intermediate results Technical observation: Intermediate results of CTEs and subqueries are treated as reference tables: can use any join column. WITH distributed_query AS (...) SELECT distributed_query JOIN distributed_table USING (any_column) 42 Marco Slot Citus Data PostgresConf US 2018

43 Joins between intermediate results Technical observation: Queries with only intermediate results (CTEs or subqueries) are router plannable: full SQL in a single round-trip. WITH distributed_query_1 AS (...), distributed_query_2 AS (...) Can use any SQL feature without SELECT further merge steps distributed_query_1 distributed_query_2 43 Marco Slot Citus Data PostgresConf US 2018

44 Scaling Real-time Analytics Applications Use case observation: Real-time analytics applications want versatile distributed SQL support Recursive planning provides nearly full, distributed SQL support in a small number of network round trips. 44 Marco Slot Citus Data PostgresConf US 2018

45 Logical planner Handling non-co-located joins through relational algebra 45

46 Non-co-located joins Business intelligence queries may join on non-distribution columns. SELECT product_id, count(*) Distributed by product_id FROM shopping_carts JOIN products USING (product_id) WHERE shopping_carts.country = 'US' AND products.category = 'Books' GROUP BY product_id; Distributed by customer_id for fast lookup of shopping cart 46 Marco Slot Citus Data PostgresConf US 2018

47 Distributed query optimisation Apply operations that reduce data size before re-partitioning. GROUP BY: product_id Join by product_id Project product_id Filter: country = 'US' Join by product_id GROUP BY: product_id Filter: category = 'Books' Table: products Re-partition by product_id GROUP BY: product_id Project product_id Filter: category = 'Books' Table: products Re-partition by product_id Table: shopping_carts Filter: country = 'US' Table: shopping_carts 47

48 Re-partitioning Split query results into buckets based on product_id SELECT partition_query_result($$ SELECT product_id, count(*) FROM shopping_carts_1028 WHERE country = 'US' GROUP BY product_id $$, 'product_id'); SELECT SELECT 48

49 Re-partitioning Fetch product_id buckets to the matching products shards. SELECT fetch_file(...); SELECT SELECT 49

50 Re-partitioning Join merged buckets with products table SELECT product_id, count FROM fragment_2138 JOIN products_ USING (product_id) WHERE products.category = 'Books'; SELECT x x x x SELECT x x 50

51 Join order planning Joins across multiple tables should avoid re-partitioning when unnecessary: orders JOIN shopping_carts JOIN customers JOIN products Bad join order: orders x shopping_carts join result x customers join result x products re-partition by customer_id re-partition by product_id query result Good join order: shopping_carts x customer re-partition by product_id join result x orders x products query result 51 Marco Slot Citus Data PostgresConf US 2018

52 Evolution of distributed SQL CitusDB: Joins, aggregates, grouping, ordering, etc. Citus 5.0: Outer joins, HAVING (2016) Citus 5.1: COPY, EXPLAIN Citus 5.2: Full SQL for router queries Citus 6.0: Co-location, INSERT...SELECT Citus 6.1: Reference tables (2017) Citus 6.2: Subquery pushdown Citus 7.0: Multi-row INSERT Citus 7.1: Window functions, DISTINCT Citus 7.2: CTEs, Subquery pull-push (2018) Citus 7.3: Arbitrary subqueries Citus 7.4: UPDATE/DELETE with subquery pushdown 52 Marco Slot Citus Data PostgresConf US 2018

53 Thanks! 53

SQL, Scaling, and What s Unique About PostgreSQL

SQL, Scaling, and What s Unique About PostgreSQL SQL, Scaling, and What s Unique About PostgreSQL Ozgun Erdogan Citus Data XLDB May 2018 Punch Line 1. What is unique about PostgreSQL? The extension APIs 2. PostgreSQL extensions are a game changer for

More information

ClickHouse Deep Dive. Aleksei Milovidov

ClickHouse Deep Dive. Aleksei Milovidov ClickHouse Deep Dive Aleksei Milovidov ClickHouse use cases A stream of events Actions of website visitors Ad impressions DNS queries E-commerce transactions We want to save info about these events and

More information

The Future of Postgres Sharding

The Future of Postgres Sharding The Future of Postgres Sharding BRUCE MOMJIAN This presentation will cover the advantages of sharding and future Postgres sharding implementation requirements. Creative Commons Attribution License http://momjian.us/presentations

More information

Data Analysis. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 12/13/12

Data Analysis. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 12/13/12 Data Analysis CPS352: Database Systems Simon Miner Gordon College Last Revised: 12/13/12 Agenda Check-in NoSQL Database Presentations Online Analytical Processing Data Mining Course Review Exam II Course

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman

cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman What is CitusDB? CitusDB is a scalable analytics database that extends PostgreSQL Citus shards your data and automa/cally parallelizes

More information

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led

Course Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led About this course This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL 20761B; 5 Days; Instructor-led Course Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can

More information

Duration Level Technology Delivery Method Training Credits. Classroom ILT 5 Days Intermediate SQL Server

Duration Level Technology Delivery Method Training Credits. Classroom ILT 5 Days Intermediate SQL Server NE-20761C Querying with Transact-SQL Summary Duration Level Technology Delivery Method Training Credits Classroom ILT 5 Days Intermediate SQL Virtual ILT On Demand SATV Introduction This course is designed

More information

PostgreSQL Built-in Sharding:

PostgreSQL Built-in Sharding: Copyright(c)2017 NTT Corp. All Rights Reserved. PostgreSQL Built-in Sharding: Enabling Big Data Management with the Blue Elephant E. Fujita, K. Horiguchi, M. Sawada, and A. Langote NTT Open Source Software

More information

20761B: QUERYING DATA WITH TRANSACT-SQL

20761B: QUERYING DATA WITH TRANSACT-SQL ABOUT THIS COURSE This 5 day course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge

More information

DB2 SQL Class Outline

DB2 SQL Class Outline DB2 SQL Class Outline The Basics of SQL Introduction Finding Your Current Schema Setting Your Default SCHEMA SELECT * (All Columns) in a Table SELECT Specific Columns in a Table Commas in the Front or

More information

Part 1: Indexes for Big Data

Part 1: Indexes for Big Data JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL Course 20761C 5 Days Instructor-led, Hands on Course Information The main purpose of the course is to give students a good understanding of the Transact- SQL language which

More information

Assignment No: Create a College database and apply different queries on it. 2. Implement GUI for SQL queries and display result of the query

Assignment No: Create a College database and apply different queries on it. 2. Implement GUI for SQL queries and display result of the query Assignment No: 1 GUI Implementation for SQL queries Learning Outcomes: At the end of this assignment students will be able to To create a simple table Write queries for the manipulation of the table Design

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL Duration: 5 Days Course Code: M20761 Overview: This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Course Code: M20761 Vendor: Microsoft Course Overview Duration: 5 RRP: 2,177 Querying Data with Transact-SQL Overview This course is designed to introduce students to Transact-SQL. It is designed in such

More information

Major Features: Postgres 9.5

Major Features: Postgres 9.5 Major Features: Postgres 9.5 BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 9.5 release. Creative Commons Attribution

More information

Designing dashboards for performance. Reference deck

Designing dashboards for performance. Reference deck Designing dashboards for performance Reference deck Basic principles 1. Everything in moderation 2. If it isn t fast in database, it won t be fast in Tableau 3. If it isn t fast in desktop, it won t be

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL General Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students

More information

SQL Server 2012 Development Course

SQL Server 2012 Development Course SQL Server 2012 Development Course Exam: 1 Lecturer: Amirreza Keshavarz May 2015 1- You are a database developer and you have many years experience in database development. Now you are employed in a company

More information

Lessons in Building a Distributed Query Planner. Ozgun Erdogan PGCon 2016

Lessons in Building a Distributed Query Planner. Ozgun Erdogan PGCon 2016 Lessons in Building a Distributed Query Planner Ozgun Erdogan PGCon 2016 Talk Outline 1. IntroducCon 2. Key insight in distributed planning 3. Distributed logical plans 4. Distributed physical plans 5.

More information

Advanced Databases: Parallel Databases A.Poulovassilis

Advanced Databases: Parallel Databases A.Poulovassilis 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

20761C: Querying Data with Transact-SQL

20761C: Querying Data with Transact-SQL 20761C: Querying Data with Transact-SQL Course Details Course Code: Duration: Notes: 20761C 5 days This course syllabus should be used to determine whether the course is appropriate for the students, based

More information

CSE Midterm - Spring 2017 Solutions

CSE Midterm - Spring 2017 Solutions CSE Midterm - Spring 2017 Solutions March 28, 2017 Question Points Possible Points Earned A.1 10 A.2 10 A.3 10 A 30 B.1 10 B.2 25 B.3 10 B.4 5 B 50 C 20 Total 100 Extended Relational Algebra Operator Reference

More information

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona Beyond Relational Databases: MongoDB, Redis & ClickHouse Marcos Albe - Principal Support Engineer @ Percona Introduction MySQL everyone? Introduction Redis? OLAP -vs- OLTP Image credits: 451 Research (https://451research.com/state-of-the-database-landscape)

More information

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15 Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating

More information

What s New in MariaDB Server Max Mether VP Server

What s New in MariaDB Server Max Mether VP Server What s New in MariaDB Server 10.3 Max Mether VP Server Recap MariaDB 10.2 New in MariaDB 10.2 - GA since May 2017 What s New in 10.2 Analytics SQL Window Functions Common Table Expressions (CTE) JSON JSON

More information

20761 Querying Data with Transact SQL

20761 Querying Data with Transact SQL Course Overview The main purpose of this course is to give students a good understanding of the Transact-SQL language which is used by all SQL Server-related disciplines; namely, Database Administration,

More information

Query Processing & Optimization

Query Processing & Optimization Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction

More information

Shark: Hive (SQL) on Spark

Shark: Hive (SQL) on Spark Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce

More information

Shen PingCAP 2017

Shen PingCAP 2017 Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL

More information

Major Features: Postgres 10

Major Features: Postgres 10 Major Features: Postgres 10 BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 10 release. Creative Commons Attribution License

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in

More information

Writing Analytical Queries for Business Intelligence

Writing Analytical Queries for Business Intelligence MOC-55232 Writing Analytical Queries for Business Intelligence 3 Days Overview About this Microsoft SQL Server 2016 Training Course This three-day instructor led Microsoft SQL Server 2016 Training Course

More information

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes?

Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more

More information

MySQL Cluster Web Scalability, % Availability. Andrew

MySQL Cluster Web Scalability, % Availability. Andrew MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended

More information

T-SQL Training: T-SQL for SQL Server for Developers

T-SQL Training: T-SQL for SQL Server for Developers Duration: 3 days T-SQL Training Overview T-SQL for SQL Server for Developers training teaches developers all the Transact-SQL skills they need to develop queries and views, and manipulate data in a SQL

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp. Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020

More information

Apache Phoenix We put the SQL back in NoSQL

Apache Phoenix We put the SQL back in NoSQL Apache Phoenix We put the SQL back in NoSQL http://phoenix.incubator.apache.org James Taylor @JamesPlusPlus Maryann Xue @MaryannXue Eli Levine @teleturn About James o o Engineer at Salesforce.com in BigData

More information

"Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary

Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary Course Summary Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge

More information

Microsoft Querying Data with Transact-SQL - Performance Course

Microsoft Querying Data with Transact-SQL - Performance Course 1800 ULEARN (853 276) www.ddls.com.au Microsoft 20761 - Querying Data with Transact-SQL - Performance Course Length 4 days Price $4290.00 (inc GST) Version C Overview This course is designed to introduce

More information

Web Applications. Software Engineering 2017 Alessio Gambi - Saarland University

Web Applications. Software Engineering 2017 Alessio Gambi - Saarland University Web Applications Software Engineering 2017 Alessio Gambi - Saarland University Based on the work of Cesare Pautasso, Christoph Dorn, Andrea Arcuri, and others ReCap Software Architecture A software system

More information

Greenplum SQL Class Outline

Greenplum SQL Class Outline Greenplum SQL Class Outline The Basics of Greenplum SQL Introduction SELECT * (All Columns) in a Table Fully Qualifying a Database, Schema and Table SELECT Specific Columns in a Table Commas in the Front

More information

Huge market -- essentially all high performance databases work this way

Huge market -- essentially all high performance databases work this way 11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Advanced Data Management Technologies Written Exam

Advanced Data Management Technologies Written Exam Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This

More information

Chapter 3. Introduction to relational databases and MySQL. 2010, Mike Murach & Associates, Inc. Murach's PHP and MySQL, C3

Chapter 3. Introduction to relational databases and MySQL. 2010, Mike Murach & Associates, Inc. Murach's PHP and MySQL, C3 1 Chapter 3 Introduction to relational databases and MySQL Slide 2 Objectives Applied 1. Use phpmyadmin to review the data and structure of the tables in a database, to import and run SQL scripts that

More information

Oracle Database: Introduction to SQL Ed 2

Oracle Database: Introduction to SQL Ed 2 Oracle University Contact Us: +40 21 3678820 Oracle Database: Introduction to SQL Ed 2 Duration: 5 Days What you will learn This Oracle Database 12c: Introduction to SQL training helps you write subqueries,

More information

Querying Data with Transact-SQL

Querying Data with Transact-SQL Querying Data with Transact-SQL Código del curso: 20761 Duración: 5 días Acerca de este curso This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first

More information

LazyBase: Trading freshness and performance in a scalable database

LazyBase: Trading freshness and performance in a scalable database LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam

Impala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?

More information

Oracle Streams. An Oracle White Paper October 2002

Oracle Streams. An Oracle White Paper October 2002 Oracle Streams An Oracle White Paper October 2002 Oracle Streams Executive Overview... 3 Introduction... 3 Oracle Streams Overview... 4... 5 Staging... 5 Propagation... 6 Transformations... 6 Consumption...

More information

Aster Data Basics Class Outline

Aster Data Basics Class Outline Aster Data Basics Class Outline CoffingDW education has been customized for every customer for the past 20 years. Our classes can be taught either on site or remotely via the internet. Education Contact:

More information

S-Store: Streaming Meets Transaction Processing

S-Store: Streaming Meets Transaction Processing S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Performance Optimization for Informatica Data Services ( Hotfix 3)

Performance Optimization for Informatica Data Services ( Hotfix 3) Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Querying Data with Transact SQL

Querying Data with Transact SQL Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including

More information

Exam #1 Review. Zuyin (Alvin) Zheng

Exam #1 Review. Zuyin (Alvin) Zheng Exam #1 Review Zuyin (Alvin) Zheng Data/Information/Database Data vs. Information Data Information Discrete, unorganized, raw facts The transformation of those facts into meaning Transactional Data vs.

More information

High-Performance Distributed DBMS for Analytics

High-Performance Distributed DBMS for Analytics 1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest

More information

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata

COGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata COGNOS (R) 8 FRAMEWORK MANAGER GUIDELINES FOR MODELING METADATA Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata GUIDELINES FOR MODELING METADATA THE NEXT LEVEL OF PERFORMANCE

More information

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L,

Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L, Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L, 2 0 1 8 Table of Contents Introduction 1 Parent Table Child Table Joins 2 Comparison to RDBMS LEFT OUTER

More information

Advanced Oracle Performance Troubleshooting. Query Transformations Randolf Geist

Advanced Oracle Performance Troubleshooting. Query Transformations Randolf Geist Advanced Oracle Performance Troubleshooting Query Transformations Randolf Geist http://oracle-randolf.blogspot.com/ http://www.sqltools-plusplus.org:7676/ info@sqltools-plusplus.org Independent Consultant

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

Turbo charging SAS Data Integration using SAS In-Database technologies Paul Jones

Turbo charging SAS Data Integration using SAS In-Database technologies Paul Jones Turbo charging SAS Data Integration using SAS In-Database technologies Paul Jones Agenda What is SAS In-Database Why do we do Who are we working with.. When? How.does it work What is SAS In-Database? Integration

More information

Citus Documentation. Release Citus Data

Citus Documentation. Release Citus Data Citus Documentation Release 6.0.1 Citus Data May 17, 2018 About Citus 1 What is Citus? 3 1.1 When to Use Citus............................................ 3 1.2 Considerations for Use..........................................

More information

Data pipelines with PostgreSQL & Kafka

Data pipelines with PostgreSQL & Kafka Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL

More information

Memory-Based Cloud Architectures

Memory-Based Cloud Architectures Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)

More information

MS_20761 Querying Data with Transact-SQL

MS_20761 Querying Data with Transact-SQL Querying Data with Transact-SQL www.ked.com.mx Av. Revolución No. 374 Col. San Pedro de los Pinos, C.P. 03800, México, CDMX. Tel/Fax: 52785560 Por favor no imprimas este documento si no es necesario. About

More information

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,

More information

Introduction to relational databases and MySQL

Introduction to relational databases and MySQL Chapter 3 Introduction to relational databases and MySQL A products table Columns 2017, Mike Murach & Associates, Inc. C3, Slide 1 2017, Mike Murach & Associates, Inc. C3, Slide 4 Objectives Applied 1.

More information

[Some of] New Query Optimizer features in MariaDB Sergei Petrunia MariaDB Shenzhen Meetup November 2017

[Some of] New Query Optimizer features in MariaDB Sergei Petrunia MariaDB Shenzhen Meetup November 2017 [Some of] New Query Optimizer features in MariaDB 10.3 Sergei Petrunia MariaDB Shenzhen Meetup November 2017 2 Plan MariaDB 10.2: Condition pushdown MariaDB 10.3: Condition pushdown

More information

Leveraging Query Parallelism In PostgreSQL

Leveraging Query Parallelism In PostgreSQL Leveraging Query Parallelism In PostgreSQL Get the most from intra-query parallelism! Dilip Kumar (Principle Software Engineer) Rafia Sabih (Software Engineer) 2013 EDB All rights reserved. 1 Overview

More information

MIS2502: Review for Exam 2. JaeHwuen Jung

MIS2502: Review for Exam 2. JaeHwuen Jung MIS2502: Review for Exam 2 JaeHwuen Jung jaejung@temple.edu http://community.mis.temple.edu/jaejung Overview Date/Time: Wednesday, Mar 28, in class (50 minutes) Place: Regular classroom Please arrive 5

More information

Oracle Database 10g: Introduction to SQL

Oracle Database 10g: Introduction to SQL ORACLE UNIVERSITY CONTACT US: 00 9714 390 9000 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

SQL Recursion, Window Queries

SQL Recursion, Window Queries SQL Recursion, Window Queries FCDB 10.2 Dr. Chris Mayfield Department of Computer Science James Madison University Apr 02, 2018 Tips on GP5 Java Create object(s) to represent query results Use ArrayList

More information

Querying Microsoft SQL Server 2012/2014

Querying Microsoft SQL Server 2012/2014 Page 1 of 14 Overview This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server 2014. This course is the foundation

More information

The Next Generation of Extreme OLTP Processing with Oracle TimesTen

The Next Generation of Extreme OLTP Processing with Oracle TimesTen The Next Generation of Extreme OLTP Processing with TimesTen Tirthankar Lahiri Redwood Shores, California, USA Keywords: TimesTen, velocity scaleout elastic inmemory relational database cloud Introduction

More information

PostgreSQL Query Optimization. Step by step techniques. Ilya Kosmodemiansky

PostgreSQL Query Optimization. Step by step techniques. Ilya Kosmodemiansky PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@) Agenda 2 1. What is a slow query? 2. How to chose queries to optimize? 3. What is a query plan? 4. Optimization tools 5.

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

PRIME Deep Dive. Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5

PRIME Deep Dive. Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5 PRIME Deep Dive Table of Contents Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5 Data Processing and Storage In-Memory... 7 Cube Publication Stages... 7 Cube Publication: Parallelization

More information

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here

Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera

More information

Redshift Queries Playbook

Redshift Queries Playbook scalable analytics built for growth Redshift Queries Playbook SQL queries for understanding user behavior Updated June 23, 2015 This playbook shows you how you can use Amplitude's Amazon Redshift database

More information

App Engine: Datastore Introduction

App Engine: Datastore Introduction App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and

More information

OVERVIEW OF RELATIONAL DATABASES: KEYS

OVERVIEW OF RELATIONAL DATABASES: KEYS OVERVIEW OF RELATIONAL DATABASES: KEYS Keys (typically called ID s in the Sierra Database) come in two varieties, and they define the relationship between tables. Primary Key Foreign Key OVERVIEW OF DATABASE

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

After completing this course, participants will be able to:

After completing this course, participants will be able to: Querying SQL Server T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s p a r t i c i p a n t s w i t h t h e t e c h n i c a l s k i l l s r e q u i r e d t o w r i t e b a

More information

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( ) Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL

More information

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#14(b): Implementation of Relational Operations Administrivia HW4 is due today. HW5 is out. Faloutsos/Pavlo

More information

$99.95 per user. Writing Queries for SQL Server (2005/2008 Edition) CourseId: 160 Skill level: Run Time: 42+ hours (209 videos)

$99.95 per user. Writing Queries for SQL Server (2005/2008 Edition) CourseId: 160 Skill level: Run Time: 42+ hours (209 videos) Course Description This course is a comprehensive query writing course for Microsoft SQL Server versions 2005, 2008, and 2008 R2. If you struggle with knowing the difference between an INNER and an OUTER

More information