Distributing Queries the Citus Way Fast and Lazy. Marco Slot
|
|
- Clementine Daisy Blair
- 6 years ago
- Views:
Transcription
1 Distributing Queries the Citus Way Fast and Lazy Marco Slot
2 What is Citus? Citus is an open source extension to Postgres (9.6, 10, 11) for transparently distributing tables across many Postgres servers. Coordinator data create_distributed_table('data', 'tenant_id'); data_1 data_4 data_2 data_5 data_3 data_6 2 Marco Slot Citus Data PostgresConf US 2018
3 How does Citus work? Citus uses hooks and internal functions to change Postgres behaviour and leverage its internal logic. SELECT - planner - custom scan Citus Postgres standard_planner 3 Marco Slot Citus Data PostgresConf US 2018
4 Different use cases for scaling out There are different use cases that can take advantage of distributed databases, in different ways. Examples: Multi-tenant SaaS app needs to scale beyond a single server Real-time analytics dashboards with high data volumes Advanced search across large, dynamic data sets Business intelligence 4 Marco Slot Citus Data PostgresConf US 2018
5 Citus planner(s) Layered planner accommodates different workloads. Router planner Multi-tenant OLTP Pushdown planner Recursive (Subquery/CTE) planning Logical planner Real-time analytics, search Real-time analytics, data warehouse Data warehouse 5 Marco Slot Citus Data PostgresConf US 2018
6 Citus planner(s) Layered planner accommodates different workloads. Query Rate Complex App Users Multi-tenant OLTP Real-time analytics, search Data Size Query Time Complex Query Real-time analytics, data warehouse Data warehouse 6 Marco Slot Citus Data PostgresConf US 2018
7 Co-located distributed tables Tables are automatically assigned to co-location groups, which ensure that rows with the same distribution column value are on the same node. orders_1 Foreign keys orders_2 Joins orders_3 Rollups products_1 products_2 products_3 This enables foreign keys, direct joins, and rollups (INSERT...SELECT) that include the distribution column. 7 Marco Slot Citus Data PostgresConf US 2018
8 Reference tables Reference tables are replicated to all nodes such that they can be joined with distributed tables on any column. orders_1 products_1 orders_2 products_2 orders_3 products_3 Joins Joins Joins category_1 category_1 category_1 8 Marco Slot Citus Data PostgresConf US 2018
9 Router planner How to be a drop-in distributed database 9
10 Routable queries Technical observation: If a query has <distribution column> = <value> filters that (transitively) apply to all tables, it can be routed to a particular node. Efficiently provides full SQL support, since full query can be handled by Postgres. SELECT SELECT 10
11 Routable queries Technical observation: If a query has <distribution column> = <value> filters that (transitively) apply to all tables, it can be routed to a particular node. Efficiently provides full SQL support, since full query can be handled by Postgres. Return 11
12 Scaling Multi-tenant Applications Use case observation: In a SaaS (B2B) application, most queries are specific to a particular tenant. Can add tenant ID column to all tables and distribute by tenant ID. Most queries are router plannable: Low overhead, low latency, full SQL capabilities of Postgres, scales out 12 Marco Slot Citus Data PostgresConf US 2018
13 Router planner with explicit filters Can explicitly provide filters on all tables: SELECT app_id, event_time FROM ( SELECT tenant_id, app_id, item_name FROM items WHERE tenant_id = 1783 ) LEFT JOIN ( SELECT tenant_id, app_id, max(event_time) AS event_time FROM events WHERE tenant_id = 1783 GROUP BY tenant_id, app_id ) USING (tenant_id, app_id) ORDER BY 2 DESC LIMIT 10; All distributed tables have filters by the same value 13
14 Router planner with inferred filters Citus can infer distribution column filters from joins: SELECT app_id, event_time FROM ( SELECT tenant_id, app_id, item_name FROM items WHERE tenant_id = 1783 ) LEFT JOIN ( SELECT tenant_id, app_id, max(event_time) AS event_time FROM events GROUP BY tenant_id, app_id ) USING (tenant_id, app_id) ORDER BY 2 DESC LIMIT 10; Filter on orders can be inferred from joins 14 Marco Slot Citus Data PostgresConf US 2018
15 Extracting relation filters What does Citus need to do to infer filters? Be lazy and call the Postgres planner: planner() -> citus_planner() -> standard_planner() Obtain filters on all relation from Postgres planning logic 15 Marco Slot Citus Data PostgresConf US 2018
16 Pushdown planning Make your workers work 16
17 Distributed queries Technical observation: Most common SQL features (aggregates, GROUP BY, ORDER BY, LIMIT) can be distributed in a single round. SELECT SELECT SELECT 17
18 Distributed queries Technical observation: Most common SQL features (aggregates, GROUP BY, ORDER BY, LIMIT) can be distributed in a single round. Merge 18
19 Merging query results Get the top 10 pages with the highest response times: SELECT page_id, avg(response_time) FROM page_views GROUP BY page_id ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018
20 Queries on shards Queries on shards when page_id is the distribution column: SELECT page_id, avg(response_time) FROM page_views_ GROUP BY page_id ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018
21 Merging query results When page_id is the distribution column: get top 10 of top 10s. SELECT page_id, avg FROM Concatenated results of queries on shards ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018
22 Queries on shards Queries on shards when page_id is not the distribution column: SELECT page_id, sum(response_time), count(response_time) FROM page_views_ GROUP BY page_id 22 Marco Slot Citus Data PostgresConf US 2018
23 Merging query results When page_id is not the distribution column: merge the averages SELECT page_id, sum(sum) / sum(count) FROM Concatenated results of queries on shards GROUP BY page_id ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018
24 What about subqueries? Instead of a table, we can have joins or subqueries: SELECT page_id, response_time FROM ( SELECT page_id FROM pages WHERE site = ' ) p JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views WHERE view_time > date ' ' GROUP BY page_id ) v USING (page_id) ORDER BY 2 DESC LIMIT 10; 24
25 Distributed queries Technical observation: A query that joins all distributed tables by distribution column with subqueries that do not aggregate across distribution column values can be distributed in a single round. 25 Marco Slot Citus Data PostgresConf US 2018
26 Pushdown planner Determine whether distribution columns are equal using Postgres planner: SELECT page_id, response_time FROM ( SELECT page_id FROM pages WHERE site = ' ) p Distribution column equality JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views WHERE view_time > date ' ' GROUP BY page_id ) v USING (page_id) ORDER BY 2 DESC LIMIT 10; 26 Marco Slot Citus Data PostgresConf US 2018
27 Pushdown planner Subquery results need to be partitionable by distribution column: SELECT page_id, response_time FROM ( SELECT page_id FROM pages WHERE site = ' ) p JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views WHERE view_time > date ' ' GROUP BY page_id ) v USING (page_id) ORDER BY 2 DESC LIMIT 10; 27 Marco Slot Citus Data PostgresConf US 2018 No aggregation across distribution column values.
28 Pushdown planner Subqueries can be executed across co-located shards in parallel: 28 SELECT page_id, response_time FROM ( SELECT page_id FROM pages_ WHERE site = ' ) JOIN ( SELECT page_id, avg(response_time) AS response_time FROM page_views_ WHERE view_time > date ' ' GROUP BY page_id ) USING (page_id) ORDER BY 2 DESC LIMIT 10;
29 Merging query results Merge the results on the coordinator: SELECT page_id, response_time FROM Concatenated results of queries on shards ORDER BY 2 DESC LIMIT Marco Slot Citus Data PostgresConf US 2018
30 Scaling Real-time Analytics Applications Use case observation: Real-time analytics dashboards need sub-second response time, regardless of data size. Single-round distributed queries are powerful, fast and scalable. In practice: Maintain aggregation tables using parallel INSERT...SELECT Dashboard selects from the aggregation table 30 Marco Slot Citus Data PostgresConf US 2018
31 Complex subqueries What about subqueries with merge steps? SELECT product_name, count FROM products JOIN ( SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 ) top10_products USING (product_id) ORDER BY count; 31 Marco Slot Citus Data PostgresConf US 2018
32 Recursive planning Have a query you can t solve? Call the Postgres planner! 32
33 Recursive planning Technical observation: Subqueries and CTEs that cannot be pushed down can often be executed as distributed queries. Pull-push execution: - Recursively call planner() on the subquery - During execution, stream results back into worker nodes - Replace the subquery with a function call that acts as a reference table 33 Marco Slot Citus Data PostgresConf US 2018
34 Recursive planning Separately plan CTEs and subqueries that violate pushdown rules: SELECT product_name, count FROM products JOIN ( SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 ) top10_products USING (product_id) ORDER BY count; 34 Marco Slot Citus Data PostgresConf US 2018
35 Recursive planning In the outer query, replace subquery with intermediate result, treated as reference table: SELECT product_name, count FROM products JOIN ( SELECT * FROM read_intermediate_result(...) AS r(product_id text, count int) ) top10_products USING (product_id) ORDER BY count; 35 Marco Slot Citus Data PostgresConf US 2018
36 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 SELECT SELECT SELECT 36 Marco Slot Citus Data PostgresConf US 2018
37 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 Merge 37 Marco Slot Citus Data PostgresConf US 2018
38 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_id, count(*) FROM orders GROUP BY product_id ORDER BY 2 DESC LIMIT 10 Results 38 Marco Slot Citus Data PostgresConf US 2018
39 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_name, count FROM products JOIN (SELECT * FROM read_intermediate_result(...) ) ; SELECT SELECT 39 Marco Slot Citus Data PostgresConf US 2018
40 Pull-push execution Execute non-pushdownable subqueries separately: SELECT product_name, count FROM products JOIN (SELECT * FROM read_intermediate_result(...) ) ; Merge 40 Marco Slot Citus Data PostgresConf US 2018
41 Recursive planning Different parts of a query can be handled by different planners. SELECT... SELECT... Router Pushdownable SELECT... SELECT... Pushdownable Local 41
42 Joins between tables and intermediate results Technical observation: Intermediate results of CTEs and subqueries are treated as reference tables: can use any join column. WITH distributed_query AS (...) SELECT distributed_query JOIN distributed_table USING (any_column) 42 Marco Slot Citus Data PostgresConf US 2018
43 Joins between intermediate results Technical observation: Queries with only intermediate results (CTEs or subqueries) are router plannable: full SQL in a single round-trip. WITH distributed_query_1 AS (...), distributed_query_2 AS (...) Can use any SQL feature without SELECT further merge steps distributed_query_1 distributed_query_2 43 Marco Slot Citus Data PostgresConf US 2018
44 Scaling Real-time Analytics Applications Use case observation: Real-time analytics applications want versatile distributed SQL support Recursive planning provides nearly full, distributed SQL support in a small number of network round trips. 44 Marco Slot Citus Data PostgresConf US 2018
45 Logical planner Handling non-co-located joins through relational algebra 45
46 Non-co-located joins Business intelligence queries may join on non-distribution columns. SELECT product_id, count(*) Distributed by product_id FROM shopping_carts JOIN products USING (product_id) WHERE shopping_carts.country = 'US' AND products.category = 'Books' GROUP BY product_id; Distributed by customer_id for fast lookup of shopping cart 46 Marco Slot Citus Data PostgresConf US 2018
47 Distributed query optimisation Apply operations that reduce data size before re-partitioning. GROUP BY: product_id Join by product_id Project product_id Filter: country = 'US' Join by product_id GROUP BY: product_id Filter: category = 'Books' Table: products Re-partition by product_id GROUP BY: product_id Project product_id Filter: category = 'Books' Table: products Re-partition by product_id Table: shopping_carts Filter: country = 'US' Table: shopping_carts 47
48 Re-partitioning Split query results into buckets based on product_id SELECT partition_query_result($$ SELECT product_id, count(*) FROM shopping_carts_1028 WHERE country = 'US' GROUP BY product_id $$, 'product_id'); SELECT SELECT 48
49 Re-partitioning Fetch product_id buckets to the matching products shards. SELECT fetch_file(...); SELECT SELECT 49
50 Re-partitioning Join merged buckets with products table SELECT product_id, count FROM fragment_2138 JOIN products_ USING (product_id) WHERE products.category = 'Books'; SELECT x x x x SELECT x x 50
51 Join order planning Joins across multiple tables should avoid re-partitioning when unnecessary: orders JOIN shopping_carts JOIN customers JOIN products Bad join order: orders x shopping_carts join result x customers join result x products re-partition by customer_id re-partition by product_id query result Good join order: shopping_carts x customer re-partition by product_id join result x orders x products query result 51 Marco Slot Citus Data PostgresConf US 2018
52 Evolution of distributed SQL CitusDB: Joins, aggregates, grouping, ordering, etc. Citus 5.0: Outer joins, HAVING (2016) Citus 5.1: COPY, EXPLAIN Citus 5.2: Full SQL for router queries Citus 6.0: Co-location, INSERT...SELECT Citus 6.1: Reference tables (2017) Citus 6.2: Subquery pushdown Citus 7.0: Multi-row INSERT Citus 7.1: Window functions, DISTINCT Citus 7.2: CTEs, Subquery pull-push (2018) Citus 7.3: Arbitrary subqueries Citus 7.4: UPDATE/DELETE with subquery pushdown 52 Marco Slot Citus Data PostgresConf US 2018
53 Thanks! 53
SQL, Scaling, and What s Unique About PostgreSQL
SQL, Scaling, and What s Unique About PostgreSQL Ozgun Erdogan Citus Data XLDB May 2018 Punch Line 1. What is unique about PostgreSQL? The extension APIs 2. PostgreSQL extensions are a game changer for
More informationClickHouse Deep Dive. Aleksei Milovidov
ClickHouse Deep Dive Aleksei Milovidov ClickHouse use cases A stream of events Actions of website visitors Ad impressions DNS queries E-commerce transactions We want to save info about these events and
More informationThe Future of Postgres Sharding
The Future of Postgres Sharding BRUCE MOMJIAN This presentation will cover the advantages of sharding and future Postgres sharding implementation requirements. Creative Commons Attribution License http://momjian.us/presentations
More informationData Analysis. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 12/13/12
Data Analysis CPS352: Database Systems Simon Miner Gordon College Last Revised: 12/13/12 Agenda Check-in NoSQL Database Presentations Online Analytical Processing Data Mining Course Review Exam II Course
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationcstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman
cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman What is CitusDB? CitusDB is a scalable analytics database that extends PostgreSQL Citus shards your data and automa/cally parallelizes
More informationCourse Outline. Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led
Querying Data with Transact-SQL Course 20761B: 5 days Instructor Led About this course This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days
More informationQuerying Data with Transact-SQL
Querying Data with Transact-SQL 20761B; 5 Days; Instructor-led Course Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can
More informationDuration Level Technology Delivery Method Training Credits. Classroom ILT 5 Days Intermediate SQL Server
NE-20761C Querying with Transact-SQL Summary Duration Level Technology Delivery Method Training Credits Classroom ILT 5 Days Intermediate SQL Virtual ILT On Demand SATV Introduction This course is designed
More informationPostgreSQL Built-in Sharding:
Copyright(c)2017 NTT Corp. All Rights Reserved. PostgreSQL Built-in Sharding: Enabling Big Data Management with the Blue Elephant E. Fujita, K. Horiguchi, M. Sawada, and A. Langote NTT Open Source Software
More information20761B: QUERYING DATA WITH TRANSACT-SQL
ABOUT THIS COURSE This 5 day course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge
More informationDB2 SQL Class Outline
DB2 SQL Class Outline The Basics of SQL Introduction Finding Your Current Schema Setting Your Default SCHEMA SELECT * (All Columns) in a Table SELECT Specific Columns in a Table Commas in the Front or
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationQuerying Data with Transact-SQL
Querying Data with Transact-SQL Course 20761C 5 Days Instructor-led, Hands on Course Information The main purpose of the course is to give students a good understanding of the Transact- SQL language which
More informationAssignment No: Create a College database and apply different queries on it. 2. Implement GUI for SQL queries and display result of the query
Assignment No: 1 GUI Implementation for SQL queries Learning Outcomes: At the end of this assignment students will be able to To create a simple table Write queries for the manipulation of the table Design
More informationQuerying Data with Transact-SQL
Querying Data with Transact-SQL Duration: 5 Days Course Code: M20761 Overview: This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious
More informationQuerying Data with Transact-SQL
Course Code: M20761 Vendor: Microsoft Course Overview Duration: 5 RRP: 2,177 Querying Data with Transact-SQL Overview This course is designed to introduce students to Transact-SQL. It is designed in such
More informationMajor Features: Postgres 9.5
Major Features: Postgres 9.5 BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 9.5 release. Creative Commons Attribution
More informationDesigning dashboards for performance. Reference deck
Designing dashboards for performance Reference deck Basic principles 1. Everything in moderation 2. If it isn t fast in database, it won t be fast in Tableau 3. If it isn t fast in desktop, it won t be
More informationQuerying Data with Transact-SQL
Querying Data with Transact-SQL General Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students
More informationSQL Server 2012 Development Course
SQL Server 2012 Development Course Exam: 1 Lecturer: Amirreza Keshavarz May 2015 1- You are a database developer and you have many years experience in database development. Now you are employed in a company
More informationLessons in Building a Distributed Query Planner. Ozgun Erdogan PGCon 2016
Lessons in Building a Distributed Query Planner Ozgun Erdogan PGCon 2016 Talk Outline 1. IntroducCon 2. Key insight in distributed planning 3. Distributed logical plans 4. Distributed physical plans 5.
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More information20761C: Querying Data with Transact-SQL
20761C: Querying Data with Transact-SQL Course Details Course Code: Duration: Notes: 20761C 5 days This course syllabus should be used to determine whether the course is appropriate for the students, based
More informationCSE Midterm - Spring 2017 Solutions
CSE Midterm - Spring 2017 Solutions March 28, 2017 Question Points Possible Points Earned A.1 10 A.2 10 A.3 10 A 30 B.1 10 B.2 25 B.3 10 B.4 5 B 50 C 20 Total 100 Extended Relational Algebra Operator Reference
More informationBeyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona
Beyond Relational Databases: MongoDB, Redis & ClickHouse Marcos Albe - Principal Support Engineer @ Percona Introduction MySQL everyone? Introduction Redis? OLAP -vs- OLTP Image credits: 451 Research (https://451research.com/state-of-the-database-landscape)
More informationExamples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15
Examples of Physical Query Plan Alternatives Selected Material from Chapters 12, 14 and 15 1 Query Optimization NOTE: SQL provides many ways to express a query. HENCE: System has many options for evaluating
More informationWhat s New in MariaDB Server Max Mether VP Server
What s New in MariaDB Server 10.3 Max Mether VP Server Recap MariaDB 10.2 New in MariaDB 10.2 - GA since May 2017 What s New in 10.2 Analytics SQL Window Functions Common Table Expressions (CTE) JSON JSON
More information20761 Querying Data with Transact SQL
Course Overview The main purpose of this course is to give students a good understanding of the Transact-SQL language which is used by all SQL Server-related disciplines; namely, Database Administration,
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationShark: Hive (SQL) on Spark
Shark: Hive (SQL) on Spark Reynold Xin UC Berkeley AMP Camp Aug 21, 2012 UC BERKELEY SELECT page_name, SUM(page_views) views FROM wikistats GROUP BY page_name ORDER BY views DESC LIMIT 10; Stage 0: Map-Shuffle-Reduce
More informationShen PingCAP 2017
Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL
More informationMajor Features: Postgres 10
Major Features: Postgres 10 BRUCE MOMJIAN POSTGRESQL is an open-source, full-featured relational database. This presentation gives an overview of the Postgres 10 release. Creative Commons Attribution License
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationWriting Analytical Queries for Business Intelligence
MOC-55232 Writing Analytical Queries for Business Intelligence 3 Days Overview About this Microsoft SQL Server 2016 Training Course This three-day instructor led Microsoft SQL Server 2016 Training Course
More informationAccelerating BI on Hadoop: Full-Scan, Cubes or Indexes?
White Paper Accelerating BI on Hadoop: Full-Scan, Cubes or Indexes? How to Accelerate BI on Hadoop: Cubes or Indexes? Why not both? 1 +1(844)384-3844 INFO@JETHRO.IO Overview Organizations are storing more
More informationMySQL Cluster Web Scalability, % Availability. Andrew
MySQL Cluster Web Scalability, 99.999% Availability Andrew Morgan @andrewmorgan www.clusterdb.com Safe Harbour Statement The following is intended to outline our general product direction. It is intended
More informationT-SQL Training: T-SQL for SQL Server for Developers
Duration: 3 days T-SQL Training Overview T-SQL for SQL Server for Developers training teaches developers all the Transact-SQL skills they need to develop queries and views, and manipulate data in a SQL
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationData 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.
Data 101 Which DB, When Joe Yong (joeyong@microsoft.com) Azure SQL Data Warehouse, Program Management Microsoft Corp. The world is changing AI increased by 300% in 2017 Data will grow to 44 ZB in 2020
More informationApache Phoenix We put the SQL back in NoSQL
Apache Phoenix We put the SQL back in NoSQL http://phoenix.incubator.apache.org James Taylor @JamesPlusPlus Maryann Xue @MaryannXue Eli Levine @teleturn About James o o Engineer at Salesforce.com in BigData
More information"Charting the Course... MOC C: Querying Data with Transact-SQL. Course Summary
Course Summary Description This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first three days can be taught as a course to students requiring the knowledge
More informationMicrosoft Querying Data with Transact-SQL - Performance Course
1800 ULEARN (853 276) www.ddls.com.au Microsoft 20761 - Querying Data with Transact-SQL - Performance Course Length 4 days Price $4290.00 (inc GST) Version C Overview This course is designed to introduce
More informationWeb Applications. Software Engineering 2017 Alessio Gambi - Saarland University
Web Applications Software Engineering 2017 Alessio Gambi - Saarland University Based on the work of Cesare Pautasso, Christoph Dorn, Andrea Arcuri, and others ReCap Software Architecture A software system
More informationGreenplum SQL Class Outline
Greenplum SQL Class Outline The Basics of Greenplum SQL Introduction SELECT * (All Columns) in a Table Fully Qualifying a Database, Schema and Table SELECT Specific Columns in a Table Commas in the Front
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationAdvanced Data Management Technologies Written Exam
Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This
More informationChapter 3. Introduction to relational databases and MySQL. 2010, Mike Murach & Associates, Inc. Murach's PHP and MySQL, C3
1 Chapter 3 Introduction to relational databases and MySQL Slide 2 Objectives Applied 1. Use phpmyadmin to review the data and structure of the tables in a database, to import and run SQL scripts that
More informationOracle Database: Introduction to SQL Ed 2
Oracle University Contact Us: +40 21 3678820 Oracle Database: Introduction to SQL Ed 2 Duration: 5 Days What you will learn This Oracle Database 12c: Introduction to SQL training helps you write subqueries,
More informationQuerying Data with Transact-SQL
Querying Data with Transact-SQL Código del curso: 20761 Duración: 5 días Acerca de este curso This course is designed to introduce students to Transact-SQL. It is designed in such a way that the first
More informationLazyBase: Trading freshness and performance in a scalable database
LazyBase: Trading freshness and performance in a scalable database (EuroSys 2012) Jim Cipar, Greg Ganger, *Kimberly Keeton, *Craig A. N. Soules, *Brad Morrey, *Alistair Veitch PARALLEL DATA LABORATORY
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationImpala. A Modern, Open Source SQL Engine for Hadoop. Yogesh Chockalingam
Impala A Modern, Open Source SQL Engine for Hadoop Yogesh Chockalingam Agenda Introduction Architecture Front End Back End Evaluation Comparison with Spark SQL Introduction Why not use Hive or HBase?
More informationOracle Streams. An Oracle White Paper October 2002
Oracle Streams An Oracle White Paper October 2002 Oracle Streams Executive Overview... 3 Introduction... 3 Oracle Streams Overview... 4... 5 Staging... 5 Propagation... 6 Transformations... 6 Consumption...
More informationAster Data Basics Class Outline
Aster Data Basics Class Outline CoffingDW education has been customized for every customer for the past 20 years. Our classes can be taught either on site or remotely via the internet. Education Contact:
More informationS-Store: Streaming Meets Transaction Processing
S-Store: Streaming Meets Transaction Processing H-Store is an experimental database management system (DBMS) designed for online transaction processing applications Manasa Vallamkondu Motivation Reducing
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationPerformance Optimization for Informatica Data Services ( Hotfix 3)
Performance Optimization for Informatica Data Services (9.5.0-9.6.1 Hotfix 3) 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,
More informationQuerying Data with Transact SQL
Course 20761A: Querying Data with Transact SQL Course details Course Outline Module 1: Introduction to Microsoft SQL Server 2016 This module introduces SQL Server, the versions of SQL Server, including
More informationExam #1 Review. Zuyin (Alvin) Zheng
Exam #1 Review Zuyin (Alvin) Zheng Data/Information/Database Data vs. Information Data Information Discrete, unorganized, raw facts The transformation of those facts into meaning Transactional Data vs.
More informationHigh-Performance Distributed DBMS for Analytics
1 High-Performance Distributed DBMS for Analytics 2 About me Developer, hardware engineering background Head of Analytic Products Department in Yandex jkee@yandex-team.ru 3 About Yandex One of the largest
More informationCOGNOS (R) 8 GUIDELINES FOR MODELING METADATA FRAMEWORK MANAGER. Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata
COGNOS (R) 8 FRAMEWORK MANAGER GUIDELINES FOR MODELING METADATA Cognos(R) 8 Business Intelligence Readme Guidelines for Modeling Metadata GUIDELINES FOR MODELING METADATA THE NEXT LEVEL OF PERFORMANCE
More informationOracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L,
Oracle NoSQL Database Parent-Child Joins and Aggregation O R A C L E W H I T E P A P E R A P R I L, 2 0 1 8 Table of Contents Introduction 1 Parent Table Child Table Joins 2 Comparison to RDBMS LEFT OUTER
More informationAdvanced Oracle Performance Troubleshooting. Query Transformations Randolf Geist
Advanced Oracle Performance Troubleshooting Query Transformations Randolf Geist http://oracle-randolf.blogspot.com/ http://www.sqltools-plusplus.org:7676/ info@sqltools-plusplus.org Independent Consultant
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationTurbo charging SAS Data Integration using SAS In-Database technologies Paul Jones
Turbo charging SAS Data Integration using SAS In-Database technologies Paul Jones Agenda What is SAS In-Database Why do we do Who are we working with.. When? How.does it work What is SAS In-Database? Integration
More informationCitus Documentation. Release Citus Data
Citus Documentation Release 6.0.1 Citus Data May 17, 2018 About Citus 1 What is Citus? 3 1.1 When to Use Citus............................................ 3 1.2 Considerations for Use..........................................
More informationData pipelines with PostgreSQL & Kafka
Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More informationMS_20761 Querying Data with Transact-SQL
Querying Data with Transact-SQL www.ked.com.mx Av. Revolución No. 374 Col. San Pedro de los Pinos, C.P. 03800, México, CDMX. Tel/Fax: 52785560 Por favor no imprimas este documento si no es necesario. About
More informationAgenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache
Databases on AWS 2017 Amazon Web Services, Inc. and its affiliates. All rights served. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon Web Services,
More informationIntroduction to relational databases and MySQL
Chapter 3 Introduction to relational databases and MySQL A products table Columns 2017, Mike Murach & Associates, Inc. C3, Slide 1 2017, Mike Murach & Associates, Inc. C3, Slide 4 Objectives Applied 1.
More information[Some of] New Query Optimizer features in MariaDB Sergei Petrunia MariaDB Shenzhen Meetup November 2017
[Some of] New Query Optimizer features in MariaDB 10.3 Sergei Petrunia MariaDB Shenzhen Meetup November 2017 2 Plan MariaDB 10.2: Condition pushdown MariaDB 10.3: Condition pushdown
More informationLeveraging Query Parallelism In PostgreSQL
Leveraging Query Parallelism In PostgreSQL Get the most from intra-query parallelism! Dilip Kumar (Principle Software Engineer) Rafia Sabih (Software Engineer) 2013 EDB All rights reserved. 1 Overview
More informationMIS2502: Review for Exam 2. JaeHwuen Jung
MIS2502: Review for Exam 2 JaeHwuen Jung jaejung@temple.edu http://community.mis.temple.edu/jaejung Overview Date/Time: Wednesday, Mar 28, in class (50 minutes) Place: Regular classroom Please arrive 5
More informationOracle Database 10g: Introduction to SQL
ORACLE UNIVERSITY CONTACT US: 00 9714 390 9000 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationSQL Recursion, Window Queries
SQL Recursion, Window Queries FCDB 10.2 Dr. Chris Mayfield Department of Computer Science James Madison University Apr 02, 2018 Tips on GP5 Java Create object(s) to represent query results Use ArrayList
More informationQuerying Microsoft SQL Server 2012/2014
Page 1 of 14 Overview This 5-day instructor led course provides students with the technical skills required to write basic Transact-SQL queries for Microsoft SQL Server 2014. This course is the foundation
More informationThe Next Generation of Extreme OLTP Processing with Oracle TimesTen
The Next Generation of Extreme OLTP Processing with TimesTen Tirthankar Lahiri Redwood Shores, California, USA Keywords: TimesTen, velocity scaleout elastic inmemory relational database cloud Introduction
More informationPostgreSQL Query Optimization. Step by step techniques. Ilya Kosmodemiansky
PostgreSQL Query Optimization Step by step techniques Ilya Kosmodemiansky (ik@) Agenda 2 1. What is a slow query? 2. How to chose queries to optimize? 3. What is a query plan? 4. Optimization tools 5.
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationPRIME Deep Dive. Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5
PRIME Deep Dive Table of Contents Introduction... 1 Data Model and Load... 2 OLAP Cubes... 2 MTDI Cubes... 5 Data Processing and Storage In-Memory... 7 Cube Publication Stages... 7 Cube Publication: Parallelization
More informationHadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here
Hadoop Beyond Batch: Real-time Workloads, SQL-on- Hadoop, and thevirtual EDW Headline Goes Here Marcel Kornacker marcel@cloudera.com Speaker Name or Subhead Goes Here 2013-11-12 Copyright 2013 Cloudera
More informationRedshift Queries Playbook
scalable analytics built for growth Redshift Queries Playbook SQL queries for understanding user behavior Updated June 23, 2015 This playbook shows you how you can use Amplitude's Amazon Redshift database
More informationApp Engine: Datastore Introduction
App Engine: Datastore Introduction Part 1 Another very useful course: https://www.udacity.com/course/developing-scalableapps-in-java--ud859 1 Topics cover in this lesson What is Datastore? Datastore and
More informationOVERVIEW OF RELATIONAL DATABASES: KEYS
OVERVIEW OF RELATIONAL DATABASES: KEYS Keys (typically called ID s in the Sierra Database) come in two varieties, and they define the relationship between tables. Primary Key Foreign Key OVERVIEW OF DATABASE
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationAfter completing this course, participants will be able to:
Querying SQL Server T h i s f i v e - d a y i n s t r u c t o r - l e d c o u r s e p r o v i d e s p a r t i c i p a n t s w i t h t h e t e c h n i c a l s k i l l s r e q u i r e d t o w r i t e b a
More informationCIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )
Guide: CIS 601 Graduate Seminar Presented By: Dr. Sunnie S. Chung Dhruv Patel (2652790) Kalpesh Sharma (2660576) Introduction Background Parallel Data Warehouse (PDW) Hive MongoDB Client-side Shared SQL
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#14(b): Implementation of Relational Operations Administrivia HW4 is due today. HW5 is out. Faloutsos/Pavlo
More information$99.95 per user. Writing Queries for SQL Server (2005/2008 Edition) CourseId: 160 Skill level: Run Time: 42+ hours (209 videos)
Course Description This course is a comprehensive query writing course for Microsoft SQL Server versions 2005, 2008, and 2008 R2. If you struggle with knowing the difference between an INNER and an OUTER
More information