SQL, Scaling, and What s Unique About PostgreSQL

Similar documents
cstore_fdw Columnar store for analytic workloads Hadi Moshayedi & Ben Redman

YeSQL: Battling the NoSQL Hype Cycle with Postgres

Lessons in Building a Distributed Query Planner. Ozgun Erdogan PGCon 2016

Module - 17 Lecture - 23 SQL and NoSQL systems. (Refer Slide Time: 00:04)

h7ps://bit.ly/citustutorial

Introduction to Geodatabase and Spatial Management in ArcGIS. Craig Gillgrass Esri

Road to a Multi-model Database -- making PostgreSQL the most popular and versatile database

PostgreSQL. JSON Roadmap. Oleg Bartunov Postgres Professional. March 17, 2017, Moscow

What is the Future of PostgreSQL?

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

RethinkDB. Niharika Vithala, Deepan Sekar, Aidan Pace, and Chang Xu

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

When, Where & Why to Use NoSQL?

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

The Evolution of. Jihoon Kim, EnterpriseDB Korea EnterpriseDB Corporation. All rights reserved. 1

The EnterpriseDB Engine of PostgreSQL Development

Mega-scale Postgres How to run 1,000,000 Postgres Databases

Stages of Data Processing

Distributing Queries the Citus Way Fast and Lazy. Marco Slot

Beyond Relational Databases: MongoDB, Redis & ClickHouse. Marcos Albe - Principal Support Percona

EPISODE 607 [0:00:00.3] JM:

Kim Greene - Introduction

Open Source Database Ecosystem in Peter Zaitsev 3 October 2016

Shine a Light on Dark Data with Vertica Flex Tables

Postgres Past Present, Future

RavenDB & document stores

CrateDB for Time Series. How CrateDB compares to specialized time series data stores

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

CIB Session 12th NoSQL Databases Structures

Copyright 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12

Key Terms. Attribute join Target table Join table Spatial join

DATABASE SCALE WITHOUT LIMITS ON AWS

PostgreSQL Built-in Sharding:

Interactive SQL-on-Hadoop from Impala to Hive/Tez to Spark SQL to JethroData

Data Analytics at Logitech Snowflake + Tableau = #Winning

PostgreSQL: Hyperconverged DBMS

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

The power of PostgreSQL exposed with automatically generated API endpoints. Sylvain Verly Coderbunker 2016Postgres 中国用户大会 Postgres Conference China 20

New and cool in PostgreSQL

NoSQL Postgres. Oleg Bartunov Postgres Professional Moscow University. Stachka 2017, Ulyanovsk, April 14, 2017

MySQL Cluster Web Scalability, % Availability. Andrew

Evaluation Guide for ASP.NET Web CMS and Experience Platforms

Data 101 Which DB, When. Joe Yong Azure SQL Data Warehouse, Program Management Microsoft Corp.

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Making MongoDB Accessible to All. Brody Messmer Product Owner DataDirect On-Premise Drivers Progress Software

An Introduction to Big Data Formats

Database Systems CSE 414

Tour of Database Platforms as a Service. June 2016 Warner Chaves Christo Kutrovsky Solutions Architect

Real Time for Big Data: The Next Age of Data Management. Talksum, Inc. Talksum, Inc. 582 Market Street, Suite 1902, San Francisco, CA 94104

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

MIS Database Systems.

Massive Scalability With InterSystems IRIS Data Platform

BIS Database Management Systems.

Security and Performance advances with Oracle Big Data SQL

Lesson 12: ArcGIS Server Capabilities

Chapter 24 NOSQL Databases and Big Data Storage Systems

Major Features: Postgres 9.5

Data 101 Which DB, When Joe Yong Sr. Program Manager Microsoft Corp.

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Enterprise Open Source Databases

Understanding NoSQL Database Implementations

CREATE INDEX USING VODKA. VODKA CONNECTING INDEXES! Олег Бартунов, ГАИШ МГУ Александр Коротков, «Интаро-Софт»

The age of Big Data Big Data for Oracle Database Professionals

QLIK INTEGRATION WITH AMAZON REDSHIFT

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

BI ENVIRONMENT PLANNING GUIDE

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Percona Server for MySQL 8.0 Walkthrough

Polyglot Persistence in Today s Data World

Introduction to Database Services

Distributed Databases: SQL vs NoSQL

April Copyright 2013 Cloudera Inc. All rights reserved.

GPU Accelerated Data Processing Speed of Thought Analytics at Scale

Key Value Store. Yiding Wang, Zhaoxiong Yang

Esri s ArcGIS Enterprise. Today s Topics. ArcGIS Enterprise. IT4GIS Keith T. Weber, GISP GIS Director ISU GIS Training and Research Center

Suggested Experience Required Exams Recommended Teradata Courses. TE Teradata 12 Basics

"Out of the Box" Workflow Simplicity Data Access Using PPDM in a Multi-Vendor Environment

CSC 355 Database Systems

Scaling & Sharding PostgreSQL Principles and Practice

Oracle Big Data SQL brings SQL and Performance to Hadoop

MongoDB. An introduction and performance analysis. by Rico Suter

What s New in MySQL and MongoDB Ecosystem Year 2017

OLTP vs. OLAP Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Apache HAWQ (incubating)

Enosis: Bridging the Semantic Gap between

Major Features: Postgres 10

Cloud Computing & Visualization

ITG Software Engineering

Technical Sheet NITRODB Time-Series Database

SQL Server 2014 Performance Tuning and Optimization

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Beyond PostGIS. New developments in Open Source Spatial Databases. Karsten Vennemann. Seattle

International Journal of Informative & Futuristic Research ISSN:

WHITEPAPER. MemSQL Enterprise Feature List

Viva, the NoSQL Postgres!

Course Content MongoDB

Transcription:

SQL, Scaling, and What s Unique About PostgreSQL Ozgun Erdogan Citus Data XLDB May 2018

Punch Line 1. What is unique about PostgreSQL? The extension APIs 2. PostgreSQL extensions are a game changer for relational databases

Talk Outline 1. What is an extension? 2. Why are extensions a game changer for databases? 3. Postgres can t do this Semi-structured or unstructured data Geospatial database S3 or columnar storage for storage Scale out 4. Conclusion

What is an Extension An extension is a piece of software that adds functionality to Postgres. Each extension bundles related objects together. Postgres 9.1 started providing official APIs to override or extend any database module s behavior. CREATE EXTENSION citus; dynamically loads these objects into Postgres address space.

What can you Extend in Postgres? You can override, cooperate with, or extend any combination of the following database modules: Type system and operators User defined functions and aggregates Storage system and indexes Write ahead logging and replication Transaction engine Background worker processes Query planner and query executor Configuration and database metadata

Why are Extensions a game changer Every decade brings new workloads for databases. The last decade was about capturing more data, in more shapes and form. Postgres has been forked by dozens of commercial databases for new workloads. When you fork, your database diverges from the community. What if you could leverage the database ecosystem and grow with it?

Extending a relational database: Really? Extending a relational database is a relatively new idea. Over the years, we received questions on this new idea. 1. Forking vs extensions: Can you really extend any database module? 2. Building from scratch vs extensions: Postgres is a relational database from an old era. It can t do this.

Relational databases can t do this Postgres isn t designed for this : 1. Process semi-structured 2. Run geospatial workloads 3. Non-relational data storage 4. Scale out for large datasets

Postgres can t do semi-structured data NoSQL popularized the use of semi-structured data as an alternative to data models used in relational databases. In practice, each model has benefits. Postgres has an extensible type system. It already supports semi-structured data types: 1. XML 2. Full-text search 3. Hstore: precursor to JSONB 4. JSON / JSONB

JSONB data type store and query from compose.com

JSONB data type aggregate and index

Postgres can do semi-structured data PostgreSQL stores and processes semi-structured data just as efficiently as NoSQL databases. You also get rich features that come with a relational database. http://goo.gl/nuolgp (Mongo vs Postgres jsonb benchmarks) If your semi-structured or unstructured data can t be served by existing data types, you can always create your own type. You can even add operators, aggregate functions, or indexes.

Postgres can t be a spatial database from boundlessgeo.com A spatial database stores and queries data that represents objects defined in a geometric space. Spatial databases represent geometric objects such as lines and polygons. Some databases handle complex structures such as 3D objects and topological coverages.

PostGIS Geographic objects

PostGIS Geospatial joins

Postgres can become a spatial database The PostGIS extension turns PostgreSQL into one of most popular geospatial databases in the world. Thousands of companies use PostGIS for spatial workloads from projects such as OpenStreetMap to start-ups like Hotel Tonight. If you need more from your spatial database, you can easily extend Postgres. In fact, PostGIS comes with six other extensions for specific use cases.

Postgres can only do row storage Postgres 9.1+ comes with foreign data wrapper APIs. With these APIs, you can add read from or write to any data source. Postgres already has 106 wrappers. With these, you can run SQL commands on diverse data sources: 1. S3 (read-only) 2. MongoDB 3. Oracle 4. Cstore_fdw

CStore Columnar storage CStore is under development. For example, cstore doesn t yet support Update / Delete commands. Cstore s primary benefit today is compression. People use it to reduce inmemory and storage footprint.

150K rows (configurable) ORC file format 150K rows (configurable) Block 1 Block 2 Block 3 Block 4 Block 5 Block 6 Block 7 10K column values (configurable) per block

CStore Data Load and Query

Postgres can do more than row stores Default storage engine for relational databases is roworiented. But, Postgres can do way more than row stores. You can extend Postgres to store data in a columnar format or interact with other databases such as DynamoDB or Oracle. Postgres provides extension apis to (1) scan foreign tables, (2) scan foreign joins, (3) update foreign tables, (4) lock rows, (5) sample data, (6) override planner and executor, and more.

Postgres doesn t scale SQL doesn t scale answers a complex problem by making a simple statement. SQL means different things to different people. Depending on the context, it could mean multi-tenant (B2B) databases, short read/writes, real-time analytics, or data warehousing. Scaling each one of these workloads require extending the relational database in a different way.

Citus Distributed database 1. Citus scales out PostgreSQL Uses sharding and replication Query engine parallelizes SQL queries across machines 2. Citus extends PostgreSQL Uses Postgres extension APIs to cooperate with or extend all database modules 3. Available in 3 ways Open source, enterprise software, and managed database as a service on AWS

Citus Scaling out PostgreSQL

Citus Architecture diagram (simplified) SELECT sum( ), count( ) FROM teams_1001 SELECT sum FROM teams_1003 Worker node 1 Table_1001 Table_1003 SELECT avg(..) FROM teams; Coordinator Table metadata Each node Postgres with Citus installed 1 shard = 1 Postgres table SELECT sum FROM teams_1002 SELECT sum FROM teams_1004... Worker node 2 Table_1002 Table_1004... Worker node N

Postgres can scale SQL doesn t scale is a simple statement to a complex problem. It s easy to dismiss a complex problem by making a statement - that trivializes the problem. SQL is hard, not impossible, to scale.

Summary Postgres Extension APIs provide a unique way to build new databases. Postgres can be extended to many different workloads 1. jsonb: Semi-structured data 2. PostGIS: Geospatial database 3. cstore_fdw: columnar storage (in works) 4. Citus: Scale out your database

Conclusion Postgres 10 enables you to extend any database module s behavior. This way, you can use functionality built into Postgres over decades. You can also grow with the rich ecosystem of tools and libraries. Extensions are a game changer for databases. The monolithic relational database could be dying. If so, long live Postgres!

Questions? Ozgun Erdogan ozgun@citusdata.com @citusdata www.citusdata.com 2017 Citus Data. All right reserved.