PostgreSQL/Jsonb. A First Look

Similar documents
JSON in PostgreSQL. Toronto Postgres Users Group Sept Steve Singer

GIN in 9.4 and further

JSON Home Improvement. Christophe Pettus PostgreSQL Experts, Inc. SCALE 14x, January 2016

Rethinking JSONB June, 2015, Ottawa, Canada. Alexander Korotkov, Oleg Bartunov, Teodor Sigaev Postgres Professional

Beyond the B-Tree. Christophe thebuild.com pgexperts.com

PostgreSQL Query Optimization. Step by step techniques. Ilya Kosmodemiansky

modern and different PostgreSQL a talk by Armin Ronacher for DUMP 2014 (Russia)

Next-Generation Parallel Query

Find your neighbours

PostgreSQL, Python, and Squid.

Partition and Conquer Large Data in PostgreSQL 10

CREATE INDEX USING VODKA. VODKA CONNECTING INDEXES! Олег Бартунов, ГАИШ МГУ Александр Коротков, «Интаро-Софт»

SQL Data Definition Language: Create and Change the Database Ray Lockwood

The power of PostgreSQL exposed with automatically generated API endpoints. Sylvain Verly Coderbunker 2016Postgres 中国用户大会 Postgres Conference China 20

Requêtes LATERALes Vik Fearing

PostgreSQL. JSON Roadmap. Oleg Bartunov Postgres Professional. March 17, 2017, Moscow

CREATE INDEX... USING VODKA An efcient indexing of nested structures. Oleg Bartunov (MSU), Teodor Sigaev (MSU), Alexander Korotkov (MEPhI)

JsQuery the jsonb query language with GIN indexing support

Django 1.9 and PostgreSQL

Issues related to PL/pgSQL usage

Table ADT and Sorting. Algorithm topics continuing (or reviewing?) CS 24 curriculum

Stored procedures - what is it?

How Did We Live Without LATERAL?

PostgreSQL to MySQL A DBA's Perspective. Patrick

The Mother of All Query Languages: SQL in Modern Times

Oleg Bartunov, Teodor Sigaev

Becoming a better developer with explain

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Learning Recursion. Recursion [ Why is it important?] ~7 easy marks in Exam Paper. Step 1. Understand Code. Step 2. Understand Execution

Exploring PostgreSQL Datatypes

GIN Oleg Bartunov, Teodor Sigaev PostgreSQL Conference, Ottawa, May 20-23, 2008

Engineering Robust Server Software

Relational databases and SQL

NOSQL FOR POSTGRESQL

Query Optimizer MySQL vs. PostgreSQL

On-Disk Bitmap Index Performance in Bizgres 0.9

Major Features: Postgres 10

CSE 530A. Query Planning. Washington University Fall 2013

CAS CS 460/660 Introduction to Database Systems. Fall

Engineering Robust Server Software

Partitioning Shines in PostgreSQL 11

Chapter 8: Working With Databases & Tables

The Stack, Free Store, and Global Namespace

New and cool in PostgreSQL

MySQL Cluster An Introduction

End o' semester clean up. A little bit of everything

Storage Tier. Mendel Rosenblum. CS142 Lecture Notes - Database.js

Query Optimizer MySQL vs. PostgreSQL

Large Scale MySQL Migration

CSE 344 FEBRUARY 14 TH INDEXING

Database Systems CSE 414

#MySQL #oow16. MySQL Server 8.0. Geir Høydalsvik

Performance Enhancements In PostgreSQL 8.4

PostgreSQL: Decoding Partition

Name Section Number. CS210 Exam #3 *** PLEASE TURN OFF ALL CELL PHONES*** Practice

Computer Science Foundation Exam

relational Key-value Graph Object Document

Advanced Topics on the Mirth Connect Interface Engine. July 6, 2016

Practical MySQL indexing guidelines

BF Survey Pro User Guide

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2009 Quiz I Solutions

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Postgres for Developers

Spatial Databases by Open Standards and Software 3.

Major Features: Postgres 9.5

DB Wide Table Storage. Summer Torsten Grust Universität Tübingen, Germany

SQL and Semi-structured data with PostgreSQL

SQL QUERY EVALUATION. CS121: Relational Databases Fall 2017 Lecture 12

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2017 Quiz I

SQLite vs. MongoDB for Big Data

CSE 530A ACID. Washington University Fall 2013

Process s Address Space. Dynamic Memory. Backing the Heap. Dynamic memory allocation 3/29/2013. When a process starts the heap is empty

This lab will introduce you to MySQL. Begin by logging into the class web server via SSH Secure Shell Client

Reminders - IMPORTANT:

OUZO for indexing sets

Principles of Product Development Flow. Part 3: Managing Queues

Accelerating queries of set data types with GIN, GiST, and custom indexing extensions

CSE 332 Spring 2013: Midterm Exam (closed book, closed notes, no calculators)

Submitted No Schema Type For Mysql Type Datetime

Identifying and Fixing Parameter Sniffing

NoSQL Postgres. Oleg Bartunov Postgres Professional Moscow University. Stachka 2017, Ulyanovsk, April 14, 2017

SP-GiST a new indexing framework for PostgreSQL

Introduction to Database Systems CSE 344

ALTER TABLE Improvements in MARIADB Server. Marko Mäkelä Lead Developer InnoDB MariaDB Corporation

BRIN indexes on geospatial databases

Prelim 2. CS 2110, November 20, 2014, 7:30 PM Extra Total Question True/False Short Answer

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

Parallel Query In PostgreSQL

Pl Sql Copy Table From One Schema To Another

SORTING. How? Binary Search gives log(n) performance.

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Rails + Legacy Databases Brian Hogan - RailsConf 2009 twitter: bphogan IRC: hoganbp

Find All Tables Containing Column With Specified Name Oracle

Drop Table If Exists Sql Command Not Properly Ended

CSE 373 Spring 2010: Midterm #1 (closed book, closed notes, NO calculators allowed)

SQL STORED ROUTINES. CS121: Relational Databases Fall 2017 Lecture 9

Range Types: Your Life Will Never Be The Same. Jonathan S. Katz CTO, VenueBook October 24, 2012

GIN generalization. Alexander Korotkov, Oleg Bartunov

G64DBS Database Systems. Lecture 6 More SQL Data. Creating Relations. Column Constraints. Last Lecture s Question

Lecture 15. Lecture 15: Bitmap Indexes

Transcription:

PostgreSQL/Jsonb A First Look

About Me Started programming in 1981 Owner of Enoki Solutions Inc. Consulting and Software Development Running VanDev since Oct 2010

Why PostgreSQL? Open Source Feature Rich Mature So much better than MySql MongoDB has issues Only atomic at the document level https://jira.mongodb.org/browse/server-14766

Why Json? Blame Javascript But, in a DB context Data locality Data atomicity without transaction overhead? Fancy blob?

Jsonb? A binary format PostgreSQL specific In theory faster to modify Generally smaller to store Indexable!

1st Observation Always use Jsonb if you re going to have the db do anything with it Json can t be indexed

Make everything jsonb? CREATE TABLE tst ( id UUID NOT NULL, data JSONB DEFAULT '{}'::jsonb NOT NULL );

A quick aside on UUIDs Structure your UUIDs (128 bits) as follows Time (ms since epoch, 44 bits, >557 years) If generating more than 2 12 /ms allow this to drift forward If it ends up being a problem, it won t be your problem Sequence (12 bits = 4096/ms) Node (12 bits = 4096 nodes) Expect up to ~1s of time drift when using ntpd Random (60 bits, 1% collision/107.644 million) Set per ms Why? Ids generated at the same time share locality Faster inserts

Make everything jsonb? CREATE TABLE tst ( id UUID NOT NULL, data JSONB DEFAULT '{}'::jsonb NOT NULL ); What happens when you modify data? The whole field is updated If data is large that can be very slow

2nd Observation Consider partitioning into sections CREATE TABLE tst ( id UUID NOT NULL, section_name VARCHAR(128), data JSONB DEFAULT '{}'::jsonb NOT NULL ); Updates to data are smaller now Updates by id are no longer atomic across sections unless you use transactions!

Indexing CREATE UNIQUE INDEX idx_tst_id ON tst USING btree (id); CREATE UNIQUE INDEX idx_tst_id_section_name ON tst USING btree (id, section_name); CREATE INDEX idx_tst_id_section_name_data ON tst USING btree (id, section_name, data); CREATE INDEX idx_tst_section_name_data_tags ON tst USING btree (section_name, ((data->>'tags') :: TEXT)); CREATE INDEX idx_tst_section_name_data_count ON tst USING btree (section_name, ((data->>'count') :: INT8)); Looks funny doesn t it.

Some test data WITH A AS ( INSERT INTO "tst" VALUES ('00000000000000000000000000000011', 'meta', '{"tags":["a","b","c"], "count":10}'),('00000000000000000000000000000012', 'meta','{"tags":["a","d","c"], "count":1}') ON CONFLICT DO NOTHING RETURNING * ) SELECT * FROM A; BTW, WITH is awesome

Did it work? EXPLAIN SELECT * FROM tst WHERE section_name='meta' ORDER BY (data->'count'); Sort (cost=8.17..8.18 rows=1 width=354) Sort Key: ((data -> 'count'::text)) -> Index Scan using idx_tst_section_name_data_count on tst (cost=0.14..8.16 rows=1 width=354) Index Cond: ((section_name)::text = 'meta'::text) EXPLAIN SELECT * FROM tst WHERE section_name='meta' ORDER BY ((data->>'count')::int8); Index Scan using idx_tst_section_name_data_count on tst (cost=0.13..8.15 rows=1 width=330) Index Cond: ((section_name)::text = 'meta'::text) SELECT * FROM tst WHERE section_name='meta' ORDER BY ((data->>'count')::int8) 00000000-0000-0000-0000-000000000012 meta {"tags": ["a", "d", "c"], "count": 1} 00000000-0000-0000-0000-000000000011 meta {"tags": ["a", "b", "c"], "count": 10}

Updating Count WITH X AS ( UPDATE tst SET data = jsonb_set(data, '{count}', to_jsonb(((data ->> 'count') :: INT8) + 1 :: INT8), FALSE) WHERE section_name='meta' and data? 'count' AND data -> 'tags'? 'd' RETURNING * ) SELECT * FROM X; 00000000-0000-0000-0000-000000000012 meta {"tags": ["a", "d", "c"], "count": 2}

What about tags? EXPLAIN SELECT * FROM "tst" WHERE section_name='meta' and "data" -> 'tags'? 'b'; Index Scan using idx_tst_section_name_data_count on tst (cost=0.14..8.17 rows=1 width=32) Index Cond: ((section_name)::text = 'meta'::text) Filter: ((data -> 'tags'::text)? 'b'::text) SELECT * FROM "tst" WHERE section_name='meta' and "data" -> 'tags'? 'b'; 00000000-0000-0000-0000-000000000011 meta {"tags": ["a", "b", "c"], "count": 10} Search within an array is linear?

Gin anyone? DROP TABLE tst; CREATE TABLE tst ( data JSONB DEFAULT '{}'::jsonb NOT NULL ); CREATE INDEX idx_tst_data ON tst USING GIN ((data->'tags')); EXPLAIN SELECT * FROM "tst" WHERE "data" -> 'tags'? 'b'; Bitmap Heap Scan on tst (cost=8.01..12.03 rows=1 width=32) Recheck Cond: ((data -> 'tags'::text)? 'b'::text) -> Bitmap Index Scan on idx_tst_data (cost=0.00..8.01 rows=1 width=0) Index Cond: ((data -> 'tags'::text)? 'b'::text)

Add back section_name CREATE TABLE tst ( section_name VARCHAR(128), data JSONB DEFAULT '{}'::jsonb NOT NULL ); CREATE INDEX idx_tst_data ON tst USING GIN (section_name, (data->'tags')); sql> CREATE INDEX idx_tst_data ON tst USING GIN (section_name, (data->'tags')) [2016-07-24 11:19:21] [42704] ERROR: data type character varying has no default operator class for access method "gin" Hint: You must specify an operator class for the index or define a default operator class for the data type. D oh

btree_gin? CREATE EXTENSION btree_gin; CREATE TABLE tst ( section_name VARCHAR(128), data JSONB DEFAULT '{}'::jsonb NOT NULL ); CREATE INDEX idx_tst_data_1 ON tst USING gin (section_name, (data->'tags')); EXPLAIN SELECT * FROM "tst" WHERE section_name = 'meta' and "data" -> 'tags'? 'b'; Seq Scan on tst (cost=0.00..14.20 rows=1 width=306) Filter: (((section_name)::text = 'meta'::text) AND ((data -> 'tags'::text)? 'b'::text)) Worse?!

No, we need more data create or replace FUNCTION tmpf() RETURNS void AS $$ declare i INTEGER; BEGIN i = 0; while i<100000 loop i = i + 1; insert into tst values ('meta','{"tags":["a"]}'); insert into tst values ('meta','{"tags":["b"]}'); insert into tst values ('meta','{"tags":["b","c"]}'); insert into tst values ('meta','{"tags":["c"]}'); end loop; END $$ LANGUAGE plpgsql;

Before and After index EXPLAIN ANALYSE SELECT * FROM "tst" WHERE section_name = 'meta' and "data" -> 'tags'? 'a'; Seq Scan on tst (cost=0.00..4336.68 rows=1 width=306) (actual time=0.030..184.442 rows=100000 loops=1) Filter: (((section_name)::text = 'meta'::text) AND ((data -> 'tags'::text)? 'a'::text)) Rows Removed by Filter: 300000 Planning time: 0.100 ms Execution time: 186.668 ms CREATE INDEX idx_tst_section_name_data ON tst USING gin (section_name, (data->'tags')); EXPLAIN ANALYSE SELECT * FROM "tst" WHERE section_name = 'meta' and "data" -> 'tags'? 'a'; Bitmap Heap Scan on tst (cost=264.10..1379.31 rows=400 width=32) (actual time=31.616..77.926 rows=100000 loops=1) Recheck Cond: (((section_name)::text = 'meta'::text) AND ((data -> 'tags'::text)? 'a'::text)) Heap Blocks: exact=3054 -> Bitmap Index Scan on idx_tst_section_name_data (cost=0.00..264.00 rows=400 width=0) (actual time=31.139.. 31.139 rows=100000 loops=1) Index Cond: (((section_name)::text = 'meta'::text) AND ((data -> 'tags'::text)? 'a'::text)) Planning time: 0.503 ms Execution time: 80.146 ms

Any better without section? CREATE INDEX idx_tst_data_tags ON tst USING GIN ((data -> 'tags')); EXPLAIN ANALYSE SELECT * FROM "tst" WHERE "data" -> 'tags'? 'a'; Bitmap Heap Scan on tst (cost=103.10..1217.31 rows=400 width=32) (actual time=16.161..59.970 rows=100000 loops=1) Recheck Cond: ((data -> 'tags'::text)? 'a'::text) Heap Blocks: exact=3054 -> Bitmap Index Scan on idx_tst_data_tags (cost=0.00..103.00 rows=400 width=0) (actual time=15.455..15.455 rows=100000 loops=1) Index Cond: ((data -> 'tags'::text)? 'a'::text) Planning time: 3.744 ms Execution time: 62.397 ms

Go big Add 4 million rows 1 row with tag e EXPLAIN ANALYSE SELECT * FROM "tst" WHERE section_name = 'meta' and "data" -> 'tags'? 'e'; Planning time: 3.351 ms Execution time: 759.093 ms CREATE INDEX idx_tst_section_name_data ON tst USING gin (section_name, (data->'tags')); EXPLAIN ANALYSE SELECT * FROM "tst" WHERE section_name = 'meta' and "data" -> 'tags'? 'e'; Planning time: 4.428 ms Execution time: 0.199 ms Looks like it works.

3rd Observation Writes (updates) get slow GIN index updates are that not fast 4 million inserts too ~3 minutes on my machine

Summary It s weird, but it works Need to specify type a lot Need to learn about indexes Need to watch out for document size Need to watch out for index update time WITH is awesome

Q&A