GIN generalization. Alexander Korotkov, Oleg Bartunov

Similar documents
GIN in 9.4 and further

GIN Oleg Bartunov, Teodor Sigaev PostgreSQL Conference, Ottawa, May 20-23, 2008

SP-GiST a new indexing framework for PostgreSQL

CREATE INDEX... USING VODKA An efcient indexing of nested structures. Oleg Bartunov (MSU), Teodor Sigaev (MSU), Alexander Korotkov (MEPhI)

Access method extendability in PostgreSQL or back to origin

CREATE INDEX USING VODKA. VODKA CONNECTING INDEXES! Олег Бартунов, ГАИШ МГУ Александр Коротков, «Интаро-Софт»

Rethinking JSONB June, 2015, Ottawa, Canada. Alexander Korotkov, Oleg Bartunov, Teodor Sigaev Postgres Professional

Find your neighbours

PostgreSQL 9.6 New advances in Full Text Search. Oleg Bartunov Postgres Professional

Pluggable Storage in PostgreSQL

JSON in PostgreSQL. Toronto Postgres Users Group Sept Steve Singer

Text Search and Similarity Search

int fnvgetconfig(handle h, UINT32 id, const void *cfg, size_t sz);... 4

PostgreSQL/Jsonb. A First Look

Indexing. UCSB 290N. Mainly based on slides from the text books of Croft/Metzler/Strohman and Manning/Raghavan/Schutze

Fuzzy Matching In PostgreSQL

What s in a Plan? And how did it get there, anyway?

Reminders - IMPORTANT:

PostgreSQL Query Optimization. Step by step techniques. Ilya Kosmodemiansky

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Schema-less PostgreSQL Current and Future

CSE 530A. Query Planning. Washington University Fall 2013

Flexible Full Text Search

JsQuery the jsonb query language with GIN indexing support

PostgreSQL. JSON Roadmap. Oleg Bartunov Postgres Professional. March 17, 2017, Moscow

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Database Systems: Fall 2017 Quiz I

Spatial indexes in PostgreSQL for astronomy

Oleg Bartunov, Teodor Sigaev

Week 03 Lecture. Buffer Pool. DBMS Parameters. Buffer Pool. Our view of relations in DBMSs:

Key/Value Pair versus hstore - Benchmarking Entity-Attribute-Value Structures in PostgreSQL.

Key/Value Pair versus hstore - Benchmarking Entity-Attribute-Value Structures in PostgreSQL.

On-Disk Bitmap Index Performance in Bizgres 0.9

DB Wide Table Storage. Summer Torsten Grust Universität Tübingen, Germany

PDF Document structure, that need for managing of PDF file. It uses in all functions from EMF2PDF SDK.

Next-Generation Parallel Query

PostgreSQL internals. Pavel Stěhule PostgreSQL Meetup

Lecture Data layout on disk. How to store relations (tables) in disk blocks. By Marina Barsky Winter 2016, University of Toronto

TriMedia Motion JPEG Decoder (VdecMjpeg) API 1

PostgreSQL 9.4 and JSON

Type Checking. Prof. James L. Frankel Harvard University

Artemis SDK. Copyright Artemis CCD Limited October 2011 Version

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14

BRIN indexes on geospatial databases

Jsonb roadmap. Oleg Bartunov Postgres Professional

Versatile access to HEALPix based sky region objects within PostgreSQL data bases with PgSphere

On-Disk Bitmap Index In Bizgres

Parallel Query In PostgreSQL

Column Stores vs. Row Stores How Different Are They Really?

Improving generalized inverted index lock wait times

CS 222/122C Fall 2016, Midterm Exam

NoSQL Postgres. Oleg Bartunov Postgres Professional Moscow University. Stachka 2017, Ulyanovsk, April 14, 2017

IMPLEMENTING THE PROFIBUS-DP MESSAGE MODE

NOSQL FOR POSTGRESQL

CREATE INDEX USING RUM RUM index and its application to FTS

OUZO for indexing sets

CS6200 Information Retrieval. David Smith College of Computer and Information Science Northeastern University

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Lecture 15. Lecture 15: Bitmap Indexes

Call DLL from Limnor Applications

COP 3223 Introduction to Programming with C - Study Union - Fall 2017

SBIG ASTRONOMICAL INSTRUMENTS

Jordan University of Science & Technology Department of Computer Science CS 211 Exam #1 (23/10/2010) -- Form A

1 o Semestre 2007/2008

Requêtes LATERALes Vik Fearing

CSCI 2212: Intermediate Programming / C Review, Chapters 10 and 11

Partition and Conquer Large Data in PostgreSQL 10

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Exposing PostgreSQL Internals with User-Defined Functions

Range Types: Your Life Will Never Be The Same. Jonathan S. Katz CTO, VenueBook October 24, 2012

Exam Principles of Imperative Computation, Summer 2011 William Lovas. June 24, 2011

PostgreSQL Performance The basics

SIMATIC Industrial software Readme SIMATIC S7-PLCSIM Advanced V2.0 SP1 Readme

WL#2474. refinements to Multi Range Read interface. Sergey Petrunia The World s Most Popular Open Source Database

TriMedia Motion JPEG Encoder (VencMjpeg) API 1

Go Faster With Native Compilation PGCon th June 2015

Pointers, Pointers, Pointers!

Variable Length Integers for Search

STORING DATA: DISK AND FILES

Hidden gems of PostgreSQL. Magnus Hagander. Redpill Linpro & PostgreSQL Global Development Group

In the CERTAINTY project, an application is defined as a network of independent processes with the following features:

ECE 2035 Programming HW/SW Systems Fall problems, 5 pages Exam Three 28 November 2012

PostgreSQL upgrade project. Zdeněk Kotala Revenue Product Engineer Sun Microsystems

OptimiData. JPEG2000 Software Development Kit for C/C++ Reference Manual. Version 1.6. from

Albis: High-Performance File Format for Big Data Systems

Terabytes of Business Intelligence: Design and Administration of Very Large Data Warehouses on PostgreSQL

AET60 BioCARDKey. Application Programming Interface. Subject to change without prior notice

Performance Enhancements In PostgreSQL 8.4

Query Processing: The Basics. External Sorting

CS201- Introduction to Programming Latest Solved Mcqs from Midterm Papers May 07,2011. MIDTERM EXAMINATION Spring 2010

Nubotics Device Interface DLL

Caching and Buffering in HDF5

Query Evaluation! References:! q [RG-3ed] Chapter 12, 13, 14, 15! q [SKS-6ed] Chapter 12, 13!

How to declare an array in C?

PusleIR Multitouch Screen Software SDK Specification. Revision 4.0

Query Optimizer MySQL vs. PostgreSQL

The MPI Message-passing Standard Practical use and implementation (VI) SPD Course 08/03/2017 Massimo Coppola

A few examples are below. Checking your results in decimal is the easiest way to verify you ve got it correct.

CSE 530A. B+ Trees. Washington University Fall 2013

P!"#r$%

Informatik Vertiefung Database Systems

Transcription:

GIN generalization Alexander Korotkov, Oleg Bartunov Work supported by Federal Unitary Enterprise Scientific- Research Institute of Economics, Informatics and Control Systems PostgreSQL developer meeting: GIN generalization 1

Presentation plan New storage (additional infromation + compression) Fast scan (skip parts of posting trees) Ordering in index Planner optimization Applications PostgreSQL developer meeting: GIN generalization 2

Additional information storage Posting list PostgreSQL developer meeting: GIN generalization 3

Additional information interface void config ( GinConfig *config ) Datum *extractvalue ( Datum itemvalue, int32 *nkeys, bool **nullflags, Datum *addinfo, bool *addinfoisnull ) bool consistent ( bool check[], StrategyNumber n, Datum query, int32 nkeys, Pointer extra_data[], bool *recheck, Datum querykeys[], bool nullflags[], Datum addinfo[], bool addinfoisnull[] ) PostgreSQL developer meeting: GIN generalization 4

ItemPointer typedef struct BlockIdData { uint16 bi_hi; uint16 bi_lo; } BlockIdData; typedef struct ItemPointerData { BlockIdData ip_blkid; OffsetNumber ip_posid; } 6 bytes! PostgreSQL developer meeting: GIN generalization 5

BlockIdData compression BlockNumber delta is a small number: use varbyte encoding PostgreSQL developer meeting: GIN generalization 6

OffsetNumber compression Offset is typically low: use varbyte encoding O0-O15 OffsetNumber bits N Additional information NULL bit PostgreSQL developer meeting: GIN generalization 7

New storage example PostgreSQL developer meeting: GIN generalization 8

GIN partial match Without additional information partial match beaves so: Save ItemPointers of all matching entries into bitmap Use bitmap instead of partial match entry PostgreSQL developer meeting: GIN generalization 9

GIN partial match with additional information Join additional information for all entries of each ItemPointer together? Datum joinaddinfo ( Datum addinfos[] ) Where to store it? It might be much more huge than bitmap. PostgreSQL developer meeting: GIN generalization 10

Fast scan entry1 && entry2 PostgreSQL developer meeting: GIN generalization 11

Fast scan: interface preconsistent: consistent which allows false positives bool precconsistent ( bool check[], StrategyNumber n, Datum query, int32 nkeys, Pointer extra_data[] ) PostgreSQL developer meeting: GIN generalization 12

Sorting in index Uses KNN infrastructure. Woks so: 1. Scan + calc rank 2. Sort 3. Return using gingettuple one by one It is right design at all? Should we do sorting in a separate node? PostgreSQL developer meeting: GIN generalization 13

Sorting in index: interface float8 calcrank ( bool check[], StrategyNumber n, Datum query, int32 nkeys, Pointer extra_data[], bool *recheck, Datum querykeys[], bool nullflags[], Datum addinfo[], bool addinfoisnull[] ) PostgreSQL developer meeting: GIN generalization 14

Planner optimization: before test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY slow_func(x,y) LIMIT 10; PLAN QUERY ----------------------------------------------------------------------- -------------------------------------------- Limit (cost=0.00..3.09 rows=10 width=16) (actual time=11.344..103.443 rows=10 loops=1) Output: x, y, (slow_func(x, y)) -> Index Scan using test_idx on public.test (cost=0.00..309.25 rows=1000 width=16) (actual time=11.341..103.422 rows=10 loops=1) Output: x, y, slow_func(x, y) Total runtime: 103.524 ms (5 rows) PostgreSQL developer meeting: GIN generalization 15

Planner optimization: after test=# EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM test ORDER BY slow_func(x,y) LIMIT 10; QUERY PLAN ----------------------------------------------------------------------- -------------------------------------------- Limit (cost=0.00..3.09 rows=10 width=16) (actual time=0.062..0.093 rows=10 loops=1) Output: x, y -> Index Scan using test_idx on public.test (cost=0.00..309.25 rows=1000 width=16) (actual time=0.058..0.085 rows=10 loops=1) Output: x, y Total runtime: 0.164 ms (5 rows) PostgreSQL developer meeting: GIN generalization 16

Applications Better FTS Array similarity search (better smlr) Positioned n-grams (better pg_trgm) Index on regexes (inverted task) PostgreSQL developer meeting: GIN generalization 17

Better FTS tsvector >< tsquery = 1/ts_rank(tsvector, tsquery) gin_tsvector_config set bytea additional information type gin_extract_tsquery extract compressed word positions as additional information gin_tsquery_pre_consistent evaluate tsquery ignoring NOTs gin_tsquery_relevance calculate >< PostgreSQL developer meeting: GIN generalization 18

Better FTS: query SELECT itemid, title FROM items WHERE fts @@ plainto_tsquery('russian', 'квартира арбат') ORDER BY fts >< plainto_tsquery('russian', 'квартира арбат') LIMIT 10; PostgreSQL developer meeting: GIN generalization 19

Better FTS: plan Limit (cost=40.00..80.22 rows=10 width=400) (actual time=1.559..1.5 Buffers: shared hit=1236 -> Index Scan using fts_idx on items (cost=40.00..7425.31 rows=1 Index Cond: (fts @@ '''квартир'' & ''арбат'''::tsquery) Order By: (fts >< '''квартир'' & ''арбат'''::tsquery) Buffers: shared hit=1236 Total runtime: 1.579 ms PostgreSQL developer meeting: GIN generalization 20

Avito.ru: testing Without patch With patch With patch without tsvector Sphinx Table size 6.0 GB 6.0 GB 2.87 GB - Index size 1.29 GB 1.27 GB 1.27 GB 1.12 GB Index build time Queries in 8 hours 216 sec 303 sec 718sec 180 sec* 3,0 M 42.7 M 42.7 M 32.0 M PostgreSQL developer meeting: GIN generalization 21

It flies! PostgreSQL developer meeting: GIN generalization 22

Thank you for attention! Questions? PostgreSQL developer meeting: GIN generalization 23