Oracle Database 18c Gentle introduction to Polymorphic Tables Functions with Common patterns and sample use cases
About me. Keith Laker Product Manager for Analytic SQL and Autonomous DW Oracle Blog: oracle-big-data.blogspot.com Twitter: @ASQLBarista @AutonomousDW Email: keith.laker@oracle.com 2
https://tinyurl.com/yb8wwqsz
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle s products may change and remains at the sole discretion of Oracle Corporation.
Session Agenda 1 2 3 4 5 6 7 Overview What is a Polymorphic Table Function? Server-side and Client-side interfaces Patterns and Use Cases for PTFs Taking a PTF apart line by line Restrictions and Other Considerations Summary 5
Session Agenda 1 Overview 6
Database 18c - All you need is.. #ThinkPolymorphic
18c Doc - PL/SQL Packages and Types Reference
Refresher - Oracle 9i User Defined Aggregate API
Session Agenda 2 What is a Polymorphic Table Function? 12
On-Going Evolution of Table Functions Table Function (TF): An application function which produces a set of rows which can be used in the FROM-clause of a query. Row types are determined and fixed at design time Complicated to design, especially for parallel execution Polymorphic Table Function (PTF): A TF which has its row type determined by the values of the actual parameters. A PTF is useful when the application wants to provide generic extensions which work for arbitrary input tables or queries. 13
3 Key Design Objectives for Polymorphic Table Functions 1. Simple algorithms should be easily expressible PTF specific code should be minimal. 2. The APIs should be few, powerful, regular, incrementally learnable, etc. 3. Query and PTF authors should not have to understand how the database parallelizes the execution of the PTF query PTF developer can simply assume serial execution Polymorphic Table Functions are simply an evolution of Table Functions 14
What is a Self-Describing/Polymorphic Table Function? ANSI SQL 2016: Definition Polymorphic Table Functions (PTF) are user-defined functions that can be invoked in the FROM clause. Capable of processing any table row type is not declared at definition time produces a result table whose row type may/may not be declared at definition time. Allows application developers to leverage the long-defined dynamic SQL Simple SQL access to powerful and complex custom functions. BLACK-BOX CREDIT RISK MODEL 15
Polymorphic Table Function Data source input must be single table Can be Row-Semantic or Table-Semantic Use Row Semantic when new columns can be determined by just looking at any single row (see livesql ECHO example) Use Table Semantic when the new columns can be determined by looking at current row + some state that summarizes previously processed rows (see livesql - ROWNUM example) 16
Key Interfaces for PTFs Client Interface DBMS_TF Server-Side Interface PTFs need various services from database to implement functionality. PTFs need mechanism to get rows from database and send back new rows. Package provides server + client interfaces utilities. Contains types, constants, and subprograms. Use DBMS_TF subprograms to consume, produce data, get information execution environment. 17
Session Agenda 3 Server-side and Client-side interfaces 18
Client Side Interfaces Optional called at start of execution. Optional called at end of execution. DESCRIBE OPEN FETCH_ROWS CLOSE Required Returns definition and structure of new row source. Can produce the associated new values (rows and columns). Compilation Optional For a given subset of rows it produces the associated new values (rows and/or columns). Execution 19
Purpose of DESCRIBE Function Determines type of rows produced Returns a DBMS_TF.DESCRIBE_T table (Server side interface) Invoked during SQL cursor compilation All argument values from calling query are passed to the DESCRIBE function. Like any PLSQL function, DESCRIBE function can be overloaded and have default values Indicates how columns are processed: Passed unchanged as output (Pass-Through columns) Columns that the PTF will use during computation (Read columns) Includes any instrumentation code 20
DESCRIBE Function - Example Code FUNCTION describe(tab IN OUT dbms_tf.table_t, cols IN dbms_tf.columns_t) RETURN dbms_tf.describe_t AS BEGIN RETURN dbms_tf.describe_t(new_columns => new_cols); END;
DESCRIBE Function - Example Code FUNCTION describe(tab IN OUT dbms_tf.table_t, cols IN dbms_tf.columns_t) RETURN dbms_tf.describe_t AS new_cols dbms_tf.columns_new_t; col_id PLS_INTEGER := 1; BEGIN What is columns_new_t? Collection of new columns TYPE COLUMNS_NEW_T IS TABLE OF COLUMN_METADATA_T INDEX BY PLS_INTEGER;
DESCRIBE Function - Example Code FOR i IN 1.. tab.column.count LOOP continue WHEN NOT dbms_tf.supported_type(tab.column(i).description.type); FOR j IN 1.. cols.count LOOP............ END LOOP; END LOOP; Loop through columns from input table Check to seeif column datatype is supported? Yes..then loop through columns passed as arguments Do some processing
Purpose of OPEN Function (Optional) OPEN procedure is generally invoked before calling the FETCH_ROWS procedure Initialize/allocate any execution specific variables Typically calls GET_XID function to get a unique ID for managing the execution state. Most useful when implementing a Table Semantics PTF Includes any instrumentation code 24
Purpose of FETCH_ROWS Function (Optional) Input to a (non-leaf) PTF is a single stream of rows, divided into arbitrary sized chunks of rows Each of these chunks is called a rowset Consume rows in input stream one rowset at a time (designated the active rowset) Only one rowset active at any time Produce the corresponding new columns (and new rows, if any). Each call to FETCH_ROWS must act upon the active rowset Can then either return or remain inside the FETCH_ROWS and request and process another rowset. Not mandatory to process additional rowsets: FETCH_ROWS can simply return after processing the current rowset Database might invoke FETCH_ROWS multiple times 25
Purpose of CLOSE Function (Optional) Called at the end of the PTF execution Releases resources associated with execution state Includes any instrumentation code 26
Creating a single package for multiple PTFs Multiple PTF implementations in same package Override the default runtime method names (OPEN, FETCH_ROWS, and CLOSE) with your own specific names. Specify the new method names using DBMS_TF METHOD_NAMES collection methods DBMS_TF.methods_t := DBMS_TF.methods_t(DBMS_TF.fetch_rows => 'Noop_Fetch'); RETURN DBMS_TF.describe_t(method_names => methods); 27
Server Side Interfaces PART 1 COLUMN_METADATA_T COLUMN_T TABLE_T COLUMNS_T COLUMNS_NEW_T TAB_<typ>_T ROW_SET_T Column metadata record Column descriptor record Table descriptor record Collection containing column names Collection for new columns Collection for each supported types, where <typ> is described in Supported Types Collections Data for a rowset record
Server Side Interfaces PART 2 GET_COL Procedure PUT_COL Procedure GET_ROW_SET Procedure PUT_ROW_SET Procedure SUPPORTED_TYPE Function GET_XID Function Fetches data for a specified (input) column Returns data for a specified (new) column Fetches the input rowset of column values Returns data for ALL (new) columns Verifies if a type is supported by DBMS_TF subprograms Returns a unique execution ID to index PTF state in a session
Server Side Interfaces PART 3 ROW_TO_CHAR Function SUPPORTED_TYPE Function TRACE Procedure XSTORE_ XXXX Procedures Returns the string representation of a row in a rowset Returns TRUE if a specified type is supported by PTF infrastructure Prints data structures to help development and problem diagnosis State management functions
Session Agenda 4 Patterns and Use Cases for PTFs 31
Basic Patterns for Polymorphic Tables Taking an existing rowset and Column-based EXPANSION Calculating/deriving a new column value Row-based EXPANSION Data pivot operation Column-based REDUCTION Data unpivot operation Row-based REDUTION Data aggregation/reduction operation No existing rowset to process Rowset GENERATOR Creates new rows and columns Importing a CSV file
Business Uses for Wrapping Bespoke Code Inside PTFs Example Use Case Path Analysis Discover patterns in rows of sequential data npath-type sequential processing for time series and pattern analysis Identify sessions from time series data Statistical Analysis High-performance processing of common statistical calculations Model to test strength of the relation between different columns Perform linear or logistic regression between output variable and set of input variables Relational Analysis Discover important relationships among data Build configurable groupings of related items from transaction records Find shortest path from a distinct node to all other nodes in a graph
Business Uses for Wrapping Bespoke Code Inside PTFs Example Use Case Text Analytics Derive patterns in textual data Bespoke text processing word counting find occurrences of words, identifies roots, track relative positions of words and/or multi-word phrases Multi-row textual analysis of data sets Cluster Analytics Discover natural groupings of data points Custom rules to clusters data into a specified number of groupings Bucket highly-dimensional items for cluster analysis Data Transformation Transform data for more advanced analysis Pivoting to extract/contract nested data for further analysis Multi-case processing to supports row matching for multiple cases
Session Agenda 5 Taking a PTF apart line by line for more code samples goto livesql.oracle.com 35
Basic PTF Examples on livesql.oracle.com ECHO any column Return rowset with ROWNUM Dynamic CSV Convertor Row semantic Table semantic Adding rows, reducing columns Adding columns, reducing rows
Session Agenda 5 Taking a PTF apart line by line Simple example: echoing/repeating columns 37
column EXPANSION pattern simple example Return all columns in input table tab Add new columns listed in cols argument New column names appended with "ECHO_". Data for new columns obtained from corresponding input columns, prefixed 'ECHO-'. SELECT * from ECHO(dept, columns(dname, loc)); DEPTNO DNAME LOC ECHO_DNAME ECHO_LOC ---------- -------------- ------------- -------------------- --------------- 10 ACCOUNTING NEW YORK ECHO-ACCOUNTING ECHO-NEW YORK 20 RESEARCH DALLAS ECHO-RESEARCH ECHO-DALLAS 30 SALES CHICAGO ECHO-SALES ECHO-CHICAGO 40 OPERATIONS BOSTON ECHO-OPERATIONS ECHO-BOSTON 38
Example: Multiple Tables and Query Arguments TABLE() operator is no longer required. Table names are passed in like regular (scalar) arguments. Query arguments are passed to PTF using a WITH-clause with t as ( select deptno, sum(sal) budget from emp natural join dept where dname in ( SALES, RESEARCH ) group by deptno) select * from ECHO(t, columns(deptno)); ; 39
Simple Example: Repeat Columns In Table Return all columns in input table Add new columns listed in cols argument New column names appended with "ECHO_". Data for new columns obtained from corresponding input columns, prefixed 'ECHO-'. SELECT * from ECHO(emp, columns(ename, job)) where deptno = 20; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ECHO_ENAME ECHO_JOB ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- --------------- --------------- 7369 SMITH CLERK 7902 17-DEC-80 800 20 ECHO-SMITH ECHO-CLER 7566 JONES MANAGER 7839 02-APR-81 2975 20 ECHO-JONES ECHO-MANA 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 ECHO-SCOTT ECHO-ANAL 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 ECHO-ADAMS ECHO-CLER 7902 FORD ANALYST 7566 03-DEC-81 3000 20 ECHO-FORD ECHO-ANAL 40
Defining the Implementation Package CREATE OR REPLACE PACKAGE echo_package AS -- @Required procedure Describe(-- Generic Arguments: newcols OUT DBMS_TF.columns_new_t, -- Specific Arguments: tab IN OUT DBMS_TF.table_t, cols IN DBMS_TF.columns_t); -- @Optional procedure Open; -- @Required procedure Fetch_Rows; -- @Optional procedure Close; end; 41
Defining The Polymorphic Table Function COLUMN ECHO CREATE OR REPLACE FUNCTION echo(tab table, cols columns) RETURN TABLE PIPELINED ROW POLYMORPHIC USING echo_package; 42
Defining The Polymorphic Table Function ROW_NUM CREATE FUNCTION row_num(tab TABLE, ini NUMBER DEFAULT 1, inc NUMBER DEFAULT 1) RETURN TABLE PIPELINED TABLE POLYMORPHIC USING row_num_p; 43
Details Of DESCRIBE Procedure for ECHO PTF CREATE OR REPLACE PACKAGE BODY echo_package AS PROCEDURE Describe( -- Generic Arguments: newcols OUT DBMS_TF.columns_new_t, -- Specific Arguments: tab IN OUT DBMS_TF.table_t, cols IN DBMS_TF.columns_t) as read_count pls_integer := 0; begin... end; 44
Details Of DESCRIBE Procedure for ECHO PTF /* Mark specified columns FOR_READ and create corresponding new column */ newcols := DBMS_TF.columns_new_t(); newcols.extend(cols.count); for i in 1.. cols.count loop for j in 1.. tab.count loop if (cols(i) = tab(j).description.col_name) then tab(j).for_read := TRUE; newcols(i) := tab(j).description; newcols(i).col_name := 'ECHO_' newcols(i).col_name; 45
Output From Execution of DESCRIBE Describe()...Read_Column[1] = ENAME...Read_Column[2] = JOB 46
Details Of OPEN Procedure Tracing Information PROCEDURE Open as env DBMS_TF.env_t := DBMS_TF.Get_Env(); begin DBMS_TF.Trace('Open()'); DBMS_TF.Trace('Get_Col.Count = ' env.get_columns.count, prefix => '...'); DBMS_TF.Trace('Put_Col.Count = ' env.put_columns.count, prefix => '...'); end; OPEN will include any SESSION STATE code required by the PTF! 47
Output From Execution of OPEN Open()...Get_Col.Count = 2...Put_Col.Count = 2 48
Details Of Describe Procedure for ECHO PTF PROCEDURE Fetch_Rows as Col DBMS_TF.tab_varchar2_t; col_count pls_integer := DBMS_TF.Get_Env().get_columns.count; begin... end; 49
Details Of FETCH_ROWS Procedure for ECHO PTF begin col DBMS_TF.tab_varchar2_t; col_count pls_integer := DBMS_TF.Get_Env().get_columns.count; /* Get each input columns, in-place update its values, and use it as the new column */ for c in 1.. col_count loop DBMS_TF.Get_Col(c, col); -- Get the column 'c /* Modify the fetched column values */ for i in 1.. col.count loop col(i) := 'ECHO-' col(i); end loop; DBMS_TF.Put_Col(c, col); -- Set the column 'c' end loop; 50
Output From Execution of FETCH_ROWS Fetch_Rows()...Col1[1] = SMITH...Col1[2] = JONES...Col1[3] = SCOTT...Col1[4] = ADAMS...Col1[5] = FORD...Col2[1] = CLERK...Col2[2] = MANAGER...Col2[3] = ANALYST...Col2[4] = CLERK...Col2[5] = ANALYST Close() 51
Using A Polymorphic Table: Echoing Varchar2 Columns SELECT * FROM ECHO(emp, COLUMNS(ename, job)) WHERE deptno = 20; EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ECHO_ENAME ECHO_JOB ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- --------------- --------------- 7369 SMITH CLERK 7902 17-DEC-80 800 20 ECHO-SMITH ECHO-CLER 7566 JONES MANAGER 7839 02-APR-81 2975 20 ECHO-JONES ECHO-MANA 7788 SCOTT ANALYST 7566 19-APR-87 3000 20 ECHO-SCOTT ECHO-ANAL 7876 ADAMS CLERK 7788 23-MAY-87 1100 20 ECHO-ADAMS ECHO-CLER 7902 FORD ANALYST 7566 03-DEC-81 3000 20 ECHO-FORD ECHO-ANAL 52
Session Agenda 5 Taking a PTF apart line by line Explain plans and database feature usage 53
Explain Plan for Polymorphic Table ECHO EXPLAIN PLAN FOR SELECT * FROM ECHO(emp, COLUMNS(ename, job)) WHERE deptno = 20; ------------------------------------------------------------------------------------- Id Operation Name Rows Bytes Cost (%CPU) Time ------------------------------------------------------------------------------------- 0 SELECT STATEMENT 5 500 2 (0) 00:00:01 1 VIEW 5 500 2 (0) 00:00:01 2 POLYMORPHIC TABLE FUNCTION ECHO 3 VIEW 5 435 2 (0) 00:00:01 * 4 TABLE ACCESS FULL EMP 5 435 2 (0) 00:00:01 ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter("emp"."deptno"=20) Note ----- - dynamic statistics used: dynamic sampling (level=2) 54
Explain Plan for Parallel Execution of PTF - ECHO ALTER TABLE emp PARALLEL 2; EXPLAIN PLAN FOR SELECT * FROM ECHO(emp, COLUMNS(ename, job)) WHERE deptno = 20; ------------------------------------------------------------------------------------------- Id Operation Name Rows Bytes Cost (%CPU) Time ------------------------------------------------------------------------------------------- 0 SELECT STATEMENT 5 500 2 (0) 00:00:01 1 PX COORDINATOR 2 PX SEND QC (RANDOM) :TQ10000 5 500 2 (0) 00:00:01 3 VIEW 5 500 2 (0) 00:00:01 4 POLYMORPHIC TABLE FUNCTION ECHO 5 VIEW 5 435 2 (0) 00:00:01 6 PX BLOCK ITERATOR 5 435 2 (0) 00:00:01 * 7 TABLE ACCESS FULL EMP 5 435 2 (0) 00:00:01 ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - filter("emp"."deptno"=20) Note ----- - dynamic statistics used: dynamic sampling (level=2) 55
Explain Plan Showing IMCDTs for PTF ECHO EXPLAIN PLAN FOR WITH e AS (SELECT /*+ MATERIALIZE */ * FROM emp) SELECT * FROM ECHO(e, COLUMNS(ename, job)) WHERE deptno = 20; ---------------------------------------------------------------------------------------------------------------------- Id Operation Name Rows Bytes Cost (%CPU) Time ---------------------------------------------------------------------------------------------------------------------- 0 SELECT STATEMENT 14 1400 4 (0) 00:00:01 1 TEMP TABLE TRANSFORMATION 2 LOAD AS SELECT (CURSOR DURATION MEMORY) SYS_TEMP_0FD9D6612_276EFC 3 TABLE ACCESS FULL EMP 14 1218 2 (0) 00:00:01 4 VIEW 14 1400 2 (0) 00:00:01 5 POLYMORPHIC TABLE FUNCTION ECHO 6 VIEW 14 1218 2 (0) 00:00:01 * 7 VIEW 14 1218 2 (0) 00:00:01 8 TABLE ACCESS FULL SYS_TEMP_0FD9D6612_276EFC 14 1218 2 (0) 00:00:01 ---------------------------------------------------------------------------------------------------------------------- 56
Explain Plan Showing Use Of Results Cache For PTF ECHO EXPLAIN PLAN FOR WITH e AS (SELECT /*+ result_cache */ * FROM echo(emp, COLUMNS(ename, job))) SELECT * FROM e WHERE deptno = 20; ------------------------------------------------------------------------------------------------------------- Id Operation Name Rows Bytes Cost (%CPU) Time ------------------------------------------------------------------------------------------------------------- 0 SELECT STATEMENT 14 1400 2 (0) 00:00:01 * 1 VIEW 14 1400 2 (0) 00:00:01 2 RESULT CACHE df9wucm9ak4br4mdpt7t2z1xv8 3 VIEW 14 1400 2 (0) 00:00:01 4 POLYMORPHIC TABLE FUNCTION ECHO 5 VIEW 14 1218 2 (0) 00:00:01 6 TABLE ACCESS FULL EMP 14 1218 2 (0) 00:00:01 ------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("deptno"=20) Result Cache Information (identified by operation id): ------------------------------------------------------ 2 - column-count=10; dependencies=(scott.emp, SCOTT.ECHO_PACKAGE, SCOTT.ECHO_PACKAGE, SCOTT.ECHO); attributes=(dynamic); name="select /*+ result_cache */ * from ECHO(emp, columns(ename, job))" 57
Nesting calls to PTFs NOT ALLOWED SELECT * FROM CHANGE_CASE(GET_COL(scott.emp, 'varchar2'), 'initcap'); * ERROR at line 1: ORA-62569: nested polymorphic table function is disallowed Solution: is to use WITH clause: WITH T AS (SELECT * FROM GET_COL(scott.emp, 'varchar2')) SELECT * FROM CHANGE_CASE(T, 'initcap'); 58
When is a PTF not a PTF When It s a DESCRIBE-only PTF? Describe only" PTF does not require the run-time procedures (open/fetch_rows/close) create or replace package GET_COL_P as function Describe(tab IN OUT DBMS_TF.Table_t, type_name varchar2, flip varchar2 DEFAULT 'True') return DBMS_TF.describe_t; function GET_COL(tab table, type_name varchar2, flip varchar2 DEFAULT 'True') return table pipelined ROW POLYMORPHIC using GET_COL_P; end GET_COL_P; / 59
When is a PTF not a PTF When It s a DESCRIBE-only PTF? create or replace package body GET_COL_P as function Describe(tab IN OUT DBMS_TF.table_t, type_name varchar2, flip varchar2 DEFAULT 'True') return DBMS_TF.describe_t as typ constant varchar2(1024) := upper(ltrim(rtrim(type_name))); begin for i in 1.. tab.column.count() loop tab.column(i).pass_through := case upper(substr(flip,1,1)) when 'F' then DBMS_TF.Column_Type_Name(tab.column(i).description)!= typ else DBMS_TF.Column_Type_Name(tab.column(i).description)= typ end /* case */; end loop; return null; end; end GET_COL_P; 60
When is a PTF not a PTF When It s a DESCRIBE-only PTF? Use the GET_COL PTF to report for employees JOB is either ANALYST or PRESIDENT only columns whose type is not VARCHAR2. SELECT * FROM GET_COL(scott.emp, 'varchar2') WHERE job IN ('ANALYST','PRESIDENT') -------------------------------------------------------------------------- Id Operation Name Rows Bytes Cost (%CPU) Time -------------------------------------------------------------------------- 0 SELECT STATEMENT 3 (100) * 1 TABLE ACCESS FULL EMP Describe-only PTF doesn't have any runtime procedures, no need to allocate PTF row-source. 61
Session Agenda 6 Restrictions and Other Considerations 62
PTF Restrictions Cannot be nested in FROM clause of a query. Nesting PTF is only allowed using WITH clause. PTF cannot be specified as an argument of a table function - no nesting. Cannot select a rowid from a Polymorphic Table Function (PTF). PARTITION BY - ORDER BY clauses can only apply to Table Semantics PTF Execution methods OPEN, FETCH_ROWS, and CLOSE must be invoked in execution context only. You cannot invoke the DESCRIBE method directly.
Can a PTF execute in parallel? Both Row and Table Semantic PTFs are parallelized but For ROW semantic PTFs Query executes with same DOP as it would if PTF were not present i.e. DOP is driven by the child row source. For TABLE semantic PTFs Requires input table rows to be redistributed using PARTITION BY key DOP determined by PARTITION BY clause 64
Session Agenda 7 Summary
Key Benefits of Polymorphic Table Functions Automatic Parallelization 100% Processing In-database Parallelization is must-have for bulk-data processing Automatically parallelizes data processing, no special code required Simpler to design, build and deploy In-Database means processing co-located with data Eliminates need to move data to separate processing engine Simpler integration with existing and future performance optimizations Extend In-Database Functions Enhance built-in analytics by incorporating bespoke business rules Extend analytic features by adding new functionality Simplifies SQL for non-technical users Simplifies sophisticated, complex SQL Brings the power of complex analytics to anyone with SQL skills
Safe Harbor Statement The preceeding was intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle s products may change and remains at the sole discretion of Oracle Corporation.