FUN WITH ANALYTIC FUNCTIONS UTOUG TRAINING DAYS 2017
ABOUT ME Born and raised here in UT In IT for 10 years, DBA for the last 6 Databases and Data are my hobbies, I m rather quite boring This isn t why you re here though
ANALYTIC FUNCTIONS SAY WHAT? Analytic Functions compute a value based upon a subset of the rows in a query result The subset it referred to as the partition Unrelated to table partitioning The best way to understand these functions is to compare them to standard Aggregate functions (SUM, MIN, MAX, etc.)
AGGREGATE VS. ANALYTIC The Data Aggregate AVG Analytic Function AVG
41 FLAVORS 41 different Analytic Functions Positional (FIRST, LAST, ROW_NUMBER, LEAD, LAG, RANK, etc.) Statistical (CORR, REG_R, N_TILE, STDDEV, etc.) Aggregate (SUM, AVG, MIN, MAX, etc.) Pattern Matching (Find patterns, like V shaped dips in stock ticker data) ListAgg
SAMPLES! Samples based on SCOTT schema View -> Snippets
THE SYNTAX It s not as complicated as it looks
QUICK EXAMPLES FUNCTION(<field a>) OVER (PARTITION by <field b>) The Data Analytic Function AVG select ename, job, deptno, avg(sal)over (partition by deptno) avg_sal_by_deptno, sal, sal/(avg(sal) over (partition by deptno)) pct_of_average from scott.emp order by deptno desc;
MIX N MATCH select ename, job, deptno, avg(sal)over (partition by deptno) avg_sal_by_deptno, sal, sal/(avg(sal) over (partition by deptno)) pct_of_average from scott.emp order by deptno desc; select ename, job, deptno, min(sal) over (partition by deptno) min_sal_by_deptno, sal, sal/(min(sal) over (partition by deptno)) pct_of_min from scott.emp order by deptno desc;
REAL LIFE
C-LEVEL ASKS EASY QUESTION Can you tell me the order that accounts were opened in? Can you give me an ordinal number (1st, 2nd, 3rd)? row_number() over (partition by acct order by acct_open_date)
WHAT ABOUT WHEN TWO SUB ACCOUNTS ARE OPENED ON THE SAME DAY, CAN YOU MAKE THOSE BE THE SAME? Original Query row_number() over (partition by acct order by acct_open_date) dense_rank() over (partition by acct order by acct_open_date) rank() over (partition by acct order by acct_open_date)
CAN YOU TELL ME HOW LONG IT TAKES BETWEEN ONE ACCOUNT AND ANOTHER? LEAD LAG lag(acct_open_date) over (partition by acct order by acct_open_date) acct_open_date - lag(acct_open_date) over (partition by acct order by acct_open_date)
WHAT SHE REALLY WANTED I just need the sequence patterns, in general This uses LISTAGG
LISTAGG LISTAGG(<string to concatenate>, <concatenator> within group (order by <field>) LISTAGG(job, ' -> ') within group (order by hiredate)
NOT GOOD ENOUGH Analytic Functions can t go in a GROUP BY Clause Can you order those by how common each pattern is? Sure? SELECT, DISTINCT listagg(acct_description, ' -> ') WITHIN GROUP (order by ACCT_OPEN_DATE) count(distinct listagg(acct_description,' -> ') WITHIN GROUP (order by ACCT_OPEN_DATE)) pattern_observance_count
DON T PUT YOUR AF S WHERE THEY DON T BELONG Use a subquery to get around this select deptno,avg_sal_by_deptno,sal,pct_of_average select from ( select deptno, avg(sal)over deptno, select (partition by deptno) avg_sal_by_deptno, avg(sal)over (partition by deptno) sal, avg_sal_by_deptno, avg(sal)over (partition by deptno) sal/(avg(sal) sal, over (partition by deptno)) pct_of_average sal/(avg(sal) sal, over (partition by deptno)) from pct_of_average scott.emp sal/(avg(sal) over (partition by from scott.emp >1 from scott.emp order by deptno desc; avg_sal_by_deptno, deptno)) where sal/(avg(sal) pct_of_average over (partition by deptno)) order by deptno order desc; by deptno desc ) where pct_of_average >=1
GETTING ROLLED Can you tell me the transactions an account has done? Can you sum the Amounts?
NO, COULD YOU SUM UP THE AMOUNTS FOR EACH MONTH, BUT DON'T HIDE THE TRANSACTION DETAILS? Original Data sum(amount) sum(amount)over (partition by trunc(business_date,'mm'), acct_num) monthly_total
COULD YOU BREAK IT OUT BY THE TYPE OF TRANSACTION IT WAS? DEBIT VS. CREDIT? sum(amount)over (partition by trunc(business_date,'mm'), acct_num) monthly_total sum(amount)over (partition by trunc(business_date,'mm'), acct_num,tran_type) monthly_total Different partition => different total Same partition => same total Nulls treated together
COULD YOU MAKE A ROLLING SUM TOO, BROKEN OUT THE SAME WAY? sum(amount)over (partition by trunc(business_date,'mm'),acct_num,tran_type) monthly_total, sum(amount) over ( partition by trunc(business_date,'mm'),acct,suffix,tran_type order by acct_seq_num) rolling_monthly_total
PERFECT, BUT COULD YOU EXCLUDE THE CURRENT TRANSACTION FROM THE ROLLING MONTHLY TOTAL? sum(amount)over (partition by trunc(business_date,'mm'), acct_num,tran_type) monthly_total, sum(amount) over ( partition by trunc(business_date,'mm'),acct,suffix,tran_type order by acct_seq_num) rolling_monthly_total, sum(amount) over ( partition by trunc(business_date,'mm'),acct,suffix,tran_type ROWS BETWEEN UNBOUNDED PRECEDING and 1 PRECEDING ) roll_mnthly_tot_excl_cur_tran
ROWS AND RANGE SUB PARTITIONS ROWS BETWEEN UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING ROWS BETWEEN UNBOUNDED PRECEDING and X PRECEDING ROWS is number of Rows RANGE is a numeric or date range PRECEEDING is before the current row FOLLOWING is after the current row
SIMPLE EXAMPLE lead(row_number) over (partition by 'X' order by row_number) next_number, first_value(row_number) over (partition by 'X' order by row_number rows between 2 FOLLOWING and 3 FOLLOWING) number_after_the_next_number, sum(row_number) over (partition by 'X' order by row_number rows between 1 FOLLOWING and 2 FOLLOWING) sum_of_next_2_nums, sum(row_number) over (partition by 'X' order by row_number rows between 1 FOLLOWING and UNBOUNDED FOLLOWING) sum_nums_from_this_to_the_end, sum(row_number) over (partition by 'X' order by row_number rows between 1 PRECEDING and 1 FOLLOWING) sum_nums_1_before_to_1_after
FILLING HOLES Can you tell me a drawer s end of day totals are each day? Lots of missing days How can we fill in those gaps?
LET S GET THE NEXT USED DATE ON EACH ROW lead(branch_date) over (partition by branch_code,cashbox_id order by branch_date) next_used_date Lets fix this null
AF S CAN BE USED ALMOST ANYWHERE case when lead(branch_date) over (partition by branch_code,cashbox_id order by branch_date)is null then branch_date else lead(branch_date) over (partition by branch_code,cashbox_id order by branch_date) end next_used_date,
NULLS FIXED! Before After But we still have gaps
JOIN THIS TO A CALENDAR Begin Date 20161101* to_date('20161101','yyyymmdd') SELECT to_date('20161101','yyyymmdd')+ ROWNUM -1 calendar_date FROM ( SELECT 1 just_a_column FROM dual CONNECT BY LEVEL <= (10000) Some big number larger than how far you want to go back. This would calculate out the End Date
JOINING TO A CALENDAR 20161115 is between 20161115 and (20161116-1) 20 th is missing, but 20161120 is between 20161119 and (20161121 1) WHERE calendar_date BETWEEN branch_date and next_used_date-1
FILLED GAPS THANKS TO AN AF Before After
HOW BIG IS THAT CANYON? Department wanted to know details of accounts going negative 1500 1000 500 They wanted to know how deep and how wide the canyon was when looking at a daily history of account balances How wide? End Time? Start Time? 0-500 -1000-1500 How deep? -2000
USE PATTERN MATCHING (12C) The Data The Result 1500 1000 500 0-500
THINGS YOU CAN DO WITH IT: Find V, W and other patterns in Stock Prices Find timeframes of high database use Group clicks in web logs into sessions Detect traversal patterns of Finite State Machines We won t go much deeper but look into these, they re neat!
NOT COMPLICATED, JUST INVOLVED Used wherever you can put data into a line graph, i.e. data is a log of events Lots of great resources: Ask Tom - http://www.oracle.com/technetwork/issue-archive/2013/13-nov/o63asktom-2034271.html GitHub - https://github.com/oracle/analytical-sql-examples/tree/master/pattern-matching Burleson - http://www.dba-oracle.com/t_sql_match_recognize.htm YouTube has some good demos too
AF PERFORMANCE? Keep an eye on performance these do lots of sorts Try to use indexes, filter your data before applying analytic functions Sometimes AF s can help improve performance, other times it can reduce it Tom Kyte says: In general, analytics are great for answering "really big" questions or questions against "small sets" https://asktom.oracle.com/pls/apex/f?p=100:11:0::::p11_question_id:1137250200346660664
QUESTIONS?