A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins

Size: px

Start display at page:

Download "A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins"

Bruno Manning
5 years ago
Views:

1 A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins Panagio;s Bouro and Nikos Mamouli 1 Aarhus University, Denmark 2 University of Ioannina, Greece

2 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Helen Tom rd Interna;onal Conference on Very Large Data Base

3 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Jane John Mary Bob Hugo Helen Tom Helen year Tom rd Interna;onal Conference on Very Large Data Base

4 Interval Joins employee start end employee start end John Mary John Jane Jane Mary Bob Bob Hugo Helen Hugo Tom Helen Tom year Find all pair of employees whose period of work on D1 and D2 intersect 43rd Interna;onal Conference on Very Large Data Base

5 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Helen Tom Jane John Mary Bob Hugo Helen Tom Find all pair of employees whose period of work on D1 and D2 intersect Applica;ons Temporal databases Mul;dimensional data management Uncertain data management year 43rd Interna;onal Conference on Very Large Data Base

6 Our focus Efficient evalua;on of interval joins Single-threaded processing Simple plane sweep based method Compe;;ve to state-of-the-art Parallel processing Par;;oning-based join Share nothing 43rd Interna;onal Conference on Very Large Data Base

7 SINGLE-THREADED PROCESSING 43rd Interna;onal Conference on Very Large Data Base

8 Related work Nested loops & Sort-merge join [Segev and Gunadhi, VLDB 89] [Gunadhi and Segev, ICDE 91] Index-based [Zhang et al., ICDE 02] [Enderle et al., SIGMOD 04] ParCConingbased [Soo et al., ICDE 94] [Dignös et al., SIGMOD 14] [Cafagna and Böhlen, VLDBJ 17] Plane-sweep based [Brinkhoff al., SIGMOD 93] [Arge et al., VLDB 98] [Piatov et al., ICDE 16] 43rd Interna;onal Conference on Very Large Data Bases 4

9 Related work Nested loops & Sort-merge join [Segev and Gunadhi, VLDB 89] [Gunadhi and Segev, ICDE 91] Index-based [Zhang et al., ICDE 02] [Enderle et al., SIGMOD 04] ParCConingbased [Soo et al., ICDE 94] [Dignös et al., SIGMOD 14] [Cafagna and Böhlen, VLDBJ 17] Plane-sweep based [Brinkhoff al., SIGMOD 93] [Arge et al., VLDB 98] [Piatov et al., ICDE 16] 43rd Interna;onal Conference on Very Large Data Bases 4

10 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast [Piatov et al., ICDE 16] Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed R + S + R S comparisons in total [Brinkhoff et al., SIGMOD 93] 43rd Interna;onal Conference on Very Large Data Bases 5

11 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed Cons R + S + R S comp Pros ü Simple ü No special structure needed [Piatov et al., ICDE 16] [Brinkhoff et al., SIGMOD 93] R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 5

12 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed Cons R + S + R S comp Pros ü Simple ü No special structure needed [Piatov et al., ICDE 16] [Brinkhoff et al., SIGMOD 93] R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 5

13 Op;mizing FS FS s scanned area s scanned area 43rd Interna;onal Conference on Very Large Data Bases 6

14 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) 43rd Interna;onal Conference on Very Large Data Bases 6

15 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area Sort by end G R s scanned area 43rd Interna;onal Conference on Very Large Data Bases 6

16 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) 43rd Interna;onal Conference on Very Large Data Bases 6

17 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) FS grouping bucke;ng Sort by end G R s scanned area Sort by end G R s scanned area {s BI S 1 } {, } {, } { } {, } {, } 43rd Interna;onal Conference on Very Large Data Bases 6

18 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) FS grouping bucke;ng Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) {s BI S 1 } {, } {, } { } {, } {, } 43rd Interna;onal Conference on Very Large Data Bases 6

19 PARALLEL PROCESSING 43rd Interna;onal Conference on Very Large Data Bases 7

20 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins S k 43rd Interna;onal Conference on Very Large Data Bases 8

21 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins Pros ü Simple ü Load balancing S k 43rd Interna;onal Conference on Very Large Data Bases 8

22 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins S k Pros ü Simple ü Load balancing Cons Domain-point comparisons rise 2. k. ( R + S ) for EBI/LEBI, k. ( R + S ) for FS Degree of parallelism n CPU cores à k = n par;;ons 43rd Interna;onal Conference on Very Large Data Bases 8

23 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins 43rd Interna;onal Conference on Very Large Data Bases 9

24 Domain-based par;;oning Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins 43rd Interna;onal Conference on Very Large Data Bases 9

25 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on Cons Replica;on Load balancing 43rd Interna;onal Conference on Very Large Data Bases 9

26 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on Cons Replica;on Load balancing 43rd Interna;onal Conference on Very Large Data Bases 9

27 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i (6) (7) (8) (9) R i S i R i S i R i S i R i S i 43rd Interna;onal Conference on Very Large Data Base0

28 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i (6) (7) (8) (9) R i S i R i S i R i S i R i S i Duplicate results elimina;on 43rd Interna;onal Conference on Very Large Data Base0

29 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i Same complexity as original join Sweep only one input No comparisons cross product (6) (7) (8) (9) R i S i R i S i R i S i R i S i Duplicate results elimina;on 43rd Interna;onal Conference on Very Large Data Base0

30 Greedy scheduling Idea Distribute (k-1) mini-joins to different cores Evenly distribute load à minimize max load NP-hard problem Greedy approxima;on algorithm 43rd Interna;onal Conference on Very Large Data Base1

31 Greedy scheduling Idea Distribute (k-1) mini-joins to different cores Evenly distribute load à minimize max load NP-hard problem Greedy approxima;on algorithm Core 1 Core 2 Core n 43rd Interna;onal Conference on Very Large Data Base1

32 Adap;ve par;;oning Idea Create an ini;al uniform par;;oning Employ a very fine ;ling granules Move load between neighboring ;les Move granules between neighboring ;les Reposi;on borders of ;les 43rd Interna;onal Conference on Very Large Data Base2

33 Adap;ve par;;oning Idea Create an ini;al uniform par;;oning Employ a very fine ;ling granules Move load between neighboring ;les Move granules between neighboring ;les Reposi;on borders of ;les t 1 t 2 t n t 1 t 2 t n 43rd Interna;onal Conference on Very Large Data Base2

34 EXPERIMENTAL ANALYSIS 43rd Interna;onal Conference on Very Large Data Base3

Setup Execution time [secs] 900 800 700 600 500 400 0 0 0 0 FS gfs bgfs

25 0.5 0.75 1 Hardware dual -core Intel(R) Xeon(R) CPU E5-2687W v3 @ 3.

Workload [ICDE 16] à XOR of start Loop unrolling forced, OpenMP for

unchanged BOOKS Aarhus libraries, interval = period of ;me book lent Synthe;c

plus peaks Experiments Execu;on ;me, # comparisons, memory footprint Both

35 Setup Execution time [secs] FS gfs bgfs Speedup [x] theoretical h-bgfs d-bgfs HT 36 HT Hardware dual -core Intel(R) Xeon(R) CPU E5-2687W 3. GHz with 128 GBs of RAM Hyper-threading enabled, up to 40 threads Sooware Workload [ICDE 16] à XOR of start Loop unrolling forced, OpenMP for mul;-threading Datasets WEBKIT git repo, interval = period of ;me file unchanged BOOKS Aarhus libraries, interval = period of ;me book lent Synthe;c Interval dura;on follows exponen;al distribu;on, uniformly distributed start plus peaks Experiments Execu;on ;me, # comparisons, memory footprint Both self joins and non-self joins Vary R / S, # cores (threads) 43rd Interna;onal Conference on Very Large Data Base4

36 Op;mizing FS Execution time [secs] FS gfs bgfs Endpoint comparisons [%] FS gfs bgfs R / S [1 core] R / S [1 core] WEBKIT WEBKIT 43rd Interna;onal Conference on Very Large Data Base5

37 WEBKIT BOOKS atomic/uniform static/fixed static/ mj+static/fixed mj+greedy mj+static/ mj+gre 25 static/ada mj+greedy 0.75 Execution time [secs] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Execution [secs] Execution timetime [secs] 000 Avg idle time ratio Avg idle time rati Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] DIP bgfsdip bgfs DIP OIP bgfs LEBI OIP LEBI OIP LEBI DIP 000 bgfs DIP 000bgFS 80 OIP LEBI OIP LEBI static/fixed 70 mj+static/fixed (a) R / S [1 core] (b) R / S [1 core] 0.25 Execution time [secs] Execution time [secs] Single-threaded processing static/ada (a) R (c) R (a) R / S [1(a)core] R / S 0 Figure 6: 1 Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] 80 F static/fixed mj+gre 0.6 R / S [1 core] R / S [1 core] 70 mj+static/fixed mj+greedy GREEND WEBKIT d [Cafagna and Bölhen, VLDBJ 17] 0 DIP: [Cafagna and Bölhen, VLDBJ 17] databases. Similar to planq databases. Simila OIP: [Dignös et al., SIGMOD 14] 15 quire the two input collec 0.2 c quire the two inp 43rd Interna;onal Conference on Very Large Data Bases 16 computation is sub-optimt 0.1 computation is su tees at most R + S en tees 0 at most R p produce results

38 Avg idle ti Avg idle time ratio Execution time [secs] Avg idle time r Avg idle time rati Execution time [secs] Avg idle tim Avg idle time r Avg idle time ratio Avg idle time rati (a) R / S 0[1 core] HT(b) R / (a) R / S [1 core] (a) R / S (a) [1 R / S core] [1 core] (b) R / S (b) [1 core] static/fixed mj+greedy/fixed 80 R / S [1 core] 00 static/fixed bgfsdip bgfs static/fixed mj+greedy/fixed mj+static/fixed mj+greedy/adaptive static/fixed static/fixed bgfs LEBI LEBI OIP LEBI bgfs atomic/uniform mj+atomic/uniform atomic/adaptive mj+static/fixed mj+greedy/adaptive static/fixed mj+greedy/fixed 25 static/fixed 00 static/fixed mj+greedy/fixed static/fixed LEBI mj+static/fixed mj+greedy/adaptive static/fixed static/fixed mj+greedy/uniform mj+greedy/adaptive mj+static/fixed mj+greedy/adaptive static/fixed mj+greedy/fixed static/fixed 60 mj+static/fixed mj+greedy/adaptive [1 core] (c) R / S HT (d) R / 0 [1 core] (c) R / S HT 36HT 9 25HT [1 core]36ht (c) [1 core] 16 (d) R / S (b) R / S [1 core] R / S 0.75(c) R / S [1 core] 4 (d) R / S [1 core] 5 Figure 6: Optimizing the -based partiti (a) R / S [1 core] (b) R / S [1 core] BOOKS (a) R / S [16 cores] [ R = S ]the -based partitioning paradigm: bg Figure(b)6:# cores Optimizing 000 Figure 6: Optimizing the 80 -based partitioning paradigm: bgfs on WEBKIT 080-based 0 80 Figure 6: Optimizing the partitioning paradigm: bgfs on WEBKI static/fixed mj+greedy/fixed 4 9 static/fixed 16 25HT 36HT static/fixed mj+static/fixed mj+greedy/adaptive [16 cores] 60 R / S coressweep, merge join algorithms re databases. Similar to #plane [16] plane databases. Similar to sweep, merge join algorithms re[16] models each interval 50 0 quire theretwo input collections to interval be sorted, however, join(r.start,r.e divr databases. Similar to plane sweep, merge join algorithms [16] models each r as a 2D point quire the two input collections to be sorted, join to divides the points spa( 7040 algorithms databases. Similar to plane sweep, merge join re- however, [16] models each interval r as a 2Dinto point computation is sub-optimal compared FS, which guaranone however, quire the two to be sorted, join divides the points into spatial regions. Again, a part input collections computation is sub-optimal which guaranoneinto collection should be quire the two input collections to be tees sorted, however, divides the points spatial regions. Agj compared 60guaranat most R tojoin +FS, S collection endpoint comparisons that domultiple not the 60 computation is sub-optimal compared to FS, which one should be joined with parti tees at most R + S endpoint comparisons that do not the other collection. [Note computation is sub-optimal compared to 50 FS, which guaranone collection should be joined with mult that produce results. comparisons do not the other collection. [Note: remove this one if wespa run 50 tees at most R + S endpoint 1 produce results. space] 5 tees at most R + S endpoint comparisons that do not the other collection. [Note: remove this on produce results. space] 25HT 36HT algorithms produce0.25 results. space] Index-based Enderle et al. [7] propose interme 0 (d) R / S [core] Index-based algorithms. Enderle et al. [7] propose intermethods based on pla (c) R / S cores] 1 et al. [7] propose (d) #join cores [ R 4= S ] 9 operate 16on Index-based val algorithms, which RI-trees [14] HT that Inte [16Enderle HT sweep. algorithms. intermethods based ontwo plane The Endpoin Execution time [secs] Execution time [secs] Execution timeexecution [secs] time [secs] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Avg idle time ratio [%] GREEND [4] Avg idle time ratio [%] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] Op;mizing Domain-based Paradigm val join algorithms, which operate on two RI-trees [14] that Interval (EBI) Join is the m Index-based algorithms. Enderle etindex al.[7]the propose Methods based on plane sweep. The input intercollections. Zhang etisal. [23] focus on approach findside (c) algorithms, R / S [1 core] (d) R / S [1Join core] whichindex operate on two RI-trees [14] that Interval (EBI) the most recent and val join the input collections. Zhang [14] et al. [23] focus on findsider it tomost be the state-of-t val join algorithms, which operate on two RI-trees that Interval (EBI) Join is the recent appr ing pairs of records in a temporal database that intersect in Sec 7: Optimizing the -based partitioning: bgfsit on input collections. Zhang al. [23]in focus on find-database sider be the state-of-the-art. EBI (reviewed in d or single-threaded processing index thefigure pairs ofetrecords a temporal thatto intersect in be thesection 2.1) is an EBI efficient index the input ing collections. Zhang et al. [23] focus on findsider it to state-of-the-art. (rev the (key, time) space (i.e., a problem similar to that studwhi ing pairs of records in a temporal database that intersect in Section 2.1) is an efficient implementation of plane 6:WEBKIT Optimizing the partitioning paradigm: on WEBKIT the -based (key, time) space (i.e.,0that a problem similar that studwhich is based on a special 0 Figure ing pairs of records in a temporal database intersect in tobgfs Section 2.1) of is an implementation ied in [, ]), proposing an extension theefficient multi-version the (key, time) space (i.e., a problem similar to that studwhich is based on a specialized gapless hash map ture dat mini-joins to the same core, and 0.25 the0.5(key, time) WEBKIT ied 1 in (i.e., [, ]), proposing an extension of the multi-version ture for managing the has act 0.75 space HT is based HT on a problem similar to that studwhich a specialized gapless B-tree [2]. lazy ied in [, ]), proposing an extension of the multi-version ture#for managing the active sets of intervals. EBI cores terms of the execution time the average idle time ratio) but when B-tree [2]. lazy version (LEBI) were s ptive) uniform initial partitioning of R / S ied in [16 [,cores] ]), proposing an extension of the multi-version ture for managing the active sets of inter the B-tree [2]. lazy version (LEBI) were shown17to significantly outp 8 Aug 31, 17 43rd Interna;onal Conference on Very Large Data Bases the best activated setup, adaptive parti-partitioning-based B-tree [2].on top of the mj+greedy/uniform lazy version (LEBI) were partitioning-base shown to signific wing setups: Partitioning-based algorithms. Apoint partitioning-based the best approach [6]apand to per also databases. Similar to plane sweep, merge join algorithms re[16] models A each interval r as a 2D (r.start,r.end) and Partitioning-based algorithms. partitioning-based apperior to another plane sw the best partitioning-based approach [6] a proach for interval joins was proposed in [22]. The tioning enhances the join evaluation when the number of cores is Partitioning-based algorithms. A partitioning-based appro perior another planeagain, sweepaimplementation quire the two input collections to be sorted, however, join joins divides the points into to spatial regions. partition of EBI [1]. proach for interval was proposed in [22]. The eline -based partitioning of Partitioning-based algorithms. A partitioning-based approach similar to is u perior toebi another plane sweep implement isone split to disjoint ranges. Each to interval assigned to HANA the proach for interval proposed in [22]. The low, e.g., 4joins orto 9; was notice how faster isranges. the mj+greedy/adaptive setup kno proach is isused in SAP computation is sub-optimal compared FS, which guarancollection should besimilar joined partitions of [12]. is split towas disjoint Each interval is assigned to with the multiple proach for interval joins proposed in [22]. The zations deactivated; knowledge, no previous wo

39 Parallel processing Speedup [x] Execution time [secs] theoretical h-lebi d-lebi HT 36 HT # cores d-lebi d-bgfs HT 36 HT WEBKIT # cores R / S [16 cores] 1 43rd Interna;onal Conference on Very Large Data Bases Speedup [x] Execution time [secs] theoretical h-bgfs d-bgfs HT 36 HT # cores d-lebi d-bgfs 18

40 To sum up Contribu;ons Efficient evalua;on of interval joins Single-threaded processing Op;mized bgfs, compe;;ve to state-of-the-art EBI/LEBI Lower memory footprint Parallel processing Novel -based par;;oning paradigm Higher speedup Future work Other types of temporal joins Other types of temporal operators Parallel processing Data-level parallelism, share data between threads 43rd Interna;onal Conference on Very Large Data Base9

41 Ques;ons? 43rd Interna;onal Conference on Very Large Data Base0

42 EXTRAS 43rd Interna;onal Conference on Very Large Data Base1

43 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = {} Result {} 43rd Interna;onal Conference on Very Large Data Bases 22

44 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {} 43rd Interna;onal Conference on Very Large Data Bases 22

45 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 22

46 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 22

47 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Pros ü No -point comparisons when producing results ü Tailored to modern hardware ü Main memory cache-aware ü Fast Cons Special data structure needed for ac;ve sets, support for efficient updates and scans 43rd Interna;onal Conference on Very Large Data Bases 23

48 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {} 43rd Interna;onal Conference on Very Large Data Bases 24

49 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 24

50 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {(, ), (, ), (, )} 43rd Interna;onal Conference on Very Large Data Bases 24

51 Forward Scan based (FS) Pros ü Simple ü No special structure needed [Brinkhoff et al., SIGMOD 93] Cons Each join result requires a -point comparison, R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 24

52 Forward Scan based (FS) Pros ü Simple ü No special structure needed [Brinkhoff et al., SIGMOD 93] Cons Each join result requires a -point comparison, R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 25

An Interval Join Optimized for Modern Hardware

An Interval Join Optimized for Modern Hardware Danila Piatov, Sven Helmer and Anton Dignös Faculty of Computer Science Free University of Bozen-Bolzano, Italy Email: firstname.lastname@unibz.it Abstract