A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins

Size: px
Start display at page:

Download "A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins"

Transcription

1 A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins Panagio;s Bouro and Nikos Mamouli 1 Aarhus University, Denmark 2 University of Ioannina, Greece

2 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Helen Tom rd Interna;onal Conference on Very Large Data Base

3 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Jane John Mary Bob Hugo Helen Tom Helen year Tom rd Interna;onal Conference on Very Large Data Base

4 Interval Joins employee start end employee start end John Mary John Jane Jane Mary Bob Bob Hugo Helen Hugo Tom Helen Tom year Find all pair of employees whose period of work on D1 and D2 intersect 43rd Interna;onal Conference on Very Large Data Base

5 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Helen Tom Jane John Mary Bob Hugo Helen Tom Find all pair of employees whose period of work on D1 and D2 intersect Applica;ons Temporal databases Mul;dimensional data management Uncertain data management year 43rd Interna;onal Conference on Very Large Data Base

6 Our focus Efficient evalua;on of interval joins Single-threaded processing Simple plane sweep based method Compe;;ve to state-of-the-art Parallel processing Par;;oning-based join Share nothing 43rd Interna;onal Conference on Very Large Data Base

7 SINGLE-THREADED PROCESSING 43rd Interna;onal Conference on Very Large Data Base

8 Related work Nested loops & Sort-merge join [Segev and Gunadhi, VLDB 89] [Gunadhi and Segev, ICDE 91] Index-based [Zhang et al., ICDE 02] [Enderle et al., SIGMOD 04] ParCConingbased [Soo et al., ICDE 94] [Dignös et al., SIGMOD 14] [Cafagna and Böhlen, VLDBJ 17] Plane-sweep based [Brinkhoff al., SIGMOD 93] [Arge et al., VLDB 98] [Piatov et al., ICDE 16] 43rd Interna;onal Conference on Very Large Data Bases 4

9 Related work Nested loops & Sort-merge join [Segev and Gunadhi, VLDB 89] [Gunadhi and Segev, ICDE 91] Index-based [Zhang et al., ICDE 02] [Enderle et al., SIGMOD 04] ParCConingbased [Soo et al., ICDE 94] [Dignös et al., SIGMOD 14] [Cafagna and Böhlen, VLDBJ 17] Plane-sweep based [Brinkhoff al., SIGMOD 93] [Arge et al., VLDB 98] [Piatov et al., ICDE 16] 43rd Interna;onal Conference on Very Large Data Bases 4

10 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast [Piatov et al., ICDE 16] Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed R + S + R S comparisons in total [Brinkhoff et al., SIGMOD 93] 43rd Interna;onal Conference on Very Large Data Bases 5

11 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed Cons R + S + R S comp Pros ü Simple ü No special structure needed [Piatov et al., ICDE 16] [Brinkhoff et al., SIGMOD 93] R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 5

12 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed Cons R + S + R S comp Pros ü Simple ü No special structure needed [Piatov et al., ICDE 16] [Brinkhoff et al., SIGMOD 93] R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 5

13 Op;mizing FS FS s scanned area s scanned area 43rd Interna;onal Conference on Very Large Data Bases 6

14 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) 43rd Interna;onal Conference on Very Large Data Bases 6

15 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area Sort by end G R s scanned area 43rd Interna;onal Conference on Very Large Data Bases 6

16 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) 43rd Interna;onal Conference on Very Large Data Bases 6

17 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) FS grouping bucke;ng Sort by end G R s scanned area Sort by end G R s scanned area {s BI S 1 } {, } {, } { } {, } {, } 43rd Interna;onal Conference on Very Large Data Bases 6

18 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) FS grouping bucke;ng Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) {s BI S 1 } {, } {, } { } {, } {, } 43rd Interna;onal Conference on Very Large Data Bases 6

19 PARALLEL PROCESSING 43rd Interna;onal Conference on Very Large Data Bases 7

20 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins S k 43rd Interna;onal Conference on Very Large Data Bases 8

21 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins Pros ü Simple ü Load balancing S k 43rd Interna;onal Conference on Very Large Data Bases 8

22 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins S k Pros ü Simple ü Load balancing Cons Domain-point comparisons rise 2. k. ( R + S ) for EBI/LEBI, k. ( R + S ) for FS Degree of parallelism n CPU cores à k = n par;;ons 43rd Interna;onal Conference on Very Large Data Bases 8

23 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins 43rd Interna;onal Conference on Very Large Data Bases 9

24 Domain-based par;;oning Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins 43rd Interna;onal Conference on Very Large Data Bases 9

25 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on Cons Replica;on Load balancing 43rd Interna;onal Conference on Very Large Data Bases 9

26 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on Cons Replica;on Load balancing 43rd Interna;onal Conference on Very Large Data Bases 9

27 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i (6) (7) (8) (9) R i S i R i S i R i S i R i S i 43rd Interna;onal Conference on Very Large Data Base0

28 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i (6) (7) (8) (9) R i S i R i S i R i S i R i S i Duplicate results elimina;on 43rd Interna;onal Conference on Very Large Data Base0

29 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i Same complexity as original join Sweep only one input No comparisons cross product (6) (7) (8) (9) R i S i R i S i R i S i R i S i Duplicate results elimina;on 43rd Interna;onal Conference on Very Large Data Base0

30 Greedy scheduling Idea Distribute (k-1) mini-joins to different cores Evenly distribute load à minimize max load NP-hard problem Greedy approxima;on algorithm 43rd Interna;onal Conference on Very Large Data Base1

31 Greedy scheduling Idea Distribute (k-1) mini-joins to different cores Evenly distribute load à minimize max load NP-hard problem Greedy approxima;on algorithm Core 1 Core 2 Core n 43rd Interna;onal Conference on Very Large Data Base1

32 Adap;ve par;;oning Idea Create an ini;al uniform par;;oning Employ a very fine ;ling granules Move load between neighboring ;les Move granules between neighboring ;les Reposi;on borders of ;les 43rd Interna;onal Conference on Very Large Data Base2

33 Adap;ve par;;oning Idea Create an ini;al uniform par;;oning Employ a very fine ;ling granules Move load between neighboring ;les Move granules between neighboring ;les Reposi;on borders of ;les t 1 t 2 t n t 1 t 2 t n 43rd Interna;onal Conference on Very Large Data Base2

34 EXPERIMENTAL ANALYSIS 43rd Interna;onal Conference on Very Large Data Base3

35 Setup Execution time [secs] FS gfs bgfs Speedup [x] theoretical h-bgfs d-bgfs HT 36 HT Hardware dual -core Intel(R) Xeon(R) CPU E5-2687W 3. GHz with 128 GBs of RAM Hyper-threading enabled, up to 40 threads Sooware Workload [ICDE 16] à XOR of start Loop unrolling forced, OpenMP for mul;-threading Datasets WEBKIT git repo, interval = period of ;me file unchanged BOOKS Aarhus libraries, interval = period of ;me book lent Synthe;c Interval dura;on follows exponen;al distribu;on, uniformly distributed start plus peaks Experiments Execu;on ;me, # comparisons, memory footprint Both self joins and non-self joins Vary R / S, # cores (threads) 43rd Interna;onal Conference on Very Large Data Base4

36 Op;mizing FS Execution time [secs] FS gfs bgfs Endpoint comparisons [%] FS gfs bgfs R / S [1 core] R / S [1 core] WEBKIT WEBKIT 43rd Interna;onal Conference on Very Large Data Base5

37 WEBKIT BOOKS atomic/uniform static/fixed static/ mj+static/fixed mj+greedy mj+static/ mj+gre 25 static/ada mj+greedy 0.75 Execution time [secs] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Execution [secs] Execution timetime [secs] 000 Avg idle time ratio Avg idle time rati Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] DIP bgfsdip bgfs DIP OIP bgfs LEBI OIP LEBI OIP LEBI DIP 000 bgfs DIP 000bgFS 80 OIP LEBI OIP LEBI static/fixed 70 mj+static/fixed (a) R / S [1 core] (b) R / S [1 core] 0.25 Execution time [secs] Execution time [secs] Single-threaded processing static/ada (a) R (c) R (a) R / S [1(a)core] R / S 0 Figure 6: 1 Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] 80 F static/fixed mj+gre 0.6 R / S [1 core] R / S [1 core] 70 mj+static/fixed mj+greedy GREEND WEBKIT d [Cafagna and Bölhen, VLDBJ 17] 0 DIP: [Cafagna and Bölhen, VLDBJ 17] databases. Similar to planq databases. Simila OIP: [Dignös et al., SIGMOD 14] 15 quire the two input collec 0.2 c quire the two inp 43rd Interna;onal Conference on Very Large Data Bases 16 computation is sub-optimt 0.1 computation is su tees at most R + S en tees 0 at most R p produce results

38 Avg idle ti Avg idle time ratio Execution time [secs] Avg idle time r Avg idle time rati Execution time [secs] Avg idle tim Avg idle time r Avg idle time ratio Avg idle time rati (a) R / S 0[1 core] HT(b) R / (a) R / S [1 core] (a) R / S (a) [1 R / S core] [1 core] (b) R / S (b) [1 core] static/fixed mj+greedy/fixed 80 R / S [1 core] 00 static/fixed bgfsdip bgfs static/fixed mj+greedy/fixed mj+static/fixed mj+greedy/adaptive static/fixed static/fixed bgfs LEBI LEBI OIP LEBI bgfs atomic/uniform mj+atomic/uniform atomic/adaptive mj+static/fixed mj+greedy/adaptive static/fixed mj+greedy/fixed 25 static/fixed 00 static/fixed mj+greedy/fixed static/fixed LEBI mj+static/fixed mj+greedy/adaptive static/fixed static/fixed mj+greedy/uniform mj+greedy/adaptive mj+static/fixed mj+greedy/adaptive static/fixed mj+greedy/fixed static/fixed 60 mj+static/fixed mj+greedy/adaptive [1 core] (c) R / S HT (d) R / 0 [1 core] (c) R / S HT 36HT 9 25HT [1 core]36ht (c) [1 core] 16 (d) R / S (b) R / S [1 core] R / S 0.75(c) R / S [1 core] 4 (d) R / S [1 core] 5 Figure 6: Optimizing the -based partiti (a) R / S [1 core] (b) R / S [1 core] BOOKS (a) R / S [16 cores] [ R = S ]the -based partitioning paradigm: bg Figure(b)6:# cores Optimizing 000 Figure 6: Optimizing the 80 -based partitioning paradigm: bgfs on WEBKIT 080-based 0 80 Figure 6: Optimizing the partitioning paradigm: bgfs on WEBKI static/fixed mj+greedy/fixed 4 9 static/fixed 16 25HT 36HT static/fixed mj+static/fixed mj+greedy/adaptive [16 cores] 60 R / S coressweep, merge join algorithms re databases. Similar to #plane [16] plane databases. Similar to sweep, merge join algorithms re[16] models each interval 50 0 quire theretwo input collections to interval be sorted, however, join(r.start,r.e divr databases. Similar to plane sweep, merge join algorithms [16] models each r as a 2D point quire the two input collections to be sorted, join to divides the points spa( 7040 algorithms databases. Similar to plane sweep, merge join re- however, [16] models each interval r as a 2Dinto point computation is sub-optimal compared FS, which guaranone however, quire the two to be sorted, join divides the points into spatial regions. Again, a part input collections computation is sub-optimal which guaranoneinto collection should be quire the two input collections to be tees sorted, however, divides the points spatial regions. Agj compared 60guaranat most R tojoin +FS, S collection endpoint comparisons that domultiple not the 60 computation is sub-optimal compared to FS, which one should be joined with parti tees at most R + S endpoint comparisons that do not the other collection. [Note computation is sub-optimal compared to 50 FS, which guaranone collection should be joined with mult that produce results. comparisons do not the other collection. [Note: remove this one if wespa run 50 tees at most R + S endpoint 1 produce results. space] 5 tees at most R + S endpoint comparisons that do not the other collection. [Note: remove this on produce results. space] 25HT 36HT algorithms produce0.25 results. space] Index-based Enderle et al. [7] propose interme 0 (d) R / S [core] Index-based algorithms. Enderle et al. [7] propose intermethods based on pla (c) R / S cores] 1 et al. [7] propose (d) #join cores [ R 4= S ] 9 operate 16on Index-based val algorithms, which RI-trees [14] HT that Inte [16Enderle HT sweep. algorithms. intermethods based ontwo plane The Endpoin Execution time [secs] Execution time [secs] Execution timeexecution [secs] time [secs] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Avg idle time ratio [%] GREEND [4] Avg idle time ratio [%] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] Op;mizing Domain-based Paradigm val join algorithms, which operate on two RI-trees [14] that Interval (EBI) Join is the m Index-based algorithms. Enderle etindex al.[7]the propose Methods based on plane sweep. The input intercollections. Zhang etisal. [23] focus on approach findside (c) algorithms, R / S [1 core] (d) R / S [1Join core] whichindex operate on two RI-trees [14] that Interval (EBI) the most recent and val join the input collections. Zhang [14] et al. [23] focus on findsider it tomost be the state-of-t val join algorithms, which operate on two RI-trees that Interval (EBI) Join is the recent appr ing pairs of records in a temporal database that intersect in Sec 7: Optimizing the -based partitioning: bgfsit on input collections. Zhang al. [23]in focus on find-database sider be the state-of-the-art. EBI (reviewed in d or single-threaded processing index thefigure pairs ofetrecords a temporal thatto intersect in be thesection 2.1) is an EBI efficient index the input ing collections. Zhang et al. [23] focus on findsider it to state-of-the-art. (rev the (key, time) space (i.e., a problem similar to that studwhi ing pairs of records in a temporal database that intersect in Section 2.1) is an efficient implementation of plane 6:WEBKIT Optimizing the partitioning paradigm: on WEBKIT the -based (key, time) space (i.e.,0that a problem similar that studwhich is based on a special 0 Figure ing pairs of records in a temporal database intersect in tobgfs Section 2.1) of is an implementation ied in [, ]), proposing an extension theefficient multi-version the (key, time) space (i.e., a problem similar to that studwhich is based on a specialized gapless hash map ture dat mini-joins to the same core, and 0.25 the0.5(key, time) WEBKIT ied 1 in (i.e., [, ]), proposing an extension of the multi-version ture for managing the has act 0.75 space HT is based HT on a problem similar to that studwhich a specialized gapless B-tree [2]. lazy ied in [, ]), proposing an extension of the multi-version ture#for managing the active sets of intervals. EBI cores terms of the execution time the average idle time ratio) but when B-tree [2]. lazy version (LEBI) were s ptive) uniform initial partitioning of R / S ied in [16 [,cores] ]), proposing an extension of the multi-version ture for managing the active sets of inter the B-tree [2]. lazy version (LEBI) were shown17to significantly outp 8 Aug 31, 17 43rd Interna;onal Conference on Very Large Data Bases the best activated setup, adaptive parti-partitioning-based B-tree [2].on top of the mj+greedy/uniform lazy version (LEBI) were partitioning-base shown to signific wing setups: Partitioning-based algorithms. Apoint partitioning-based the best approach [6]apand to per also databases. Similar to plane sweep, merge join algorithms re[16] models A each interval r as a 2D (r.start,r.end) and Partitioning-based algorithms. partitioning-based apperior to another plane sw the best partitioning-based approach [6] a proach for interval joins was proposed in [22]. The tioning enhances the join evaluation when the number of cores is Partitioning-based algorithms. A partitioning-based appro perior another planeagain, sweepaimplementation quire the two input collections to be sorted, however, join joins divides the points into to spatial regions. partition of EBI [1]. proach for interval was proposed in [22]. The eline -based partitioning of Partitioning-based algorithms. A partitioning-based approach similar to is u perior toebi another plane sweep implement isone split to disjoint ranges. Each to interval assigned to HANA the proach for interval proposed in [22]. The low, e.g., 4joins orto 9; was notice how faster isranges. the mj+greedy/adaptive setup kno proach is isused in SAP computation is sub-optimal compared FS, which guarancollection should besimilar joined partitions of [12]. is split towas disjoint Each interval is assigned to with the multiple proach for interval joins proposed in [22]. The zations deactivated; knowledge, no previous wo

39 Parallel processing Speedup [x] Execution time [secs] theoretical h-lebi d-lebi HT 36 HT # cores d-lebi d-bgfs HT 36 HT WEBKIT # cores R / S [16 cores] 1 43rd Interna;onal Conference on Very Large Data Bases Speedup [x] Execution time [secs] theoretical h-bgfs d-bgfs HT 36 HT # cores d-lebi d-bgfs 18

40 To sum up Contribu;ons Efficient evalua;on of interval joins Single-threaded processing Op;mized bgfs, compe;;ve to state-of-the-art EBI/LEBI Lower memory footprint Parallel processing Novel -based par;;oning paradigm Higher speedup Future work Other types of temporal joins Other types of temporal operators Parallel processing Data-level parallelism, share data between threads 43rd Interna;onal Conference on Very Large Data Base9

41 Ques;ons? 43rd Interna;onal Conference on Very Large Data Base0

42 EXTRAS 43rd Interna;onal Conference on Very Large Data Base1

43 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = {} Result {} 43rd Interna;onal Conference on Very Large Data Bases 22

44 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {} 43rd Interna;onal Conference on Very Large Data Bases 22

45 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 22

46 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 22

47 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Pros ü No -point comparisons when producing results ü Tailored to modern hardware ü Main memory cache-aware ü Fast Cons Special data structure needed for ac;ve sets, support for efficient updates and scans 43rd Interna;onal Conference on Very Large Data Bases 23

48 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {} 43rd Interna;onal Conference on Very Large Data Bases 24

49 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 24

50 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {(, ), (, ), (, )} 43rd Interna;onal Conference on Very Large Data Bases 24

51 Forward Scan based (FS) Pros ü Simple ü No special structure needed [Brinkhoff et al., SIGMOD 93] Cons Each join result requires a -point comparison, R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 24

52 Forward Scan based (FS) Pros ü Simple ü No special structure needed [Brinkhoff et al., SIGMOD 93] Cons Each join result requires a -point comparison, R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 25

An Interval Join Optimized for Modern Hardware

An Interval Join Optimized for Modern Hardware An Interval Join Optimized for Modern Hardware Danila Piatov, Sven Helmer and Anton Dignös Faculty of Computer Science Free University of Bozen-Bolzano, Italy Email: firstname.lastname@unibz.it Abstract

More information

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk

More information

Mondrian Mul+dimensional K Anonymity

Mondrian Mul+dimensional K Anonymity Mondrian Mul+dimensional K Anonymity Kristen Lefevre, David J. DeWi

More information

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc

FuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,

More information

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:

More information

Temporal Aggregation and Join

Temporal Aggregation and Join TSDB15, SL05 1/49 A. Dignös Temporal and Spatial Database 2014/2015 2nd semester Temporal Aggregation and Join SL05 Temporal aggregation Span, instant, moving window aggregation Aggregation tree, balanced

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Join Processing for Flash SSDs: Remembering Past Lessons

Join Processing for Flash SSDs: Remembering Past Lessons Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of

More information

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC

GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT

More information

DNA Interaction Network

DNA Interaction Network Social Network Web Network Social Network DNA Interaction Network Follow Network User-Product Network Nonuniform network comm costs Contentiousness of the memory subsystems Nonuniform comp requirement

More information

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it

CS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1

More information

Efficient Aggregation for Graph Summarization

Efficient Aggregation for Graph Summarization Efficient Aggregation for Graph Summarization Yuanyuan Tian (University of Michigan) Richard A. Hankins (Nokia Research Center) Jignesh M. Patel (University of Michigan) Motivation Graphs are everywhere

More information

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

hashfs Applying Hashing to Op2mize File Systems for Small File Reads hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Handling heterogeneous storage devices in clusters

Handling heterogeneous storage devices in clusters Handling heterogeneous storage devices in clusters André Brinkmann University of Paderborn Toni Cortes Barcelona Supercompu8ng Center Randomized Data Placement Schemes n Randomized Data Placement Schemes

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying

More information

Main-Memory Databases 1 / 25

Main-Memory Databases 1 / 25 1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low

More information

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 13: Query Processing Basic Steps in Query Processing

Chapter 13: Query Processing Basic Steps in Query Processing Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!

More information

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Chapter 20: Parallel Databases. Introduction

Chapter 20: Parallel Databases. Introduction Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!

More information

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH Cli$anger: Scaling Performance Cliffs in Memory Caches [NSDI 2016] Cache OS: Data Center Dynamic Cache Management Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH 1 Key-Value Caches are Essen1al

More information

Accelerating Analytical Workloads

Accelerating Analytical Workloads Accelerating Analytical Workloads Thomas Neumann Technische Universität München April 15, 2014 Scale Out in Big Data Analytics Big Data usually means data is distributed Scale out to process very large

More information

Re- op&mizing Data Parallel Compu&ng

Re- op&mizing Data Parallel Compu&ng Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data

More information

PageForge: A Near-Memory Content- Aware Page-Merging Architecture

PageForge: A Near-Memory Content- Aware Page-Merging Architecture PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign MICRO-50 @ Boston Motivation: Server

More information

Argo: Architecture- Aware Graph Par33oning

Argo: Architecture- Aware Graph Par33oning Argo: Architecture- Aware Graph Par33oning Angen Zheng Alexandros Labrinidis, Panos K. Chrysanthis, and Jack Lange Department of Computer Science, University of PiCsburgh hcp://db.cs.pic.edu/group/ hcp://www.prognosgclab.org/

More information

NetSlices: Scalable Mul/- Core Packet Processing in User- Space

NetSlices: Scalable Mul/- Core Packet Processing in User- Space NetSlices: Scalable Mul/- Core Packet Processing in - Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee Packet Processors Essen/al for evolving networks Sophis/cated

More information

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery

More information

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from

More information

Energy- Aware Time Change Detec4on Using Synthe4c Aperture Radar On High- Performance Heterogeneous Architectures: A DDDAS Approach

Energy- Aware Time Change Detec4on Using Synthe4c Aperture Radar On High- Performance Heterogeneous Architectures: A DDDAS Approach Energy- Aware Time Change Detec4on Using Synthe4c Aperture Radar On High- Performance Heterogeneous Architectures: A DDDAS Approach Sanjay Ranka (PI) Sartaj Sahni (Co- PI) Mark Schmalz (Co- PI) University

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique

Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

MapReduce. Tom Anderson

MapReduce. Tom Anderson MapReduce Tom Anderson Last Time Difference between local state and knowledge about other node s local state Failures are endemic Communica?on costs ma@er Why Is DS So Hard? System design Par??oning of

More information

MapReduce. Cloud Computing COMP / ECPE 293A

MapReduce. Cloud Computing COMP / ECPE 293A Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems

More information

Columnstore and B+ tree. Are Hybrid Physical. Designs Important?

Columnstore and B+ tree. Are Hybrid Physical. Designs Important? Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 B+ tree & Columnstore on same table = Hybrid design 4? C O L C O L B+ tree B+ tree ? C O L C O L B+ tree B+ tree

More information

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores Xiaoning Ding The Ohio State University dingxn@cse.ohiostate.edu

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques Chapter 12, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections v Cost depends on #qualifying

More information

Evaluating Queries. Query Processing: Overview. Query Processing: Example 6/2/2009. Query Processing

Evaluating Queries. Query Processing: Overview. Query Processing: Example 6/2/2009. Query Processing Evaluating Queries Query Processing Query Processing: Overview Query Processing: Example select lname from employee where ssn = 123456789 ; query expression tree? π lname σ ssn= 123456789 employee physical

More information

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,

More information

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD

Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore

More information

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin

Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age Presented by Dennis Grishin What is the problem? Efficient computation requires distribution of processing between

More information

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016

Datenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016 Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List

More information

Leveraging Set Relations in Exact Set Similarity Join

Leveraging Set Relations in Exact Set Similarity Join Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,

More information

Indexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4

Indexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 Indexing Bi-temporal Windows Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 1 2 3 4 Outline Introduction Bi-temporal Windows Related Work The BiSW Index Experiments Conclusion

More information

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large

More information

Hash Joins for Multi-core CPUs. Benjamin Wagner

Hash Joins for Multi-core CPUs. Benjamin Wagner Hash Joins for Multi-core CPUs Benjamin Wagner Joins fundamental operator in query processing variety of different algorithms many papers publishing different results main question: is tuning to modern

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES

9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES CONFERENCES Short Name SIGMOD Full Name Special Interest Group on Management Of Data CONTINUOUS NEAREST NEIGHBOR SEARCH Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong University of Science and Technology

More information

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn

Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Mo>va>on: Parallel Query Processing Increasing parallelism in compu>ng Shared nothing clusters, mul> core technology,

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007. Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,

More information

Chapter 17: Parallel Databases

Chapter 17: Parallel Databases Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems

More information

DATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland

DATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland DATABASES AND THE CLOUD Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland AVALOQ Conference Zürich June 2011 Systems Group www.systems.ethz.ch Enterprise Computing Center

More information

Line Segment Intersection Dmitriy V'jukov

Line Segment Intersection Dmitriy V'jukov Line Segment Intersection Dmitriy V'jukov 1. Problem Statement Write a threaded code to find pairs of input line segments that intersect within three-dimensional space. Line segments are defined by 6 integers

More information

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage

On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft

More information

Evaluation of relational operations

Evaluation of relational operations Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book

More information

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm

Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory

More information

Starchart*: GPU Program Power/Performance Op7miza7on Using Regression Trees

Starchart*: GPU Program Power/Performance Op7miza7on Using Regression Trees Starchart*: GPU Program Power/Performance Op7miza7on Using Regression Trees Wenhao Jia, Princeton University Kelly A. Shaw, University of Richmond Margaret Martonosi, Princeton University *Sta7s7cal Tuning

More information

Exadata Implementation Strategy

Exadata Implementation Strategy Exadata Implementation Strategy BY UMAIR MANSOOB 1 Who Am I Work as Senior Principle Engineer for an Oracle Partner Oracle Certified Administrator from Oracle 7 12c Exadata Certified Implementation Specialist

More information

Network Coding: Theory and Applica7ons

Network Coding: Theory and Applica7ons Network Coding: Theory and Applica7ons PhD Course Part IV Tuesday 9.15-12.15 18.6.213 Muriel Médard (MIT), Frank H. P. Fitzek (AAU), Daniel E. Lucani (AAU), Morten V. Pedersen (AAU) Plan Hello World! Intra

More information

Implementation of Relational Operations: Other Operations

Implementation of Relational Operations: Other Operations Implementation of Relational Operations: Other Operations Module 4, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Simple Selections SELECT * FROM Reserves R WHERE R.rname < C% Of the form σ

More information

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing: Lab 2 -- Due Weds. Project meeting sign ups 6.830 Lecture 8 10/2/2017 Recap column stores & paper Join Processing: S = {S} tuples, S pages R = {R} tuples, R pages R < S M pages of memory Types of joins

More information

The Logic of Physical Garbage Collection in Deduplicating Storage

The Logic of Physical Garbage Collection in Deduplicating Storage The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in

More information

Parallel Merge Sort with Double Merging

Parallel Merge Sort with Double Merging Parallel with Double Merging Ahmet Uyar Department of Computer Engineering Meliksah University, Kayseri, Turkey auyar@meliksah.edu.tr Abstract ing is one of the fundamental problems in computer science.

More information

Effect of memory latency

Effect of memory latency CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable

More information

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair

External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair {machado,tron}@visgraf.impa.br Theoretical Models Random Access Machine Memory: Infinite Array. Access

More information

What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server

What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server What We Have Already Learned CSE 444: Database Internals Lectures 19-20 Parallel DBMSs Overall architecture of a DBMS Internals of query execution: Data storage and indexing Buffer management Query evaluation

More information

Do Query Op5mizers Need to be SSD- aware?

Do Query Op5mizers Need to be SSD- aware? Do Query Op5mizers Need to be SSD- aware? Steven Pelley, Kristen LeFevre, Thomas F. Wenisch, University of Michigan CSE ADMS 2011 1 Enterprise flash: a new hope Disk Flash Model WD 10Krpm OCZ RevoDrive

More information

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System

A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal

More information

High Performance Computing on MapReduce Programming Framework

High Performance Computing on MapReduce Programming Framework International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen

More information

Cost Models for Query Processing Strategies in the Active Data Repository

Cost Models for Query Processing Strategies in the Active Data Repository Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272

More information

Computational Geometry

Computational Geometry Motivation Motivation Polygons and visibility Visibility in polygons Triangulation Proof of the Art gallery theorem Two points in a simple polygon can see each other if their connecting line segment is

More information

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs

Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Omid Mashayekhi Hang Qu Chinmayee Shah Philip Levis July 13, 2017 2 Cloud Frameworks SQL Streaming Machine Learning

More information

ECSE 425 Lecture 25: Mul1- threading

ECSE 425 Lecture 25: Mul1- threading ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:

More information

Max-Count Aggregation Estimation for Moving Points

Max-Count Aggregation Estimation for Moving Points Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems

More information

Crescando: Predictable Performance for Unpredictable Workloads

Crescando: Predictable Performance for Unpredictable Workloads Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing

More information

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS

A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS F. Campeotto 1,2 A. Dal Palù 3 A. Dovier 2 F. Fioretto 1 E. Pontelli 1 1. Dept. Computer Science, NMSU 2. Dept.

More information

TiDA: High Level Programming Abstrac8ons for Data Locality Management

TiDA: High Level Programming Abstrac8ons for Data Locality Management h#p://parcorelab.ku.edu.tr TiDA: High Level Programming Abstrac8ons for Data Locality Management Didem Unat, Muhammed Nufail Farooqi, Burak Bastem Koç University, Turkey Tan Nguyen, Weiqun Zhang, George

More information

Firebird in 2011/2012: Development Review

Firebird in 2011/2012: Development Review Firebird in 2011/2012: Development Review Dmitry Yemanov mailto:dimitr@firebirdsql.org Firebird Project http://www.firebirdsql.org/ Packages Released in 2011 Firebird 2.1.4 March 2011 96 bugs fixed 4 improvements,

More information

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017

Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017 Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store

More information

Hypergraph Sparsifica/on and Its Applica/on to Par//oning

Hypergraph Sparsifica/on and Its Applica/on to Par//oning Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer

More information

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki DBMS Data Loading: An Analysis on Modern Hardware Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki Data loading: A necessary evil Volume => Expensive 4 zettabytes

More information

Implementing Joins 1

Implementing Joins 1 Implementing Joins 1 Last Time Selection Scan, binary search, indexes Projection Duplicate elimination: sorting, hashing Index-only scans Joins 2 Tuple Nested Loop Join foreach tuple r in R do foreach

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Architecture of a Real-Time Operational DBMS

Architecture of a Real-Time Operational DBMS Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.

More information

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer

More information

Introduction to Spatial Database Systems

Introduction to Spatial Database Systems Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor

More information

Evaluation of Relational Operations

Evaluation of Relational Operations Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection

More information

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file

More information

Dremel: Interactive Analysis of Web-Scale Datasets

Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Presented by: Sameer Agarwal sameerag@cs.berkeley.edu

More information

C has been and will always remain on top for performancecritical

C has been and will always remain on top for performancecritical Check out this link: http://spectrum.ieee.org/static/interactive-the-top-programminglanguages-2016 C has been and will always remain on top for performancecritical applications: Implementing: Databases

More information

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured

More information

Analysis in the Big Data Era

Analysis in the Big Data Era Analysis in the Big Data Era Massive Data Data Analysis Insight Key to Success = Timely and Cost-Effective Analysis 2 Hadoop MapReduce Ecosystem Popular solution to Big Data Analytics Java / C++ / R /

More information