A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins
|
|
- Bruno Manning
- 5 years ago
- Views:
Transcription
1 A Forward Scan based Plane Sweep Algorithm for Parallel Interval Joins Panagio;s Bouro and Nikos Mamouli 1 Aarhus University, Denmark 2 University of Ioannina, Greece
2 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Helen Tom rd Interna;onal Conference on Very Large Data Base
3 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Jane John Mary Bob Hugo Helen Tom Helen year Tom rd Interna;onal Conference on Very Large Data Base
4 Interval Joins employee start end employee start end John Mary John Jane Jane Mary Bob Bob Hugo Helen Hugo Tom Helen Tom year Find all pair of employees whose period of work on D1 and D2 intersect 43rd Interna;onal Conference on Very Large Data Base
5 Interval Joins employee start end John Mary employee start end Jane Bob Hugo Helen Tom Jane John Mary Bob Hugo Helen Tom Find all pair of employees whose period of work on D1 and D2 intersect Applica;ons Temporal databases Mul;dimensional data management Uncertain data management year 43rd Interna;onal Conference on Very Large Data Base
6 Our focus Efficient evalua;on of interval joins Single-threaded processing Simple plane sweep based method Compe;;ve to state-of-the-art Parallel processing Par;;oning-based join Share nothing 43rd Interna;onal Conference on Very Large Data Base
7 SINGLE-THREADED PROCESSING 43rd Interna;onal Conference on Very Large Data Base
8 Related work Nested loops & Sort-merge join [Segev and Gunadhi, VLDB 89] [Gunadhi and Segev, ICDE 91] Index-based [Zhang et al., ICDE 02] [Enderle et al., SIGMOD 04] ParCConingbased [Soo et al., ICDE 94] [Dignös et al., SIGMOD 14] [Cafagna and Böhlen, VLDBJ 17] Plane-sweep based [Brinkhoff al., SIGMOD 93] [Arge et al., VLDB 98] [Piatov et al., ICDE 16] 43rd Interna;onal Conference on Very Large Data Bases 4
9 Related work Nested loops & Sort-merge join [Segev and Gunadhi, VLDB 89] [Gunadhi and Segev, ICDE 91] Index-based [Zhang et al., ICDE 02] [Enderle et al., SIGMOD 04] ParCConingbased [Soo et al., ICDE 94] [Dignös et al., SIGMOD 14] [Cafagna and Böhlen, VLDBJ 17] Plane-sweep based [Brinkhoff al., SIGMOD 93] [Arge et al., VLDB 98] [Piatov et al., ICDE 16] 43rd Interna;onal Conference on Very Large Data Bases 4
10 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast [Piatov et al., ICDE 16] Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed R + S + R S comparisons in total [Brinkhoff et al., SIGMOD 93] 43rd Interna;onal Conference on Very Large Data Bases 5
11 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed Cons R + S + R S comp Pros ü Simple ü No special structure needed [Piatov et al., ICDE 16] [Brinkhoff et al., SIGMOD 93] R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 5
12 Plane sweep methods Endpoint-Based Join (EBI/LEBI) Sweep line stops both on start and end Backwards scan, Gapless hash map buffers open intervals No endpoint comparisons Cons Tailored to modern hardware, main Special memory structure cache-aware needed Fast Special structure needed Pros ü No -point comparisons ü Tailored to modern hardware ü Main memory cache-aware ü Fast Forward Scan based (FS) Sweep lines stops only on start Simple, no special structure needed Cons R + S + R S comp Pros ü Simple ü No special structure needed [Piatov et al., ICDE 16] [Brinkhoff et al., SIGMOD 93] R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 5
13 Op;mizing FS FS s scanned area s scanned area 43rd Interna;onal Conference on Very Large Data Bases 6
14 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) 43rd Interna;onal Conference on Very Large Data Bases 6
15 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area Sort by end G R s scanned area 43rd Interna;onal Conference on Very Large Data Bases 6
16 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) 43rd Interna;onal Conference on Very Large Data Bases 6
17 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) FS grouping bucke;ng Sort by end G R s scanned area Sort by end G R s scanned area {s BI S 1 } {, } {, } { } {, } {, } 43rd Interna;onal Conference on Very Large Data Bases 6
18 Op;mizing FS FS s scanned area (, ),(, ) s scanned area (, ),(, ), (, ) FS grouping Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) FS grouping bucke;ng Sort by end G R s scanned area (, ),(, ), (, ),(, ) Sort by end G R s scanned area (, ) {s BI S 1 } {, } {, } { } {, } {, } 43rd Interna;onal Conference on Very Large Data Bases 6
19 PARALLEL PROCESSING 43rd Interna;onal Conference on Very Large Data Bases 7
20 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins S k 43rd Interna;onal Conference on Very Large Data Bases 8
21 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins Pros ü Simple ü Load balancing S k 43rd Interna;onal Conference on Very Large Data Bases 8
22 Hash-based par;;oning [Piatov et al., ICDE 16] R 1 S 1 R h R 2 S 2 h S R k Idea Randomly split each input into k par;;ons using hash func;on h Evaluate k 2 independent par;;on joins S k Pros ü Simple ü Load balancing Cons Domain-point comparisons rise 2. k. ( R + S ) for EBI/LEBI, k. ( R + S ) for FS Degree of parallelism n CPU cores à k = n par;;ons 43rd Interna;onal Conference on Very Large Data Bases 8
23 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins 43rd Interna;onal Conference on Very Large Data Bases 9
24 Domain-based par;;oning Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins 43rd Interna;onal Conference on Very Large Data Bases 9
25 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on Cons Replica;on Load balancing 43rd Interna;onal Conference on Very Large Data Bases 9
26 Domain-based par;;oning t 1 t 2 t k R 1 S 1 R 2 S 2 R k S k Idea Split into k ;les Replicate intervals Evaluate k independent par;;on joins Pros ü Degree of parallelism - n CPU cores à k = n par;;ons ü Automa;c duplicate elimina;on Cons Replica;on Load balancing 43rd Interna;onal Conference on Very Large Data Bases 9
27 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i (6) (7) (8) (9) R i S i R i S i R i S i R i S i 43rd Interna;onal Conference on Very Large Data Base0
28 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i (6) (7) (8) (9) R i S i R i S i R i S i R i S i Duplicate results elimina;on 43rd Interna;onal Conference on Very Large Data Base0
29 Mini-joins break down 3 types of intervals à 9 types of mini-tasks (1) (2) (3) (4) (5) R i S i R i S i R i S i R i S i R i S i Same complexity as original join Sweep only one input No comparisons cross product (6) (7) (8) (9) R i S i R i S i R i S i R i S i Duplicate results elimina;on 43rd Interna;onal Conference on Very Large Data Base0
30 Greedy scheduling Idea Distribute (k-1) mini-joins to different cores Evenly distribute load à minimize max load NP-hard problem Greedy approxima;on algorithm 43rd Interna;onal Conference on Very Large Data Base1
31 Greedy scheduling Idea Distribute (k-1) mini-joins to different cores Evenly distribute load à minimize max load NP-hard problem Greedy approxima;on algorithm Core 1 Core 2 Core n 43rd Interna;onal Conference on Very Large Data Base1
32 Adap;ve par;;oning Idea Create an ini;al uniform par;;oning Employ a very fine ;ling granules Move load between neighboring ;les Move granules between neighboring ;les Reposi;on borders of ;les 43rd Interna;onal Conference on Very Large Data Base2
33 Adap;ve par;;oning Idea Create an ini;al uniform par;;oning Employ a very fine ;ling granules Move load between neighboring ;les Move granules between neighboring ;les Reposi;on borders of ;les t 1 t 2 t n t 1 t 2 t n 43rd Interna;onal Conference on Very Large Data Base2
34 EXPERIMENTAL ANALYSIS 43rd Interna;onal Conference on Very Large Data Base3
35 Setup Execution time [secs] FS gfs bgfs Speedup [x] theoretical h-bgfs d-bgfs HT 36 HT Hardware dual -core Intel(R) Xeon(R) CPU E5-2687W 3. GHz with 128 GBs of RAM Hyper-threading enabled, up to 40 threads Sooware Workload [ICDE 16] à XOR of start Loop unrolling forced, OpenMP for mul;-threading Datasets WEBKIT git repo, interval = period of ;me file unchanged BOOKS Aarhus libraries, interval = period of ;me book lent Synthe;c Interval dura;on follows exponen;al distribu;on, uniformly distributed start plus peaks Experiments Execu;on ;me, # comparisons, memory footprint Both self joins and non-self joins Vary R / S, # cores (threads) 43rd Interna;onal Conference on Very Large Data Base4
36 Op;mizing FS Execution time [secs] FS gfs bgfs Endpoint comparisons [%] FS gfs bgfs R / S [1 core] R / S [1 core] WEBKIT WEBKIT 43rd Interna;onal Conference on Very Large Data Base5
37 WEBKIT BOOKS atomic/uniform static/fixed static/ mj+static/fixed mj+greedy mj+static/ mj+gre 25 static/ada mj+greedy 0.75 Execution time [secs] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Execution [secs] Execution timetime [secs] 000 Avg idle time ratio Avg idle time rati Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] DIP bgfsdip bgfs DIP OIP bgfs LEBI OIP LEBI OIP LEBI DIP 000 bgfs DIP 000bgFS 80 OIP LEBI OIP LEBI static/fixed 70 mj+static/fixed (a) R / S [1 core] (b) R / S [1 core] 0.25 Execution time [secs] Execution time [secs] Single-threaded processing static/ada (a) R (c) R (a) R / S [1(a)core] R / S 0 Figure 6: 1 Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] 80 F static/fixed mj+gre 0.6 R / S [1 core] R / S [1 core] 70 mj+static/fixed mj+greedy GREEND WEBKIT d [Cafagna and Bölhen, VLDBJ 17] 0 DIP: [Cafagna and Bölhen, VLDBJ 17] databases. Similar to planq databases. Simila OIP: [Dignös et al., SIGMOD 14] 15 quire the two input collec 0.2 c quire the two inp 43rd Interna;onal Conference on Very Large Data Bases 16 computation is sub-optimt 0.1 computation is su tees at most R + S en tees 0 at most R p produce results
38 Avg idle ti Avg idle time ratio Execution time [secs] Avg idle time r Avg idle time rati Execution time [secs] Avg idle tim Avg idle time r Avg idle time ratio Avg idle time rati (a) R / S 0[1 core] HT(b) R / (a) R / S [1 core] (a) R / S (a) [1 R / S core] [1 core] (b) R / S (b) [1 core] static/fixed mj+greedy/fixed 80 R / S [1 core] 00 static/fixed bgfsdip bgfs static/fixed mj+greedy/fixed mj+static/fixed mj+greedy/adaptive static/fixed static/fixed bgfs LEBI LEBI OIP LEBI bgfs atomic/uniform mj+atomic/uniform atomic/adaptive mj+static/fixed mj+greedy/adaptive static/fixed mj+greedy/fixed 25 static/fixed 00 static/fixed mj+greedy/fixed static/fixed LEBI mj+static/fixed mj+greedy/adaptive static/fixed static/fixed mj+greedy/uniform mj+greedy/adaptive mj+static/fixed mj+greedy/adaptive static/fixed mj+greedy/fixed static/fixed 60 mj+static/fixed mj+greedy/adaptive [1 core] (c) R / S HT (d) R / 0 [1 core] (c) R / S HT 36HT 9 25HT [1 core]36ht (c) [1 core] 16 (d) R / S (b) R / S [1 core] R / S 0.75(c) R / S [1 core] 4 (d) R / S [1 core] 5 Figure 6: Optimizing the -based partiti (a) R / S [1 core] (b) R / S [1 core] BOOKS (a) R / S [16 cores] [ R = S ]the -based partitioning paradigm: bg Figure(b)6:# cores Optimizing 000 Figure 6: Optimizing the 80 -based partitioning paradigm: bgfs on WEBKIT 080-based 0 80 Figure 6: Optimizing the partitioning paradigm: bgfs on WEBKI static/fixed mj+greedy/fixed 4 9 static/fixed 16 25HT 36HT static/fixed mj+static/fixed mj+greedy/adaptive [16 cores] 60 R / S coressweep, merge join algorithms re databases. Similar to #plane [16] plane databases. Similar to sweep, merge join algorithms re[16] models each interval 50 0 quire theretwo input collections to interval be sorted, however, join(r.start,r.e divr databases. Similar to plane sweep, merge join algorithms [16] models each r as a 2D point quire the two input collections to be sorted, join to divides the points spa( 7040 algorithms databases. Similar to plane sweep, merge join re- however, [16] models each interval r as a 2Dinto point computation is sub-optimal compared FS, which guaranone however, quire the two to be sorted, join divides the points into spatial regions. Again, a part input collections computation is sub-optimal which guaranoneinto collection should be quire the two input collections to be tees sorted, however, divides the points spatial regions. Agj compared 60guaranat most R tojoin +FS, S collection endpoint comparisons that domultiple not the 60 computation is sub-optimal compared to FS, which one should be joined with parti tees at most R + S endpoint comparisons that do not the other collection. [Note computation is sub-optimal compared to 50 FS, which guaranone collection should be joined with mult that produce results. comparisons do not the other collection. [Note: remove this one if wespa run 50 tees at most R + S endpoint 1 produce results. space] 5 tees at most R + S endpoint comparisons that do not the other collection. [Note: remove this on produce results. space] 25HT 36HT algorithms produce0.25 results. space] Index-based Enderle et al. [7] propose interme 0 (d) R / S [core] Index-based algorithms. Enderle et al. [7] propose intermethods based on pla (c) R / S cores] 1 et al. [7] propose (d) #join cores [ R 4= S ] 9 operate 16on Index-based val algorithms, which RI-trees [14] HT that Inte [16Enderle HT sweep. algorithms. intermethods based ontwo plane The Endpoin Execution time [secs] Execution time [secs] Execution timeexecution [secs] time [secs] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] Execution time [secs] Avg idle time ratio [%] Avg idle time ratio [%] GREEND [4] Avg idle time ratio [%] Avg idle time ratio [%] Execution time [secs] Execution time [secs] Execution time [secs] Op;mizing Domain-based Paradigm val join algorithms, which operate on two RI-trees [14] that Interval (EBI) Join is the m Index-based algorithms. Enderle etindex al.[7]the propose Methods based on plane sweep. The input intercollections. Zhang etisal. [23] focus on approach findside (c) algorithms, R / S [1 core] (d) R / S [1Join core] whichindex operate on two RI-trees [14] that Interval (EBI) the most recent and val join the input collections. Zhang [14] et al. [23] focus on findsider it tomost be the state-of-t val join algorithms, which operate on two RI-trees that Interval (EBI) Join is the recent appr ing pairs of records in a temporal database that intersect in Sec 7: Optimizing the -based partitioning: bgfsit on input collections. Zhang al. [23]in focus on find-database sider be the state-of-the-art. EBI (reviewed in d or single-threaded processing index thefigure pairs ofetrecords a temporal thatto intersect in be thesection 2.1) is an EBI efficient index the input ing collections. Zhang et al. [23] focus on findsider it to state-of-the-art. (rev the (key, time) space (i.e., a problem similar to that studwhi ing pairs of records in a temporal database that intersect in Section 2.1) is an efficient implementation of plane 6:WEBKIT Optimizing the partitioning paradigm: on WEBKIT the -based (key, time) space (i.e.,0that a problem similar that studwhich is based on a special 0 Figure ing pairs of records in a temporal database intersect in tobgfs Section 2.1) of is an implementation ied in [, ]), proposing an extension theefficient multi-version the (key, time) space (i.e., a problem similar to that studwhich is based on a specialized gapless hash map ture dat mini-joins to the same core, and 0.25 the0.5(key, time) WEBKIT ied 1 in (i.e., [, ]), proposing an extension of the multi-version ture for managing the has act 0.75 space HT is based HT on a problem similar to that studwhich a specialized gapless B-tree [2]. lazy ied in [, ]), proposing an extension of the multi-version ture#for managing the active sets of intervals. EBI cores terms of the execution time the average idle time ratio) but when B-tree [2]. lazy version (LEBI) were s ptive) uniform initial partitioning of R / S ied in [16 [,cores] ]), proposing an extension of the multi-version ture for managing the active sets of inter the B-tree [2]. lazy version (LEBI) were shown17to significantly outp 8 Aug 31, 17 43rd Interna;onal Conference on Very Large Data Bases the best activated setup, adaptive parti-partitioning-based B-tree [2].on top of the mj+greedy/uniform lazy version (LEBI) were partitioning-base shown to signific wing setups: Partitioning-based algorithms. Apoint partitioning-based the best approach [6]apand to per also databases. Similar to plane sweep, merge join algorithms re[16] models A each interval r as a 2D (r.start,r.end) and Partitioning-based algorithms. partitioning-based apperior to another plane sw the best partitioning-based approach [6] a proach for interval joins was proposed in [22]. The tioning enhances the join evaluation when the number of cores is Partitioning-based algorithms. A partitioning-based appro perior another planeagain, sweepaimplementation quire the two input collections to be sorted, however, join joins divides the points into to spatial regions. partition of EBI [1]. proach for interval was proposed in [22]. The eline -based partitioning of Partitioning-based algorithms. A partitioning-based approach similar to is u perior toebi another plane sweep implement isone split to disjoint ranges. Each to interval assigned to HANA the proach for interval proposed in [22]. The low, e.g., 4joins orto 9; was notice how faster isranges. the mj+greedy/adaptive setup kno proach is isused in SAP computation is sub-optimal compared FS, which guarancollection should besimilar joined partitions of [12]. is split towas disjoint Each interval is assigned to with the multiple proach for interval joins proposed in [22]. The zations deactivated; knowledge, no previous wo
39 Parallel processing Speedup [x] Execution time [secs] theoretical h-lebi d-lebi HT 36 HT # cores d-lebi d-bgfs HT 36 HT WEBKIT # cores R / S [16 cores] 1 43rd Interna;onal Conference on Very Large Data Bases Speedup [x] Execution time [secs] theoretical h-bgfs d-bgfs HT 36 HT # cores d-lebi d-bgfs 18
40 To sum up Contribu;ons Efficient evalua;on of interval joins Single-threaded processing Op;mized bgfs, compe;;ve to state-of-the-art EBI/LEBI Lower memory footprint Parallel processing Novel -based par;;oning paradigm Higher speedup Future work Other types of temporal joins Other types of temporal operators Parallel processing Data-level parallelism, share data between threads 43rd Interna;onal Conference on Very Large Data Base9
41 Ques;ons? 43rd Interna;onal Conference on Very Large Data Base0
42 EXTRAS 43rd Interna;onal Conference on Very Large Data Base1
43 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = {} Result {} 43rd Interna;onal Conference on Very Large Data Bases 22
44 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {} 43rd Interna;onal Conference on Very Large Data Bases 22
45 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 22
46 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Endpoint indices EI R = {.start,.start,.end,.end} EI S = {.start,.end,.start,.end,.start,.end,.start,.start,.end,.end} AcCve sets A R = {} A S = { } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 22
47 Endpoint-Based Join (EBI/LEBI) [Piatov et al., ICDE 16] Pros ü No -point comparisons when producing results ü Tailored to modern hardware ü Main memory cache-aware ü Fast Cons Special data structure needed for ac;ve sets, support for efficient updates and scans 43rd Interna;onal Conference on Very Large Data Bases 23
48 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {} 43rd Interna;onal Conference on Very Large Data Bases 24
49 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {(, )} 43rd Interna;onal Conference on Very Large Data Bases 24
50 Forward Scan based (FS) [Brinkhoff et al., SIGMOD 93] Sorted inputs R = {, } S = {,,,, } Result {(, ), (, ), (, )} 43rd Interna;onal Conference on Very Large Data Bases 24
51 Forward Scan based (FS) Pros ü Simple ü No special structure needed [Brinkhoff et al., SIGMOD 93] Cons Each join result requires a -point comparison, R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 24
52 Forward Scan based (FS) Pros ü Simple ü No special structure needed [Brinkhoff et al., SIGMOD 93] Cons Each join result requires a -point comparison, R + S + R S comparisons in total 43rd Interna;onal Conference on Very Large Data Bases 25
An Interval Join Optimized for Modern Hardware
An Interval Join Optimized for Modern Hardware Danila Piatov, Sven Helmer and Anton Dignös Faculty of Computer Science Free University of Bozen-Bolzano, Italy Email: firstname.lastname@unibz.it Abstract
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationMondrian Mul+dimensional K Anonymity
Mondrian Mul+dimensional K Anonymity Kristen Lefevre, David J. DeWi
More informationFuxiSort. Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc
Fuxi Jiamang Wang, Yongjun Wu, Hua Cai, Zhipeng Tang, Zhiqiang Lv, Bin Lu, Yangyu Tao, Chao Li, Jingren Zhou, Hong Tang Alibaba Group Inc {jiamang.wang, yongjun.wyj, hua.caihua, zhipeng.tzp, zhiqiang.lv,
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationTemporal Aggregation and Join
TSDB15, SL05 1/49 A. Dignös Temporal and Spatial Database 2014/2015 2nd semester Temporal Aggregation and Join SL05 Temporal aggregation Span, instant, moving window aggregation Aggregation tree, balanced
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationJoin Processing for Flash SSDs: Remembering Past Lessons
Join Processing for Flash SSDs: Remembering Past Lessons Jaeyoung Do, Jignesh M. Patel Department of Computer Sciences University of Wisconsin-Madison $/MB GB Flash Solid State Drives (SSDs) Benefits of
More informationGPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC
GPU ACCELERATED SELF-JOIN FOR THE DISTANCE SIMILARITY METRIC MIKE GOWANLOCK NORTHERN ARIZONA UNIVERSITY SCHOOL OF INFORMATICS, COMPUTING & CYBER SYSTEMS BEN KARSIN UNIVERSITY OF HAWAII AT MANOA DEPARTMENT
More informationDNA Interaction Network
Social Network Web Network Social Network DNA Interaction Network Follow Network User-Product Network Nonuniform network comm costs Contentiousness of the memory subsystems Nonuniform comp requirement
More informationCS 590: High Performance Computing. Parallel Computer Architectures. Lab 1 Starts Today. Already posted on Canvas (under Assignment) Let s look at it
Lab 1 Starts Today Already posted on Canvas (under Assignment) Let s look at it CS 590: High Performance Computing Parallel Computer Architectures Fengguang Song Department of Computer Science IUPUI 1
More informationEfficient Aggregation for Graph Summarization
Efficient Aggregation for Graph Summarization Yuanyuan Tian (University of Michigan) Richard A. Hankins (Nokia Research Center) Jignesh M. Patel (University of Michigan) Motivation Graphs are everywhere
More informationhashfs Applying Hashing to Op2mize File Systems for Small File Reads
hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationHandling heterogeneous storage devices in clusters
Handling heterogeneous storage devices in clusters André Brinkmann University of Paderborn Toni Cortes Barcelona Supercompu8ng Center Randomized Data Placement Schemes n Randomized Data Placement Schemes
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationAsaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH
Cli$anger: Scaling Performance Cliffs in Memory Caches [NSDI 2016] Cache OS: Data Center Dynamic Cache Management Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh and Sachin KaH 1 Key-Value Caches are Essen1al
More informationAccelerating Analytical Workloads
Accelerating Analytical Workloads Thomas Neumann Technische Universität München April 15, 2014 Scale Out in Big Data Analytics Big Data usually means data is distributed Scale out to process very large
More informationRe- op&mizing Data Parallel Compu&ng
Re- op&mizing Data Parallel Compu&ng Sameer Agarwal Srikanth Kandula, Nicolas Bruno, Ming- Chuan Wu, Ion Stoica, Jingren Zhou UC Berkeley A Data Parallel Job can be a collec/on of maps, A Data Parallel
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques [R&G] Chapter 14, Part B CS4320 1 Using an Index for Selections Cost depends on #qualifying tuples, and clustering. Cost of finding qualifying data
More informationPageForge: A Near-Memory Content- Aware Page-Merging Architecture
PageForge: A Near-Memory Content- Aware Page-Merging Architecture Dimitrios Skarlatos, Nam Sung Kim, and Josep Torrellas University of Illinois at Urbana-Champaign MICRO-50 @ Boston Motivation: Server
More informationArgo: Architecture- Aware Graph Par33oning
Argo: Architecture- Aware Graph Par33oning Angen Zheng Alexandros Labrinidis, Panos K. Chrysanthis, and Jack Lange Department of Computer Science, University of PiCsburgh hcp://db.cs.pic.edu/group/ hcp://www.prognosgclab.org/
More informationNetSlices: Scalable Mul/- Core Packet Processing in User- Space
NetSlices: Scalable Mul/- Core Packet Processing in - Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee Packet Processors Essen/al for evolving networks Sophis/cated
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationEnergy- Aware Time Change Detec4on Using Synthe4c Aperture Radar On High- Performance Heterogeneous Architectures: A DDDAS Approach
Energy- Aware Time Change Detec4on Using Synthe4c Aperture Radar On High- Performance Heterogeneous Architectures: A DDDAS Approach Sanjay Ranka (PI) Sartaj Sahni (Co- PI) Mark Schmalz (Co- PI) University
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationChisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique
Chisel++: Handling Partitioning Skew in MapReduce Framework Using Efficient Range Partitioning Technique Prateek Dhawalia Sriram Kailasam D. Janakiram Distributed and Object Systems Lab Dept. of Comp.
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationMapReduce. Tom Anderson
MapReduce Tom Anderson Last Time Difference between local state and knowledge about other node s local state Failures are endemic Communica?on costs ma@er Why Is DS So Hard? System design Par??oning of
More informationMapReduce. Cloud Computing COMP / ECPE 293A
Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems
More informationColumnstore and B+ tree. Are Hybrid Physical. Designs Important?
Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 B+ tree & Columnstore on same table = Hybrid design 4? C O L C O L B+ tree B+ tree ? C O L C O L B+ tree B+ tree
More informationSRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores
SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores Xiaoning Ding The Ohio State University dingxn@cse.ohiostate.edu
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques Chapter 12, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections v Cost depends on #qualifying
More informationEvaluating Queries. Query Processing: Overview. Query Processing: Example 6/2/2009. Query Processing
Evaluating Queries Query Processing Query Processing: Overview Query Processing: Example select lname from employee where ssn = 123456789 ; query expression tree? π lname σ ssn= 123456789 employee physical
More informationEvaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi
Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,
More informationOp#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD
Op#mizing PGAS overhead in a mul#-locale Chapel implementa#on of CoMD Riyaz Haque and David F. Richards This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore
More informationMorsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin
Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age Presented by Dennis Grishin What is the problem? Efficient computation requires distribution of processing between
More informationDatenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016
Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List
More informationLeveraging Set Relations in Exact Set Similarity Join
Leveraging Set Relations in Exact Set Similarity Join Xubo Wang, Lu Qin, Xuemin Lin, Ying Zhang, and Lijun Chang University of New South Wales, Australia University of Technology Sydney, Australia {xwang,lxue,ljchang}@cse.unsw.edu.au,
More informationIndexing Bi-temporal Windows. Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4
Indexing Bi-temporal Windows Chang Ge 1, Martin Kaufmann 2, Lukasz Golab 1, Peter M. Fischer 3, Anil K. Goel 4 1 2 3 4 Outline Introduction Bi-temporal Windows Related Work The BiSW Index Experiments Conclusion
More informationParallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism
Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large
More informationHash Joins for Multi-core CPUs. Benjamin Wagner
Hash Joins for Multi-core CPUs Benjamin Wagner Joins fundamental operator in query processing variety of different algorithms many papers publishing different results main question: is tuning to modern
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More information9/23/2009 CONFERENCES CONTINUOUS NEAREST NEIGHBOR SEARCH INTRODUCTION OVERVIEW PRELIMINARY -- POINT NN QUERIES
CONFERENCES Short Name SIGMOD Full Name Special Interest Group on Management Of Data CONTINUOUS NEAREST NEIGHBOR SEARCH Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong University of Science and Technology
More informationAmol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn
Amol Deshpande, University of Maryland Lisa Hellerstein, Polytechnic University, Brooklyn Mo>va>on: Parallel Query Processing Increasing parallelism in compu>ng Shared nothing clusters, mul> core technology,
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationSandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.
Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationDATABASES AND THE CLOUD. Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland
DATABASES AND THE CLOUD Gustavo Alonso Systems Group / ECC Dept. of Computer Science ETH Zürich, Switzerland AVALOQ Conference Zürich June 2011 Systems Group www.systems.ethz.ch Enterprise Computing Center
More informationLine Segment Intersection Dmitriy V'jukov
Line Segment Intersection Dmitriy V'jukov 1. Problem Statement Write a threaded code to find pairs of input line segments that intersect within three-dimensional space. Line segments are defined by 6 integers
More informationOn Smart Query Routing: For Distributed Graph Querying with Decoupled Storage
On Smart Query Routing: For Distributed Graph Querying with Decoupled Storage Arijit Khan Nanyang Technological University (NTU), Singapore Gustavo Segovia ETH Zurich, Switzerland Donald Kossmann Microsoft
More informationEvaluation of relational operations
Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book
More informationImproving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm
Improving Memory Space Efficiency of Kd-tree for Real-time Ray Tracing Byeongjun Choi, Byungjoon Chang, Insung Ihm Department of Computer Science and Engineering Sogang University, Korea Improving Memory
More informationStarchart*: GPU Program Power/Performance Op7miza7on Using Regression Trees
Starchart*: GPU Program Power/Performance Op7miza7on Using Regression Trees Wenhao Jia, Princeton University Kelly A. Shaw, University of Richmond Margaret Martonosi, Princeton University *Sta7s7cal Tuning
More informationExadata Implementation Strategy
Exadata Implementation Strategy BY UMAIR MANSOOB 1 Who Am I Work as Senior Principle Engineer for an Oracle Partner Oracle Certified Administrator from Oracle 7 12c Exadata Certified Implementation Specialist
More informationNetwork Coding: Theory and Applica7ons
Network Coding: Theory and Applica7ons PhD Course Part IV Tuesday 9.15-12.15 18.6.213 Muriel Médard (MIT), Frank H. P. Fitzek (AAU), Daniel E. Lucani (AAU), Morten V. Pedersen (AAU) Plan Hello World! Intra
More informationImplementation of Relational Operations: Other Operations
Implementation of Relational Operations: Other Operations Module 4, Lecture 2 Database Management Systems, R. Ramakrishnan 1 Simple Selections SELECT * FROM Reserves R WHERE R.rname < C% Of the form σ
More information6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:
Lab 2 -- Due Weds. Project meeting sign ups 6.830 Lecture 8 10/2/2017 Recap column stores & paper Join Processing: S = {S} tuples, S pages R = {R} tuples, R pages R < S M pages of memory Types of joins
More informationThe Logic of Physical Garbage Collection in Deduplicating Storage
The Logic of Physical Garbage Collection in Deduplicating Storage Fred Douglis Abhinav Duggal Philip Shilane Tony Wong Dell EMC Shiqin Yan University of Chicago Fabiano Botelho Rubrik 1 Deduplication in
More informationParallel Merge Sort with Double Merging
Parallel with Double Merging Ahmet Uyar Department of Computer Engineering Meliksah University, Kayseri, Turkey auyar@meliksah.edu.tr Abstract ing is one of the fundamental problems in computer science.
More informationEffect of memory latency
CACHE AWARENESS Effect of memory latency Consider a processor operating at 1 GHz (1 ns clock) connected to a DRAM with a latency of 100 ns. Assume that the processor has two ALU units and it is capable
More informationExternal-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair
External-Memory Algorithms with Applications in GIS - (L. Arge) Enylton Machado Roberto Beauclair {machado,tron}@visgraf.impa.br Theoretical Models Random Access Machine Memory: Infinite Array. Access
More informationWhat We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server
What We Have Already Learned CSE 444: Database Internals Lectures 19-20 Parallel DBMSs Overall architecture of a DBMS Internals of query execution: Data storage and indexing Buffer management Query evaluation
More informationDo Query Op5mizers Need to be SSD- aware?
Do Query Op5mizers Need to be SSD- aware? Steven Pelley, Kristen LeFevre, Thomas F. Wenisch, University of Michigan CSE ADMS 2011 1 Enterprise flash: a new hope Disk Flash Model WD 10Krpm OCZ RevoDrive
More informationA Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System
A Distributed Data- Parallel Execu3on Framework in the Kepler Scien3fic Workflow System Ilkay Al(ntas and Daniel Crawl San Diego Supercomputer Center UC San Diego Jianwu Wang UMBC WorDS.sdsc.edu Computa3onal
More informationHigh Performance Computing on MapReduce Programming Framework
International Journal of Private Cloud Computing Environment and Management Vol. 2, No. 1, (2015), pp. 27-32 http://dx.doi.org/10.21742/ijpccem.2015.2.1.04 High Performance Computing on MapReduce Programming
More informationIntroduction to Parallel Computing
Portland State University ECE 588/688 Introduction to Parallel Computing Reference: Lawrence Livermore National Lab Tutorial https://computing.llnl.gov/tutorials/parallel_comp/ Copyright by Alaa Alameldeen
More informationCost Models for Query Processing Strategies in the Active Data Repository
Cost Models for Query rocessing Strategies in the Active Data Repository Chialin Chang Institute for Advanced Computer Studies and Department of Computer Science University of Maryland, College ark 272
More informationComputational Geometry
Motivation Motivation Polygons and visibility Visibility in polygons Triangulation Proof of the Art gallery theorem Two points in a simple polygon can see each other if their connecting line segment is
More informationExecu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs
Execu&on Templates: Caching Control Plane Decisions for Strong Scaling of Data Analy&cs Omid Mashayekhi Hang Qu Chinmayee Shah Philip Levis July 13, 2017 2 Cloud Frameworks SQL Streaming Machine Learning
More informationECSE 425 Lecture 25: Mul1- threading
ECSE 425 Lecture 25: Mul1- threading H&P Chapter 3 Last Time Theore1cal and prac1cal limits of ILP Instruc1on window Branch predic1on Register renaming 2 Today Mul1- threading Chapter 3.5 Summary of ILP:
More informationMax-Count Aggregation Estimation for Moving Points
Max-Count Aggregation Estimation for Moving Points Yi Chen Peter Revesz Dept. of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA Abstract Many interesting problems
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationA FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS
A FILTERING TECHNIQUE FOR FRAGMENT ASSEMBLY- BASED PROTEINS LOOP MODELING WITH CONSTRAINTS F. Campeotto 1,2 A. Dal Palù 3 A. Dovier 2 F. Fioretto 1 E. Pontelli 1 1. Dept. Computer Science, NMSU 2. Dept.
More informationTiDA: High Level Programming Abstrac8ons for Data Locality Management
h#p://parcorelab.ku.edu.tr TiDA: High Level Programming Abstrac8ons for Data Locality Management Didem Unat, Muhammed Nufail Farooqi, Burak Bastem Koç University, Turkey Tan Nguyen, Weiqun Zhang, George
More informationFirebird in 2011/2012: Development Review
Firebird in 2011/2012: Development Review Dmitry Yemanov mailto:dimitr@firebirdsql.org Firebird Project http://www.firebirdsql.org/ Packages Released in 2011 Firebird 2.1.4 March 2011 96 bugs fixed 4 improvements,
More informationToward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store. Wei Xie TTU CS Department Seminar, 3/7/2017
Toward Energy-efficient and Fault-tolerant Consistent Hashing based Data Store Wei Xie TTU CS Department Seminar, 3/7/2017 1 Outline General introduction Study 1: Elastic Consistent Hashing based Store
More informationHypergraph Sparsifica/on and Its Applica/on to Par//oning
Hypergraph Sparsifica/on and Its Applica/on to Par//oning Mehmet Deveci 1,3, Kamer Kaya 1, Ümit V. Çatalyürek 1,2 1 Dept. of Biomedical Informa/cs, The Ohio State University 2 Dept. of Electrical & Computer
More informationDBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki
DBMS Data Loading: An Analysis on Modern Hardware Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki Data loading: A necessary evil Volume => Expensive 4 zettabytes
More informationImplementing Joins 1
Implementing Joins 1 Last Time Selection Scan, binary search, indexes Projection Duplicate elimination: sorting, hashing Index-only scans Joins 2 Tuple Nested Loop Join foreach tuple r in R do foreach
More informationSimultaneous Multithreading on Pentium 4
Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on
More informationArchitecture of a Real-Time Operational DBMS
Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.
More informationElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests Mingxing Tan 1 2, Gai Liu 1, Ritchie Zhao 1, Steve Dai 1, Zhiru Zhang 1 1 Computer Systems Laboratory, Electrical and Computer
More informationIntroduction to Spatial Database Systems
Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated
More informationWeaving Relations for Cache Performance
VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection
More informationCamdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa
Camdoop Exploiting In-network Aggregation for Big Data Applications costa@imperial.ac.uk joint work with Austin Donnelly, Antony Rowstron, and Greg O Shea (MSR Cambridge) MapReduce Overview Input file
More informationDremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis Presented by: Sameer Agarwal sameerag@cs.berkeley.edu
More informationC has been and will always remain on top for performancecritical
Check out this link: http://spectrum.ieee.org/static/interactive-the-top-programminglanguages-2016 C has been and will always remain on top for performancecritical applications: Implementing: Databases
More informationChallenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery
Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark by Nkemdirim Dockery High Performance Computing Workloads Core-memory sized Floating point intensive Well-structured
More informationAnalysis in the Big Data Era
Analysis in the Big Data Era Massive Data Data Analysis Insight Key to Success = Timely and Cost-Effective Analysis 2 Hadoop MapReduce Ecosystem Popular solution to Big Data Analytics Java / C++ / R /
More information