Architecture-Conscious Database Systems
|
|
- Flora Gregory
- 5 years ago
- Views:
Transcription
1 Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI)
2 Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query co-processing on Commodity Processors VLDB 2006 tutorial (first half) Anastassia Ailamaki, Stavros Harizopoulos VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 2
3 Focus How can we explore new hardware to run database workloads efficiently? VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 3
4 Overview l Introduction l Interaction computer architecture and DBMS l Query processing l Perf Breakdowns l Bottlenecks l Future Directions l Architecture-Conscious Data Management l Limitations and opportunities VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 4
5 Processor/Memory Speed Gap 2x processor speed every 18 months Larger but not as much faster memories VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 5
6 CPU Architecture Elements: l Storage l CPU caches L1/L2/L3 l Registers l Execution Unit(s) l Pipelined l SIMD VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 6
7 CPU Metrics VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 7
8 DRAM Metrics VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 8
9 The Memory Wall Trip to memory = 1000s of instructions! VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 9
10 Memory Hierarchies +Transition Lookaside Buffer (TLB) Cache for VM address translation Ł only 64 entries! VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 10
11 Cache Implementation Important parameters: cache size, cache line size, cache associativity VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 11
12 Cache Associativity lower associativity Ł faster lookup VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 12
13 Miss Classification Cold misses are unavoidable Capacity, conflict, and coherence misses can be reduced VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 13
14 Super-Scalar Execution (pipelining) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 14
15 >ILP: Superscalar Out-Of-Order DB: only 1.5x faster than inorder [KPH98,RGA98] Limited ILP opportunity VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 15
16 >>ILP: Branch Prediction DB programs: long code paths => mispredictions VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 16
17 Database Workloads on UniProcs DB apps heavily under-utilize hardware VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 17
18 Paralellism: Multiple Threads l Simultaneous Multithreading (SMT) l Pursue multiple threads in parallel within a processor pipeline l Store multiple contexts in different register sets l Share function units between threads l Fast (hardware) context switching amongst threads l Chip Multiprocessors (CMPs) l >1 complete processors on a chip l Every functional unit of a processor is replicated VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 18
19 MultiThreading Compared RS64-IV (IBM) Cache miss Cache miss Cache miss UltraSparc T1 (Sun) POWER5 (IBM) Speedup: OLTP 3x, DSS 1.5x [LBE98] VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 19
20 The Case for CMPs l Getting diminishing returns l From a single core, though powerful (OoO, superscalar, multithreaded) l n-core CMP outperforms n-thread SMT l CMPs offer productivity advantages l Moore s law: 2x transistors every 18 months l More, not faster Will you still Exponentially need me, Will more you cores still feed me, when Beatles I m 64? VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 20
21 A Chip Multiprocessor Main Memory VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 21
22 Current CMP Technology IBM Power 6 Sun Niagara T2 AMD Barcelona Intel Nehalem VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 22
23 Summary: Trends & DB workloads l Hardware: continuously evolving l Superscalar OoO SMT CMP l Processor/memory speed gap: growing l Software: one processor does not fit all l At most 50% CPU utilization l Heavy reuse vs. sequential scan vs random access loops l Opportunities for Architectural Study l On real conventional processors l On simulators (hard to find/build, slow) l On co-processors: NPUs and GPUs VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 23
24 DB execution time breakdown At least 50% cycles on stalls Memory is major bottleneck VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 24
25 DSS/OLTP Basics: Memory Bottlenecks: data in L2, instructions in L1 Random access (OLTP): L1I-bound VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 25
26 Why not increase L1-Ins sizes? L1I size is stable L2 size increase: Effect on performance? VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 26
27 Increasing L2 Cache Size Larger L2: trade-off for OLTP VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 27
28 Summary: workload analysis l Database workloads: more than 50% stalls l Mostly due to memory delays l Cannot always reduce stalls by increasing cache size l Crucial Bottlenecks l Data accesses to the L2 cache (especially DSS) l Instruction accesses to the L1 cache (especially OLTP) Goal 1: Eliminate unnecessary misses Goal 2: Hide latency of cold misses VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 28
29 Classic Data Layout on Disk Pages VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 29
30 NSM In Memory Hierarchy VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 30
31 Decomposition Storage Model (DSM) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 31
32 Decomposition Storage Model (DSM) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 32
33 DSM In Memory Hierarchy VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 33
34 Partition Attributes Across (PAX) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 34
35 PAX In Memory Hierarchy VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 35
36 PAX Performance Results VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 36
37 Summary (no replication) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 37
38 Clotho: memory stores PAX minipages New buffer pool manager handles sharing VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 38
39 Fractured Mirrors VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 39
40 What about the rest of misses? Prefetching hides cache miss latency Efficiently used in pointer-chasing lookups! VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 40
41 > Prefetching B+ Trees >2x better search AND update performance Approach complementary to CSB+-trees! VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 41
42 >> Prefetching B+ Trees pb+-trees: 8X speedup over B+-trees Fractal pb+-trees: 80% faster in-mem search VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 42
43 Bulk Lookups 2x speedup with enough concurrency VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 43
44 Hiding Latencies: Summary Lots more to be done in this area - consider interference and scarce resources VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 44
45 Cache-Conscious Joins Radix-Cluster l Radix-Partitioned Hash Join l create partitions << CPU cache l small partitions Ł many partitions l many partitions Ł multiple passes needed Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 02 (all Manegold, Boncz, Kersten) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 45
46 l Radix-Partitioned Hash Join l create partitions << CPU cache l small partitions Ł many partitions l many partitions Ł multiple passes needed l Radix-Cluster Cache-Conscious Joins Radix-Cluster l Radix-Sort with early stopping l Each pass looks at Bi higher-most radix bits l Splitting each input cluster into 2 Bi output clusters l leaves relation partially ordered Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 02 (all Manegold, Boncz, Kersten) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 46
47 Cache-Conscious Joins Radix-Cluster l Multiple clustering passes l Limit number of clusters per pass l Avoid cache/tlb trashing l Trade memory cost for CPU cost l Any data type (hashing) Database Architecture Optimized for the New Bottleneck: Memory Access VLDB 99 Generic Database Cost Models for Hierarchical Memory Systems, VLDB 02 (all Manegold, Boncz, Kersten) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 47
48 Cache-Conscious Joins Accurate Cache Miss Cost Modeling VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 48
49 Cache-Conscious Joins Partitioned Hash Join VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 49
50 Cache-Conscious Joins Radix-Clustered Hash-Join: overall perf (64,000,000 tuples) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 50
51 Cache-Conscious Joins Pre-Projection vs. Post-Projection VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 51
52 Cache-Conscious Joins Pre-Projection vs. Post-Projection l Radix-Decluster = cache-conscious post-projection VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 52
53 Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 l Partitioned Hash-Join Join Index! Join Indices Valduriez, TODS 87 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 53
54 l Partitioned Hash-Join l Cluster on Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 54
55 l Partitioned Hash-Join l Cluster on Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB N-1 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 55
56 l Partitioned Hash-Join l Cluster on Left l Project Left Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 56
57 Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 l Partitioned Hash-Join l Cluster on Left l Project Left l Cluster on Right Destination in final result Good locality for fetch VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 57
58 l Partitioned Hash-Join l Cluster on Left l Project Left l Cluster on Right l Project Right Cache-Conscious Joins Post-Projection Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 Ł Radix-Decluster VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 58
59 l Red column forms a dense domain l {0,1,2,,N-1} l All subsequences are ordered l e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 l Task: merge subsequences into dense sequence VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 59
60 l Red column forms a dense domain l {0,1,2,,N-1} l All subsequences are ordered l e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 l Task: merge subsequences into dense sequence l Approach 1: merge H=2 B lists l L cost O(N log(h)) l many (H) cursors needed Ł cache thrashing VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 60
61 l Red column forms a dense domain l {0,1,2,,N-1} l All subsequences are ordered l e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 l Task: merge subsequences into dense sequence l Approach 1: merge H=2 B lists l L cost O(N log(h)), l many (H) cursors needed Ł cache thrashing l Approach 2: insert by position l L many (H) sparse passes over the result Ł no cache reuse VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 61
62 l Red column forms a dense domain l {0,1,2,,N-1} l All subsequences are ordered l e.g. Cache-Conscious Joins Radix-Decluster Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 l Task: merge subsequences into dense sequence l Approach 1: merge H=2 B lists l L cost O(N log(h)) l many (H) cursors needed Ł cache thrashing l Approach 2: insert by position l L many (H) sparse passes over the result Ł no cache reuse l Radix-Decluster: insert by position with sliding window VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 62
63 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 63
64 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 3 clusters VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 64
65 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 destination positions VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 65
66 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 Projection column (still) in wrong order VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 66
67 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 Result column to fill insertion window of size 2 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 67
68 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 68
69 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 69
70 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 70
71 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 71
72 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 72
73 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 73
74 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 74
75 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 75
76 Cache-Conscious Joins Radix-Decluster In Action Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 76
77 Cache-Conscious Joins Radix-Decluster Memory Access Pattern Cache-Conscious Radix-Decluster Projections, Manegold, Boncz, Nes, VLDB 04 n Random access only to sliding window(<< cache size) l Only compulsary misses Ł cache-conscious VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 77
78 Cache-Conscious Joins Radix-Decluster Performance Tradeoff l Radix-Decluster prefers small W (i.e. vertical fragmentation) Read Also: - Jive Join - Slam Join Fast Joins Using Join Indices Ross, Lei, VLDBJ 98 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 78
79 Hash Aggregation Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 79
80 Hash Aggregation Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 80
81 Hash Aggregation Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 81
82 Multithreaded Hash Aggregation Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 82
83 Option 1: Independent Tables Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 83
84 Option 2: Global Tables With Mutex(es) Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 84
85 Option 3: Global Table + Atomic Instructions Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 85
86 Option 4: Hybrid Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 86
87 Performance Results Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 87
88 Dynamic Choice Adaptive Aggregation on Chip Multiprocessors, Cieszlewicz, Ross, VLDB 07 VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 88
89 Query Processing Algorithms Adapt query processing algorithms to caches Related work includes: l Improving data cache performance l Sorting and Hash-Join l Improving Instruction Cache performance l DSS and OLTP applications VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 89
90 DB Operators Using SIMD VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 90
91 DB Operators Using SIMD result[pos] = n; pos += (x[n] < 10) branch predication VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 91
92 Optimizing NSM Hash Join Idea: exploit inter-tuple parallellism VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 92
93 Group Prefetching VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 93
94 Software Pipelining VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 94
95 Prefetched Hash Join: Results Warning: prefetch distance hard to tune, your mileage may vary does not work for DSM VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 95
96 Instruction Related Stalls Impossible to overlap I-cache delays VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 96
97 Call Graph Prefetching Beneficial for predictable DSS Streams VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 97
98 DSS: Reducing I-misses with buffering 12% speedup for simple TPC-H queries VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 98
99 STEPS: Cache-Resident OLTP VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 99
100 Parallelizing Transactions l Intra-transaction parallellism l Used for long-running queries (DSS) l Does not work for short queries l Short queries dominate in OLTP workloads VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 100
101 Parallelizing Transactions l Intra-transaction parallellism l Each thread spans multiple queries l Hard to add to existing systems! Thread Level Speculation (TLS) makes parallelization easier VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 101
102 Thread-Level Speculation (TLS) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 102
103 Thread-Level Speculation (TLS) Data dependencies limit performance VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 103
104 Removing Bottleneck Dependencies 2x lower latency with 4 CPUs Useful for non-tls parallelism as well VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 104
105 Summary: Memory Affinity l Cache-aware data placement l Eliminates unnecessary trips to memory l Minimizes conflict/capacity misses l What about compulsory (cold) misses? l Can t avoid, but can hide latency with prefetching and grouping l Techniques for B-trees, Hash-Joins l Query Processing Algorithms l For sorting, and hash-joins, addressing D- and I-stalls l Low-level instruction cache optimizations l For DSS: call graph prefetching, branch predication l OLTP: STEPS (explicit transaction scheduling) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 105
106 References Workload studies (Simulation) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 106
107 References Workload studies (Real Machines) VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 107
108 References Architecture-Conscious Data Placement VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 108
109 References Architecture Conscious Access Methods VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 109
110 References Architecture-Conscious Query Processing VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 110
111 References Instrustion Stream Optimizations and DBMS Architectures VLDB 2009 Summer School Shanghai Architecture-Conscious Database Techniques 111
Bridging the Processor/Memory Performance Gap in Database Applications
Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE
More informationWeaving Relations for Cache Performance
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems Anastassia Ailamaki Ph.D. Examination November 30, 2000 A DBMS on a 1980 Computer DBMS Execution PROCESSOR 10 cycles/instruction DBMS Data and Instructions 6 cycles
More informationSTEPS Towards Cache-Resident Transaction Processing
STEPS Towards Cache-Resident Transaction Processing Stavros Harizopoulos joint work with Anastassia Ailamaki VLDB 2004 Carnegie ellon CPI OLTP workloads on modern CPUs 6 4 2 L2-I stalls L2-D stalls L1-I
More informationWeaving Relations for Cache Performance
VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor
More informationWeaving Relations for Cache Performance
Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison Memory Hierarchies PROCESSOR EXECUTION PIPELINE
More informationQuery co-processing on Commodity Processors. Processor Performance / Time. Focus of this tutorial
Query co-processing on Commodity Processors Anastassia Ailamaki Carnegie Mellon University Naga K. Govindaraju Dinesh Manocha University of North Carolina at Chapel Hill Processor Performance / Time Performance
More informationSandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.
Cooperative Scans: Dynamic Bandwidth Sharing in a DBMS Marcin Zukowski Sandor Heman, Niels Nes, Peter Boncz CWI, Amsterdam VLDB 2007 Outline Scans in a DBMS Cooperative Scans Benchmarks DSM version VLDB,
More informationEECS 470. Lecture 18. Simultaneous Multithreading. Fall 2018 Jon Beaumont
Lecture 18 Simultaneous Multithreading Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Slides developed in part by Profs. Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi,
More informationDatenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016
Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List
More informationCOSC 6385 Computer Architecture - Thread Level Parallelism (I)
COSC 6385 Computer Architecture - Thread Level Parallelism (I) Edgar Gabriel Spring 2014 Long-term trend on the number of transistor per integrated circuit Number of transistors double every ~18 month
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationWalking Four Machines by the Shore
Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationWHY PARALLEL PROCESSING? (CE-401)
PARALLEL PROCESSING (CE-401) COURSE INFORMATION 2 + 1 credits (60 marks theory, 40 marks lab) Labs introduced for second time in PP history of SSUET Theory marks breakup: Midterm Exam: 15 marks Assignment:
More informationTDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading
Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5
More informationPerformance in the Multicore Era
Performance in the Multicore Era Gustavo Alonso Systems Group -- ETH Zurich, Switzerland Systems Group Enterprise Computing Center Performance in the multicore era 2 BACKGROUND - SWISSBOX SwissBox: An
More informationMultithreaded Processors. Department of Electrical Engineering Stanford University
Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationCS4230 Parallel Programming. Lecture 3: Introduction to Parallel Architectures 8/28/12. Homework 1: Parallel Programming Basics
CS4230 Parallel Programming Lecture 3: Introduction to Parallel Architectures Mary Hall August 28, 2012 Homework 1: Parallel Programming Basics Due before class, Thursday, August 30 Turn in electronically
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationModule 18: "TLP on Chip: HT/SMT and CMP" Lecture 39: "Simultaneous Multithreading and Chip-multiprocessing" TLP on Chip: HT/SMT and CMP SMT
TLP on Chip: HT/SMT and CMP SMT Multi-threading Problems of SMT CMP Why CMP? Moore s law Power consumption? Clustered arch. ABCs of CMP Shared cache design Hierarchical MP file:///e /parallel_com_arch/lecture39/39_1.htm[6/13/2012
More informationclass 9 fast scans 1.0 prof. Stratos Idreos
class 9 fast scans 1.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ 1 pass to merge into 8 sorted pages (2N pages) 1 pass to merge into 4 sorted pages (2N pages) 1 pass to merge into
More informationROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define
More informationHyperthreading Technology
Hyperthreading Technology Aleksandar Milenkovic Electrical and Computer Engineering Department University of Alabama in Huntsville milenka@ece.uah.edu www.ece.uah.edu/~milenka/ Outline What is hyperthreading?
More informationDEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING Year & Semester : III/VI Section : CSE-1 & CSE-2 Subject Code : CS2354 Subject Name : Advanced Computer Architecture Degree & Branch : B.E C.S.E. UNIT-1 1.
More informationPerformance Issues and Query Optimization in Monet
Performance Issues and Query Optimization in Monet Stefan Manegold Stefan.Manegold@cwi.nl 1 Contents Modern Computer Architecture: CPU & Memory system Consequences for DBMS - Data structures: vertical
More informationAccelerating Foreign-Key Joins using Asymmetric Memory Channels
Accelerating Foreign-Key Joins using Asymmetric Memory Channels Holger Pirk Stefan Manegold Martin Kersten holger@cwi.nl manegold@cwi.nl mk@cwi.nl Why? Trivia: Joins are important But: Many Joins are (Indexed)
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMotivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism
Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the
More informationExploring different level of parallelism Instruction-level parallelism (ILP): how many of the operations/instructions in a computer program can be performed simultaneously 1. e = a + b 2. f = c + d 3.
More informationKaisen Lin and Michael Conley
Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC
More informationComputer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13
Computer Architecture: Multi-Core Processors: Why? Onur Mutlu & Seth Copen Goldstein Carnegie Mellon University 9/11/13 Moore s Law Moore, Cramming more components onto integrated circuits, Electronics,
More informationMemory Hierarchies && The New Bottleneck == Cache Conscious Data Access. Martin Grund
Memory Hierarchies && The New Bottleneck == Cache Conscious Data Access Martin Grund Agenda Key Question: What is the memory hierarchy and how to exploit it? What to take home How is computer memory organized.
More informationImproving Database Performance on Simultaneous Multithreading Processors
Tech Report CUCS-7-5 Improving Database Performance on Simultaneous Multithreading Processors Jingren Zhou Microsoft Research jrzhou@microsoft.com John Cieslewicz Columbia University johnc@cs.columbia.edu
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing
More informationOutline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.
Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions
More informationLecture 13: March 25
CISC 879 Software Support for Multicore Architectures Spring 2007 Lecture 13: March 25 Lecturer: John Cavazos Scribe: Ying Yu 13.1. Bryan Youse-Optimization of Sparse Matrix-Vector Multiplication on Emerging
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationcomplex plans and hybrid layouts
class 7 complex plans and hybrid layouts prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ essential column-stores features virtual ids late tuple reconstruction (if ever) vectorized execution
More informationComputer Architecture
Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 19: Multiprocessing Shuai Wang Department of Computer Science and Technology Nanjing University [Slides adapted from CSE 502 Stony Brook University] Getting More
More informationOutline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??
Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross
More informationHardware Acceleration for Database Systems using Content Addressable Memories
Hardware Acceleration for Database Systems using Content Addressable Memories Nagender Bandi, Sam Schneider, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara Overview The Memory
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationParallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism
Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large
More informationComputer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors
Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture Lecture 9: Multiprocessors Challenges of Parallel Processing First challenge is % of program inherently
More informationComputer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University Moore s Law Moore, Cramming more components onto integrated circuits, Electronics, 1965. 2 3 Multi-Core Idea:
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Multi-{Socket,,Thread} Getting More Performance Keep pushing IPC and/or frequenecy Design complexity (time to market) Cooling (cost) Power delivery (cost) Possible, but too
More information10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache
Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is
More informationParallel Architecture. Hwansoo Han
Parallel Architecture Hwansoo Han Performance Curve 2 Unicore Limitations Performance scaling stopped due to: Power Wire delay DRAM latency Limitation in ILP 3 Power Consumption (watts) 4 Wire Delay Range
More informationCourse II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan
Course II Parallel Computer Architecture Week 2-3 by Dr. Putu Harry Gunawan www.phg-simulation-laboratory.com Review Review Review Review Review Review Review Review Review Review Review Review Processor
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationCHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN
CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationComputer Architecture Crash course
Computer Architecture Crash course Frédéric Haziza Department of Computer Systems Uppsala University Summer 2008 Conclusions The multicore era is already here cost of parallelism is dropping
More informationMulti-core processors are here, but how do you resolve data bottlenecks in native code?
Multi-core processors are here, but how do you resolve data bottlenecks in native code? hint: it s all about locality Michael Wall October, 2008 part I of II: System memory 2 PDC 2008 October 2008 Session
More informationMultithreaded Architectures and The Sort Benchmark. Phil Garcia Hank Korth Dept. of Computer Science and Engineering Lehigh University
Multithreaded Architectures and The Sort Benchmark Phil Garcia Hank Korth Dept. of Computer Science and Engineering Lehigh University About our Sort Benchmark Based on the benchmark proposed in A measure
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationDatabase Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:
Database Workload + Low throughput (0.8 IPC on an 8-wide superscalar. 1/4 of SPEC) + Naturally threaded (and widely used) application - Already high cache miss rates on a single-threaded machine (destructive
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationECE 588/688 Advanced Computer Architecture II
ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Fall 2009 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2009 1 When and Where? When:
More informationLecture 14: Multithreading
CS 152 Computer Architecture and Engineering Lecture 14: Multithreading John Wawrzynek Electrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~johnw
More informationMulti-threaded Queries. Intra-Query Parallelism in LLVM
Multi-threaded Queries Intra-Query Parallelism in LLVM Multithreaded Queries Intra-Query Parallelism in LLVM Yang Liu Tianqi Wu Hao Li Interpreted vs Compiled (LLVM) Interpreted vs Compiled (LLVM) Interpreted
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)
CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer
More informationParallel Processing SIMD, Vector and GPU s cont.
Parallel Processing SIMD, Vector and GPU s cont. EECS4201 Fall 2016 York University 1 Multithreading First, we start with multithreading Multithreading is used in GPU s 2 1 Thread Level Parallelism ILP
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationCOURSE 12. Parallel DBMS
COURSE 12 Parallel DBMS 1 Parallel DBMS Most DB research focused on specialized hardware CCD Memory: Non-volatile memory like, but slower than flash memory Bubble Memory: Non-volatile memory like, but
More informationA high performance database kernel for query-intensive applications. Peter Boncz
MonetDB: A high performance database kernel for query-intensive applications Peter Boncz CWI Amsterdam The Netherlands boncz@cwi.nl Contents The Architecture of MonetDB The MIL language with examples Where
More informationTHREAD LEVEL PARALLELISM
THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Homework 4 is due on Dec. 11 th This lecture
More informationCPU Architecture Overview. Varun Sampath CIS 565 Spring 2012
CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations
More informationECE 571 Advanced Microprocessor-Based Design Lecture 4
ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted
More informationLec 25: Parallel Processors. Announcements
Lec 25: Parallel Processors Kavita Bala CS 340, Fall 2008 Computer Science Cornell University PA 3 out Hack n Seek Announcements The goal is to have fun with it Recitations today will talk about it Pizza
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationControl Hazards. Prediction
Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional
More informationCache-Aware Database Systems Internals Chapter 7
Cache-Aware Database Systems Internals Chapter 7 1 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential
More informationAnastasia Ailamaki. Performance and energy analysis using transactional workloads
Performance and energy analysis using transactional workloads Anastasia Ailamaki EPFL and RAW Labs SA students: Danica Porobic, Utku Sirin, and Pinar Tozun Online Transaction Processing $2B+ industry Characteristics:
More informationHash Joins for Multi-core CPUs. Benjamin Wagner
Hash Joins for Multi-core CPUs Benjamin Wagner Joins fundamental operator in query processing variety of different algorithms many papers publishing different results main question: is tuning to modern
More informationEN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)
EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction) Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering
More information45-year CPU Evolution: 1 Law -2 Equations
4004 8086 PowerPC 601 Pentium 4 Prescott 1971 1978 1992 45-year CPU Evolution: 1 Law -2 Equations Daniel Etiemble LRI Université Paris Sud 2004 Xeon X7560 Power9 Nvidia Pascal 2010 2017 2016 Are there
More informationIn-Memory Data Management
In-Memory Data Management Martin Faust Research Assistant Research Group of Prof. Hasso Plattner Hasso Plattner Institute for Software Engineering University of Potsdam Agenda 2 1. Changed Hardware 2.
More informationLecture 1: Introduction
Contemporary Computer Architecture Instruction set architecture Lecture 1: Introduction CprE 581 Computer Systems Architecture, Fall 2016 Reading: Textbook, Ch. 1.1-1.7 Microarchitecture; examples: Pipeline
More informationData Processing on Modern Hardware
Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2016 c Jens Teubner Data Processing on Modern Hardware Summer 2016 1 Part III Instruction
More informationImproving Instruction Cache Performance in OLTP
Improving Instruction Cache Performance in OLTP STAVROS HARIZOPOULOS MIT CSAIL and ANASTASSIA AILAMAKI Carnegie Mellon University Instruction-cache misses account for up to 40% of execution time in Online
More informationCache-Aware Database Systems Internals. Chapter 7
Cache-Aware Database Systems Internals Chapter 7 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential
More informationECE 588/688 Advanced Computer Architecture II
ECE 588/688 Advanced Computer Architecture II Instructor: Alaa Alameldeen alaa@ece.pdx.edu Winter 2018 Portland State University Copyright by Alaa Alameldeen and Haitham Akkary 2018 1 When and Where? When:
More informationMulti-core Architectures. Dr. Yingwu Zhu
Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster
More information