DATABASE CRACKING: Fancy Scan, not Poor Man s Sort! Don. Holger Pirk Eleni Petraki Strato Idreos
|
|
- Elaine Robbins
- 5 years ago
- Views:
Transcription
1 DATABASE CRACKING: Fancy Scan, not Poor Man s Sort! Hardware Folks Cracking Folks Don Holger Pirk Eleni Petraki Strato Idreos Stefan Manegold Martin Kersten
2 EVALUATING RANGE PREDICATES
3 COMPLEXITY ON PAPER Scanning: O(n) Sorting: O(n log(n)) Cracking: O(n) Essentially a single Quicksort-Step
4 COSTS IN REALITY Implement microbenchmarks 1 Billion uniform random integer values Pivot in the middle of the range Workstation machine (16 GB RAM, 4 Sandy Bridge Cores)
5 COSTS IN REALITY Wallclock time in s Parallel Scanning Cracking Parallel Sorting
6 SO: WHAT S GOING ON?
7 CACHE MISSES? 1.5B 1.4B 1.2B 1.0B 800M 600M 400M 200M L1I Misses L1D Misses L2 Misses L3 Misses NOPE! 0.0 Scanning Cracking Sorting
8 CPU COSTS Micro-ops Issued? No Yes Allocation Stall? Micro-op Ever Retire? No Yes No Yes Frontend Bound Backend Bound Bad Speculation " " # Retiring! Cache Miss Stalls Other Stalls
9 CPU COSTS Data Stalls Retiring 1.0 Bad Speculation Pipeline Frontend Pipeline Backend Scanning Cracking Sorting
10 CPU COSTS Data Stalls Retiring Bad Speculation Pipeline Frontend Pipeline Backend 14 %!!! Scanning Cracking Sorting
11 CPU COSTS Data Stalls Retiring 1.0 Bad Speculation Pipeline Frontend Pipeline Backend Lots of Potential Scanning Cracking Sorting
12 WHAT CAN WE DO ABOUT IT?
13 INCREASING CPU EFFICIENCY
14 PREDICATION for(i=0; i<size; i++)! if(input[i] < pivot) {! output[outi] = input[i];! outi++! } for(i=0; i<size; i++)! {! output[outi] = input[i];! outi += (input[i] < pivot);! }
15 PREDICATION Turns control dependencies into data dependencies Eliminates Branch Mispredictions Causes unconditional (potentially unnecessary) I/O (limited to caches) Works only for out-of-place algorithms
16 PREDICATED CRACKING
17 PREDICATED CRACKING pivot 5 active backup
18 PREDICATED CRACKING pivot active backup
19 PREDICATED CRACKING pivot cmp active backup 5? State Before Iteration
20 PREDICATED CRACKING pivot cmp active backup > Evaluate Predicat & Write
21 PREDICATED CRACKING pivot cmp active backup = 1- = Advance Cursor
22 PREDICATED CRACKING pivot cmp active backup * + * Read Next Element
23 PREDICATED CRACKING pivot cmp backup active
24 PREDICATED CRACKING Predication for in-place algorithms No branching No branch mispredictions Somewhat intricate Lots of copying stuff around (integer granularity inefficient) Bulk-copying would be more efficient
25 VECTORIZED CRACKING
26 VECTORIZED CRACKING Turns in-place cracking into out-of-place cracking Copies Vector-sized chunks and cracks them into the array Makes vanilla-predication possible Uses SIMD-copying for vector copying Challenge: ensure that values aren't accidentally" overwritten
27 VECTORIZED CRACKING copy partition copy partition
28 RESULTS
29 RESULTS Data Stalls Retiring 1.0 Bad Speculation Pipeline Frontend Pipeline Backend Vectorized Predicated Original
30 RESULTS: WORKSTATION Wallclock time in s Scan Vectorized Predicated (Register) Predicated (Cache) Original
31 RESULTS: SERVER Wallclock time in s Not there yet! 0.0 Scan Vectorized Predicated (Register) Predicated (Cache) Original
32 PARALLELIZATION
33 PARALLELIZATION Obvious Solution: Partitioning
34 CRACK & MERGE x1 y1x2 y2x3 y3x4 y4 Partition
35 CRACK & MERGE x1 y1x2 y2x3 y3x4 y4 Merge
36 REFINED CRACK & MERGE x1 x2 x3 x4 y4 y3 y2 y1 Partition
37 REFINED CRACK & MERGE x1 x2 x3 x4 y4 y3 y2 y1 Smaller Merge
38 RESULTS: WORKSTATION 1,6 1,2 Seconds 0,8 0,4 0 Scan RVPCrack RPCrack PVCrack PCrack Vectorized
39 RESULTS: SERVER 3,00 2,25 Seconds 1,50 0,75 0,00 Scan RVPCrack RPCrack PVCrack PCrack Vectorized
40 IMPACT OF SELECTIVITY: WORKSTATION Wallclock time in s Vectorized Partition & Merge Vectorized Partition & Merge Refined Partition & Merge Vectorized Refined Partition & Merge Scanning Qualifying Tuples/Pivot
41 IMPACT OF SELECTIVITY: SERVER 2.6 Wallclock time in s Vectorized Partition & Merge Vectorized Partition & Merge Refined Partition & Merge Vectorized Refined Partition & Merge Scanning Qualifying Tuples/Pivot
42 CONCLUSIONS
43
Accelerating Foreign-Key Joins using Asymmetric Memory Channels
Accelerating Foreign-Key Joins using Asymmetric Memory Channels Holger Pirk Stefan Manegold Martin Kersten holger@cwi.nl manegold@cwi.nl mk@cwi.nl Why? Trivia: Joins are important But: Many Joins are (Indexed)
More informationclass 10 fast scans 2.0 prof. Stratos Idreos
class 10 fast scans 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ always want to minimize data movement - computation & utilize all resources! registers on chip cache on board
More informationArchitecture-Conscious Database Systems
Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query
More informationRegister Allocation. Stanford University CS243 Winter 2006 Wei Li 1
Register Allocation Wei Li 1 Register Allocation Introduction Problem Formulation Algorithm 2 Register Allocation Goal Allocation of variables (pseudo-registers) in a procedure to hardware registers Directly
More informationIS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)
IS 709/809: Computational Methods in IS Research Algorithm Analysis (Sorting) Nirmalya Roy Department of Information Systems University of Maryland Baltimore County www.umbc.edu Sorting Problem Given an
More informationHolistic Indexing in Main-memory Column-stores
Holistic Indexing in Main-memory Column-stores Eleni Petraki CWI Amsterdam petraki@cwi.nl Stratos Idreos Harvard University stratos@seas.harvard.edu Stefan Manegold CWI Amsterdam manegold@cwi.nl ABSTRACT
More informationSorting Algorithms. CptS 223 Advanced Data Structures. Larry Holder School of Electrical Engineering and Computer Science Washington State University
Sorting Algorithms CptS 223 Advanced Data Structures Larry Holder School of Electrical Engineering and Computer Science Washington State University 1 QuickSort Divide-and-conquer approach to sorting Like
More informationQuery Processing Models
Query Processing Models Holger Pirk Holger Pirk Query Processing Models 1 / 43 Purpose of this lecture By the end, you should Understand the principles of the different Query Processing Models Be able
More informationSort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot
Sort vs. Hash Join Revisited for Near-Memory Execution Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot 1 Near-Memory Processing (NMP) Emerging technology Stacked memory: A logic die w/ a stack
More informationR & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:
Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationInformation Coding / Computer Graphics, ISY, LiTH
Sorting on GPUs Revisiting some algorithms from lecture 6: Some not-so-good sorting approaches Bitonic sort QuickSort Concurrent kernels and recursion Adapt to parallel algorithms Many sorting algorithms
More informationMemory Management. Goals of Memory Management. Mechanism. Policies
Memory Management Design, Spring 2011 Department of Computer Science Rutgers Sakai: 01:198:416 Sp11 (https://sakai.rutgers.edu) Memory Management Goals of Memory Management Convenient abstraction for programming
More informationUNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.
UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known
More informationLecture 6: Static ILP
Lecture 6: Static ILP Topics: loop analysis, SW pipelining, predication, speculation (Section 2.2, Appendix G) Assignment 2 posted; due in a week 1 Loop Dependences If a loop only has dependences within
More informationPortland State University ECE 587/687. Memory Ordering
Portland State University ECE 587/687 Memory Ordering Copyright by Alaa Alameldeen, Zeshan Chishti and Haitham Akkary 2018 Handling Memory Operations Review pipeline for out of order, superscalar processors
More informationECE 571 Advanced Microprocessor-Based Design Lecture 9
ECE 571 Advanced Microprocessor-Based Design Lecture 9 Vince Weaver http://www.eece.maine.edu/ vweaver vincent.weaver@maine.edu 30 September 2014 Announcements Next homework coming soon 1 Bulldozer Paper
More informationIntel released new technology call P6P
P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new
More informationCS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines
CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationSoftware and Tools for HPE s The Machine Project
Labs Software and Tools for HPE s The Machine Project Scalable Tools Workshop Aug/1 - Aug/4, 2016 Lake Tahoe Milind Chabbi Traditional Computing Paradigm CPU DRAM CPU DRAM CPU-centric computing 2 CPU-Centric
More informationKaisen Lin and Michael Conley
Kaisen Lin and Michael Conley Simultaneous Multithreading Instructions from multiple threads run simultaneously on superscalar processor More instruction fetching and register state Commercialized! DEC
More informationReal Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel
More informationData Structures and Algorithms
Data Structures and Algorithms Autumn 2018-2019 Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Quicksort Outline Sorting Algorithms (contd.) 1 Sorting Algorithms (contd.) Quicksort Quicksort
More informationAdministration CS 412/413. Instruction ordering issues. Simplified architecture model. Examples. Impact of instruction ordering
dministration CS 1/13 Introduction to Compilers and Translators ndrew Myers Cornell University P due in 1 week Optional reading: Muchnick 17 Lecture 30: Instruction scheduling 1 pril 00 1 Impact of instruction
More informationPerformance analysis with Periscope
Performance analysis with Periscope M. Gerndt, V. Petkov, Y. Oleynik, S. Benedict Technische Universität petkovve@in.tum.de March 2010 Outline Motivation Periscope (PSC) Periscope performance analysis
More informationPortland State University ECE 587/687. Memory Ordering
Portland State University ECE 587/687 Memory Ordering Copyright by Alaa Alameldeen and Haitham Akkary 2012 Handling Memory Operations Review pipeline for out of order, superscalar processors To maximize
More informationSeveral Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining
Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the
More informationDatenbanksysteme II: Modern Hardware. Stefan Sprenger November 23, 2016
Datenbanksysteme II: Modern Hardware Stefan Sprenger November 23, 2016 Content of this Lecture Introduction to Modern Hardware CPUs, Cache Hierarchy Branch Prediction SIMD NUMA Cache-Sensitive Skip List
More informationAnastasia Ailamaki. Performance and energy analysis using transactional workloads
Performance and energy analysis using transactional workloads Anastasia Ailamaki EPFL and RAW Labs SA students: Danica Porobic, Utku Sirin, and Pinar Tozun Online Transaction Processing $2B+ industry Characteristics:
More informationParallel Patterns Ezio Bartocci
TECHNISCHE UNIVERSITÄT WIEN Fakultät für Informatik Cyber-Physical Systems Group Parallel Patterns Ezio Bartocci Parallel Patterns Think at a higher level than individual CUDA kernels Specify what to compute,
More informationCode Optimization & Performance. CS528 Serial Code Optimization. Great Reality There s more to performance than asymptotic complexity
CS528 Serial Code Optimization Dept of CSE, IIT Guwahati 1 Code Optimization & Performance Machine independent opt Code motion Reduction in strength Common subexpression Elimination Tuning: Identifying
More informationPS2 out today. Lab 2 out today. Lab 1 due today - how was it?
6.830 Lecture 7 9/25/2017 PS2 out today. Lab 2 out today. Lab 1 due today - how was it? Project Teams Due Wednesday Those of you who don't have groups -- send us email, or hand in a sheet with just your
More informationAlgorithms and Data Structures. Marcin Sydow. Introduction. QuickSort. Sorting 2. Partition. Limit. CountSort. RadixSort. Summary
Sorting 2 Topics covered by this lecture: Stability of Sorting Quick Sort Is it possible to sort faster than with Θ(n log(n)) complexity? Countsort Stability A sorting algorithm is stable if it preserves
More informationPerformance Issues and Query Optimization in Monet
Performance Issues and Query Optimization in Monet Stefan Manegold Stefan.Manegold@cwi.nl 1 Contents Modern Computer Architecture: CPU & Memory system Consequences for DBMS - Data structures: vertical
More informationQuestion And Answer.
Q.1 What is the number of swaps required to sort n elements using selection sort, in the worst case? A. Θ(n) B. Θ(n log n) C. Θ(n2) D. Θ(n2 log n) ANSWER : Option A Θ(n) Note that we
More informationPhoto David Wright STEVEN R. BAGLEY PIPELINES AND ILP
Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks
More informationCS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines
CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines Sreepathi Pai UTCS September 14, 2015 Outline 1 Introduction 2 Out-of-order Scheduling 3 The Intel Haswell
More informationGetting CPI under 1: Outline
CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more
More informationExternal Memory Algorithms and Data Structures. Winter 2004/2005
External Memory Algorithms and Data Structures Winter 2004/2005 Riko Jacob Peter Widmayer Assignments: Yoshio Okamoto EMADS 04/ 05: Course Description Page 1 External Memory Algorithms and Data Structures
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following
More informationChapter 5. Quicksort. Copyright Oliver Serang, 2018 University of Montana Department of Computer Science
Chapter 5 Quicsort Copyright Oliver Serang, 08 University of Montana Department of Computer Science Quicsort is famous because of its ability to sort in-place. I.e., it directly modifies the contents of
More informationHammer Slide: Work- and CPU-efficient Streaming Window Aggregation
Large-Scale Data & Systems Group Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation Georgios Theodorakis, Alexandros Koliousis, Peter Pietzuch, Holger Pirk Large-Scale Data & Systems (LSDS)
More informationPerformance Tuning the OpenEdge Database in The Modern World
Performance Tuning the OpenEdge Database in The Modern World Gus Björklund, Progress Mike Furgal, Bravepoint Performance tuning is not only about software configuration and turning knobs Situation: Your
More informationAnalysis of parallel suffix tree construction
168 Analysis of parallel suffix tree construction Malvika Singh 1 1 (Computer Science, Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar, Gujarat, India. Email: malvikasingh2k@gmail.com)
More informationMaster Informatics Eng.
Advanced Architectures Master Informatics Eng. 207/8 A.J.Proença The Roofline Performance Model (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 207/8 AJProença, Advanced Architectures,
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationWilliam Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors
William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,
More informationOverview of Sorting Algorithms
Unit 7 Sorting s Simple Sorting algorithms Quicksort Improving Quicksort Overview of Sorting s Given a collection of items we want to arrange them in an increasing or decreasing order. You probably have
More informationCSC 273 Data Structures
CSC 273 Data Structures Lecture 6 - Faster Sorting Methods Merge Sort Divides an array into halves Sorts the two halves, Then merges them into one sorted array. The algorithm for merge sort is usually
More informationSorting. Dr. Baldassano Yu s Elite Education
Sorting Dr. Baldassano Yu s Elite Education Last week recap Algorithm: procedure for computing something Data structure: system for keeping track for information optimized for certain actions Good algorithms
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationAssembly Language Programming
Assembly Language Programming Ľudmila Jánošíková Department of Mathematical Methods and Operations Research Faculty of Management Science and Informatics University of Žilina tel.: 421 41 513 4200 Ludmila.Janosikova@fri.uniza.sk
More informationclass 5 column stores 2.0 prof. Stratos Idreos
class 5 column stores 2.0 prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ worth thinking about what just happened? where is my data? email, cloud, social media, can we design systems
More informationCS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.
CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in
More informationCPU Architecture Overview. Varun Sampath CIS 565 Spring 2012
CPU Architecture Overview Varun Sampath CIS 565 Spring 2012 Objectives Performance tricks of a modern CPU Pipelining Branch Prediction Superscalar Out-of-Order (OoO) Execution Memory Hierarchy Vector Operations
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationKey question: how do we pick a good pivot (and what makes a good pivot in the first place)?
More on sorting Mergesort (v2) Quicksort Mergesort in place in action 53 2 44 85 11 67 7 39 14 53 87 11 50 67 2 14 44 53 80 85 87 14 87 80 50 29 72 95 2 44 80 85 7 29 39 72 95 Boxes with same color are
More informationParallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1
Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple
More informationHY425 Lecture 09: Software to exploit ILP
HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 ILP techniques Hardware Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit
More informationCS330. Query Processing
CS330 Query Processing 1 Overview of Query Evaluation Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically implemented using a `pull interface: when an operator is `pulled for
More informationHY425 Lecture 09: Software to exploit ILP
HY425 Lecture 09: Software to exploit ILP Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 4, 2010 Dimitrios S. Nikolopoulos HY425 Lecture 09: Software to exploit ILP 1 / 44 ILP techniques
More informationQuestion 7.11 Show how heapsort processes the input:
Question 7.11 Show how heapsort processes the input: 142, 543, 123, 65, 453, 879, 572, 434, 111, 242, 811, 102. Solution. Step 1 Build the heap. 1.1 Place all the data into a complete binary tree in the
More informationLecture: Static ILP. Topics: predication, speculation (Sections C.5, 3.2)
Lecture: Static ILP Topics: predication, speculation (Sections C.5, 3.2) 1 Scheduled and Unrolled Loop Loop: L.D F0, 0(R1) L.D F6, -8(R1) L.D F10,-16(R1) L.D F14, -24(R1) ADD.D F4, F0, F2 ADD.D F8, F6,
More informationSpring Prof. Hyesoon Kim
Spring 2011 Prof. Hyesoon Kim 2 Warp is the basic unit of execution A group of threads (e.g. 32 threads for the Tesla GPU architecture) Warp Execution Inst 1 Inst 2 Inst 3 Sources ready T T T T One warp
More informationIndex Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search
Index Construction Dictionary, postings, scalable indexing, dynamic indexing Web Search 1 Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing
More informationOutline. Exploiting Program Parallelism. The Hydra Approach. Data Speculation Support for a Chip Multiprocessor (Hydra CMP) HYDRA
CS 258 Parallel Computer Architecture Data Speculation Support for a Chip Multiprocessor (Hydra CMP) Lance Hammond, Mark Willey and Kunle Olukotun Presented: May 7 th, 2008 Ankit Jain Outline The Hydra
More informationSuperscalar Processors
Superscalar Processors Superscalar Processor Multiple Independent Instruction Pipelines; each with multiple stages Instruction-Level Parallelism determine dependencies between nearby instructions o input
More informationCMU Introduction to Computer Architecture, Spring Midterm Exam 2. Date: Wed., 4/17. Legibility & Name (5 Points): Problem 1 (90 Points):
Name: CMU 18-447 Introduction to Computer Architecture, Spring 2013 Midterm Exam 2 Instructions: Date: Wed., 4/17 Legibility & Name (5 Points): Problem 1 (90 Points): Problem 2 (35 Points): Problem 3 (35
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Instruction Commit The End of the Road (um Pipe) Commit is typically the last stage of the pipeline Anything an insn. does at this point is irrevocable Only actions following
More informationOverview of Query Evaluation. Overview of Query Evaluation
Overview of Query Evaluation Chapter 12 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of Query Evaluation v Plan: Tree of R.A. ops, with choice of alg for each op. Each operator
More informationMultithreaded Architectures and The Sort Benchmark. Phil Garcia Hank Korth Dept. of Computer Science and Engineering Lehigh University
Multithreaded Architectures and The Sort Benchmark Phil Garcia Hank Korth Dept. of Computer Science and Engineering Lehigh University About our Sort Benchmark Based on the benchmark proposed in A measure
More informationComputer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling)
18-447 Computer Architecture Lecture 12: Out-of-Order Execution (Dynamic Instruction Scheduling) Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 2/13/2015 Agenda for Today & Next Few Lectures
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 3. Instruction-Level Parallelism and Its Exploitation
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation Introduction Pipelining become universal technique in 1985 Overlaps execution of
More informationRed Fox: An Execution Environment for Relational Query Processing on GPUs
Red Fox: An Execution Environment for Relational Query Processing on GPUs Haicheng Wu 1, Gregory Diamos 2, Tim Sheard 3, Molham Aref 4, Sean Baxter 2, Michael Garland 2, Sudhakar Yalamanchili 1 1. Georgia
More informationOriginal PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy
Competitors using generic parts Performance benefits to be had for custom design Original PlayStation: no vector processing or floating point support Geometry issues Photorealism at the core of design
More informationOptimize Data Structures and Memory Access Patterns to Improve Data Locality
Optimize Data Structures and Memory Access Patterns to Improve Data Locality Abstract Cache is one of the most important resources
More informationNumerical Simulation on the GPU
Numerical Simulation on the GPU Roadmap Part 1: GPU architecture and programming concepts Part 2: An introduction to GPU programming using CUDA Part 3: Numerical simulation techniques (grid and particle
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 3 Instruction-Level Parallelism and Its Exploitation 1 Branch Prediction Basic 2-bit predictor: For each branch: Predict taken or not
More informationReading for this lecture (Goodrich and Tamassia):
COMP26120: Algorithms and Imperative Programming Basic sorting algorithms Ian Pratt-Hartmann Room KB2.38: email: ipratt@cs.man.ac.uk 2017 18 Reading for this lecture (Goodrich and Tamassia): Secs. 8.1,
More informationFirst Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors
First Swedish Workshop on Multi-Core Computing MCC 2008 Ronneby: On Sorting and Load Balancing on Graphics Processors Daniel Cederman and Philippas Tsigas Distributed Computing Systems Chalmers University
More informationCS240 Fall Mike Lam, Professor. Quick Sort
??!!!!! CS240 Fall 2015 Mike Lam, Professor Quick Sort Merge Sort Merge sort Sort sublists (divide & conquer) Merge sorted sublists (combine) All the "hard work" is done after recursing Hard to do "in-place"
More informationCartoon parallel architectures; CPUs and GPUs
Cartoon parallel architectures; CPUs and GPUs CSE 6230, Fall 2014 Th Sep 11! Thanks to Jee Choi (a senior PhD student) for a big assist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ~ socket 14 ~ core 14 ~ HWMT+SIMD
More informationEvaluation of relational operations
Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book
More informationHardware-Based Speculation
Hardware-Based Speculation Execute instructions along predicted execution paths but only commit the results if prediction was correct Instruction commit: allowing an instruction to update the register
More informationclass 17 updates prof. Stratos Idreos
class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects
More informationL14 Quicksort and Performance Optimization
L14 Quicksort and Performance Optimization Alice E. Fischer Fall 2018 Alice E. Fischer L4 Quicksort... 1/12 Fall 2018 1 / 12 Outline 1 The Quicksort Strategy 2 Diagrams 3 Code Alice E. Fischer L4 Quicksort...
More informationOverview of Implementing Relational Operators and Query Evaluation
Overview of Implementing Relational Operators and Query Evaluation Chapter 12 Motivation: Evaluating Queries The same query can be evaluated in different ways. The evaluation strategy (plan) can make orders
More informationDynamic Control Hazard Avoidance
Dynamic Control Hazard Avoidance Consider Effects of Increasing the ILP Control dependencies rapidly become the limiting factor they tend to not get optimized by the compiler more instructions/sec ==>
More informationLECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS
Department of Computer Science University of Babylon LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS By Faculty of Science for Women( SCIW), University of Babylon, Iraq Samaher@uobabylon.edu.iq
More informationJackson Marusarz Intel Corporation
Jackson Marusarz Intel Corporation Intel VTune Amplifier Quick Introduction Get the Data You Need Hotspot (Statistical call tree), Call counts (Statistical) Thread Profiling Concurrency and Lock & Waits
More informationOverview of Query Evaluation. Chapter 12
Overview of Query Evaluation Chapter 12 1 Outline Query Optimization Overview Algorithm for Relational Operations 2 Overview of Query Evaluation DBMS keeps descriptive data in system catalogs. SQL queries
More informationWrite only as much as necessary. Be brief!
1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached
More informationProfiling: Understand Your Application
Profiling: Understand Your Application Michal Merta michal.merta@vsb.cz 1st of March 2018 Agenda Hardware events based sampling Some fundamental bottlenecks Overview of profiling tools perf tools Intel
More informationA Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting
A Sophomoric Introduction to Shared-Memory Parallelism and Concurrency Lecture 3 Parallel Prefix, Pack, and Sorting Steve Wolfman, based on work by Dan Grossman LICENSE: This file is licensed under a Creative
More informationPart XVII. Staircase Join Tree-Aware Relational (X)Query Processing. Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440
Part XVII Staircase Join Tree-Aware Relational (X)Query Processing Torsten Grust (WSI) Database-Supported XML Processors Winter 2008/09 440 Outline of this part 1 XPath Accelerator Tree aware relational
More informationMain Memory and the CPU Cache
Main Memory and the CPU Cache CPU cache Unrolled linked lists B Trees Our model of main memory and the cost of CPU operations has been intentionally simplistic The major focus has been on determining
More informationFaster Sorting Methods
Faster Sorting Methods Chapter 9 Contents Merge Sort Merging Arrays Recursive Merge Sort The Efficiency of Merge Sort Iterative Merge Sort Merge Sort in the Java Class Library Contents Quick Sort The Efficiency
More informationOverview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages
Overview of Query Processing Query Parser Query Processor Evaluation of Relational Operations Query Rewriter Query Optimizer Query Executor Yanlei Diao UMass Amherst Lock Manager Access Methods (Buffer
More informationEXTERNAL SORTING. CS 564- Spring ACKs: Dan Suciu, Jignesh Patel, AnHai Doan
EXTERNAL SORTING CS 564- Spring 2018 ACKs: Dan Suciu, Jignesh Patel, AnHai Doan WHAT IS THIS LECTURE ABOUT? I/O aware algorithms for sorting External merge a primitive for sorting External merge-sort basic
More informationComputer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1
Computer Memory Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1 Warm Up public int sum1(int n, int m, int[][] table) { int output = 0; for (int i = 0; i < n; i++) { for (int j = 0; j
More information