Overview on Hardware Optimizations for Database Engines

Size: px
Start display at page:

Download "Overview on Hardware Optimizations for Database Engines"

Transcription

1 Overview on Hardware Optimizations for Database Engines Annett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner BTW 2017, Stuttgart, Germany,

2 Interaction DB-Engine and Hardware Applications/Database Engines Well-Known Challenge: Exploit hardware technology by specific data management techniques (indexing, data storage, query & transaction processing) Main Memory CPU Modern Hardware 1e+07 memory (KByte) 1e+06 1e #cores

3 Era of Dark Silicon MOORE S LAW DARK SILICON Number of transistors in a dense integrated circuit doubles approximately every two years. 1e+07 We can no longer power the transistors that Moore is giving us #transistors (x1000) process (nm) 1e+06 1e

4 HW/SW Co-Design for DB-Engines Applications/Database Engines Challenge: HW/SW Co-Design for Database Engines Specialization of Hardware to overcome Dark Silicon Modern Hardware 4

5 Outline HARDWARE FOUNDATION INTELLIGENT DMA CONTROLLER EXTENSIONS FOR PROCESSING ELEMENTS 5

6 Hardware Foundation TOMAHAWK PLATFORM 6

7 Hardware Foundation Zoom In 7

8 Hardware Foundation Zoom In (2) Control-Plane CORE MANAGER (CM) Extended Xtensa-LX5 from Tensilica (now Cadence) 32KB for code 64KB for data PROCESSING ELEMENTS (PE) Xtensa-LX5 from Tensilica (now Cadence) 32KB for code 2x32KB for data on PE APPLICATION CORE (APP) 570T core from Tensilica (now Cadence) Control-Plane 8

9 Outline Control-Plane PART I: EXTENSIONS OF PROCESSING ELEMENTS Control-Plane 9

10 Development Flow int res= (v0 + v1 + v2) >> shift8; DEVELOPMENT OF INSTRUCTION SET EXTENSIONS WITH TENSILICA TOOLS Tensilica Instruction Extension (TIE) language C/TIE compiler Cycle accurate simulator/debugger Processor generator SYNTHESIS OF RTL CODE // shift8 -> internal state int res=add3_shift(v0, v1, v2); Synopsys Design Compiler, PrimeTime PX TSMC CMOS LP 65nm libraries 10

11 Investigated Database Primitives 2014 Bitmap Compression and Processing (AND, OR, XOR) Hashing Sorted Set Operations Primivites WAH PLWAH COMPAX Hash + Lookup Hash + Insert Hash Keys Hash Sampling CityHash32 Merge Sort Intersection Union Difference Sort-Merge Join Sort-Merge Aggregation (SUM) 11

12 General Approach for all Extensions Extended Tensilica LX5 Processor Instruction Set Data Prefetcher Basic RISC Instruction Set Application-Specific Instruction Set Instruction fetch 64 bit Local Instruction Memory Register Files Basic Registers Load-Store Unit bit Local Data Memory 0 Interconnect Application-Specific Registers Application-Specific States Load-Store Unit bit Local Data Memory 1 12

13 Bitmap Primitives BITMAPS ARE A SPECIAL KIND OF INDEX BITMAPS COMPRESSION Table T bit length equals number of tuples bitmap index OID X =0 =1 =2 = b 1 b 2 b 3 b 4 WORD-ALIGNED HYBRID (WAH) CODE Stateless compression Run-length-encoding (RLE) - run of 0 s and 1 s WAH bitmaps contain RLE - compressed fills and - uncompressed literals Bit-wise OR select * from T where X < 2 13

14 Bit-Wise OR on Compressed Bitmaps 32 bit words In hex b FFFFF... WAH b1 Bit-wise OR WAH b2 Literal 0 fill 10<runlength> Literal FFFFF 7FFFFFFF OR OR OR OR C C0001E0 3FE <runlength> 1 fill Literal Literal Logical operations (AND, OR, XOR) on two compressed bitmaps 1) Load WAH word(s) 2) Calculate output (Fill-Fill, Literal-Fill, Literal-Literal) 3) Combine output b2 7FFFFFFF 7FFFFFFF 7C0001E0 3FE

15 C-Code WHILE(XIDX!=XSIZE && YIDX!=YSIZE) { //new X or Y? Calculate new fill count if(xisfill==1 && YisFill==1) { //2 fills } if(xfillwords<yfillwords) min=xfillwords; else min=yfillwords; writefill(comprresultbi,&zidx,x[xidx] Y[Yidx],min); XfillWords-=min; YfillWords-=min; else if((xisfill==1 && YisFill==0) (XisFill==0 && YisFill==1)) { } if(xisfill==1){ XfillWords--; if((x[xidx]&0xc )==0xc ) writefill(comprresultbi, &Zidx, 0xC , 1); else { comprresultbi[zidx]=y[yidx]; Zidx++; } Fill-Fill Literal-Fill if(yisfill==1){ YfillWords--; if((y[yidx]&0xc )==0xc ) } } } else { } writefill(comprresultbi, &Zidx, 0xC , 1); result=x[xidx] Y[Yidx]; else {comprresultbi[zidx]=x[xidx]; Zidx++; } Literal-Literal if((result&0x7fffffff)==0x7fffffff) writefill(comprresultbi, &Zidx, 0xC , 1); else if((result&0x7fffffff)==0) writefill(comprresultbi, &Zidx, 0x , 1); else { comprresultbi[zidx]=x[xidx] Y[Yidx]; Zidx++; } 15

16 Processing with PE Extension Initial Load Load Prepare Store Store Memory 0 Application specific states Preprocessing Application specific states Memory 0 Operation Memory 1 Postprocessing Memory 1 M E M O R Y 0 M E M O R Y FFFFF F C C0001E0 3FE Align to 128-bit lines ldxstream() ldystream() Is word fill or Literal? -> fill -> overwrite input words Perform operation OR v => Write to output stream -> append or overwrite previous word with increased fill counter Proceed to next word (4x) 4 x WAHinst() Buffer result M E M O R Y 0/1 16

17 Bit-Wise OR on Compressed Bitmaps 32 bit words In hex b FFFFF WAH b1 Bit-wise OR WAH b2 b2 Literal 0 fill Literal FFFFF OR OR OR OR C C0001E0 3FE fill Literal Literal 7FFFFFFF 7FFFFFFF 7C0001E0 3FE00000 Code with Extension do{ ldxstream(); ldystream(); WAHinst(); WAHinst(); WAHinst(); } while(wahinst()); 17

18 Many More Extensions Bitmap Compression and Processing (AND, OR, XOR) Hashing Sorted Set Operations Extension Processor WAH PLWAH COMPAX Hash + Lookup Hash + Insert Hash Keys Hash Sampling CityHash32 Merge Sort Intersection Union Difference Sort-Merge Join Sort-Merge Aggregation (SUM) BitiX X X X HASHI X X X X X Titan3D X X X X X X X Tomahawk DBA X X X X X X 18

19 Evaluation REFERENCE PROCESSORS Tomahawk DBA Processor --> Set of different DB-Extensions for WAH-Compression, Hashing, and Sortes-Set Operations Processor Tomahawk without DBA Tomahawk with DBA Description Basic Xtensa LX5 without instruction set extensions, 1 LSU, 32-bit memory interface Set of different DB-Extensions for WAH- Compression, Hashing and Sorted-Set Operations Technology [nm] A total [mm²] f MAX [GHz] P MAX f MAX Comparison Intel i7-6500u Low-power Intel 2-core processor based on Skylake architecture, 4MB L3 cache 14 99*

20 Evaluation - Bitmaps 20

21 Outline PART 2: INTELLIGENT DMA CONTROLLER 21

22 Problem Statement T2 RISC Core T2 RISC Core T2 RISC Core 0 0xCCA 1 0x00B Local Memory t AN Cache APP APP t NA Tensilica 570T Local Memory NoC Local Memory CM CM LX4-ISA_E Local Memory t NMc t McN Memory Controller Synopsys DWC DDR2 Problem: Many round-trips for key lookups t McM t MMc Approach: Teach B-trees to the memory controller 2 0x0FA Memory 3 0x1FD Micron DDR2 SDRAM 4 0xDE1 5 0x0ED 6 0x00E 7 0xD0A t APP 22 22

23 Intelligent Main Memory Controller (idma) Core Core Core 0 0xCC6 1 0x000 Local Memory t NC t CN Cache APP Local Memory NoC Local Memory CM Local Memory t NP t PN Pointer Chaser t PMc t McP Memory Memory Memory Controller Controller Controller Synopsys DWC Synopsis DDR2 t McM t MMc Vision (and first simulations) Intelligent memory controller Is aware of the semantics of memory layout Implements core operations (e.g. lookup) Implementation (no yet in silicon) 0,183mm² PE with 200Mhz 2 0x0F0 Memory 3 0x1FD 4Micron 0xDE1 DDR2 SDRAM 5 0x0ED 6 0x00E 7 0xD0A 23 23

24 First idma Design 24

25 Evaluation using Simulator 25

26 Summary HARDWARE FOUNDATION INTELLIGENT DMA CONTROLLER EXTENSIONS FOR PROCESSING ELEMENTS 26

27 Overview on Hardware Optimizations for Database Engines Annett Ungethüm, Dirk Habich, Tomas Karnagel, Sebastian Haas, Eric Mier, Gerhard Fettweis, Wolfgang Lehner BTW 2017, Stuttgart, Germany,

An MPSoC for Energy-Efficient Database Query Processing

An MPSoC for Energy-Efficient Database Query Processing Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis An MPSoC for Energy-Efficient Database Query Processing TensilicaDay 2016 Sebastian Haas Emil Matúš Gerhard Fettweis 09.02.2016

More information

Overview on Hardware Optimizations for Database Engines

Overview on Hardware Optimizations for Database Engines B. Mitschang et al. (Hrsg.): Datenbanksysteme für Business, Technologie und Web (BTW 2017), Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2017 383 Overview on Hardware Optimizations

More information

HW/SW-Database-CoDesign for Compressed Bitmap Index Processing

HW/SW-Database-CoDesign for Compressed Bitmap Index Processing HW/SW-Database-CoDesign for Compressed Bitmap Index Processing Sebastian Haas, Tomas Karnagel, Oliver Arnold, Erik Laux 1, Benjamin Schlegel 2, Gerhard Fettweis, Wolfgang Lehner Vodafone Chair Mobile Communications

More information

A Database Accelerator for Energy-Efficient Query Processing and Optimization

A Database Accelerator for Energy-Efficient Query Processing and Optimization A Database Accelerator for Energy-Efficient Query Processing and Optimization Sebastian Haas, Oliver Arnold, Stefan Scholze, Sebastian Höppner, Georg Ellguth, Andreas Dixius, Annett Ungethüm, Eric Mier,

More information

Conflict Detection-based Run-Length Encoding AVX-512 CD Instruction Set in Action

Conflict Detection-based Run-Length Encoding AVX-512 CD Instruction Set in Action Conflict Detection-based Run-Length Encoding AVX-512 CD Instruction Set in Action Annett Ungethüm, Johannes Pietrzyk, Patrick Damme, Dirk Habich, Wolfgang Lehner HardBD & Active'18 Workshop in Paris, France

More information

Meet the Walkers! Accelerating Index Traversals for In-Memory Databases"

Meet the Walkers! Accelerating Index Traversals for In-Memory Databases Meet the Walkers! Accelerating Index Traversals for In-Memory Databases Onur Kocberber Boris Grot, Javier Picorel, Babak Falsafi, Kevin Lim, Parthasarathy Ranganathan Our World is Data-Driven! Data resides

More information

Adaptive Query Processing on Prefix Trees Wolfgang Lehner

Adaptive Query Processing on Prefix Trees Wolfgang Lehner Adaptive Query Processing on Prefix Trees Wolfgang Lehner Fachgruppentreffen, 22.11.2012 TU München Prof. Dr.-Ing. Wolfgang Lehner > Challenges for Database Systems Three things are important in the database

More information

Storage hierarchy. Textbook: chapters 11, 12, and 13

Storage hierarchy. Textbook: chapters 11, 12, and 13 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow Very small Small Bigger Very big (KB) (MB) (GB) (TB) Built-in Expensive Cheap Dirt cheap Disks: data is stored on concentric circular

More information

Main-Memory Databases 1 / 25

Main-Memory Databases 1 / 25 1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low

More information

Column Scan Acceleration in Hybrid CPU-FPGA Systems

Column Scan Acceleration in Hybrid CPU-FPGA Systems Column Scan Acceleration in Hybrid CPU-FPGA Systems ABSTRACT Nusrat Jahan Lisa, Annett Ungethüm, Dirk Habich, Wolfgang Lehner Technische Universität Dresden Database Systems Group Dresden, Germany {firstname.lastname}@tu-dresden.de

More information

Highlights. FP51 (FPGA based 1T 8051 core)

Highlights. FP51 (FPGA based 1T 8051 core) Copyright 2017 PulseRain Technology, LLC. FP51 (FPGA based 1T 8051 core) 10555 Scripps Trl, San Diego, CA 92131 858-877-3485 858-408-9550 http://www.pulserain.com Highlights 1T 8051 Core Intel MCS-51 Compatible

More information

Compression of the Stream Array Data Structure

Compression of the Stream Array Data Structure Compression of the Stream Array Data Structure Radim Bača and Martin Pawlas Department of Computer Science, Technical University of Ostrava Czech Republic {radim.baca,martin.pawlas}@vsb.cz Abstract. In

More information

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation

Lab. Course Goals. Topics. What is VLSI design? What is an integrated circuit? VLSI Design Cycle. VLSI Design Automation Course Goals Lab Understand key components in VLSI designs Become familiar with design tools (Cadence) Understand design flows Understand behavioral, structural, and physical specifications Be able to

More information

Keywords: CRC, CRC-7, cyclic redundancy check, industrial output, PLC, programmable logic controller, C code, CRC generation, microprocessor, switch

Keywords: CRC, CRC-7, cyclic redundancy check, industrial output, PLC, programmable logic controller, C code, CRC generation, microprocessor, switch Keywords: CRC, CRC-7, cyclic redundancy check, industrial output, PLC, programmable logic controller, C code, CRC generation, microprocessor, switch APPLICATION NOTE 6002 CRC PROGRAMMING FOR THE MAX14900E

More information

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14

HICAMP Bitmap. A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 HICAMP Bitmap A Space-Efficient Updatable Bitmap Index for In-Memory Databases! Bo Wang, Heiner Litz, David R. Cheriton Stanford University DAMON 14 Database Indexing Databases use precomputed indexes

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25 Indexing Jan Chomicki University at Buffalo Jan Chomicki () Indexing 1 / 25 Storage hierarchy Cache Main memory Disk Tape Very fast Fast Slower Slow (nanosec) (10 nanosec) (millisec) (sec) Very small Small

More information

FABRICATION TECHNOLOGIES

FABRICATION TECHNOLOGIES FABRICATION TECHNOLOGIES DSP Processor Design Approaches Full custom Standard cell** higher performance lower energy (power) lower per-part cost Gate array* FPGA* Programmable DSP Programmable general

More information

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Configurable s for SOC Design Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Why Listen to This Presentation? Understand how SOC design techniques, now nearly 20 years old, are

More information

PACE: Power-Aware Computing Engines

PACE: Power-Aware Computing Engines PACE: Power-Aware Computing Engines Krste Asanovic Saman Amarasinghe Martin Rinard Computer Architecture Group MIT Laboratory for Computer Science http://www.cag.lcs.mit.edu/ PACE Approach Energy- Conscious

More information

Survey. Motivation 29.5 / 40 class is required

Survey. Motivation 29.5 / 40 class is required Survey Motivation 29.5 / 40 class is required Concerns 6 / 40 not good at examination That s why we have 3 examinations 6 / 40 this class sounds difficult 8 / 40 understand the instructor Want class to

More information

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 12: Query Processing. Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join

More information

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table

Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table Fast IP Routing Lookup with Configurable Processor and Compressed Routing Table H. Michael Ji, and Ranga Srinivasan Tensilica, Inc. 3255-6 Scott Blvd Santa Clara, CA 95054 Abstract--In this paper we examine

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

Adaptable Intelligence The Next Computing Era

Adaptable Intelligence The Next Computing Era Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion

More information

Functioning Hardware from Functional Specifications

Functioning Hardware from Functional Specifications Functioning Hardware from Functional Specifications Stephen A. Edwards Martha A. Kim Richard Townsend Kuangya Zhai Lianne Lairmore Columbia University IBM PL Day, November 18, 2014 Where s my 10 GHz processor?

More information

Data Blocks: Hybrid OLTP and OLAP on compressed storage

Data Blocks: Hybrid OLTP and OLAP on compressed storage Data Blocks: Hybrid OLTP and OLAP on compressed storage Ben Brümmer Technische Universität München Fürstenfeldbruck, 26. November 208 Ben Brümmer 26..8 Lehrstuhl für Datenbanksysteme Problem HDD/Archive/Tape-Storage

More information

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history

More information

TMS320C6678 Memory Access Performance

TMS320C6678 Memory Access Performance Application Report Lit. Number April 2011 TMS320C6678 Memory Access Performance Brighton Feng Communication Infrastructure ABSTRACT The TMS320C6678 has eight C66x cores, runs at 1GHz, each of them has

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

Understanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE

Understanding Sources of Inefficiency in General-Purpose Chips. Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE Understanding Sources of Inefficiency in General-Purpose Chips Hameed, Rehan, et al. PRESENTED BY: XIAOMING GUO SIJIA HE 1 Outline Motivation H.264 Basics Key ideas Implementation & Evaluation Summary

More information

Computer Architecture Spring 2016

Computer Architecture Spring 2016 omputer Architecture Spring 2016 Lecture 09: Prefetching Shuai Wang Department of omputer Science and Technology Nanjing University Prefetching(1/3) Fetch block ahead of demand Target compulsory, capacity,

More information

Introduction to Microprocessor

Introduction to Microprocessor Introduction to Microprocessor Slide 1 Microprocessor A microprocessor is a multipurpose, programmable, clock-driven, register-based electronic device That reads binary instructions from a storage device

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

Chapter 12: Query Processing

Chapter 12: Query Processing Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join

More information

Computer Organization. 8th Edition. Chapter 5 Internal Memory

Computer Organization. 8th Edition. Chapter 5 Internal Memory William Stallings Computer Organization and Architecture 8th Edition Chapter 5 Internal Memory Semiconductor Memory Types Memory Type Category Erasure Write Mechanism Volatility Random-access memory (RAM)

More information

Dynamic Memory Management for Real-Time Multiprocessor System-on-a-Chip

Dynamic Memory Management for Real-Time Multiprocessor System-on-a-Chip Dynamic Memory Management for Real-Time Multiprocessor System-on-a-Chip Mohamed A. Shalan Dissertation Advisor Vincent J. Mooney III School of Electrical and Computer Engineering Agenda Introduction &

More information

Flash Memory Summit 2011

Flash Memory Summit 2011 1 Billion cores Memory Summit 2011 Session 302: Nonvolatile Design Challenges and Methodologies The Processor s role in maximizing performance and reducing energy consumption Neil Robinson Tensilica At

More information

Xtensa. Andrew Mihal 290A Fall 2002

Xtensa. Andrew Mihal 290A Fall 2002 Xtensa Andrew Mihal 290A Fall 2002 1 Outline Introduction Single processor Xtensa system architecture Exporting a programming model for single processor Multiple processor system architecture Exporting

More information

Query Processing Models

Query Processing Models Query Processing Models Holger Pirk Holger Pirk Query Processing Models 1 / 43 Purpose of this lecture By the end, you should Understand the principles of the different Query Processing Models Be able

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems

Query Processing with Indexes. Announcements (February 24) Review. CPS 216 Advanced Database Systems Query Processing with Indexes CPS 216 Advanced Database Systems Announcements (February 24) 2 More reading assignment for next week Buffer management (due next Wednesday) Homework #2 due next Thursday

More information

Walking Four Machines by the Shore

Walking Four Machines by the Shore Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0

More information

In-Memory Data Structures and Databases Jens Krueger

In-Memory Data Structures and Databases Jens Krueger In-Memory Data Structures and Databases Jens Krueger Enterprise Platform and Integration Concepts Hasso Plattner Intitute What to take home from this talk? 2 Answer to the following questions: What makes

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part VII Lecture 15, March 17, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part VI Algorithms for Relational Operations Today s Session: DBMS

More information

Microarchitecture Overview. Performance

Microarchitecture Overview. Performance Microarchitecture Overview Prof. Scott Rixner Duncan Hall 3028 rixner@rice.edu January 18, 2005 Performance 4 Make operations faster Process improvements Circuit improvements Use more transistors to make

More information

An overview of standard cell based digital VLSI design

An overview of standard cell based digital VLSI design An overview of standard cell based digital VLSI design Implementation of the first generation AsAP processor Zhiyi Yu and Tinoosh Mohsenin VCL Laboratory UC Davis Outline Overview of standard cellbased

More information

Column Stores vs. Row Stores How Different Are They Really?

Column Stores vs. Row Stores How Different Are They Really? Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background

More information

Lecture 1: What is a Computer? Lecture for CPSC 2105 Computer Organization by Edward Bosworth, Ph.D.

Lecture 1: What is a Computer? Lecture for CPSC 2105 Computer Organization by Edward Bosworth, Ph.D. Lecture 1: What is a Computer? Lecture for CPSC 2105 Computer Organization by Edward Bosworth, Ph.D. An Older Computer The figure at right is an older computer, called a PDP-11/20. It was designed in the

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

Workload Optimized Systems: The Wheel of Reincarnation. Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013

Workload Optimized Systems: The Wheel of Reincarnation. Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013 Workload Optimized Systems: The Wheel of Reincarnation Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013 Outline Definition Technology Minicomputers Prime Workstations Apollo Graphics

More information

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,

More information

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 15, March 15, 2015 Mohammad Hammoud Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+ Tree) and Hash-based (i.e., Extendible

More information

Jim Keller. Digital Equipment Corp. Hudson MA

Jim Keller. Digital Equipment Corp. Hudson MA Jim Keller Digital Equipment Corp. Hudson MA ! Performance - SPECint95 100 50 21264 30 21164 10 1995 1996 1997 1998 1999 2000 2001 CMOS 5 0.5um CMOS 6 0.35um CMOS 7 0.25um "## Continued Performance Leadership

More information

An Overview of Standard Cell Based Digital VLSI Design

An Overview of Standard Cell Based Digital VLSI Design An Overview of Standard Cell Based Digital VLSI Design With examples taken from the implementation of the 36-core AsAP1 chip and the 1000-core KiloCore chip Zhiyi Yu, Tinoosh Mohsenin, Aaron Stillmaker,

More information

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources

This Unit: Putting It All Together. CIS 501 Computer Architecture. What is Computer Architecture? Sources This Unit: Putting It All Together CIS 501 Computer Architecture Unit 12: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital Circuits

More information

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application

More information

CPU Architecture system clock

CPU Architecture system clock CPU Architecture system clock Memory 64-bit adder Every CPU architecture is implemented using digital logic. In each cycle of the system clock, logic is executed and results are saved. System designers

More information

AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors

AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors Kaixi Hou, Hao Wang, Wu-chun Feng {kaixihou,hwang121,wfeng}@vt.edu Pairwise Sequence Alignment Algorithms

More information

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes B-KD rees for Hardware Accelerated Ray racing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek Saarland University, Germany Outline Previous Work B-KD ree as new Spatial Index Structure DynR

More information

A Hardware Accelerator for Computing an Exact Dot Product. Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović

A Hardware Accelerator for Computing an Exact Dot Product. Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović A Hardware Accelerator for Computing an Exact Dot Product Jack Koenig, David Biancolin, Jonathan Bachrach, Krste Asanović 1 Challenges with Floating Point Addition and multiplication are not associative

More information

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture?

This Unit: Putting It All Together. CIS 371 Computer Organization and Design. Sources. What is Computer Architecture? This Unit: Putting It All Together CIS 371 Computer Organization and Design Unit 15: Putting It All Together: Anatomy of the XBox 360 Game Console Application OS Compiler Firmware CPU I/O Memory Digital

More information

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili

PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on. Blaise-Pascal Tine Sudhakar Yalamanchili PageVault: Securing Off-Chip Memory Using Page-Based Authen?ca?on Blaise-Pascal Tine Sudhakar Yalamanchili Outline Background: Memory Security Motivation Proposed Solution Implementation Evaluation Conclusion

More information

From SQL-query to result Have a look under the hood

From SQL-query to result Have a look under the hood From SQL-query to result Have a look under the hood Classical view on RA: sets Theory of relational databases: table is a set Practice (SQL): a relation is a bag of tuples R π B (R) π B (R) A B 1 1 2

More information

Computer Organization

Computer Organization Objectives 5.1 Chapter 5 Computer Organization Source: Foundations of Computer Science Cengage Learning 5.2 After studying this chapter, students should be able to: List the three subsystems of a computer.

More information

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

Sorting Improves Bitmap Indexes

Sorting Improves Bitmap Indexes Joint work (presented at BDA 08 and DOLAP 08) with Daniel Lemire and Kamel Aouiche, UQAM. December 4, 2008 Database Indexes Databases use precomputed indexes (auxiliary data structures) to speed processing.

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console

Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Computer Architecture Unit 11: Putting it All Together: Anatomy of the XBox 360 Game Console Slides originally developed by Milo Martin & Amir Roth at University of Pennsylvania! Computer Architecture

More information

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot

Sort vs. Hash Join Revisited for Near-Memory Execution. Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot Sort vs. Hash Join Revisited for Near-Memory Execution Nooshin Mirzadeh, Onur Kocberber, Babak Falsafi, Boris Grot 1 Near-Memory Processing (NMP) Emerging technology Stacked memory: A logic die w/ a stack

More information

Evaluation of Relational Operations: Other Techniques

Evaluation of Relational Operations: Other Techniques Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying

More information

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms

More information

101-1 Under-Graduate Project Digital IC Design Flow

101-1 Under-Graduate Project Digital IC Design Flow 101-1 Under-Graduate Project Digital IC Design Flow Speaker: Ming-Chun Hsiao Adviser: Prof. An-Yeu Wu Date: 2012/9/25 ACCESS IC LAB Outline Introduction to Integrated Circuit IC Design Flow Verilog HDL

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser Text Analytics Index-Structures for Information Retrieval Ulf Leser Content of this Lecture Inverted files Storage structures Phrase and proximity search Building and updating the index Using a RDBMS Ulf

More information

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Query Processing. Introduction to Databases CompSci 316 Fall 2017 Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2

More information

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar,

CO403 Advanced Microprocessors IS860 - High Performance Computing for Security. Basavaraj Talawar, CO403 Advanced Microprocessors IS860 - High Performance Computing for Security Basavaraj Talawar, basavaraj@nitk.edu.in Course Syllabus Technology Trends: Transistor Theory. Moore's Law. Delay, Power,

More information

The MARIE Architecture

The MARIE Architecture The MARIE Machine Architecture that is Really Intuitive and Easy. We now define the ISA (Instruction Set Architecture) of the MARIE. This forms the functional specifications for the CPU. Basic specifications

More information

Hardware: Logical View

Hardware: Logical View Hardware: Logical View CPU Memory Bus Disks Net USB Etc. 1 Hardware: Physical View USB I/O controller Storage connections CPU Memory 2 Hardware: 351 View (version 0) instructions? Memory CPU data CPU executes

More information

MOSAID Semiconductor

MOSAID Semiconductor MOSAID Semiconductor Fabr-IC (A Single-Chip Gigabit Ethernet Switch With Integrated Memory) @Hot Chips Dave Brown Chief Architect July 4, 2001 Fabr-IC Feature summary 2 Gig ports 1 gig port for stacking

More information

Chapter 13: Query Processing

Chapter 13: Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Statistics. Duplicate Elimination

Statistics. Duplicate Elimination Query Execution 1. Parse query into a relational algebra expression tree. 2. Optimize relational algebra expression tree and select implementations for the relational algebra operators. (This step involves

More information

//

// ----------------------------------------------------------------------------------- Filename: FixedMath.h ----------------------------------------------------------------------------------- -----------------------------------------------------------------------------------

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

CSCI 2212: Intermediate Programming / C Chapter 15

CSCI 2212: Intermediate Programming / C Chapter 15 ... /34 CSCI 222: Intermediate Programming / C Chapter 5 Alice E. Fischer October 9 and 2, 25 ... 2/34 Outline Integer Representations Binary Integers Integer Types Bit Operations Applying Bit Operations

More information

Technical Information. Command overview of Vision Systems

Technical Information. Command overview of Vision Systems Technical Information Command overview of Vision Systems Image analysis command Grab image 0x01 X X X X Shutter speed 0x07 X X X X Synchronous flash 0x49 X X X X Video mode 0x00 X X Display 0x05 X X X

More information

Bridging the Processor/Memory Performance Gap in Database Applications

Bridging the Processor/Memory Performance Gap in Database Applications Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE

More information

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1 CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1 This exam is closed book, closed notes. All cell phones must be turned off. No calculators may be used. You have two hours to complete

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

Virtual Memory 1. Virtual Memory

Virtual Memory 1. Virtual Memory Virtual Memory 1 Virtual Memory key concepts virtual memory, physical memory, address translation, MMU, TLB, relocation, paging, segmentation, executable file, swapping, page fault, locality, page replacement

More information

Computer System Architecture Midterm Examination Spring 2002

Computer System Architecture Midterm Examination Spring 2002 Computer System Architecture 6.823 Midterm Examination Spring 2002 Name: This is an open book, open notes exam. 110 Minutes 1 Pages Notes: Not all questions are of equal difficulty, so look over the entire

More information

Near Memory Computing Spectral and Sparse Accelerators

Near Memory Computing Spectral and Sparse Accelerators Near Memory Computing Spectral and Sparse Accelerators Franz Franchetti ECE, Carnegie Mellon University www.ece.cmu.edu/~franzf Co-Founder, SpiralGen www.spiralgen.com The work was sponsored by Defense

More information

Virtual Memory 1. Virtual Memory

Virtual Memory 1. Virtual Memory Virtual Memory 1 Virtual Memory key concepts virtual memory, physical memory, address translation, MMU, TLB, relocation, paging, segmentation, executable file, swapping, page fault, locality, page replacement

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Parallel Exact Inference on the Cell Broadband Engine Processor

Parallel Exact Inference on the Cell Broadband Engine Processor Parallel Exact Inference on the Cell Broadband Engine Processor Yinglong Xia and Viktor K. Prasanna {yinglonx, prasanna}@usc.edu University of Southern California http://ceng.usc.edu/~prasanna/ SC 08 Overview

More information

MCW Application Notes 24 th February 2017

MCW Application Notes 24 th February 2017 MCW Application Notes 24 th February 2017 www.motorcontrolwarehouse.co.uk Document number MCW-HEDY-001 Revision 0.1 Author Gareth Lloyd Product HEDY HD700 Title Summary HEDY HD700 Modbus Serial Communications

More information

Database Applications (15-415)

Database Applications (15-415) Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+

More information