AN ELASTIC MULTI-CORE ALLOCATION MECHANISM FOR DATABASE SYSTEMS
|
|
- Kory Weaver
- 5 years ago
- Views:
Transcription
1 1 AN ELASTIC MULTI-CORE ALLOCATION MECHANISM FOR DATABASE SYSTEMS SIMONE DOMINICO 1, JORGE A. MEIRA 2, MARCO A. Z. ALVES 1, EDUARDO C. DE ALMEIDA 1 FEDERAL UNIVERSITY OF PARANÁ, BRAZIL 1, UNIVERSITY OF LUXEMBOURG 2
2 2 OUTLINE Introduction NUMA effect in OLAP PetriNet Multi-Core Allocation Mechanism Evaluation Conclusion
3 INTRODUCTION 3
4 NON-UNIFORM MEMORY ACCESS (NUMA) Avoid memory Interconnect link contention in multicore machines C C1 C C5 Node: Portion of C2 C3 C6 C7 memory and LLC LLC associated processor DRAM DRAM Shared L3 Cache Node Node 1
5 NON-UNIFORM MEMORY ACCESS (NUMA) Avoid memory Interconnect link contention in multicore machines Node: Portion of memory and Local C C1 C2 C3 LLC Remote C C5 C6 C7 LLC associated processor DRAM DRAM Shared L3 Cache Fast local RAM and slow remote RAM Node Node 1
6 5 NON-UNIFORM MEMORY ACCESS (NUMA) Threads may run in any Interconnect link core over time T1 T9 T2 LLC LLC DRAM DRAM Shared L3 Cache Node Node 1
7 5 NON-UNIFORM MEMORY ACCESS (NUMA) Threads may run in any Interconnect link core over time Local Data is mapped at thread startup T1 T9 T2 LLC LLC DRAM DRAM Shared L3 Cache Node Node 1
8 5 NON-UNIFORM MEMORY ACCESS (NUMA) Threads may run in any Interconnect link core over time Local Remote Data is mapped at T1 T2 thread startup T9 Remote access through LLC LLC the interconnection DRAM DRAM Shared L3 Cache Node Node 1
9 5 NON-UNIFORM MEMORY ACCESS (NUMA) Threads may run in any Interconnect link core over time Local Data is mapped at T1 thread startup T2 T9 Remote access through LLC LLC the interconnection DRAM DRAM Shared L3 Cache Migration: load balance, Node Node 1 reduce intercon. usage
10 NUMA EFFECT IN OLAP 6
11 7 PARALLEL QUERY PROCESSING IN NUMA SELECT * FROM A, B WHERE A.ID = B.ID AND B.DATE >= A.ID = B.ID σ B.DATE >= A B
12 7 PARALLEL QUERY PROCESSING IN NUMA SELECT * FROM A, B WHERE A.ID = B.ID AND B.DATE >= for t1 in left.getnext(): buildhashtable(t1) for t2 in right.getnext(): if probe(t2): emit(t1 t2) Parallelism is baked in the query plan at planning time σ for t in child.getnext(): if evalpred(t):emit(t) Batch work is assigned to threads statically (tuples or columns) Ex. VOLCANO and MONETDB A for t in A: emit(t) B for t in B: emit(t) Encapsulation of parallelism in the volcano query processing system G. Graefe Sigmod, 199 NUMA obliviousness through memory mapping M. Gawadi, M. Kersten Damon@Sigmod, 215
13 8 NOT NUMA-FRIENDLY PARALLELISM TPC-H Query 6 on 1GB database -node Quad-Core AMD Opteron 8 Series (6Gb RAM/Node) MonetDB (v )
14 8 NOT NUMA-FRIENDLY PARALLELISM TPC-H Query 6 on 1GB database -node Quad-Core AMD Opteron 8 Series (6Gb RAM/Node) MonetDB (v )
15 9 IMPACT OF REMOTE ACCESS TPC-H Q6: SQL vs. C language (raw performance) OS/C MonetDB Interconnect Traffic MB/s (1^3) Users
16 9 IMPACT OF REMOTE ACCESS TPC-H Q6: SQL vs. C language (raw performance) Dense/C Sparse/C OS/C MonetDB Interconnect Traffic MB/s (1^3) Not NUMA-friendly processing The OS isn t, either Users
17 9 IMPACT OF REMOTE ACCESS TPC-H Q6: SQL vs. C language (raw performance) Dense/C Sparse/C OS/C MonetDB Interconnect Traffic MB/s (1^3) Specific cores are allocated threads pinned to cores Users
18 9 IMPACT OF REMOTE ACCESS SQL query vs. C language (raw performance) GOAL: Dense/C Sparse/C OS/C MonetDB MITIGATE THE DATA MOVEMENT HANDING 8 OUT TO THE OS THE LOCAL OPTIMUM 6 NUMBER OF CORES IN SPECIFIC NODES Interconnect Traffic MB/s (1^3) Users DB is not NUMA-friendly OS isn t, too threads pinned to cores
19 9 IMPACT OF REMOTE ACCESS SQL query vs. C language (raw performance) CHALLENGES: LOCAL 8 OPTIMUM NUMBER OF CORES CPU-CORE ALLOCATION 6 Interconnect Traffic MB/s (1^3) 2 Dense/C Sparse/C OS/C MonetDB Users DB is not NUMA-friendly OS isn t, too threads pinned to cores
20 1 PETRINET MULTI-CORE ALLOCATION MECHANISM
21 11 OVERVIEW OF OUR MULTI-CORE ALLOCATION MECHANISM 2 Monitoring usage 1 Monitoring threads
22 11 OVERVIEW OF OUR MULTI-CORE ALLOCATION MECHANISM 2 Monitoring usage 1 Monitoring threads 3 Priority allocation
23 11 OVERVIEW OF OUR MULTI-CORE ALLOCATION MECHANISM Allocate cores
24 12 LOCAL OPTIMUM NUMBER OF CORES 1 CPU (%) # of Cores 8 CPU (%) # of Cores State Transition
25 12 LOCAL OPTIMUM NUMBER OF CORES 1 CPU (%) # of Cores 8 CPU (%) # of Cores State Transition
26 12 LOCAL OPTIMUM NUMBER OF CORES 1 CPU (%) # of Cores 8 CPU (%) # of Cores State Transition
27 12 LOCAL OPTIMUM NUMBER OF CORES CPU (%) Overloaded state CPU (%) # of Cores # of Cores State Transition Idle state
28 13 PETRINET MULTI-CORE ALLOCATION MECHANISM (DEFINITIONS) Places={Stable, Idle, Overload, Provision, Checks} Transitions={t,..., t7 }, (i.e., conditions) Arcs=<pi, tj > or <tj, pi>, (i.e., I/O functions)
29 1 OVERLOADED STATE Monitoring CPU (%) # of Cores State Transition
30 1 OVERLOADED STATE Overloaded state CPU (%) # of Cores State Transition
31 1 OVERLOADED STATE Overloaded state Adding cores CPU (%) # of Cores Action: allocate cores State Transition
32 15 ALLOCATION MODES Sparse Dense Node Node 1 Node Node Node 2 Node 3 Node 2 Node 3 Time Time allocated next to allocate
33 15 ALLOCATION MODES Sparse Dense Node Node 1 Node Node Node 2 Node 3 Node 2 Node 3 Time Time allocated next to allocate
34 16 ADAPTIVE MODE Node Node 1 Release Priority queue Allocate # of DB Thread Pages + Node 2 Node 3 /proc/[pid]/numa_maps allocated next
35 16 ADAPTIVE MODE Node Node 1 Release Priority queue Allocate # of DB Thread Pages + Node 2 Node 3 /proc/[pid]/numa_maps allocated next
36 16 ADAPTIVE MODE Node Node 1 Release Priority queue Allocate # of DB Thread Pages + Node 2 Node 3 /proc/[pid]/numa_maps allocated next
37 EXPERIMENTS 17
38 18 SETUP TPC-H 1GB and 1GB database up to 256 users -node Quad-Core AMD Opteron 8 Series (6GB RAM/Node), Agr. Bandwidth 1.6 Gb/s, L3 (6MB) Debian Linux 8 Jessie Prototype in C language Database systems: MonetDB (v ) NUMA-Aware Microsoft SQL Server (v217 Dev. RC2)
39 T5 T5 T6 19 IMPACT OF THE MIGRATION OF THREADS (TPC-H QUERY 6, 1 USER, 1 GB DATABASE, MONETDB) T6 T7 T8 T9 T1 T11 T12 T13 T1 T15 T16 T17 T18 T19 T2 Threads Core Number Timestamp (ms) OS/MonetDB Sparse (c) (a) T7 T8 T9 T1 T11 T12 T13 T1 T15 T16 T17 T18 T19 T2 Dense (b) Adaptive (d)
40 2 CONCURRENT EXECUTION OF TPC-H (256 USERS, 1 GB DATABASE, MONETDB/SQLSERVER) Mem TP (GB/s) Mem TP (GB/s) Mem TP (GB/s) Mem TP (GB/s) (a) (b) OS/MonetDB Adaptive/MonetDB OS/SQL Server (c) 3 Adaptive/SQL Server Time (s) (d) S S1 S2 S3 MonetDB SQL Server
41 2 CONCURRENT EXECUTION OF TPC-H Mem TP (GB/s) Mem TP (GB/s) Mem TP (GB/s) Mem TP (GB/s) (256 USERS, 1 GB DATABASE, MONETDB/SQLSERVER) (a) (b) OS/MonetDB Adaptive/MonetDB OS/SQL Server (c) 3 Adaptive/SQL Server Time (s) (d) S S1 S2 S3 Less cores
42 21 NUMA-FRIENDLINESS: INTERCONNECTION RATIO,9 (256 USERS, 1 GB DATABASE, MONETDB, TPC-H) OS/MonetDB Dense Sparse Adaptive HT/IMC Ratio,6,3 Better Q9 Q13 Q19 Q22
43 22 NUMA-FRIENDLINESS: INTERCONNECTION RATIO,3 (256 USERS, 1 GB DATABASE, SQL SERVER, TPC-H) OS/MonetDB Dense Sparse Adaptive HT/IMC Ratio,2,1 Better Q9 Q13 Q19 Q22
44 23 AN ELASTIC MULTI-CORE ALLOCATION MECHANISM FOR DATABASE SYSTEMS CONCLUSIONS
45 2 CONCLUSIONS Abstract model for the allocation of cores Thread scheduling works better with less cores (local optimum number of cores) Less thread migrations with less data movement: up to 3.87x of reduction on traffic ratio Up to 26% on energy savings
46 THANKS! ACKNOWLEDGMENTS: FUNDAÇAO ARAUCARIA, CAPES, CNPQ AND FUNDO NACIONAL DE DESENVOLVIMENTO DA EDUCAÇÃO (FNDE).
47 23 NUMA-FRIENDLINESS: INTERCONNECTION RATIO (MONETDB, 256 USERS, 1 GB DATABASE, TPC-H) Execution Speedup (Adaptive Mode) HT/IMC ratio Q1 Q9 Q8 Q7 Q6 Q5 Q Q3 Q2 Q1 Q11 Q12 Q13 Queries Q22 Q21 Q2 Q19 Q18 Q17 Q16 Q15 Q1 OS/MonetDB Dense Sparse Adaptive
48 23 NUMA-FRIENDLINESS: INTERCONNECTION RATIO (SQL SERVER, 256 USERS, 1 GB DATABASE, TPC-H) Execution Speedup (Adaptive Mode) HT/IMC ratio Q1 Q9 Q8 Q7 Q6 Q5 Q Q3 Q2 Q1 Q11 Q12 Q13 Queries Q22 Q21 Q2 Q19 Q18 Q17 Q16 Q15 Q1 OS/SQL server Dense Sparse Adaptive
49 23 ENERGY SAVINGS (MONETDB, 256 USERS, 1 GB DATABASE, TPC-H) 18 26% 16 Energy (J) Q22 Q21 Q2 Q19 Q18 Q17 Q16 Q15 Q1 Q13 Q12 Q11 Q1 Q9 Q8 Q7 Q6 Q5 Q Q3 Q2 Q1 Q22 Q21 Q2 Q19 Q18 Q17 Q16 Q15 Q1 Q13 Q12 Q11 Q1 Q9 Q8 Q7 Q6 Q5 Q Q3 Q2 Q1 OS scheduler/monetdb Adaptive (a) CPU HT (b)
50 23 ENERGY SAVINGS (SQL SERVER, 256 USERS, 1 GB DATABASE, TPC-H) 7 11% 6 Energy (J) Q22 Q21 Q2 Q19 Q18 Q17 Q16 Q15 Q1 Q13 Q12 Q11 Q1 Q9 Q8 Q7 Q6 Q5 Q Q3 Q2 Q1 Q22 Q21 Q2 Q19 Q18 Q17 Q16 Q15 Q1 Q13 Q12 Q11 Q1 Q9 Q8 Q7 Q6 Q5 Q Q3 Q2 Q1 OS scheduler/sql Server Adaptive CPU HT
51 12 PETRINET MULTI-CORE ALLOCATION MECHANISM (DEFINITION: LOCAL OPTIMUM NUMBER OF CORES) w is the current workload of the database threads nalloc is the number of allocated CPU cores (nalloc ntotal )
52 12 PETRINET MULTI-CORE ALLOCATION MECHANISM (DEFINITION: LOCAL OPTIMUM NUMBER OF CORES) u is the average resource usage of database threads between minimum and maximum thresholds ntotal is the number of available CPU cores p(x) is the performance function, where x assumes the number of CPU cores nalloc or ntotal
Virtual SQL Servers. Actual Performance. 2016
@kleegeek davidklee.net heraflux.com linkedin.com/in/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture Health
More informationE-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems
E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems Rebecca Taft, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. Elmore, Ashraf Aboulnaga, Andrew Pavlo, Michael
More informationdavidklee.net gplus.to/kleegeek linked.com/a/davidaklee
@kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture
More informationParallel In-situ Data Processing with Speculative Loading
Parallel In-situ Data Processing with Speculative Loading Yu Cheng, Florin Rusu University of California, Merced Outline Background Scanraw Operator Speculative Loading Evaluation SAM/BAM Format More than
More informationTales of the Tail Hardware, OS, and Application-level Sources of Tail Latency
Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports and Steven D. Gribble February 2, 2015 1 Introduction What is Tail Latency? What
More informationRack-scale Data Processing System
Rack-scale Data Processing System Jana Giceva, Darko Makreshanski, Claude Barthels, Alessandro Dovis, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich Rack-scale Data Processing
More informationMaximizing Six-Core AMD Opteron Processor Performance with RHEL
Maximizing Six-Core AMD Opteron Processor Performance with RHEL Bhavna Sarathy Red Hat Technical Lead, AMD Sanjay Rao Senior Software Engineer, Red Hat Sept 4, 2009 1 Agenda Six-Core AMD Opteron processor
More informationHyPer-sonic Combined Transaction AND Query Processing
HyPer-sonic Combined Transaction AND Query Processing Thomas Neumann Technische Universität München December 2, 2011 Motivation There are different scenarios for database usage: OLTP: Online Transaction
More informationSRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores
SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores Xiaoning Ding The Ohio State University dingxn@cse.ohiostate.edu
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationMorsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age. Presented by Dennis Grishin
Morsel- Drive Parallelism: A NUMA- Aware Query Evaluation Framework for the Many- Core Age Presented by Dennis Grishin What is the problem? Efficient computation requires distribution of processing between
More informationUsing Transparent Compression to Improve SSD-based I/O Caches
Using Transparent Compression to Improve SSD-based I/O Caches Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationDBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki
DBMS Data Loading: An Analysis on Modern Hardware Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki Data loading: A necessary evil Volume => Expensive 4 zettabytes
More informationParallel In-situ Data Processing Techniques
Parallel In-situ Data Processing Techniques Florin Rusu, Yu Cheng University of California, Merced Outline Background SCANRAW Operator Speculative Loading Evaluation Astronomy: FITS Format Sloan Digital
More informationWhat s New in VMware vsphere 4.1 Performance. VMware vsphere 4.1
What s New in VMware vsphere 4.1 Performance VMware vsphere 4.1 T E C H N I C A L W H I T E P A P E R Table of Contents Scalability enhancements....................................................................
More informationSANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION
SANDPIPER: BLACK-BOX AND GRAY-BOX STRATEGIES FOR VIRTUAL MACHINE MIGRATION Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif * University of Massachusetts Amherst * Intel, Portland Data
More informationNon-uniform memory access (NUMA)
Non-uniform memory access (NUMA) Memory access between processor core to main memory is not uniform. Memory resides in separate regions called NUMA domains. For highest performance, cores should only access
More informationCOEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence
1 COEN-4730 Computer Architecture Lecture 08 Thread Level Parallelism and Coherence Cristinel Ababei Dept. of Electrical and Computer Engineering Marquette University Credits: Slides adapted from presentations
More informationTransparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching
Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National
More informationHyPer-sonic Combined Transaction AND Query Processing
HyPer-sonic Combined Transaction AND Query Processing Thomas Neumann Technische Universität München October 26, 2011 Motivation - OLTP vs. OLAP OLTP and OLAP have very different requirements OLTP high
More informationMARACAS: A Real-Time Multicore VCPU Scheduling Framework
: A Real-Time Framework Computer Science Department Boston University Overview 1 2 3 4 5 6 7 Motivation platforms are gaining popularity in embedded and real-time systems concurrent workload support less
More informationTaming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems
More informationThe Case for Heterogeneous HTAP
The Case for Heterogeneous HTAP Raja Appuswamy, Manos Karpathiotakis, Danica Porobic, and Anastasia Ailamaki Data-Intensive Applications and Systems Lab EPFL 1 HTAP the contract with the hardware Hybrid
More informationMiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces
MiAMI: Multi-Core Aware Processor Affinity for TCP/IP over Multiple Network Interfaces Hye-Churn Jang Hyun-Wook (Jin) Jin Department of Computer Science and Engineering Konkuk University Seoul, Korea {comfact,
More informationThe Power of Batching in the Click Modular Router
The Power of Batching in the Click Modular Router Joongi Kim, Seonggu Huh, Keon Jang, * KyoungSoo Park, Sue Moon Computer Science Dept., KAIST Microsoft Research Cambridge, UK * Electrical Engineering
More informationParallel Algorithm Engineering
Parallel Algorithm Engineering Kenneth S. Bøgh PhD Fellow Based on slides by Darius Sidlauskas Outline Background Current multicore architectures UMA vs NUMA The openmp framework and numa control Examples
More informationDetecting Memory-Boundedness with Hardware Performance Counters
Center for Information Services and High Performance Computing (ZIH) Detecting ory-boundedness with Hardware Performance Counters ICPE, Apr 24th 2017 (daniel.molka@tu-dresden.de) Robert Schöne (robert.schoene@tu-dresden.de)
More informationMoneta: A High-performance Storage Array Architecture for Nextgeneration, Micro 2010
Moneta: A High-performance Storage Array Architecture for Nextgeneration, Non-volatile Memories Micro 2010 NVM-based SSD NVMs are replacing spinning-disks Performance of disks has lagged NAND flash showed
More informationDatabase Replication in Tashkent. CSEP 545 Transaction Processing Sameh Elnikety
Database Replication in Tashkent CSEP 545 Transaction Processing Sameh Elnikety Replication for Performance Expensive Limited scalability DB Replication is Challenging Single database system Large, persistent
More informationSharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment
SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Enterprise Intranet Collaboration Environment This document is provided as-is. Information and views expressed in this document, including
More informationLecture 17. NUMA Architecture and Programming
Lecture 17 NUMA Architecture and Programming Announcements Extended office hours today until 6pm Weds after class? Partitioning and communication in Particle method project 2012 Scott B. Baden /CSE 260/
More informationWorkload Optimized Systems: The Wheel of Reincarnation. Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013
Workload Optimized Systems: The Wheel of Reincarnation Michael Sporer, Netezza Appliance Hardware Architect 21 April 2013 Outline Definition Technology Minicomputers Prime Workstations Apollo Graphics
More informationPerformance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference
The 2017 IEEE International Symposium on Workload Characterization Performance Characterization, Prediction, and Optimization for Heterogeneous Systems with Multi-Level Memory Interference Shin-Ying Lee
More informationNUMA-aware Graph-structured Analytics
NUMA-aware Graph-structured Analytics Kaiyuan Zhang, Rong Chen, Haibo Chen Institute of Parallel and Distributed Systems Shanghai Jiao Tong University, China Big Data Everywhere 00 Million Tweets/day 1.11
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationLecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter
Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)
More informationSharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Social Environment
SharePoint 2010 Technical Case Study: Microsoft SharePoint Server 2010 Social Environment This document is provided as-is. Information and views expressed in this document, including URL and other Internet
More informationImpact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance
Impact of Dell FlexMem Bridge on Microsoft SQL Server Database Performance A Dell Technical White Paper Dell Database Solutions Engineering Jisha J Leena Basanthi October 2010 THIS WHITE PAPER IS FOR INFORMATIONAL
More informationDream Report Version 4.83
Release Notes Dream Report Version 4.83 User Friendly Programming-Free Reporting for Automation Contents Release Notes Summary of the New Functionality and Enhancements (compared to the original version
More informationContents Overview of the Compression Server White Paper... 5 Business Problem... 7
P6 Professional Compression Server White Paper for On-Premises Version 17 July 2017 Contents Overview of the Compression Server White Paper... 5 Business Problem... 7 P6 Compression Server vs. Citrix...
More informationSix-Core AMD Opteron Processor
What s you should know about the Six-Core AMD Opteron Processor (Codenamed Istanbul ) Six-Core AMD Opteron Processor Versatility Six-Core Opteron processors offer an optimal mix of performance, energy
More informationModel-Driven Geo-Elasticity In Database Clouds
Model-Driven Geo-Elasticity In Database Clouds Tian Guo, Prashant Shenoy College of Information and Computer Sciences University of Massachusetts, Amherst This work is supported by NSF grant 1345300, 1229059
More informationA Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs?
1 A Performance Puzzle: B-Tree Insertions are Slow on SSDs or What Is a Performance Model for SSDs? Bradley C. Kuszmaul MIT CSAIL, & Tokutek 3 iibench - SSD Insert Test 25 2 Rows/Second 15 1 5 2,, 4,,
More informationA Row Buffer Locality-Aware Caching Policy for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
A Row Buffer Locality-Aware Caching Policy for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Overview Emerging memories such as PCM offer higher density than
More informationQuad-core Press Briefing First Quarter Update
Quad-core Press Briefing First Quarter Update AMD Worldwide Server/Workstation Marketing C O N F I D E N T I A L Outstanding Dual-core Performance Toady Average of scores places AMD ahead by 2% Average
More informationMultiprocessor Systems. Chapter 8, 8.1
Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor
More informationReactive design patterns for microservices on multicore
Reactive Software with elegance Reactive design patterns for microservices on multicore Reactive summit - 22/10/18 charly.bechara@tredzone.com Outline Microservices on multicore Reactive Multicore Patterns
More informationOptimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service
Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service * Kshitij Sudan* Sadagopan Srinivasan Rajeev Balasubramonian* Ravi Iyer Executive Summary Goal: Co-schedule N applications
More informationComputer Architecture and OS. EECS678 Lecture 2
Computer Architecture and OS EECS678 Lecture 2 1 Recap What is an OS? An intermediary between users and hardware A program that is always running A resource manager Manage resources efficiently and fairly
More informationOperating System. Chapter 4. Threads. Lynn Choi School of Electrical Engineering
Operating System Chapter 4. Threads Lynn Choi School of Electrical Engineering Process Characteristics Resource ownership Includes a virtual address space (process image) Ownership of resources including
More informationTHE FUTURE OF GPU DATA MANAGEMENT. Michael Wolfe, May 9, 2017
THE FUTURE OF GPU DATA MANAGEMENT Michael Wolfe, May 9, 2017 CPU CACHE Hardware managed What data to cache? Where to store the cached data? What data to evict when the cache fills up? When to store data
More informationMULTI-THREADED QUERIES
15-721 Project 3 Final Presentation MULTI-THREADED QUERIES Wendong Li (wendongl) Lu Zhang (lzhang3) Rui Wang (ruiw1) Project Objective Intra-operator parallelism Use multiple threads in a single executor
More informationNUMA replicated pagecache for Linux
NUMA replicated pagecache for Linux Nick Piggin SuSE Labs January 27, 2008 0-0 Talk outline I will cover the following areas: Give some NUMA background information Introduce some of Linux s NUMA optimisations
More informationMicrosoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage
Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage A Dell Technical White Paper Dell Database Engineering Solutions Anthony Fernandez April 2010 THIS
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationWorkload Characterization and Optimization of TPC-H Queries on Apache Spark
Workload Characterization and Optimization of TPC-H Queries on Apache Spark Tatsuhiro Chiba and Tamiya Onodera IBM Research - Tokyo April. 17-19, 216 IEEE ISPASS 216 @ Uppsala, Sweden Overview IBM Research
More informationMain-Memory Databases 1 / 25
1 / 25 Motivation Hardware trends Huge main memory capacity with complex access characteristics (Caches, NUMA) Many-core CPUs SIMD support in CPUs New CPU features (HTM) Also: Graphic cards, FPGAs, low
More informationData Processing on Modern Hardware
Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2014 c Jens Teubner Data Processing on Modern Hardware Summer 2014 1 Part V Execution on Multiple
More informationZhengyang Liu! Oct 25, Supported by NSF Grant OCI
SDCI Net: Collaborative Research: An integrated study of datacenter networking and 100 GigE wide-area networking in support of distributed scientific computing Zhengyang Liu! Oct 25, 2013 Supported by
More informationOperating System Supports for SCM as Main Memory Systems (Focusing on ibuddy)
2011 NVRAMOS Operating System Supports for SCM as Main Memory Systems (Focusing on ibuddy) 2011. 4. 19 Jongmoo Choi http://embedded.dankook.ac.kr/~choijm Contents Overview Motivation Observations Proposal:
More informationData Transformation and Migration in Polystores
Data Transformation and Migration in Polystores Adam Dziedzic, Aaron Elmore & Michael Stonebraker September 15th, 2016 Agenda Data Migration for Polystores: What & Why? How? Acceleration of physical data
More informationEnabling Cost-effective Data Processing with Smart SSD
Enabling Cost-effective Data Processing with Smart SSD Yangwook Kang, UC Santa Cruz Yang-suk Kee, Samsung Semiconductor Ethan L. Miller, UC Santa Cruz Chanik Park, Samsung Electronics Efficient Use of
More informationCPU Scheduling. Operating Systems (Fall/Winter 2018) Yajin Zhou ( Zhejiang University
Operating Systems (Fall/Winter 2018) CPU Scheduling Yajin Zhou (http://yajin.org) Zhejiang University Acknowledgement: some pages are based on the slides from Zhi Wang(fsu). Review Motivation to use threads
More informationResource-Conscious Scheduling for Energy Efficiency on Multicore Processors
Resource-Conscious Scheduling for Energy Efficiency on Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe
More information[MS10987A]: Performance Tuning and Optimizing SQL Databases
[MS10987A]: Performance Tuning and Optimizing SQL Databases Length : 4 Days Audience(s) : IT Professionals Level : 300 Technology : Microsoft SQL Server Delivery Method : Instructor-led (Classroom) Course
More informationIn-Memory Data Management for Enterprise Applications. BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP)
In-Memory Data Management for Enterprise Applications BigSys 2014, Stuttgart, September 2014 Johannes Wust Hasso Plattner Institute (now with SAP) What is an In-Memory Database? 2 Source: Hector Garcia-Molina
More informationCHOP: Integrating DRAM Caches for CMP Server Platforms. Uliana Navrotska
CHOP: Integrating DRAM Caches for CMP Server Platforms Uliana Navrotska 766785 What I will talk about? Problem: in the era of many-core architecture at the heart of the trouble is the so-called memory
More informationChapter 13: I/O Systems
Chapter 13: I/O Systems Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance 13.2 Silberschatz, Galvin
More informationZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency
ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos, Yannis Klonatos, Manolis Marazakis, Michail D. Flouris, and Angelos Bilas {mcatos,klonatos,maraz,flouris,bilas}@ics.forth.gr
More informationFirst-In-First-Out (FIFO) Algorithm
First-In-First-Out (FIFO) Algorithm Reference string: 7,0,1,2,0,3,0,4,2,3,0,3,0,3,2,1,2,0,1,7,0,1 3 frames (3 pages can be in memory at a time per process) 15 page faults Can vary by reference string:
More informationCourse Outline. Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led
Performance Tuning and Optimizing SQL Databases Course 10987B: 4 days Instructor Led About this course This four-day instructor-led course provides students who manage and maintain SQL Server databases
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationHIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS
HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS Proven Companies and Products Fusion-io Leader in PCIe enterprise flash platforms Accelerates mission-critical applications
More informationAdvanced Topics UNIT 2 PERFORMANCE EVALUATIONS
Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors
More informationHewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE
Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE Digital transformation is taking place in businesses of all sizes Big Data and Analytics Mobility Internet of Things
More informationRow Buffer Locality Aware Caching Policies for Hybrid Memories. HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu
Row Buffer Locality Aware Caching Policies for Hybrid Memories HanBin Yoon Justin Meza Rachata Ausavarungnirun Rachael Harding Onur Mutlu Executive Summary Different memory technologies have different
More informationScaling PortfolioCenter on a Network using 64-Bit Computing
Scaling PortfolioCenter on a Network using 64-Bit Computing Alternate Title: Scaling PortfolioCenter using 64-Bit Servers As your office grows, both in terms of the number of PortfolioCenter users and
More informationMultiprocessor System. Multiprocessor Systems. Bus Based UMA. Types of Multiprocessors (MPs) Cache Consistency. Bus Based UMA. Chapter 8, 8.
Multiprocessor System Multiprocessor Systems Chapter 8, 8.1 We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than
More informationSystem Design for a Million TPS
System Design for a Million TPS Hüsnü Sensoy Global Maksimum Data & Information Technologies Global Maksimum Data & Information Technologies Focused just on large scale data and information problems. Complex
More informationMultiprocessor Systems. COMP s1
Multiprocessor Systems 1 Multiprocessor System We will look at shared-memory multiprocessors More than one processor sharing the same memory A single CPU can only go so fast Use more than one CPU to improve
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationMaking Storage Smarter Jim Williams Martin K. Petersen
Making Storage Smarter Jim Williams Martin K. Petersen Agenda r Background r Examples r Current Work r Future 2 Definition r Storage is made smarter by exchanging information between the application and
More informationComputer Systems Research in the Post-Dennard Scaling Era. Emilio G. Cota Candidacy Exam April 30, 2013
Computer Systems Research in the Post-Dennard Scaling Era Emilio G. Cota Candidacy Exam April 30, 2013 Intel 4004, 1971 1 core, no cache 23K 10um transistors Intel Nehalem EX, 2009 8c, 24MB cache 2.3B
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationCS370 Operating Systems
CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts
More informationThe benefits and costs of writing a POSIX kernel in a high-level language
1 / 38 The benefits and costs of writing a POSIX kernel in a high-level language Cody Cutler, M. Frans Kaashoek, Robert T. Morris MIT CSAIL Should we use high-level languages to build OS kernels? 2 / 38
More informationReal-Time Cache Management for Multi-Core Virtualization
Real-Time Cache Management for Multi-Core Virtualization Hyoseung Kim 1,2 Raj Rajkumar 2 1 University of Riverside, California 2 Carnegie Mellon University Benefits of Multi-Core Processors Consolidation
More informationData Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions
Data Partitioning on Heterogeneous Multicore and Multi-GPU systems Using Functional Performance Models of Data-Parallel Applictions Ziming Zhong Vladimir Rychkov Alexey Lastovetsky Heterogeneous Computing
More informationCSE 544: Principles of Database Systems
CSE 544: Principles of Database Systems Anatomy of a DBMS, Parallel Databases 1 Announcements Lecture on Thursday, May 2nd: Moved to 9am-10:30am, CSE 403 Paper reviews: Anatomy paper was due yesterday;
More informationIntroducing the Cray XMT. Petr Konecny May 4 th 2007
Introducing the Cray XMT Petr Konecny May 4 th 2007 Agenda Origins of the Cray XMT Cray XMT system architecture Cray XT infrastructure Cray Threadstorm processor Shared memory programming model Benefits/drawbacks/solutions
More informationStaged Memory Scheduling
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research June 12 th 2012 Executive Summary Observation:
More informationThe Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith
The Multikernel: A new OS architecture for scalable multicore systems Baumann et al. Presentation: Mark Smith Review Introduction Optimizing the OS based on hardware Processor changes Shared Memory vs
More informationGPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE)
GPU ACCELERATION OF WSMP (WATSON SPARSE MATRIX PACKAGE) NATALIA GIMELSHEIN ANSHUL GUPTA STEVE RENNICH SEID KORIC NVIDIA IBM NVIDIA NCSA WATSON SPARSE MATRIX PACKAGE (WSMP) Cholesky, LDL T, LU factorization
More informationChapter 05. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1
Chapter 05 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 5.1 Basic structure of a centralized shared-memory multiprocessor based on a multicore chip.
More informationSCHEDULING ALGORITHMS FOR MULTICORE SYSTEMS BASED ON APPLICATION CHARACTERISTICS
SCHEDULING ALGORITHMS FOR MULTICORE SYSTEMS BASED ON APPLICATION CHARACTERISTICS 1 JUNG KYU PARK, 2* JAEHO KIM, 3 HEUNG SEOK JEON 1 Department of Digital Media Design and Applications, Seoul Women s University,
More informationSOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG
SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG 1 WHO ARE THOSE GUYS Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling
More informationArchitecture and Performance Implications
VMWARE WHITE PAPER VMware ESX Server 2 Architecture and Performance Implications ESX Server 2 is scalable, high-performance virtualization software that allows consolidation of multiple applications in
More informationChapter 8: Virtual Memory. Operating System Concepts
Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating
More informationLoad-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise)
Load-Sto-Meter: Generating Workloads for Persistent Memory Damini Chopra, Doug Voigt Hewlett Packard (Enterprise) Application vs. Pure Workloads Benchmarks that reproduce application workloads Assist in
More information