Prefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor

Size: px
Start display at page:

Download "Prefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor"

Transcription

1 Prefetch Threads for Database Operations on a Simultaneous Multi-threaded Processor Kostas Papadopoulos December 11, 2005 Abstract Simultaneous Multi-threading (SMT) has been developed to increase instruction level parallelism by allowing instructions from a different thread to run during a stall. Inter-thread cache interference, however, might limit the benefit of running multiple independent threads. SMT processors can be utilized in a different model, where a helper thread is used to prefetch cache blocks for the main execution thread. Physical experimentation with low level compiler generated prefetch threads has been tried with mixed results. Memory resident databases spent as much as 50% their time in stalls. Memory prefetching has been shown to have a positive effect in some situations. In this paper we present an experiment with an abstracted database operation which uses a high level synchronized thread mechanism to prefetch memory. The experiment run on an Intel Pentium 4 processor with Hyper-Threading. The focus of the experiment was the L2 cache performance. The results show a substantial decrease of L2 misses as reported by Pentium 4 Performance Monitoring. Additionally, a main/prefetch thread pair can be made to run 15% faster than a single thread, in spite of the high synchronization cost. 1 Introduction Commercial Relational Database Systems have been shown to suffer from memory stalls due to under-utilization of the L2 cache[2][6]. Methods of overcoming this limitation have been proposed which include different data layouts [3], algorithmic optimizations[4], and prefetching[6][7]. SMT processor offer an ideal architecture for memory prefetching as the additional logical processor can be used to run a prefetch thread[5]. Databases have methods of accessing their data which can generally be grouped into indexed or sequential. Indexed methods use complicate structures, usually B-tree+ but also hash based, in order to lookup the relevant tuple. Sequential methods scan the full set of tuples in a database relation and perform an operation, such as selection, projection, aggregation etc. on each tuple. In this paper we present an experiment on a physical processor to measure the impact of a prefetch thread on a database operation. In the experiment, we abstracted a table sequential scan operation as executed by PostgreSQL. The prefetch thread was created and synchronized using a POSIX thread library. The threads were synchronized so as the prefetch thread would always fetch data within a predefined range of memory pages ahead of the main thread. The threads were affined to logical processors on a Linux system. The OS scheduler was otherwise allowed to schedule them as it saw fit. The results show a decrease of the L2 misses by as much as 27 times as reported by Pentium 4. The main/prefetch pair execution times where 15% lower than that of a single thread. The main/prefetch pair execution times where 10% higher than that of a two simultaneous main threads. This must be attributed to the high synchronization cost. The rest of the paper is organized as follows: Section 2 describes the experimental setup and methodology, section 3 shows the results, section 4 shows some related work. The paper ends with some notes on future work in section 5 and the conclusion in section 6. 1

2 2 Experimental Setup and Methodology 2.1 Environment The experiment run on a Pentium 4 processor with Hyper-threading[12]. Pentium 4 has two logical processors sharing the L1 and L2 caches. The processor characteristics are listed in table 1. The processor has Performance Monitoring counters which can report, among other measurements, the L2 and L1 load misses. Processor Intel Pentium 4 SMT Intel Hyper-Threading with two logical CPUs Frequency 2.8 GHz L2 Cache Size 512K L2 Cache Line Size 64B L1 Data Cache Size 32K Table 1: Processor Characteristics The operating system was Linux. Linux recognizes the logical CPU s as real CPU s. Processes, and threads, are scheduled on each of the logical processors. The scheduler also understands sibling CPU s with special consideration in the scheduling algorithm[14]. The Performance Application Programming Interface (PAPI)[13] was used to obtain the Pentium 4 Performance measurements. PAPI allows the recording of the performance events at suitable locations in the program execution at the expense of some small additional coding. Note that PAPI needs a specially built Linux kernel to run. 2.2 Database Operation and Synchronization Algorithm A table sequential scan is the simpler form of selection that the a database must do. In the experiment, we abstracted a table sequential scan operation as executed by PostgreSQL. PostgreSQL, as most database systems, organizes data as to optimize disk I/O. In that, a disk block becomes a memory buffer. Each buffer holds a varying number of tuples, depending on the tuple size. Each buffer holds tuples for only a single relation. Buffers occupy most the memory used by the database system. Memory is generally allocated using one or more shared memory segments. Although a relation might be scanned sequentially, disk blocks are usually scattered in available memory buffers, and a hash-table (or another suitable construct) is used for mapping. PostgreSQL does sequential scans by: 1. Create a lookup key. 2. Lookup the buffer in a hash table 3. Check conditions for each tuple in buffer and return tuples that match 4. go to 1. The above algorithm was created in a small stand-alone program which replicated the design principles detailed above. The memory was allocated in a shared memory segment. A tuple was constructed to hold 5 attributes of 32 bytes in total. The buffer size was set at 4K and each buffer had 127 randomly filled tuples. Buffers were fetched in a predefined random order in order to simulate the scattering of the disk blocks in memory. A simple aggregate operation, count(*), was performed on the tuples by comparing a double numeric value. The prefetch thread was created using the Native POSIX Linux Threads library[10] which is, as the name suggests, a POSIX thread implementation on Linux. Thread synchronization uses POSIX thread primitives which are assumed to be fast under Linux[11]. The prefetch thread executed the same algorithm as the main thread, with the exception that it did not perform the operation. The main and prefetch threads were synchronized on the number of buffers that the prefetch thread can be ahead of the main, called synchronization distance. For a given distance N, the prefetch thread was kept between [-N,2N] buffers ahead of the main. If either thread would fall outside this range, it would sleep waiting for the other to catch-up. At either end of the range [0,N] the threads would signal each other to continue. The synchronization distance was varied during the experiment. Table 2 lists the steps taken by each thread. A more complete listing is given in the Appendix. 2

3 Main Thread wait for Prefetch thread to start wait for the Prefetch to read N buffers for all buffers if Prefetch is behind wake Prefetch if Prefetch more than N behind wait for Prefetch process tuples in current buffer loop wait for Prefetch to finish return matches Prefetch Thread wait for Main thread to start for all buffers if Main is more than N ahead wake Main if Main more than 2N ahead wait for Main loop wait for Main to finish return Table 2: The synchronization of Main and Prefetch threads 3 Experimental Results We run the tests for a single thread, two main threads and a main/prefetch pair. For the main/prefetch pair, we varying the synchronization distance from 2 to 300 buffers. The measurements were taken after a full scan of 5000 buffers (about 20MB of memory). The scan was repeated 1000 times in each case. The measurements are for one logical processor and excludes kernel level events. L2 misses (x 1000) L2 Misses Time (sec) L1 Misses Figure 1: L1 and L2 misses for single thread L1 misses (x 1000) Figure 1 and 2 show the L1 and L2 misses for a single thread and two main threads respectively. The average number of L2 misses are for single thread and for two threads. The two threads test shows interesting patterns in L2 misses. As the threads run, they fall in and out of L2 misses (x 1000) L2 Misses Time (sec) L1 Misses L1 misses (x 1000) Figure 2: L1 and L2 misses for two main threads L2 misses (x 1000) L2 Misses Time (sec) L1 Misses L1 misses (x 1000) Figure 3: L1 and L2 misses for a main/prefetch pair 3

4 L2 misses Synchronization Distance (buffers) Main/Prefetch Thread Pair Single Thread Two Main Threads Figure 4: L2 misses per synchronization distance Real time (sec) Synchronization Distance (buffers) Main/Prefetch Thread Pair Single Thread Reference Two Threads Reference Figure 5: Real time per synchronization distance Synchronization Distance L2 misses Time (secs) Number of Prefetch Waiting Number of Main Waiting Time (secs) Figure 6: Relation of execution time to other parameters synch, periodically thrashing each others L2 cache. None-the-less, the two thread run shows only slight increase, less than 10%, in the average L2 miss count. At the same time the average execution time falls from 8.45s to 6.46s which is a 25% increase. Figure 3 shows the L1 and L2 misses for a main/prefetch pair. The synchronization distance is 6 which gave the best results. It can be seen that the average number of L2 misses falls from about to 920, a 27 times reduction. The reduction is about 29 times better than the two thread run. Figure 4 shows the average L2 misses as the synchronization distance varied. A distance of 6 shows the best results. As the distance increases and the threads are allowed to drift further apart, the number of misses increases to about for 300 buffers. This is still a 2.5 times reduction which is fairly impressive considering how little synchronization there is between the two threads. Figure 5 shows the average real time as the synchronization distance varied. The execution time is compared to the execution of single thread, at 8.45 secs, and average time of two threads, at 6.47 secs. The execution time drops below that of the single thread at 200 buffers and levels at 7.2 secs between 4 and 50 buffers. The pair time is about 10% slower than the average time of the two thread test. The algorithm for main/prefetch pair is an extension of the single thread algorithm with the synchronization logic. So there are two main factors that influence the execution time, the improved L2 utilization and the synchronization cost. Part of the synchronization is the time the main thread spends waiting. We will consider the number of times the thread waits on the condition variable, as well as the similar number for the prefetch thread, as an indication of the synchronization cost. In order to examine these factors, we show in figure 6 the execution time along with the wait counts for main and prefetch threads and the L2 misses. We observe that all quantities rise sharply at below 4. This means that the threads spend considerable time waiting which impacts both the L2 misses and the execution time. Above 5 the L2 misses rises almost linearly, while the wait count for Main drop close to zero and the count for the prefetch thread decreases. The execution time remains constant during this period. This would lead to the conclusion that synchroniza- 4

5 tion cost and L2 improvements play an equal part in the overall performance. Additionally execution gains can be achieved with relatively little synchronization, at least for a dedicated system as the one studied here. 4 Related Work In [1] Kihm et al. examined the problem of interthread cache interference for independent threads on SMT systems. The show that it can limit the benefit of running multiple independent threads. The propose methods so that the operation system scheduler be made aware of these problems and schedule threads accordingly. In [5] Kim et al. use a static analysis in order to identify delinquent loads which are loops in the code where the most cache misses occur. Then they change the compiler in order to generate the helper threads of execution for the identified loops. They use two methods to synchronize the threads, one based on Windows XP system calls and a custom build hardware solution. The synchronization is done in three ways. Static loop based in which the threads synchronize once every loop, Static sample trigger in which the threads synchronize every few iterations of the loops, dynamic trigger in which the generated code monitors the cache behavior and synchronizes accordingly. Their results show same performance gains using these methods of synchronization, with dynamic trigger synchronization being the most promising. The authors also expect better results if the synchronization cost can be lowered. 5 Future Work In this experiment we tested a simple database operation, a sequential scan with a simple aggregate. Only the synchronization distance was examined. Other parameters that might influence the usefulness of this method must be analyzed such as the type of operation performed on each tuple, the synchronization algorithm, the structure of the data and database access method, e.g. index methods, and the relation of the prefetch thread to the cache line size and the hardware prefetch logic of the processor. Tests involving multiple main/prefetch pair must also be carried out as well as a multiple main to single prefetch thread model. Additionally, the synchronization needs to be minimized or eliminated. This can be done by either designing a better synchronization algorithm or by designing explicit operating system support. In the first case, a never block algorithm could give good results if it can maintain relative close distance between the threads. In the second case, the OS can be modified to schedule the main and prefetch threads in the same time slot, in a gang scheduling way, as to eliminate or minimize the need for synchronization. 6 Conclusions Databases are known to suffer from poor cache performance leading to excessive processor stalls. We present here an experiment where a prefetch thread is used to assist a database operation on a SMT processor. The results show dramatic decrease in L2 cache misses, while the overall performance was about 15% better as compared to single thread and 12% worst as compared to two independent threads execution. The performance penalty that must be attributed to the high synchronization cost. Overall the results are promising. The experiment shows that it is possible to improve the L2 cache performance with high level synchronization. References [1] Joshua Kihm, Alex Settle, Andrew Janiszewski, Dan Connors Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors 06/2005 Journal of Instruction-Level Parallelism 7 (2005) 1-28 [2] DBMSs On A Modern Processor: Where Does Time Go, Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, David A. Wood, Proceedings of the 25th VLDB Conference,Edinburgh, Scotland,

6 [3] Weaving Relations for Cache Performance Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, Marios Skounakis [4] Buffering Database Operations for Enhanced Instruction Cache Performance Kenneth A. Ross, Jingren Zhou [5] Dongkeun Kim et all. Physical Experimentation with Prefetching Helper Threads on Intel s Hyper-Threaded Processors International Symposium on Code Generation and Optimization (CGO 04) p. 27 [6] The Memory performance of DSS Commercial Workloads in Shared-Memory Multiprocessors Pedro Trancoso, Josep-L. Larriba- Pey, Zheng Zhang, and Josep Torrellas [7] The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding Pedro Trancoso and Josep Torrellas [8] Lawrence Spracklen, Yuan Chou & Santosh G. Abraham, Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications 2005, Proceedings of the 11th International Symposium on High- Performance Computer Architecture Pages: [9] BRUCE MOMJIAN, PostgreSQL Internals December, 2001, SOFTWARE RESEARCH ASSOCIATES [10] U. Drepper, I. Molnar The Native POSIX Thread Library for Linux February [11] Ulrich Drepper, Futexes Are Tricky December 13, drepper/futex.pdf [12] Intel IA-32 Intel Architecture, Software Developer s Manual Volume 3: System Programming Guide [13] Performance Application Programming Interface at [14] Understanding the Linux CPU Scheduler Josh Aas cpu scheduler.pdf. 6

7 A Appendix We list below a simplification of the scan method of the main thread. As explained above, the prefetch thread follows the same algorithm. Initializations and error checking has been removed for clarity. int scan(){ // wait for prefetch to start pthread_mutex_lock(&mut); main_status=started; while(prefetch_status!=started){ pthread_cond_signal(&go_cond); pthread_cond_wait(&done_cond, &mut); // wait for the first N buffers while(prefetch_buffers_read < prefetch_sync){ pthread_cond_signal(&go_cond); pthread_cond_wait(&done_cond, &mut); main_buffers_read=0; pthread_mutex_unlock(&mut); for(i=0;i<buffers;i++){ // synchronize every N=prefetch_sync buffers if(i%prefetch_sync){ pthread_mutex_lock(&mut); main_buffers_read=i; if(prefetch_buffers_read < main_buffers_read ){ // wake helper thread pthread_cond_signal(&go_cond); if(prefetch_buffers_read < main_buffers_read - prefetch_sync){ // we are too far ahead; wait... pthread_cond_wait(&done_cond, &mut); pthread_mutex_unlock(&mut); int j; for(j=0;j<tuples;j++){ if(relation[order[i]].data[j].net_weight <target_weight){ matches++; 7

8 // make sure a helper thread is not waiting pthread_mutex_lock(&mut); main_status=finished; while(prefetch_status!=finished2){ pthread_cond_signal(&go_cond); pthread_cond_wait(&done_cond, &mut); pthread_mutex_unlock(&mut); return matches; 8

STEPS Towards Cache-Resident Transaction Processing

STEPS Towards Cache-Resident Transaction Processing STEPS Towards Cache-Resident Transaction Processing Stavros Harizopoulos joint work with Anastassia Ailamaki VLDB 2004 Carnegie ellon CPI OLTP workloads on modern CPUs 6 4 2 L2-I stalls L2-D stalls L1-I

More information

Bridging the Processor/Memory Performance Gap in Database Applications

Bridging the Processor/Memory Performance Gap in Database Applications Bridging the Processor/Memory Performance Gap in Database Applications Anastassia Ailamaki Carnegie Mellon http://www.cs.cmu.edu/~natassa Memory Hierarchies PROCESSOR EXECUTION PIPELINE L1 I-CACHE L1 D-CACHE

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance VLDB 2001, Rome, Italy Best Paper Award Weaving Relations for Cache Performance Anastassia Ailamaki David J. DeWitt Mark D. Hill Marios Skounakis Presented by: Ippokratis Pandis Bottleneck in DBMSs Processor

More information

Data Processing on Modern Hardware

Data Processing on Modern Hardware Data Processing on Modern Hardware Jens Teubner, TU Dortmund, DBIS Group jens.teubner@cs.tu-dortmund.de Summer 2014 c Jens Teubner Data Processing on Modern Hardware Summer 2014 1 Part V Execution on Multiple

More information

Walking Four Machines by the Shore

Walking Four Machines by the Shore Walking Four Machines by the Shore Anastassia Ailamaki www.cs.cmu.edu/~natassa with Mark Hill and David DeWitt University of Wisconsin - Madison Workloads on Modern Platforms Cycles per instruction 3.0

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon David DeWitt, Mark Hill, and Marios Skounakis University of Wisconsin-Madison Memory Hierarchies PROCESSOR EXECUTION PIPELINE

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems 2009 VLDB Summer School Shanghai Peter Boncz (CWI) Sources Thank You! l l l l Database Architectures for New Hardware VLDB 2004 tutorial, Anastassia Ailamaki Query

More information

Anastasia Ailamaki. Performance and energy analysis using transactional workloads

Anastasia Ailamaki. Performance and energy analysis using transactional workloads Performance and energy analysis using transactional workloads Anastasia Ailamaki EPFL and RAW Labs SA students: Danica Porobic, Utku Sirin, and Pinar Tozun Online Transaction Processing $2B+ industry Characteristics:

More information

Improving Database Performance on Simultaneous Multithreading Processors

Improving Database Performance on Simultaneous Multithreading Processors Tech Report CUCS-7-5 Improving Database Performance on Simultaneous Multithreading Processors Jingren Zhou Microsoft Research jrzhou@microsoft.com John Cieslewicz Columbia University johnc@cs.columbia.edu

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses

More information

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358

Memory Management. Reading: Silberschatz chapter 9 Reading: Stallings. chapter 7 EEL 358 Memory Management Reading: Silberschatz chapter 9 Reading: Stallings chapter 7 1 Outline Background Issues in Memory Management Logical Vs Physical address, MMU Dynamic Loading Memory Partitioning Placement

More information

Comparing Gang Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT)

Comparing Gang Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT) Comparing Scheduling with Dynamic Space Sharing on Symmetric Multiprocessors Using Automatic Self-Allocating Threads (ASAT) Abstract Charles Severance Michigan State University East Lansing, Michigan,

More information

DBMSs on a Modern Processor: Where Does Time Go? Revisited

DBMSs on a Modern Processor: Where Does Time Go? Revisited DBMSs on a Modern Processor: Where Does Time Go? Revisited Matthew Becker Information Systems Carnegie Mellon University mbecker+@cmu.edu Naju Mancheril School of Computer Science Carnegie Mellon University

More information

Simultaneous Multithreading on Pentium 4

Simultaneous Multithreading on Pentium 4 Hyper-Threading: Simultaneous Multithreading on Pentium 4 Presented by: Thomas Repantis trep@cs.ucr.edu CS203B-Advanced Computer Architecture, Spring 2004 p.1/32 Overview Multiple threads executing on

More information

Weaving Relations for Cache Performance

Weaving Relations for Cache Performance Weaving Relations for Cache Performance Anastassia Ailamaki Carnegie Mellon Computer Platforms in 198 Execution PROCESSOR 1 cycles/instruction Data and Instructions cycles

More information

Virtual to physical address translation

Virtual to physical address translation Virtual to physical address translation Virtual memory with paging Page table per process Page table entry includes present bit frame number modify bit flags for protection and sharing. Page tables can

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation

More information

Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009

Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming. Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009 Analyzing Memory Access Patterns and Optimizing Through Spatial Memory Streaming Ogün HEPER CmpE 511 Computer Architecture December 24th, 2009 Agenda Introduction Memory Hierarchy Design CPU Speed vs.

More information

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2

6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 6 - Main Memory EECE 315 (101) ECE UBC 2013 W2 Acknowledgement: This set of slides is partly based on the PPTs provided by the Wiley s companion website (including textbook images, when not explicitly

More information

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING ROEVER ENGINEERING COLLEGE DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING 16 MARKS CS 2354 ADVANCE COMPUTER ARCHITECTURE 1. Explain the concepts and challenges of Instruction-Level Parallelism. Define

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Go Deep: Fixing Architectural Overheads of the Go Scheduler

Go Deep: Fixing Architectural Overheads of the Go Scheduler Go Deep: Fixing Architectural Overheads of the Go Scheduler Craig Hesling hesling@cmu.edu Sannan Tariq stariq@cs.cmu.edu May 11, 2018 1 Introduction Golang is a programming language developed to target

More information

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures

Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Understanding The Behavior of Simultaneous Multithreaded and Multiprocessor Architectures Nagi N. Mekhiel Department of Electrical and Computer Engineering Ryerson University, Toronto, Ontario M5B 2K3

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Cache-Aware Database Systems Internals Chapter 7

Cache-Aware Database Systems Internals Chapter 7 Cache-Aware Database Systems Internals Chapter 7 1 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential

More information

Buffering Database Operations for Enhanced Instruction Cache Performance

Buffering Database Operations for Enhanced Instruction Cache Performance ing Database Operations for Enhanced Instruction Cache Performance Jingren Zhou Columbia University jrzhou@cs.columbia.edu Kenneth A. Ross Columbia University kar@cs.columbia.edu ABSTRACT As more and more

More information

Physical memory vs. Logical memory Process address space Addresses assignment to processes Operating system tasks Hardware support CONCEPTS 3.

Physical memory vs. Logical memory Process address space Addresses assignment to processes Operating system tasks Hardware support CONCEPTS 3. T3-Memory Index Memory management concepts Basic Services Program loading in memory Dynamic memory HW support To memory assignment To address translation Services to optimize physical memory usage COW

More information

Software optimization technique for the reduction page fault to increase the processor performance

Software optimization technique for the reduction page fault to increase the processor performance Software optimization technique for the reduction page fault to increase the processor performance Jisha P.Abraham #1, Sheena Mathew *2 # Department of Computer Science, Mar Athanasius College of Engineering,

More information

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1)

Lecture 11: SMT and Caching Basics. Today: SMT, cache access basics (Sections 3.5, 5.1) Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1) 1 Thread-Level Parallelism Motivation: a single thread leaves a processor under-utilized for most of the time by doubling

More information

CSE544 Database Architecture

CSE544 Database Architecture CSE544 Database Architecture Tuesday, February 1 st, 2011 Slides courtesy of Magda Balazinska 1 Where We Are What we have already seen Overview of the relational model Motivation and where model came from

More information

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture

Chapter 2. Parallel Hardware and Parallel Software. An Introduction to Parallel Programming. The Von Neuman Architecture An Introduction to Parallel Programming Peter Pacheco Chapter 2 Parallel Hardware and Parallel Software 1 The Von Neuman Architecture Control unit: responsible for deciding which instruction in a program

More information

Performance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit

Performance of Various Levels of Storage. Movement between levels of storage hierarchy can be explicit or implicit Memory Management All data in memory before and after processing All instructions in memory in order to execute Memory management determines what is to be in memory Memory management activities Keeping

More information

Scalable Aggregation on Multicore Processors

Scalable Aggregation on Multicore Processors Scalable Aggregation on Multicore Processors Yang Ye, Kenneth A. Ross, Norases Vesdapunt Department of Computer Science, Columbia University, New York NY (yeyang,kar)@cs.columbia.edu, nv2157@columbia.edu

More information

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access)

Virtual or Logical. Logical Addr. MMU (Memory Mgt. Unit) Physical. Addr. 1. (50 ns access) Virtual Memory - programmer views memory as large address space without concerns about the amount of physical memory or memory management. (What do the terms 3-bit (or 6-bit) operating system or overlays

More information

CS 261 Fall Mike Lam, Professor. Virtual Memory

CS 261 Fall Mike Lam, Professor. Virtual Memory CS 261 Fall 2016 Mike Lam, Professor Virtual Memory Topics Operating systems Address spaces Virtual memory Address translation Memory allocation Lingering questions What happens when you call malloc()?

More information

Cache Injection on Bus Based Multiprocessors

Cache Injection on Bus Based Multiprocessors Cache Injection on Bus Based Multiprocessors Aleksandar Milenkovic, Veljko Milutinovic School of Electrical Engineering, University of Belgrade E-mail: {emilenka,vm@etf.bg.ac.yu, Http: {galeb.etf.bg.ac.yu/~vm,

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 15 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2017 Lecture 15 LAST TIME: CACHE ORGANIZATION Caches have several important parameters B = 2 b bytes to store the block in each cache line S = 2 s cache sets

More information

Pipelined Hash-Join on Multithreaded Architectures

Pipelined Hash-Join on Multithreaded Architectures Pipelined Hash-Join on Multithreaded Architectures Philip Garcia University of Wisconsin-Madison Madison, WI 53706 USA pcgarcia@wisc.edu Henry F. Korth Lehigh University Bethlehem, PA 805 USA hfk@lehigh.edu

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy

Beyond ILP II: SMT and variants. 1 Simultaneous MT: D. Tullsen, S. Eggers, and H. Levy EE482: Advanced Computer Organization Lecture #13 Processor Architecture Stanford University Handout Date??? Beyond ILP II: SMT and variants Lecture #13: Wednesday, 10 May 2000 Lecturer: Anamaya Sullery

More information

Query Processing on Multi-Core Architectures

Query Processing on Multi-Core Architectures Query Processing on Multi-Core Architectures Frank Huber and Johann-Christoph Freytag Department for Computer Science, Humboldt-Universität zu Berlin Rudower Chaussee 25, 12489 Berlin, Germany {huber,freytag}@dbis.informatik.hu-berlin.de

More information

Objectives and Functions Convenience. William Stallings Computer Organization and Architecture 7 th Edition. Efficiency

Objectives and Functions Convenience. William Stallings Computer Organization and Architecture 7 th Edition. Efficiency William Stallings Computer Organization and Architecture 7 th Edition Chapter 8 Operating System Support Objectives and Functions Convenience Making the computer easier to use Efficiency Allowing better

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems

Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi University of Kansas 1 Why? High-Performance Multicores for Real-Time Systems

More information

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. CHAPTER 11: IMPLEMENTING FILE SYSTEMS (COMPACT) By I-Chen Lin Textbook: Operating System Concepts 9th Ed. File-System Structure File structure Logical storage unit Collection of related information File

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy

COMPUTER ARCHITECTURE. Virtualization and Memory Hierarchy COMPUTER ARCHITECTURE Virtualization and Memory Hierarchy 2 Contents Virtual memory. Policies and strategies. Page tables. Virtual machines. Requirements of virtual machines and ISA support. Virtual machines:

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:

More information

Memory Management. Memory

Memory Management. Memory Memory Management These slides are created by Dr. Huang of George Mason University. Students registered in Dr. Huang s courses at GMU can make a single machine readable copy and print a single copy of

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu Outline Parallel computing? Multi-core architectures Memory hierarchy Vs. SMT Cache coherence What is parallel computing? Using multiple processors in parallel to

More information

Software-Controlled Multithreading Using Informing Memory Operations

Software-Controlled Multithreading Using Informing Memory Operations Software-Controlled Multithreading Using Informing Memory Operations Todd C. Mowry Computer Science Department University Sherwyn R. Ramkissoon Department of Electrical & Computer Engineering University

More information

Cache-Aware Database Systems Internals. Chapter 7

Cache-Aware Database Systems Internals. Chapter 7 Cache-Aware Database Systems Internals Chapter 7 Data Placement in RDBMSs A careful analysis of query processing operators and data placement schemes in RDBMS reveals a paradox: Workloads perform sequential

More information

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto

Ricardo Rocha. Department of Computer Science Faculty of Sciences University of Porto Ricardo Rocha Department of Computer Science Faculty of Sciences University of Porto Slides based on the book Operating System Concepts, 9th Edition, Abraham Silberschatz, Peter B. Galvin and Greg Gagne,

More information

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY 1 Memories: Review SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: value is stored

More information

The Gap Between the Virtual Machine and the Real Machine. Charles Forgy Production Systems Tech

The Gap Between the Virtual Machine and the Real Machine. Charles Forgy Production Systems Tech The Gap Between the Virtual Machine and the Real Machine Charles Forgy Production Systems Tech How to Improve Performance Use better algorithms. Use parallelism. Make better use of the hardware. Argument

More information

CPE300: Digital System Architecture and Design

CPE300: Digital System Architecture and Design CPE300: Digital System Architecture and Design Fall 2011 MW 17:30-18:45 CBC C316 Virtual Memory 11282011 http://www.egr.unlv.edu/~b1morris/cpe300/ 2 Outline Review Cache Virtual Memory Projects 3 Memory

More information

Scheduling the Intel Core i7

Scheduling the Intel Core i7 Third Year Project Report University of Manchester SCHOOL OF COMPUTER SCIENCE Scheduling the Intel Core i7 Ibrahim Alsuheabani Degree Programme: BSc Software Engineering Supervisor: Prof. Alasdair Rawsthorne

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 11: File System Implementation Prof. Alan Mislove (amislove@ccs.neu.edu) File-System Structure File structure Logical storage unit Collection

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 20 Main Memory Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Pages Pages and frames Page

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI CHEN TIANZHOU SHI QINGSONG JIANG NING College of Computer Science Zhejiang University College of Computer Science

More information

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end

More information

Multi-core Architectures. Dr. Yingwu Zhu

Multi-core Architectures. Dr. Yingwu Zhu Multi-core Architectures Dr. Yingwu Zhu What is parallel computing? Using multiple processors in parallel to solve problems more quickly than with a single processor Examples of parallel computing A cluster

More information

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition

Chapter 8: Memory- Management Strategies. Operating System Concepts 9 th Edition Chapter 8: Memory- Management Strategies Operating System Concepts 9 th Edition Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation

More information

Operating Systems. Introduction & Overview. Outline for today s lecture. Administrivia. ITS 225: Operating Systems. Lecture 1

Operating Systems. Introduction & Overview. Outline for today s lecture. Administrivia. ITS 225: Operating Systems. Lecture 1 ITS 225: Operating Systems Operating Systems Lecture 1 Introduction & Overview Jan 15, 2004 Dr. Matthew Dailey Information Technology Program Sirindhorn International Institute of Technology Thammasat

More information

Lecture 2: September 9

Lecture 2: September 9 CMPSCI 377 Operating Systems Fall 2010 Lecture 2: September 9 Lecturer: Prashant Shenoy TA: Antony Partensky & Tim Wood 2.1 OS & Computer Architecture The operating system is the interface between a user

More information

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor.

Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. CS 320 Ch. 18 Multicore Computers Multicore computer: Combines two or more processors (cores) on a single die. Also called a chip-multiprocessor. Definitions: Hyper-threading Intel's proprietary simultaneous

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Flash Drive Emulation

Flash Drive Emulation Flash Drive Emulation Eric Aderhold & Blayne Field aderhold@cs.wisc.edu & bfield@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison Abstract Flash drives are becoming increasingly

More information

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research Joel Hestness jthestness@uwalumni.com Lenni Kuff lskuff@uwalumni.com Computer Science Department University of

More information

Chapter 9 Memory Management Main Memory Operating system concepts. Sixth Edition. Silberschatz, Galvin, and Gagne 8.1

Chapter 9 Memory Management Main Memory Operating system concepts. Sixth Edition. Silberschatz, Galvin, and Gagne 8.1 Chapter 9 Memory Management Main Memory Operating system concepts. Sixth Edition. Silberschatz, Galvin, and Gagne 8.1 Chapter 9: Memory Management Background Swapping Contiguous Memory Allocation Segmentation

More information

Virtual Memory. control structures and hardware support

Virtual Memory. control structures and hardware support Virtual Memory control structures and hardware support 1 Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time A process may be swapped in and

More information

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011

Virtual Memory. CS61, Lecture 15. Prof. Stephen Chong October 20, 2011 Virtual Memory CS6, Lecture 5 Prof. Stephen Chong October 2, 2 Announcements Midterm review session: Monday Oct 24 5:3pm to 7pm, 6 Oxford St. room 33 Large and small group interaction 2 Wall of Flame Rob

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

A Cache Hierarchy in a Computer System

A Cache Hierarchy in a Computer System A Cache Hierarchy in a Computer System Ideally one would desire an indefinitely large memory capacity such that any particular... word would be immediately available... We are... forced to recognize the

More information

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System

A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI, CHEN TIANZHOU, SHI QINGSONG, JIANG NING College of Computer Science Zhejiang University College of Computer

More information

A Scalable Event Dispatching Library for Linux Network Servers

A Scalable Event Dispatching Library for Linux Network Servers A Scalable Event Dispatching Library for Linux Network Servers Hao-Ran Liu and Tien-Fu Chen Dept. of CSIE National Chung Cheng University Traditional server: Multiple Process (MP) server A dedicated process

More information

Architecture-Conscious Database Systems

Architecture-Conscious Database Systems Architecture-Conscious Database Systems Anastassia Ailamaki Ph.D. Examination November 30, 2000 A DBMS on a 1980 Computer DBMS Execution PROCESSOR 10 cycles/instruction DBMS Data and Instructions 6 cycles

More information

Performance and Optimization Issues in Multicore Computing

Performance and Optimization Issues in Multicore Computing Performance and Optimization Issues in Multicore Computing Minsoo Ryu Department of Computer Science and Engineering 2 Multicore Computing Challenges It is not easy to develop an efficient multicore program

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Chapter 8: Main Memory. Operating System Concepts 9 th Edition Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Chapter 05. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1

Chapter 05. Authors: John Hennessy & David Patterson. Copyright 2011, Elsevier Inc. All rights Reserved. 1 Chapter 05 Authors: John Hennessy & David Patterson Copyright 2011, Elsevier Inc. All rights Reserved. 1 Figure 5.1 Basic structure of a centralized shared-memory multiprocessor based on a multicore chip.

More information

OPERATING SYSTEM. Chapter 12: File System Implementation

OPERATING SYSTEM. Chapter 12: File System Implementation OPERATING SYSTEM Chapter 12: File System Implementation Chapter 12: File System Implementation File-System Structure File-System Implementation Directory Implementation Allocation Methods Free-Space Management

More information

Show Me the $... Performance And Caches

Show Me the $... Performance And Caches Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1

More information

How to Optimize the Scalability & Performance of a Multi-Core Operating System. Architecting a Scalable Real-Time Application on an SMP Platform

How to Optimize the Scalability & Performance of a Multi-Core Operating System. Architecting a Scalable Real-Time Application on an SMP Platform How to Optimize the Scalability & Performance of a Multi-Core Operating System Architecting a Scalable Real-Time Application on an SMP Platform Overview W hen upgrading your hardware platform to a newer

More information

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst Operating Systems CMPSCI 377 Spring 2017 Mark Corner University of Massachusetts Amherst Last Class: Intro to OS An operating system is the interface between the user and the architecture. User-level Applications

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

HPC VT Machine-dependent Optimization

HPC VT Machine-dependent Optimization HPC VT 2013 Machine-dependent Optimization Last time Choose good data structures Reduce number of operations Use cheap operations strength reduction Avoid too many small function calls inlining Use compiler

More information

Memory Hierarchy. Slides contents from:

Memory Hierarchy. Slides contents from: Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory

More information

Chapter 8 Main Memory

Chapter 8 Main Memory COP 4610: Introduction to Operating Systems (Spring 2014) Chapter 8 Main Memory Zhi Wang Florida State University Contents Background Swapping Contiguous memory allocation Paging Segmentation OS examples

More information

What Operating Systems Do An operating system is a program hardware that manages the computer provides a basis for application programs acts as an int

What Operating Systems Do An operating system is a program hardware that manages the computer provides a basis for application programs acts as an int Operating Systems Lecture 1 Introduction Agenda: What Operating Systems Do Computer System Components How to view the Operating System Computer-System Operation Interrupt Operation I/O Structure DMA Structure

More information

Design Patterns for Real-Time Computer Music Systems

Design Patterns for Real-Time Computer Music Systems Design Patterns for Real-Time Computer Music Systems Roger B. Dannenberg and Ross Bencina 4 September 2005 This document contains a set of design patterns for real time systems, particularly for computer

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Chapter 11: Implementing File Systems

Chapter 11: Implementing File Systems Chapter 11: Implementing File Systems Operating System Concepts 99h Edition DM510-14 Chapter 11: Implementing File Systems File-System Structure File-System Implementation Directory Implementation Allocation

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2)

Multiple Processor Systems. Lecture 15 Multiple Processor Systems. Multiprocessor Hardware (1) Multiprocessors. Multiprocessor Hardware (2) Lecture 15 Multiple Processor Systems Multiple Processor Systems Multiprocessors Multicomputers Continuous need for faster computers shared memory model message passing multiprocessor wide area distributed

More information

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches

Background. 20: Distributed File Systems. DFS Structure. Naming and Transparency. Naming Structures. Naming Schemes Three Main Approaches Background 20: Distributed File Systems Last Modified: 12/4/2002 9:26:20 PM Distributed file system (DFS) a distributed implementation of the classical time-sharing model of a file system, where multiple

More information