Algorithm Analysis Techniques for Single Chip Computer Systems

Size: px
Start display at page:

Download "Algorithm Analysis Techniques for Single Chip Computer Systems"

Transcription

1 Algorithm Analysis Techniques for Single Chip Computer Systems Matthew Frank MTLCS Cambridge, MA December 2, 1998 Abstract Circuit fabrication techniques have advanced to the point where it is possible to put an entire computer system, including processor, cache, and memory, on a single chip. This complete integration dictates changes in the basic assumptions that can be made about system latencies. n particular, in single chip systems wire delays dominate all other costs, so memory access times increase as memory size grows. The result is that, to achieve the best possible performance, an algorithm design needs to account for the geometry of data layout. This paper provides a case study for algorithm analysis where memory latency grows as the square root of memory size, consistent with the real limitations found in the 2dimensional VLS implementation of a single chip computer system. We study divideandconquer sorting algorithms, and find that while a traditional implementation would require asymptotic time, caching techniques can be used to reduce this cost to. A similar analysis of a tiled matrix multiplication algorithm shows that an uncached implementation would require time while caching reduces the cost to. 1 Memory Access Costs Before 1980 computers were constructed from thousands of chips, each chip containing just a few logic gates. Since the delay through one of these chips was greater than the propogation delay of 10 meters of wire (the size of the room containing the computer), a reasonable engineering approximation was to assume that the distance between components was irrelevant. 1

2 A 3 The situation will be reversed in the next generation of computer systems, which will fit entirely on a single integrated circuit. n systems being built today, the wire delay across 2 cm of silicon is greater than 10 gate delays. n five to ten years a 2 cm wire delay will be in the range of hundreds to thousands of gate delays. A reasonable engineering approximation is to assume that gate delays are irrelevant and that the distances between various system components are all that matter. Geometry dominates. This paper is a first attempt at analyzing algorithm behavior for systems like single chip computers, where wire delays are dominant. We begin with the assumption that a memory of size has access time. We show for two algorithms, divideandconquer sort and matrix multiply, that while caching techniques help, they can not completely hide the cost of accessing memory. For sorting, we demonstrate a caching scheme that reduces average memory latency to "! $&%('. For matrix multiply, we show a caching scheme that reduces average memory latency from to *) +. n the next section we discuss the basic caching model. Section 3 provides an analysis of divideandconquer sorting. Section 4 presents the analysis of matrixmultiply. Section 5 discusses some of the broader consequences of memory access costs that grow as memory size increases. 2 Caching A cache is a small memory that is used as a scratchpad during computation. Since the cache is smaller than the main memory its access time is smaller. The hope is that commonly used data elements can be copied into the smaller memory, and then accessed multiple times at the smaller cost. Suppose we have a cache of size,.01, where, 2, is a fraction that represents the cache size as a fraction of the main memory size. Then the cost of accessing the cache is 1. Suppose also that some fraction, 678:9;, of memory accesses miss in the cache and must be satisfied from main memory at a cost, while the remaining <8=9; > the lower cost. Then the average memory latency, A 4@ :9; accesses hit in the cache at 8B, is given by: 8CD 1FË 678:9; HG (1) Note that this equation implies a tradeoff between cache miss rate and cache access time. As the cache grows, the miss rate decreases but the access time grows. We can minimize the average memory latency by taking the derivative of A 8B and setting it equal to 0. 2

3 2 3 A E 1cde 8C 678:9; LKBG (2) 3 DivideandConquer Sorting A basic divideandconquer sort of an element array performs MONQP> steps, each of which touches all array elements. f memory costs then such an algorithm requires R SMONQP" time. Suppose, however, that we are provided with a cache of size,tu1 where 3.4. Then the cost of a cache access will be 1. The sorting algorithm can leverage this faster memory by dividing the array into VTW2 X chunks. Each chunk is copied into the cache, sorted in the cache with the smaller memory cost, and then copied back to main memory. Finally the sorted chunks in main memory are merged together, unfortunately incurring the higher memory cost. Given a cache of size,, the number of accesses that can be performed in the cache is SMONQPY, and the number of accesses to main memory is SMONQPZV 0MONQP"\[ 2 ]MONQP". The miss rate for sorting, 6 sort 8^9;B, is then the number of accesses in main memory divided by the total number of accesses: SMONQP> 0MONQP" M_NQP> MONQP" G (3) Now we can combine the cache access cost and the miss rate to calculate the average memory latency, A sort8`9;c for an element sort with cache of size 1. A sort8`9;c> 1aE ^6 sort8^9;cd 1 Now we find the minimal value for given respect to and setting it equal to 0. A sort8^9;b bm_nqp> Solving this equation gives the optimal value for : 6 sort8^9;cd 8M_NQPb f 0M_NQPb MONQP> G (4) by taking the derivative with LKBG (5) G (6) Finally, we can plug back into A sort to get the optimal average memory access time: 3

4 qr r f e E A sort8^9;bcgh Since the entire algorithm requires SMONQP> time of divideandconquer sorting is: M_NQP= j $&%(' i "k l MONQP" MONQP" M_NQP MONQP*8MONQPb MONQP> MONQP*8MONQPb HG (7) MONQP> memory accesses, the total running SMONQP"^A"csort m SMONQPn8M_NQPb og (8) The extra factor of MONQP=8M_NQPb can $&%(' be elminated by using a multilevel cache hierarchy. For example if we provide memories, each 4 times the size of the previous, then exactly references will be satisfied from each memory. Each memory has an access cost of 2 times the previous. The total running time is then: 4 Tiled Matrix Multiply p l $&%(' s*t u8 (9) The technique for analyzing tiled matrix multiplication is similar to the technique we used in the previous section. The algorithm we examine is as follows: for i = 1 to M by T for j = 1 to M by T for k = 1 to M by T for ii = i to i+t1 for jj = j to j+t1 c = C[ii,jj] for kk = k to k+t1 c = c + A[ii,kk] * B[kk,jj] C[ii,jj] = c This algorithm uses a tiling factor v. Each vuwxv submatrix of the y w y matrices A and B is brought into the cache. Each element is accessed from the 4

5 4 v ) v f cache v times before being replaced. Since the main memory size is zf{ y the memory access time is. Since the cache size is v f the cache access time is v. We can then calculate the average memory latency, A mm 8^9v for matrix multiply. A mm 8^9v D}v~E G (10) To find the optimal tiling factor, v, given, we must take the derivative and set it equal to 0. A mm8^9v 4 v f The solution to this equation yield the optimal value for v. LKBG (11) v c ) (12) When we insert v c into A mm 8^9v we get the optimal average memory access time: A mm8^9v ) TE u ) xog (13) Thus, matrix multiplication, which would be an 8y i 8 f algorithm without caching, is improved to 8 2( ƒ with a tile of size y w y. This is a factor of ) greater than would be found in an analysis assuming memory costs of. 5 mplications The results of this paper strongly indicate that single chip computer systems, even those with just a single processor, should be treated as distributed systems. This is excellent news, since there is a large body of established techniques for dealing with latency problems in distributed systems. The most promising of these are using prefetching to leverage the large available communication bandwidth to overlap multiple latencies, and distributing computation by putting some processing resources near each portion of memory so that the data doesn t need to be moved at all. On the flip side, these results call into question the efficacy of traditional areatime tradeoffs. n single chip computer systems, distance and time are equivalent so adding area adds time. The problem with this tradeoff becomes even more 5

6 apparent in the energy domain. While this paper has focused on application speed, it could have just as well focused on energy consumption. n single chip computer systems, the energy consumed is also proportional to the sum of distances that signals need to travel. While prefetching can trade off increased bandwidth requirements to overlap high latency costs, it does not reduce the application energy costs. Only geometric optimizations that reduce the signal propogation distance can improve energy consumption. Finally, these results suggest that parallel applications may not be as inefficient factors that we observe in the memory as is traditionally believed. The extra latency analyses in Sections 3 and 4 seem similar to the extra factors that are often observed in applications parallelized onto mesh based communication networks. The results in this paper indicate that these additional factors are not overheads from parallelization, but may actually represent a fundamental cost of computing in finite dimensional space. 6

Cache-Efficient Algorithms

Cache-Efficient Algorithms 6.172 Performance Engineering of Software Systems LECTURE 8 Cache-Efficient Algorithms Charles E. Leiserson October 5, 2010 2010 Charles E. Leiserson 1 Ideal-Cache Model Recall: Two-level hierarchy. Cache

More information

Program Transformations for the Memory Hierarchy

Program Transformations for the Memory Hierarchy Program Transformations for the Memory Hierarchy Locality Analysis and Reuse Copyright 214, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California

More information

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2

More information

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)

Memory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed) Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2

More information

CSC630/CSC730 Parallel & Distributed Computing

CSC630/CSC730 Parallel & Distributed Computing CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2

More information

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science

Algorithms and Architecture. William D. Gropp Mathematics and Computer Science Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?

More information

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng

Slide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01

More information

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time

Cache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +

More information

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture

Announcements. ! Previous lecture. Caches. Inf3 Computer Architecture Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for

More information

Lecture 7 Notes: 07 / 11. Reflection and refraction

Lecture 7 Notes: 07 / 11. Reflection and refraction Lecture 7 Notes: 07 / 11 Reflection and refraction When an electromagnetic wave, such as light, encounters the surface of a medium, some of it is reflected off the surface, while some crosses the boundary

More information

EE/CSCI 451 Midterm 1

EE/CSCI 451 Midterm 1 EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming

More information

ECE 486/586. Computer Architecture. Lecture # 2

ECE 486/586. Computer Architecture. Lecture # 2 ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:

More information

EE 352 Lab 4 Cache Me If You Can

EE 352 Lab 4 Cache Me If You Can EE 352 Lab 4 Cache Me If You Can 1 Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply

More information

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #

More information

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy. Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest

More information

(Refer Slide Time: 01:25)

(Refer Slide Time: 01:25) Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 32 Memory Hierarchy: Virtual Memory (contd.) We have discussed virtual

More information

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language

More information

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

CS377P Programming for Performance Single Thread Performance Caches I

CS377P Programming for Performance Single Thread Performance Caches I CS377P Programming for Performance Single Thread Performance Caches I Sreepathi Pai UTCS September 21, 2015 Outline 1 Introduction 2 Caches 3 Performance of Caches Outline 1 Introduction 2 Caches 3 Performance

More information

Configuration Caching Techniques for FPGA

Configuration Caching Techniques for FPGA Submitted to IEEE Symposium on FPGAs for Custom Computing Machines, 2000. Configuration Caching Techniques for FPGA Zhiyuan Li, Katherine Compton Department of Electrical and Computer Engineering Northwestern

More information

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis

Computer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity

More information

Writing Parallel Programs; Cost Model.

Writing Parallel Programs; Cost Model. CSE341T 08/30/2017 Lecture 2 Writing Parallel Programs; Cost Model. Due to physical and economical constraints, a typical machine we can buy now has 4 to 8 computing cores, and soon this number will be

More information

CS 204 Lecture Notes on Elementary Network Analysis

CS 204 Lecture Notes on Elementary Network Analysis CS 204 Lecture Notes on Elementary Network Analysis Mart Molle Department of Computer Science and Engineering University of California, Riverside CA 92521 mart@cs.ucr.edu October 18, 2006 1 First-Order

More information

Dense Matrix Algorithms

Dense Matrix Algorithms Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication

More information

(Lec 14) Placement & Partitioning: Part III

(Lec 14) Placement & Partitioning: Part III Page (Lec ) Placement & Partitioning: Part III What you know That there are big placement styles: iterative, recursive, direct Placement via iterative improvement using simulated annealing Recursive-style

More information

Implementing a Statically Adaptive Software RAID System

Implementing a Statically Adaptive Software RAID System Implementing a Statically Adaptive Software RAID System Matt McCormick mattmcc@cs.wisc.edu Master s Project Report Computer Sciences Department University of Wisconsin Madison Abstract Current RAID systems

More information

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language

More information

A Correlation of. to the. Common Core State Standards for Mathematics Bid Category Grade 5

A Correlation of. to the. Common Core State Standards for Mathematics Bid Category Grade 5 A Correlation of to the Bid Category 11-010-50 A Correlation of, to the Operations and Algebraic Thinking Write and interpret numerical expressions. [5.OA.A.1]Use parentheses, brackets, or braces in numerical

More information

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017

CS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017 CS 1: Intro to Systems Caching Martin Gagne Swarthmore College March 2, 2017 Recall A cache is a smaller, faster memory, that holds a subset of a larger (slower) memory We take advantage of locality to

More information

Grade K 8 Standards Grade 5

Grade K 8 Standards Grade 5 Grade 5 In grade 5, instructional time should focus on three critical areas: (1) developing fluency with addition and subtraction of fractions, and developing understanding of the multiplication of fractions

More information

EECS4201 Computer Architecture

EECS4201 Computer Architecture Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be

More information

Common Core Standards 5 th Grade - Mathematics

Common Core Standards 5 th Grade - Mathematics Common Core Standards 5 th Grade - Mathematics Operations and Algebraic Thinking Write and interpret numerical expressions. 1. Use parenthesis, brackets, or braces in numerical expressions, and evaluate

More information

Mathematics Grade 5. grade 5 33

Mathematics Grade 5. grade 5 33 Mathematics Grade 5 In Grade 5, instructional time should focus on three critical areas: (1) developing fluency with addition and subtraction of fractions, and developing understanding of the multiplication

More information

Memory Hierarchy Basics

Memory Hierarchy Basics Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases

More information

Memory Hierarchy. Advanced Optimizations. Slides contents from:

Memory Hierarchy. Advanced Optimizations. Slides contents from: Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.

More information

Grade 5. Massachusetts Curriculum Framework for Mathematics 48

Grade 5. Massachusetts Curriculum Framework for Mathematics 48 Grade 5 Introduction In grade 5, instructional time should focus on four critical areas: (1) developing fluency with addition and subtraction of fractions, and developing understanding of the multiplication

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

EE 352 Lab 5 Cache Me If You Can

EE 352 Lab 5 Cache Me If You Can EE 52 Lab 5 Cache Me If You Can Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

Stack Machines. Towards Scalable Stack Based Parallelism. 1 of 53. Tutorial Organizer: Dr Chris Crispin-Bailey

Stack Machines. Towards Scalable Stack Based Parallelism. 1 of 53. Tutorial Organizer: Dr Chris Crispin-Bailey 1 of 53 Stack Machines Towards Scalable Stack Based Parallelism Tutorial Organizer: Department of Computer Science University of York 2 of 53 Today s Speakers Dr Mark Shannon Dr Huibin Shi 3 of 53 Stack

More information

Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi

Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 32 The Memory Systems Part III Welcome back. (Refer Slide

More information

Use grouping symbols including parentheses, brackets, or braces in numerical expressions, and evaluate expressions with these symbols

Use grouping symbols including parentheses, brackets, or braces in numerical expressions, and evaluate expressions with these symbols Operations and Algebraic Thinking AR.Math.Content.5.OA.A.1 AR.Math.Content.5.OA.A.2 Write and interpret numerical expressions Use grouping symbols including parentheses, brackets, or braces in numerical

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology

More information

Problem 2 If the cost of a 12 inch wafer (actually 300mm) is $3500, what is the cost/die for the circuit in Problem 1.

Problem 2 If the cost of a 12 inch wafer (actually 300mm) is $3500, what is the cost/die for the circuit in Problem 1. EE 330 Homework 1 Fall 2016 Due Friday Aug 26 Problem 1 Assume a simple circuit requires 1,000 MOS transistors on a die and that all transistors are minimum sized. If the transistors are fabricated in

More information

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.

Computer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP

More information

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple

Memory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss

More information

Module 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.

Module 10: Design of Shared Memory Multiprocessors Lecture 20: Performance of Coherence Protocols MOESI protocol. MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Distributed Virtual Reality Computation

Distributed Virtual Reality Computation Jeff Russell 4/15/05 Distributed Virtual Reality Computation Introduction Virtual Reality is generally understood today to mean the combination of digitally generated graphics, sound, and input. The goal

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more

More information

Common Core Standards for Mathematics. Grade 5. Operations and Algebraic Thinking Date Taught

Common Core Standards for Mathematics. Grade 5. Operations and Algebraic Thinking Date Taught Operations and Algebraic Thinking Write and interpret numerical expressions. 5.OA.1. Use parentheses, brackets, or braces in numerical expressions, and evaluate expressions with these symbols. 5.OA.2.

More information

Adapted from David Patterson s slides on graduate computer architecture

Adapted from David Patterson s slides on graduate computer architecture Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual

More information

Common Core State Standard for Mathematics

Common Core State Standard for Mathematics Domain: Operations and Algebraic Clusters: Write and interpret numerical expressions 1. Use parentheses, brackets, or braces in numerical expressions and evaluate expressions with these symbols. CC.5.OA.1

More information

Teaching Math thru Big Ideas Focusing on Differentiation. Marian Small April 2017 San Diego, CA

Teaching Math thru Big Ideas Focusing on Differentiation. Marian Small April 2017 San Diego, CA Teaching Math thru Big Ideas Focusing on Differentiation Marian Small April 2017 San Diego, CA Our focus today Differentiation by focusing on big ideas Formative assessment/feedback Planning lessons/units

More information

EITF20: Computer Architecture Part4.1.1: Cache - 2

EITF20: Computer Architecture Part4.1.1: Cache - 2 EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss

More information

CSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen

CSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen CSE 548 Computer Architecture Clock Rate vs IPC V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger Presented by: Ning Chen Transistor Changes Development of silicon fabrication technology caused transistor

More information

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS Lecture 03-04 PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS By: Dr. Zahoor Jan 1 ALGORITHM DEFINITION A finite set of statements that guarantees an optimal solution in finite interval of time 2 GOOD ALGORITHMS?

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 6 Coding I Chapter 3 Information Redundancy Part.6.1 Information Redundancy - Coding A data word with d bits is encoded

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009

VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009 VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs April 6 th, 2009 Message Passing Costs Major overheads in the execution of parallel programs: from communication

More information

Basic Communication Operations (Chapter 4)

Basic Communication Operations (Chapter 4) Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:

More information

2. TOPOLOGICAL PATTERN ANALYSIS

2. TOPOLOGICAL PATTERN ANALYSIS Methodology for analyzing and quantifying design style changes and complexity using topological patterns Jason P. Cain a, Ya-Chieh Lai b, Frank Gennari b, Jason Sweis b a Advanced Micro Devices, 7171 Southwest

More information

Addition and Subtraction

Addition and Subtraction PART Looking Back At: Grade Number and Operations 89 Geometry 9 Fractions 94 Measurement 9 Data 9 Number and Operations 96 Geometry 00 Fractions 0 Measurement 02 Data 0 Looking Forward To: Grade Number

More information

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building

More information

Math 9 Final Exam Review and Outline

Math 9 Final Exam Review and Outline Math 9 Final Exam Review and Outline Your Final Examination in Mathematics 9 is a comprehensive final of all material covered in the course. It is broken down into the three sections: Number Sense, Patterns

More information

5 th Grade LEUSD Learning Targets in Mathematics

5 th Grade LEUSD Learning Targets in Mathematics 5 th Grade LEUSD Learning Targets in Mathematics 6/24/2015 The learning targets below are intended to provide a guide for teachers in determining whether students are exhibiting characteristics of being

More information

DCSD Common Core State Standards Math Pacing Guide 5th Grade. Trimester 1

DCSD Common Core State Standards Math Pacing Guide 5th Grade. Trimester 1 Trimester 1 CCSS Mathematical Practices 1.Make sense of problems and persevere in solving them. 2.Reason abstractly and quantitatively. 3.Construct viable arguments and critique the reasoning of others.

More information

Multiprocessor Cache Coherency. What is Cache Coherence?

Multiprocessor Cache Coherency. What is Cache Coherence? Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by

More information

I/O Model. Cache-Oblivious Algorithms : Algorithms in the Real World. Advantages of Cache-Oblivious Algorithms 4/9/13

I/O Model. Cache-Oblivious Algorithms : Algorithms in the Real World. Advantages of Cache-Oblivious Algorithms 4/9/13 I/O Model 15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms Matrix multiplication Distribution sort Static searching Abstracts a single level of the memory hierarchy Fast memory

More information

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.

Embedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto. Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors

More information

CCBC Math 081 Geometry Section 2.2

CCBC Math 081 Geometry Section 2.2 2.2 Geometry Geometry is the study of shapes and their mathematical properties. In this section, we will learn to calculate the perimeter, area, and volume of a few basic geometric shapes. Perimeter We

More information

Fundamentals of Quantitative Design and Analysis

Fundamentals of Quantitative Design and Analysis Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature

More information

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast

More information

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM

Overview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static

More information

Transistors and Wires

Transistors and Wires Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides

More information

5.OA.1 5.OA.2. The Common Core Institute

5.OA.1 5.OA.2. The Common Core Institute Operations and Algebraic Thinking The Common Core Institute Cluster: Write and interpret numerical expressions. 5.OA.1: Use parentheses, brackets, or braces in numerical expressions, and evaluate expressions

More information

Combinational hazards

Combinational hazards Combinational hazards We break down combinational hazards into two major categories, logic hazards and function hazards. A logic hazard is characterized by the fact that it can be eliminated by proper

More information

Copyright 2012, Elsevier Inc. All rights reserved.

Copyright 2012, Elsevier Inc. All rights reserved. Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology

More information

Advanced Caching Techniques

Advanced Caching Techniques Advanced Caching Approaches to improving memory system performance eliminate memory accesses/operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide

More information

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors

More information

Mathematics Assessment Anchors and Eligible Content

Mathematics Assessment Anchors and Eligible Content Mathematics Assessment Anchors and Eligible Content Aligned to Pennsylvania Common Core Standards www.pdesas.org www.education.state.pa.us 2012 Pennsylvania System of School Assessment The Assessment Anchors,

More information

ECE 669 Parallel Computer Architecture

ECE 669 Parallel Computer Architecture ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate

More information

th Grade Math Curriculum Map

th Grade Math Curriculum Map Standards Quarter 1 Dates Taught (For Teacher Use) Number and Operations in Base Ten Understand the place value system (Major Work) 5.NBT.1 Recognize that in a multi-digit number, a digit in one place

More information

Project Proposals. 1 Project 1: On-chip Support for ILP, DLP, and TLP in an Imagine-like Stream Processor

Project Proposals. 1 Project 1: On-chip Support for ILP, DLP, and TLP in an Imagine-like Stream Processor EE482C: Advanced Computer Organization Lecture #12 Stream Processor Architecture Stanford University Tuesday, 14 May 2002 Project Proposals Lecture #12: Tuesday, 14 May 2002 Lecturer: Students of the class

More information

LECTURE 5: MEMORY HIERARCHY DESIGN

LECTURE 5: MEMORY HIERARCHY DESIGN LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive

More information

Cache Memory and Performance

Cache Memory and Performance Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)

More information

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015

CS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015 CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable

More information

Digital Electronics 27. Digital System Design using PLDs

Digital Electronics 27. Digital System Design using PLDs 1 Module -27 Digital System Design 1. Introduction 2. Digital System Design 2.1 Standard (Fixed function) ICs based approach 2.2 Programmable ICs based approach 3. Comparison of Digital System Design approaches

More information

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache

10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is

More information

MA 323 Geometric Modelling Course Notes: Day 21 Three Dimensional Bezier Curves, Projections and Rational Bezier Curves

MA 323 Geometric Modelling Course Notes: Day 21 Three Dimensional Bezier Curves, Projections and Rational Bezier Curves MA 323 Geometric Modelling Course Notes: Day 21 Three Dimensional Bezier Curves, Projections and Rational Bezier Curves David L. Finn Over the next few days, we will be looking at extensions of Bezier

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256

More information

6. Parallel Volume Rendering Algorithms

6. Parallel Volume Rendering Algorithms 6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information