Algorithm Analysis Techniques for Single Chip Computer Systems
|
|
- Annabelle Hunt
- 5 years ago
- Views:
Transcription
1 Algorithm Analysis Techniques for Single Chip Computer Systems Matthew Frank MTLCS Cambridge, MA December 2, 1998 Abstract Circuit fabrication techniques have advanced to the point where it is possible to put an entire computer system, including processor, cache, and memory, on a single chip. This complete integration dictates changes in the basic assumptions that can be made about system latencies. n particular, in single chip systems wire delays dominate all other costs, so memory access times increase as memory size grows. The result is that, to achieve the best possible performance, an algorithm design needs to account for the geometry of data layout. This paper provides a case study for algorithm analysis where memory latency grows as the square root of memory size, consistent with the real limitations found in the 2dimensional VLS implementation of a single chip computer system. We study divideandconquer sorting algorithms, and find that while a traditional implementation would require asymptotic time, caching techniques can be used to reduce this cost to. A similar analysis of a tiled matrix multiplication algorithm shows that an uncached implementation would require time while caching reduces the cost to. 1 Memory Access Costs Before 1980 computers were constructed from thousands of chips, each chip containing just a few logic gates. Since the delay through one of these chips was greater than the propogation delay of 10 meters of wire (the size of the room containing the computer), a reasonable engineering approximation was to assume that the distance between components was irrelevant. 1
2 A 3 The situation will be reversed in the next generation of computer systems, which will fit entirely on a single integrated circuit. n systems being built today, the wire delay across 2 cm of silicon is greater than 10 gate delays. n five to ten years a 2 cm wire delay will be in the range of hundreds to thousands of gate delays. A reasonable engineering approximation is to assume that gate delays are irrelevant and that the distances between various system components are all that matter. Geometry dominates. This paper is a first attempt at analyzing algorithm behavior for systems like single chip computers, where wire delays are dominant. We begin with the assumption that a memory of size has access time. We show for two algorithms, divideandconquer sort and matrix multiply, that while caching techniques help, they can not completely hide the cost of accessing memory. For sorting, we demonstrate a caching scheme that reduces average memory latency to "! $&%('. For matrix multiply, we show a caching scheme that reduces average memory latency from to *) +. n the next section we discuss the basic caching model. Section 3 provides an analysis of divideandconquer sorting. Section 4 presents the analysis of matrixmultiply. Section 5 discusses some of the broader consequences of memory access costs that grow as memory size increases. 2 Caching A cache is a small memory that is used as a scratchpad during computation. Since the cache is smaller than the main memory its access time is smaller. The hope is that commonly used data elements can be copied into the smaller memory, and then accessed multiple times at the smaller cost. Suppose we have a cache of size,.01, where, 2, is a fraction that represents the cache size as a fraction of the main memory size. Then the cost of accessing the cache is 1. Suppose also that some fraction, 678:9;, of memory accesses miss in the cache and must be satisfied from main memory at a cost, while the remaining <8=9; > the lower cost. Then the average memory latency, A 4@ :9; accesses hit in the cache at 8B, is given by: 8CD 1FË 678:9; HG (1) Note that this equation implies a tradeoff between cache miss rate and cache access time. As the cache grows, the miss rate decreases but the access time grows. We can minimize the average memory latency by taking the derivative of A 8B and setting it equal to 0. 2
3 2 3 A E 1cde 8C 678:9; LKBG (2) 3 DivideandConquer Sorting A basic divideandconquer sort of an element array performs MONQP> steps, each of which touches all array elements. f memory costs then such an algorithm requires R SMONQP" time. Suppose, however, that we are provided with a cache of size,tu1 where 3.4. Then the cost of a cache access will be 1. The sorting algorithm can leverage this faster memory by dividing the array into VTW2 X chunks. Each chunk is copied into the cache, sorted in the cache with the smaller memory cost, and then copied back to main memory. Finally the sorted chunks in main memory are merged together, unfortunately incurring the higher memory cost. Given a cache of size,, the number of accesses that can be performed in the cache is SMONQPY, and the number of accesses to main memory is SMONQPZV 0MONQP"\[ 2 ]MONQP". The miss rate for sorting, 6 sort 8^9;B, is then the number of accesses in main memory divided by the total number of accesses: SMONQP> 0MONQP" M_NQP> MONQP" G (3) Now we can combine the cache access cost and the miss rate to calculate the average memory latency, A sort8`9;c for an element sort with cache of size 1. A sort8`9;c> 1aE ^6 sort8^9;cd 1 Now we find the minimal value for given respect to and setting it equal to 0. A sort8^9;b bm_nqp> Solving this equation gives the optimal value for : 6 sort8^9;cd 8M_NQPb f 0M_NQPb MONQP> G (4) by taking the derivative with LKBG (5) G (6) Finally, we can plug back into A sort to get the optimal average memory access time: 3
4 qr r f e E A sort8^9;bcgh Since the entire algorithm requires SMONQP> time of divideandconquer sorting is: M_NQP= j $&%(' i "k l MONQP" MONQP" M_NQP MONQP*8MONQPb MONQP> MONQP*8MONQPb HG (7) MONQP> memory accesses, the total running SMONQP"^A"csort m SMONQPn8M_NQPb og (8) The extra factor of MONQP=8M_NQPb can $&%(' be elminated by using a multilevel cache hierarchy. For example if we provide memories, each 4 times the size of the previous, then exactly references will be satisfied from each memory. Each memory has an access cost of 2 times the previous. The total running time is then: 4 Tiled Matrix Multiply p l $&%(' s*t u8 (9) The technique for analyzing tiled matrix multiplication is similar to the technique we used in the previous section. The algorithm we examine is as follows: for i = 1 to M by T for j = 1 to M by T for k = 1 to M by T for ii = i to i+t1 for jj = j to j+t1 c = C[ii,jj] for kk = k to k+t1 c = c + A[ii,kk] * B[kk,jj] C[ii,jj] = c This algorithm uses a tiling factor v. Each vuwxv submatrix of the y w y matrices A and B is brought into the cache. Each element is accessed from the 4
5 4 v ) v f cache v times before being replaced. Since the main memory size is zf{ y the memory access time is. Since the cache size is v f the cache access time is v. We can then calculate the average memory latency, A mm 8^9v for matrix multiply. A mm 8^9v D}v~E G (10) To find the optimal tiling factor, v, given, we must take the derivative and set it equal to 0. A mm8^9v 4 v f The solution to this equation yield the optimal value for v. LKBG (11) v c ) (12) When we insert v c into A mm 8^9v we get the optimal average memory access time: A mm8^9v ) TE u ) xog (13) Thus, matrix multiplication, which would be an 8y i 8 f algorithm without caching, is improved to 8 2( ƒ with a tile of size y w y. This is a factor of ) greater than would be found in an analysis assuming memory costs of. 5 mplications The results of this paper strongly indicate that single chip computer systems, even those with just a single processor, should be treated as distributed systems. This is excellent news, since there is a large body of established techniques for dealing with latency problems in distributed systems. The most promising of these are using prefetching to leverage the large available communication bandwidth to overlap multiple latencies, and distributing computation by putting some processing resources near each portion of memory so that the data doesn t need to be moved at all. On the flip side, these results call into question the efficacy of traditional areatime tradeoffs. n single chip computer systems, distance and time are equivalent so adding area adds time. The problem with this tradeoff becomes even more 5
6 apparent in the energy domain. While this paper has focused on application speed, it could have just as well focused on energy consumption. n single chip computer systems, the energy consumed is also proportional to the sum of distances that signals need to travel. While prefetching can trade off increased bandwidth requirements to overlap high latency costs, it does not reduce the application energy costs. Only geometric optimizations that reduce the signal propogation distance can improve energy consumption. Finally, these results suggest that parallel applications may not be as inefficient factors that we observe in the memory as is traditionally believed. The extra latency analyses in Sections 3 and 4 seem similar to the extra factors that are often observed in applications parallelized onto mesh based communication networks. The results in this paper indicate that these additional factors are not overheads from parallelization, but may actually represent a fundamental cost of computing in finite dimensional space. 6
Cache-Efficient Algorithms
6.172 Performance Engineering of Software Systems LECTURE 8 Cache-Efficient Algorithms Charles E. Leiserson October 5, 2010 2010 Charles E. Leiserson 1 Ideal-Cache Model Recall: Two-level hierarchy. Cache
More informationProgram Transformations for the Memory Hierarchy
Program Transformations for the Memory Hierarchy Locality Analysis and Reuse Copyright 214, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California
More informationHomework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization
ECE669: Parallel Computer Architecture Fall 2 Handout #2 Homework # 2 Due: October 6 Programming Multiprocessors: Parallelism, Communication, and Synchronization 1 Introduction When developing multiprocessor
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2011/12 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2011/12 1 2
More informationMemory Hierarchy Computing Systems & Performance MSc Informatics Eng. Memory Hierarchy (most slides are borrowed)
Computing Systems & Performance Memory Hierarchy MSc Informatics Eng. 2012/13 A.J.Proença Memory Hierarchy (most slides are borrowed) AJProença, Computer Systems & Performance, MEI, UMinho, 2012/13 1 2
More informationCSC630/CSC730 Parallel & Distributed Computing
CSC630/CSC730 Parallel & Distributed Computing Analytical Modeling of Parallel Programs Chapter 5 1 Contents Sources of Parallel Overhead Performance Metrics Granularity and Data Mapping Scalability 2
More informationAlgorithms and Architecture. William D. Gropp Mathematics and Computer Science
Algorithms and Architecture William D. Gropp Mathematics and Computer Science www.mcs.anl.gov/~gropp Algorithms What is an algorithm? A set of instructions to perform a task How do we evaluate an algorithm?
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationCache Performance! ! Memory system and processor performance:! ! Improving memory hierarchy performance:! CPU time = IC x CPI x Clock time
Cache Performance!! Memory system and processor performance:! CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st = Pipeline time +
More informationAnnouncements. ! Previous lecture. Caches. Inf3 Computer Architecture
Announcements! Previous lecture Caches Inf3 Computer Architecture - 2016-2017 1 Recap: Memory Hierarchy Issues! Block size: smallest unit that is managed at each level E.g., 64B for cache lines, 4KB for
More informationLecture 7 Notes: 07 / 11. Reflection and refraction
Lecture 7 Notes: 07 / 11 Reflection and refraction When an electromagnetic wave, such as light, encounters the surface of a medium, some of it is reflected off the surface, while some crosses the boundary
More informationEE/CSCI 451 Midterm 1
EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming
More informationECE 486/586. Computer Architecture. Lecture # 2
ECE 486/586 Computer Architecture Lecture # 2 Spring 2015 Portland State University Recap of Last Lecture Old view of computer architecture: Instruction Set Architecture (ISA) design Real computer architecture:
More informationEE 352 Lab 4 Cache Me If You Can
EE 352 Lab 4 Cache Me If You Can 1 Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply
More informationHomework # 1 Due: Feb 23. Multicore Programming: An Introduction
C O N D I T I O N S C O N D I T I O N S Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.86: Parallel Computing Spring 21, Agarwal Handout #5 Homework #
More informationEastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.
Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest
More information(Refer Slide Time: 01:25)
Computer Architecture Prof. Anshul Kumar Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture - 32 Memory Hierarchy: Virtual Memory (contd.) We have discussed virtual
More informationAlgorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory I
Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language
More informationwhat operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?
Inside the CPU how does the CPU work? what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored? some short, boring programs to illustrate the
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCS377P Programming for Performance Single Thread Performance Caches I
CS377P Programming for Performance Single Thread Performance Caches I Sreepathi Pai UTCS September 21, 2015 Outline 1 Introduction 2 Caches 3 Performance of Caches Outline 1 Introduction 2 Caches 3 Performance
More informationConfiguration Caching Techniques for FPGA
Submitted to IEEE Symposium on FPGAs for Custom Computing Machines, 2000. Configuration Caching Techniques for FPGA Zhiyuan Li, Katherine Compton Department of Electrical and Computer Engineering Northwestern
More informationComputer Science 210 Data Structures Siena College Fall Topic Notes: Complexity and Asymptotic Analysis
Computer Science 210 Data Structures Siena College Fall 2017 Topic Notes: Complexity and Asymptotic Analysis Consider the abstract data type, the Vector or ArrayList. This structure affords us the opportunity
More informationWriting Parallel Programs; Cost Model.
CSE341T 08/30/2017 Lecture 2 Writing Parallel Programs; Cost Model. Due to physical and economical constraints, a typical machine we can buy now has 4 to 8 computing cores, and soon this number will be
More informationCS 204 Lecture Notes on Elementary Network Analysis
CS 204 Lecture Notes on Elementary Network Analysis Mart Molle Department of Computer Science and Engineering University of California, Riverside CA 92521 mart@cs.ucr.edu October 18, 2006 1 First-Order
More informationDense Matrix Algorithms
Dense Matrix Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text Introduction to Parallel Computing, Addison Wesley, 2003. Topic Overview Matrix-Vector Multiplication
More information(Lec 14) Placement & Partitioning: Part III
Page (Lec ) Placement & Partitioning: Part III What you know That there are big placement styles: iterative, recursive, direct Placement via iterative improvement using simulated annealing Recursive-style
More informationImplementing a Statically Adaptive Software RAID System
Implementing a Statically Adaptive Software RAID System Matt McCormick mattmcc@cs.wisc.edu Master s Project Report Computer Sciences Department University of Wisconsin Madison Abstract Current RAID systems
More informationAlgorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II
Memory Performance of Algorithms CSE 32 Data Structures Lecture Algorithm Performance Factors Algorithm choices (asymptotic running time) O(n 2 ) or O(n log n) Data structure choices List or Arrays Language
More informationA Correlation of. to the. Common Core State Standards for Mathematics Bid Category Grade 5
A Correlation of to the Bid Category 11-010-50 A Correlation of, to the Operations and Algebraic Thinking Write and interpret numerical expressions. [5.OA.A.1]Use parentheses, brackets, or braces in numerical
More informationCS 31: Intro to Systems Caching. Martin Gagne Swarthmore College March 23, 2017
CS 1: Intro to Systems Caching Martin Gagne Swarthmore College March 2, 2017 Recall A cache is a smaller, faster memory, that holds a subset of a larger (slower) memory We take advantage of locality to
More informationGrade K 8 Standards Grade 5
Grade 5 In grade 5, instructional time should focus on three critical areas: (1) developing fluency with addition and subtraction of fractions, and developing understanding of the multiplication of fractions
More informationEECS4201 Computer Architecture
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis These slides are based on the slides provided by the publisher. The slides will be
More informationCommon Core Standards 5 th Grade - Mathematics
Common Core Standards 5 th Grade - Mathematics Operations and Algebraic Thinking Write and interpret numerical expressions. 1. Use parenthesis, brackets, or braces in numerical expressions, and evaluate
More informationMathematics Grade 5. grade 5 33
Mathematics Grade 5 In Grade 5, instructional time should focus on three critical areas: (1) developing fluency with addition and subtraction of fractions, and developing understanding of the multiplication
More informationMemory Hierarchy Basics
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases
More informationMemory Hierarchy. Advanced Optimizations. Slides contents from:
Memory Hierarchy Advanced Optimizations Slides contents from: Hennessy & Patterson, 5ed. Appendix B and Chapter 2. David Wentzlaff, ELE 475 Computer Architecture. MJT, High Performance Computing, NPTEL.
More informationGrade 5. Massachusetts Curriculum Framework for Mathematics 48
Grade 5 Introduction In grade 5, instructional time should focus on four critical areas: (1) developing fluency with addition and subtraction of fractions, and developing understanding of the multiplication
More informationNoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad
NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third
More informationEE 352 Lab 5 Cache Me If You Can
EE 52 Lab 5 Cache Me If You Can Introduction In this lab you use your straightforward triple-nested loop implementation of a matrix multiply while implementing a second blocked version of matrix multiply
More informationHardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University
Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis
More informationStack Machines. Towards Scalable Stack Based Parallelism. 1 of 53. Tutorial Organizer: Dr Chris Crispin-Bailey
1 of 53 Stack Machines Towards Scalable Stack Based Parallelism Tutorial Organizer: Department of Computer Science University of York 2 of 53 Today s Speakers Dr Mark Shannon Dr Huibin Shi 3 of 53 Stack
More informationComputer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi
Computer Architecture Prof. Smruthi Ranjan Sarangi Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 32 The Memory Systems Part III Welcome back. (Refer Slide
More informationUse grouping symbols including parentheses, brackets, or braces in numerical expressions, and evaluate expressions with these symbols
Operations and Algebraic Thinking AR.Math.Content.5.OA.A.1 AR.Math.Content.5.OA.A.2 Write and interpret numerical expressions Use grouping symbols including parentheses, brackets, or braces in numerical
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology
More informationProblem 2 If the cost of a 12 inch wafer (actually 300mm) is $3500, what is the cost/die for the circuit in Problem 1.
EE 330 Homework 1 Fall 2016 Due Friday Aug 26 Problem 1 Assume a simple circuit requires 1,000 MOS transistors on a die and that all transistors are minimum sized. If the transistors are fabricated in
More informationComputer Architecture. A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive per
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory
More informationInternational Journal of Scientific & Engineering Research, Volume 5, Issue 2, February ISSN
International Journal of Scientific & Engineering Research, Volume 5, Issue 2, February-2014 938 LOW POWER SRAM ARCHITECTURE AT DEEP SUBMICRON CMOS TECHNOLOGY T.SANKARARAO STUDENT OF GITAS, S.SEKHAR DILEEP
More informationMemory Hierarchy Basics. Ten Advanced Optimizations. Small and Simple
Memory Hierarchy Basics Six basic cache optimizations: Larger block size Reduces compulsory misses Increases capacity and conflict misses, increases miss penalty Larger total cache capacity to reduce miss
More informationModule 10: "Design of Shared Memory Multiprocessors" Lecture 20: "Performance of Coherence Protocols" MOESI protocol.
MOESI protocol Dragon protocol State transition Dragon example Design issues General issues Evaluating protocols Protocol optimizations Cache size Cache line size Impact on bus traffic Large cache line
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationDistributed Virtual Reality Computation
Jeff Russell 4/15/05 Distributed Virtual Reality Computation Introduction Virtual Reality is generally understood today to mean the combination of digitally generated graphics, sound, and input. The goal
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationCommon Core Standards for Mathematics. Grade 5. Operations and Algebraic Thinking Date Taught
Operations and Algebraic Thinking Write and interpret numerical expressions. 5.OA.1. Use parentheses, brackets, or braces in numerical expressions, and evaluate expressions with these symbols. 5.OA.2.
More informationAdapted from David Patterson s slides on graduate computer architecture
Mei Yang Adapted from David Patterson s slides on graduate computer architecture Introduction Ten Advanced Optimizations of Cache Performance Memory Technology and Optimizations Virtual Memory and Virtual
More informationCommon Core State Standard for Mathematics
Domain: Operations and Algebraic Clusters: Write and interpret numerical expressions 1. Use parentheses, brackets, or braces in numerical expressions and evaluate expressions with these symbols. CC.5.OA.1
More informationTeaching Math thru Big Ideas Focusing on Differentiation. Marian Small April 2017 San Diego, CA
Teaching Math thru Big Ideas Focusing on Differentiation Marian Small April 2017 San Diego, CA Our focus today Differentiation by focusing on big ideas Formative assessment/feedback Planning lessons/units
More informationEITF20: Computer Architecture Part4.1.1: Cache - 2
EITF20: Computer Architecture Part4.1.1: Cache - 2 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache performance optimization Bandwidth increase Reduce hit time Reduce miss penalty Reduce miss
More informationCSE 548 Computer Architecture. Clock Rate vs IPC. V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger. Presented by: Ning Chen
CSE 548 Computer Architecture Clock Rate vs IPC V. Agarwal, M. S. Hrishikesh, S. W. Kechler. D. Burger Presented by: Ning Chen Transistor Changes Development of silicon fabrication technology caused transistor
More informationPROGRAM EFFICIENCY & COMPLEXITY ANALYSIS
Lecture 03-04 PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS By: Dr. Zahoor Jan 1 ALGORITHM DEFINITION A finite set of statements that guarantees an optimal solution in finite interval of time 2 GOOD ALGORITHMS?
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 6 Coding I Chapter 3 Information Redundancy Part.6.1 Information Redundancy - Coding A data word with d bits is encoded
More informationParallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)
Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication
More informationVIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs. April 6 th, 2009
VIII. Communication costs, routing mechanism, mapping techniques, cost-performance tradeoffs April 6 th, 2009 Message Passing Costs Major overheads in the execution of parallel programs: from communication
More informationBasic Communication Operations (Chapter 4)
Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:
More information2. TOPOLOGICAL PATTERN ANALYSIS
Methodology for analyzing and quantifying design style changes and complexity using topological patterns Jason P. Cain a, Ya-Chieh Lai b, Frank Gennari b, Jason Sweis b a Advanced Micro Devices, 7171 Southwest
More informationAddition and Subtraction
PART Looking Back At: Grade Number and Operations 89 Geometry 9 Fractions 94 Measurement 9 Data 9 Number and Operations 96 Geometry 00 Fractions 0 Measurement 02 Data 0 Looking Forward To: Grade Number
More informationEI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)
EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems) Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building
More informationMath 9 Final Exam Review and Outline
Math 9 Final Exam Review and Outline Your Final Examination in Mathematics 9 is a comprehensive final of all material covered in the course. It is broken down into the three sections: Number Sense, Patterns
More information5 th Grade LEUSD Learning Targets in Mathematics
5 th Grade LEUSD Learning Targets in Mathematics 6/24/2015 The learning targets below are intended to provide a guide for teachers in determining whether students are exhibiting characteristics of being
More informationDCSD Common Core State Standards Math Pacing Guide 5th Grade. Trimester 1
Trimester 1 CCSS Mathematical Practices 1.Make sense of problems and persevere in solving them. 2.Reason abstractly and quantitatively. 3.Construct viable arguments and critique the reasoning of others.
More informationMultiprocessor Cache Coherency. What is Cache Coherence?
Multiprocessor Cache Coherency CS448 1 What is Cache Coherence? Two processors can have two different values for the same memory location 2 1 Terminology Coherence Defines what values can be returned by
More informationI/O Model. Cache-Oblivious Algorithms : Algorithms in the Real World. Advantages of Cache-Oblivious Algorithms 4/9/13
I/O Model 15-853: Algorithms in the Real World Locality II: Cache-oblivious algorithms Matrix multiplication Distribution sort Static searching Abstracts a single level of the memory hierarchy Fast memory
More informationEmbedded processors. Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.
Embedded processors Timo Töyry Department of Computer Science and Engineering Aalto University, School of Science timo.toyry(at)aalto.fi Comparing processors Evaluating processors Taxonomy of processors
More informationCCBC Math 081 Geometry Section 2.2
2.2 Geometry Geometry is the study of shapes and their mathematical properties. In this section, we will learn to calculate the perimeter, area, and volume of a few basic geometric shapes. Perimeter We
More informationFundamentals of Quantitative Design and Analysis
Fundamentals of Quantitative Design and Analysis Dr. Jiang Li Adapted from the slides provided by the authors Computer Technology Performance improvements: Improvements in semiconductor technology Feature
More informationChapter 5A. Large and Fast: Exploiting Memory Hierarchy
Chapter 5A Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) Fast, expensive Dynamic RAM (DRAM) In between Magnetic disk Slow, inexpensive Ideal memory Access time of SRAM
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Edited by Mansour Al Zuair 1 Introduction Programmers want unlimited amounts of memory with low latency Fast
More informationOverview. Memory Classification Read-Only Memory (ROM) Random Access Memory (RAM) Functional Behavior of RAM. Implementing Static RAM
Memories Overview Memory Classification Read-Only Memory (ROM) Types of ROM PROM, EPROM, E 2 PROM Flash ROMs (Compact Flash, Secure Digital, Memory Stick) Random Access Memory (RAM) Types of RAM Static
More informationTransistors and Wires
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis Part II These slides are based on the slides provided by the publisher. The slides
More information5.OA.1 5.OA.2. The Common Core Institute
Operations and Algebraic Thinking The Common Core Institute Cluster: Write and interpret numerical expressions. 5.OA.1: Use parentheses, brackets, or braces in numerical expressions, and evaluate expressions
More informationCombinational hazards
Combinational hazards We break down combinational hazards into two major categories, logic hazards and function hazards. A logic hazard is characterized by the fact that it can be eliminated by proper
More informationCopyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 1 Fundamentals of Quantitative Design and Analysis 1 Computer Technology Performance improvements: Improvements in semiconductor technology
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory accesses/operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide
More informationAdvanced Topics UNIT 2 PERFORMANCE EVALUATIONS
Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS Structure Page Nos. 2.0 Introduction 4 2. Objectives 5 2.2 Metrics for Performance Evaluation 5 2.2. Running Time 2.2.2 Speed Up 2.2.3 Efficiency 2.3 Factors
More informationMathematics Assessment Anchors and Eligible Content
Mathematics Assessment Anchors and Eligible Content Aligned to Pennsylvania Common Core Standards www.pdesas.org www.education.state.pa.us 2012 Pennsylvania System of School Assessment The Assessment Anchors,
More informationECE 669 Parallel Computer Architecture
ECE 669 Parallel Computer Architecture Lecture 9 Workload Evaluation Outline Evaluation of applications is important Simulation of sample data sets provides important information Working sets indicate
More informationth Grade Math Curriculum Map
Standards Quarter 1 Dates Taught (For Teacher Use) Number and Operations in Base Ten Understand the place value system (Major Work) 5.NBT.1 Recognize that in a multi-digit number, a digit in one place
More informationProject Proposals. 1 Project 1: On-chip Support for ILP, DLP, and TLP in an Imagine-like Stream Processor
EE482C: Advanced Computer Organization Lecture #12 Stream Processor Architecture Stanford University Tuesday, 14 May 2002 Project Proposals Lecture #12: Tuesday, 14 May 2002 Lecturer: Students of the class
More informationLECTURE 5: MEMORY HIERARCHY DESIGN
LECTURE 5: MEMORY HIERARCHY DESIGN Abridged version of Hennessy & Patterson (2012):Ch.2 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more expensive
More informationCache Memory and Performance
Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)
More informationCS 31: Intro to Systems Caching. Kevin Webb Swarthmore College March 24, 2015
CS 3: Intro to Systems Caching Kevin Webb Swarthmore College March 24, 205 Reading Quiz Abstraction Goal Reality: There is no one type of memory to rule them all! Abstraction: hide the complex/undesirable
More informationDigital Electronics 27. Digital System Design using PLDs
1 Module -27 Digital System Design 1. Introduction 2. Digital System Design 2.1 Standard (Fixed function) ICs based approach 2.2 Programmable ICs based approach 3. Comparison of Digital System Design approaches
More information10/16/2017. Miss Rate: ABC. Classifying Misses: 3C Model (Hill) Reducing Conflict Misses: Victim Buffer. Overlapping Misses: Lockup Free Cache
Classifying Misses: 3C Model (Hill) Divide cache misses into three categories Compulsory (cold): never seen this address before Would miss even in infinite cache Capacity: miss caused because cache is
More informationMA 323 Geometric Modelling Course Notes: Day 21 Three Dimensional Bezier Curves, Projections and Rational Bezier Curves
MA 323 Geometric Modelling Course Notes: Day 21 Three Dimensional Bezier Curves, Projections and Rational Bezier Curves David L. Finn Over the next few days, we will be looking at extensions of Bezier
More informationCHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 Advance Encryption Standard (AES) Rijndael algorithm is symmetric block cipher that can process data blocks of 128 bits, using cipher keys with lengths of 128, 192, and 256
More information6. Parallel Volume Rendering Algorithms
6. Parallel Volume Algorithms This chapter introduces a taxonomy of parallel volume rendering algorithms. In the thesis statement we claim that parallel algorithms may be described by "... how the tasks
More informationEITF20: Computer Architecture Part 5.1.1: Virtual Memory
EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache
More information