Giving credit where credit is due
|
|
- Spencer McBride
- 5 years ago
- Views:
Transcription
1 CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege Mellon Unersty. I hae modfed them and added new sldes. 2 Topcs Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed automatcally n hardware. Hold frequently accessed blocks of man memory CPU looks frst for data n L, then n L2, then n man memory. Typcal bus structure: CPU chp regster fle L ALU cache cache bus system bus memory bus L2 cache bus nterface I/O brdge man memory 3 4 Insertng an L Cache Between the CPU and Man Memory The transfer unt between the CPU regster fle and the cache s a 4-byte block. lne lne The transfer unt between the cache and man memory s a 4-word block (6 bytes). block block 2 block 3 a b c d... p q r s... w x y z... The tny, ery fast CPU regster fle has room for four 4-byte words. The small fast L cache has room for two 4-word blocks. The bg slow man memory has room for many 4-word blocks. 5 General Org of a Cache Memory Cache s an array of sets. Each set contans one or more lnes. Each lne holds a block of data. S = 2 s sets set : set : set S-: bt t bts per lne per lne B = 2 b bytes per B B B Cache sze: C = B x E x S data bytes B B B E lnes per set 6 Page
2 set : set : set S-: Addressng Caches B B B B B B Address A: t bts s bts b bts m- <> <set ndex> <block offset> The word at address A s n the cache f the bts n one of the <> lnes n set <set ndex> match <>. The word contents begn at offset <block offset> bytes from the begnnng of the block. Drect-Mapped Cache Smplest knd of cache Characterzed by exactly one lne per set. set : set : set S-: E= lnes per set 7 8 Accessng Drect-Mapped Caches Set selecton Use the set ndex bts to determne the set of nterest. Accessng Drect-Mapped Caches Lne matchng and word selecton Lne matchng: Fnd a lne n the selected set wth a matchng Word selecton: Then extract the word m- t bts selected set s bts b bts set ndex block offset set : set : set S-: selected set (): (2) The bts n the cache lne must match the =? bts n the address =? () The bt must be set w w w 2 w 3 t bts m s bts b bts set ndex block offset (3) If () and (2), then cache ht, and block offset selects startng byte. 9 Drect-Mapped Cache Smulaton t= s=2 b= x xx x () (4) M=6 byte addresses, B=2 bytes/block, S=4 sets, E= entry/set Address trace (reads): [ 2 ], [ 2 ], 3 [ 2 ], 8 [ 2 ], [ 2 ] [ 2 ] (mss) data m[] M[-] m[] 8 [ 2 ] (mss) data m[9] M[8-9] m[8] M[2-3] (3) (5) 3 [ 2 ] (mss) data m[] M[-] m[] m[3] M[2-3] m[2] [ 2 ] (mss) data m[] M[-] m[] m[3] M[2-3] m[2] Why Use Mddle Bts as Index? 4-lne Cache Hgh-Order Bt Indexng Adjacent memory lnes would map to same cache entry Poor use of spatal localty Mddle-Order Bt Indexng Consecute memory lnes map to dfferent cache lnes Can hold C-byte regon of address space n cache at one tme Hgh-Order Bt Indexng Mddle-Order Bt Indexng 2 Page 2
3 Set Assocate Caches Characterzed by more than one lne per set Accessng Set Assocate Caches Set selecton dentcal to drect-mapped cache set : E=2 lnes per set set : set : set S-: m- t bts Selected set s bts b bts set ndex block offset set : set S-: 3 4 Accessng Set Assocate Caches Lne matchng and word selecton must compare the n each lne n the selected set. Mult-Leel Caches Optons: separate data and nstructon caches, or a unfed cache selected set (): =? () The bt must be set w w w 2 w 3 Processor Regs L d-cache L -cache Unfed L2 Cache Memory dsk (2) The bts n one of the cache lnes must =? match the bts n the address t bts m- s bts b bts set ndex block offset (3) If () and (2), then cache ht, and block offset selects startng byte. sze: speed: $/Mbyte: lne sze: 2 B 3 ns 8-64 KB 3 ns 8 B 32 B 32 B larger, slower, cheaper -4MB SRAM 6 ns $/MB 28 MB DRAM 6 ns $.5/MB 8 KB 3 GB 8 ms $.5/MB 5 6 Intel Pentum Cache Herarchy Cache Performance Metrcs Regs. L Data cycle latency 6 KB 4-way assoc Wrte-through 32B lnes L Instructon 6 KB, 4-way 32B lnes Processor Chp L2 Unfed 28KB--2 MB 4-way assoc Wrte-back Wrte allocate 32B lnes Man Memory Up to 4GB Mss Rate Fracton of memory references not found n cache (msses/references) Typcal numbers: 3-% for L can be qute small (e.g., < %) for L2, dependng on sze, etc. Ht Tme Tme to deler a lne n the cache to the processor (ncludes tme to determne whether the lne s n the cache) Typcal numbers: clock cycle for L 3-8 clock cycles for L2 Mss Penalty Addtonal tme requred because of a mss Typcally 25- cycles for man memory 7 8 Page 3
4 Wrtng Cache Frendly Code Repeated references to arables are good (temporal localty) Strde- reference patterns are good (spatal localty) Examples: cold cache, 4-byte words, 4-word s nt sumarrayrows(nt a[m][n]) nt, j, sum = ; nt sumarraycols(nt a[m][n]) nt, j, sum = ; The Memory Mountan Read throughput (read bandwdth) Number of bytes read from memory per second (MB/s) Memory mountan Measured read throughput as a functon of spatal and temporal localty. Compact way to characterze memory system performance. for ( = ; < M; ++) for (j = ; j < N; j++) sum += a[][j]; return sum; for (j = ; j < N; j++) for ( = ; < M; ++) sum += a[][j]; return sum; Mss rate = /4 = 25% Mss rate = % 9 2 Memory Mountan Test Functon /* The test functon */ od test(nt elems, nt strde) nt, result = ; olatle nt snk; for ( = ; < elems; += strde) result += data[]; snk = result; /* So compler doesn't optmze away the loop */ /* Run test(elems, strde) and return read throughput (MB/s) */ double run(nt sze, nt strde, double Mhz) double cycles; nt elems = sze / szeof(nt); test(elems, strde); /* warm up the cache */ cycles = fcyc2(test, elems, strde, ); /* call test(elems,strde) */ return (sze / strde) / (cycles / Mhz); /* conert cycles to MB/s */ Memory Mountan Man Routne /* mountan.c - Generate the memory mountan. */ #defne MINBYTES ( << ) /* Workng set sze ranges from KB */ #defne MAXBYTES ( << 23) /*... up to 8 MB */ #defne MAXSTRIDE 6 /* Strdes range from to 6 */ #defne MAXELEMS MAXBYTES/szeof(nt) nt data[maxelems]; /* The array we'll be traersng */ nt man() nt sze; /* Workng set sze (n bytes) */ nt strde; /* Strde (n array elements) */ double Mhz; /* Clock frequency */ nt_data(data, MAXELEMS); /* Intalze each element n data to */ Mhz = mhz(); /* Estmate the clock frequency */ for (sze = MAXBYTES; sze >= MINBYTES; sze >>= ) for (strde = ; strde <= MAXSTRIDE; strde++) prntf("%.f\t", run(sze, strde, Mhz)); prntf("\n"); ext(); 2 22 The Memory Mountan Rdges of Temporal Localty read throughput (MB/s) Slopes of Spatal Localty xe L2 L Pentum III Xeon 55 MHz 6 KB on-chp L d-cache 6 KB on-chp L -cache 52 KB off-chp unfed L2 cache Rdges of Temporal Localty Slce through the memory mountan wth strde= llumnates read throughputs of dfferent caches and memory read througput (MB/s) man memory regon L2 cache regon L cache regon s s3 s5 mem 8k 2k 2 strde (words) s7 s9 s s3 s5 8m 2m 52k 28k 32k workng set sze (bytes) 23 8m 4m 2m 24k 52k 256k 28k 64k 32k 6k workng set sze (bytes) 8k 4k 2k k 24 Page 4
5 A Slope of Spatal Localty Slce through memory mountan wth sze=256kb shows sze. read throughput (MB/s) one access per cache lne s s2 s3 s4 s5 s6 s7 s8 s9 s s s2 s3 s4 s5 s6 strde (words) Matrx Multplcaton Example Major Cache Effects to Consder Total cache sze Explot temporal localty and keep the workng set small (e.g., by usng blockng) /* jk */ Varable sum Block sze for (=; <n; ++) held n regster Explot spatal localty for (j=; j<n; j++) sum =.; for (k=; k<n; k++) Descrpton: Multply N x N matrces c[][j] = sum; O(N3) total operatons Accesses N reads per source element N alues summed per destnaton» but may be able to hold n regster Mss Rate Analyss for Matrx Multply Assume: Lne sze = 32B (bg enough for 4 64-bt words) Matrx dmenson (N) s ery large Approxmate /N as. Cache s not een bg enough to hold multple rows Analyss Method: Look at access pattern of nner loop k A k j B j C Layout of C Arrays n Memory (reew) C arrays allocated n row-major order each row n contguous memory locatons Steppng through columns n one row: for ( = ; < N; ++) sum += a[][]; accesses successe elements f block sze (B) > 4 bytes, explot spatal localty compulsory mss rate = 4 bytes / B Steppng through rows n one column: for ( = ; < n; ++) sum += a[][]; accesses dstant elements no spatal localty! compulsory mss rate = (.e. %) Matrx Multplcaton (jk) Matrx Multplcaton (jk) /* jk */ for (=; <n; ++) for (j=; j<n; j++) sum =.; for (k=; k<n; k++) c[][j] = sum;.25.. (,j) Row-wse Fxed /* jk */ for (j=; j<n; j++) for (=; <n; ++) sum =.; for (k=; k<n; k++) c[][j] = sum.25.. (,j) Row-wse Fxed 29 3 Page 5
6 Matrx Multplcaton (kj) Matrx Multplcaton (kj) /* kj */ for (k=; k<n; k++) for (=; <n; ++) r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; (,k) (k,*) Fxed Row-wse Row-wse /* kj */ for (=; <n; ++) for (k=; k<n; k++) r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; (,k) (k,*) Fxed Row-wse Row-wse Matrx Multplcaton (jk) Matrx Multplcaton (kj) /* jk */ for (j=; j<n; j++) for (k=; k<n; k++) r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r; (*,k) Column - wse... (k,j) Fxed /* kj */ for (k=; k<n; k++) for (j=; j<n; j++) r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r;... (*,k) (k,j) Fxed Summary of Matrx Multplcaton Pentum Matrx Multply Performance jk (& jk): 2 loads, stores msses/ter =.25 kj (& kj): 2 loads, store msses/ter =.5 jk (& kj): 2 loads, store msses/ter = 2. Mss rates are helpful but not perfect predctors. Code schedulng matters, too. 6 for (=; <n; ++) for (j=; j<n; j++) for (k=; k<n; k++) for (=; <n; ++) for (j=; j<n; j++) for (k=; k<n; k++) 5 sum =.; for (k=; k<n; k++) c[][j] = sum; r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r; Cycles/teraton kj jk kj kj jk jk Array sze (n) 36 Page 6
7 Improng Temporal Localty by Blockng Example: Blocked matrx multplcaton block (n ths context) does not mean. Instead, t mean a sub-block wthn the matrx. Example: N = 8; sub-block sze = 4 A A 2 A 2 A 22 X B B 2 B 2 B 22 C = A B + A 2 B 2 C 2 = A B 2 + A 2 B 22 C 2 = A 2 B + A 22 B 2 C 22 = A 2 B 2 + A 22 B 22 = C C 2 C 2 C 22 Key dea: Sub-blocks (.e., A xy ) can be treated just lke scalars. Blocked Matrx Multply (bjk) for (jj=; jj<n; jj+=bsze) for (=; <n; ++) for (j=jj; j < mn(jj+bsze,n); j++) c[][j] =.; for (kk=; kk<n; kk+=bsze) for (=; <n; ++) for (j=jj; j < mn(jj+bsze,n); j++) sum =. for (k=kk; k < mn(kk+bsze,n); k++) c[][j] += sum; Blocked Matrx Multply Analyss Innermost loop par multples a X bsze sler of A by a bsze X bsze block of B and accumulates nto X bsze sler of C Loop oer steps through n row slers of A & C, usng same B for (=; <n; ++) for (j=jj; j < mn(jj+bsze,n); j++) sum =. for (k=kk; k < mn(kk+bsze,n); k++) Innermost c[][j] += sum; Loop Par kk jj jj kk row sler accessed Update successe bsze tmes block reused n elements of sler tmes n successon 39 Pentum Blocked Matrx Multply Performance Blockng (bjk and bkj) mproes performance by a factor of two oer unblocked ersons (jk and jk) Cycles/teraton relately nsenste to array sze Array sze (n) kj jk kj kj jk jk bjk (bsze = 25) bkj (bsze = 25) 4 Concludng Obseratons Programmer can optmze for cache performance How data structures are organzed How data are accessed Nested loop structure Blockng s a general technque All systems faor cache frendly code Gettng absolute optmum performance s ery platform specfc Cache szes, lne szes, assocattes, etc. Can get most of the adane wth generc code Keep workng set reasonably small (temporal localty) Use small strdes (spatal localty) 4 Page 7
Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed
More informationGiving credit where credit is due
CSCE 23J Computer Organization Cache Memories Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce23j Giving credit where credit is due Most of slides for this lecture are based
More informationCache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
5-23 The course that gies CMU its Zip! Topics Cache Memories Oct., 22! Generic cache memory organization! Direct mapped caches! Set associatie caches! Impact of caches on performance Cache Memories Cache
More informationν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines.
Topics CISC 36 Cache Memories Dec, 29 ν Generic cache memory organization ν Direct mapped caches ν Set associatie caches ν Impact of caches on performance Cache Memories Cache memories are small, fast
More informationCache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory
5-23 The course that gies CMU its Zip! Cache Memories Oct, 2 Topics Generic cache memory organization Direct mapped caches Set associatie caches Impact of caches on performance Cache memories Cache memories
More informationCISC 360. Cache Memories Nov 25, 2008
CISC 36 Topics Cache Memories Nov 25, 28 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Cache memories are small, fast SRAM-based
More informationCache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010
Cache Memories EL21 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 21 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of
More informationCache Memories October 8, 2007
15-213 Topics Cache Memories October 8, 27 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountain class12.ppt Cache Memories Cache
More informationSystems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations
Systems I Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Cache Performance Metrics Miss Rate Fraction of memory references not found in cache
More informationMemory Hierarchy. Announcement. Computer system model. Reference
Announcement Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 26//5 Grade for hw#4 is online Please DO submit homework if you haen t Please sign up a demo time on /6 or /7 at
More informationCache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance
Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,
More informationLast class. Caches. Direct mapped
Memory Hierarchy II Last class Caches Direct mapped E=1 (One cache line per set) Each main memory address can be placed in exactly one place in the cache Conflict misses if two addresses map to same place
More informationCache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access
Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?
More informationLecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.
Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons
More informationToday Cache memory organization and operation Performance impact of caches
Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality
More informationCache memories are small, fast SRAM based memories managed automatically in hardware.
Cache Memories Cache memories are small, fast SRAM based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More information211: Computer Architecture Summer 2016
211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University
More informationThe course that gives CMU its Zip! Memory System Performance. March 22, 2001
15-213 The course that gives CMU its Zip! Memory System Performance March 22, 2001 Topics Impact of cache parameters Impact of memory reference patterns memory mountain range matrix multiply Basic Cache
More informationMemory and I/O Organization
Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest
More informationAgenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories
Agenda Chapter 6 Cache Memories Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal
More informationCache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.
Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging
More informationToday. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,
Today Cache Memories CSci 2021: Machine Architecture and Organization November 7th-9th, 2016 Your instructor: Stephen McCamant Cache memory organization and operation Performance impact of caches The memory
More informationDenison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud
Cache Memories CS-281: Introduction to Computer Systems Instructor: Thomas C. Bressoud 1 Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally
More informationExample. How are these parameters decided?
Example How are these parameters decided? Comparing cache organizations Like many architectural features, caches are evaluated experimentally. As always, performance depends on the actual instruction mix,
More informationCarnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron
Cache Memories Computer Architecture Instructor: Norbert Lu1enberger based on the book by Randy Bryant and Dave O Hallaron 1 Today Cache memory organiza7on and opera7on Performance impact of caches The
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationComputer Organization: A Programmer's Perspective
Computer Architecture and The Memory Hierarchy Oren Kapah orenkapah.ac@gmail.com Typical Computer Architecture CPU chip PC (Program Counter) register file AL U Main Components CPU Main Memory Input/Output
More information4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.
//7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the
More informationCarnegie Mellon. Cache Memories
Cache Memories Thanks to Randal E. Bryant and David R. O Hallaron from CMU Reading Assignment: Computer Systems: A Programmer s Perspec4ve, Third Edi4on, Chapter 6 1 Today Cache memory organiza7on and
More informationCache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons
Cache Memories 15-213/18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, 2017 Today s Instructor: Phil Gibbons 1 Today Cache memory organization and operation Performance impact
More informationCISC 360. Cache Memories Exercises Dec 3, 2009
Topics ν CISC 36 Cache Memories Exercises Dec 3, 29 Review of cache memory mapping Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. ν Hold frequently
More informationMemory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska
Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran (NYU)
More informationMemory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,
Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Cache Memory Organization and Access Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O
More informationHigh level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization
What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton
More informationCache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory
Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More informationComputer Architecture ELEC3441
Causes of Cache Msses: The 3 C s Computer Archtecture ELEC3441 Lecture 9 Cache (2) Dr. Hayden Kwo-Hay So Department of Electrcal and Electronc Engneerng Compulsory: frst reference to a lne (a..a. cold
More informationAssembler. Building a Modern Computer From First Principles.
Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought
More informationMotivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:
4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/
More informationCache Memories : Introduc on to Computer Systems 12 th Lecture, October 6th, Instructor: Randy Bryant.
Cache Memories 15-213: Introduc on to Computer Systems 12 th Lecture, October 6th, 2016 Instructor: Randy Bryant 1 Today Cache memory organiza on and opera on Performance impact of caches The memory mountain
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationLoop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation
Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop
More informationOptimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden
Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationCache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron
Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron 1 Topics Cache memory organiza3on and opera3on Performance impact of caches 2 Cache Memories Cache memories are
More informationLoop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)
Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks
More informationMemory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds
More informationLocality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example
Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near
More informationNews. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example
Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt
More informationStorage Binding in RTL synthesis
Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationMemory hierarchies: caches and their impact on the running time
Memory hierarchies: caches and their impact on the running time Irene Finocchi Dept. of Computer and Science Sapienza University of Rome A happy coincidence A fundamental property of hardware Different
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationHigh-Performance Parallel Computing
High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;
More informationSequential search. Building Java Programs Chapter 13. Sequential search. Sequential search
Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures)
CS 6C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H Katz David A PaHerson hhp://insteecsberkeleyedu/~cs6c/fa Direct Mapped (contnued) - Interface CharacterisTcs of the
More informationIntroduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers
1 2 Introducton to Programmng Bertrand Meyer Lecture 13: Contaner data structures Last revsed 1 December 2003 Topcs for ths lecture 3 Contaner data structures 4 Contaners and genercty Contan other objects
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationNachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16
Nachos Project Speaker: Sheng-We Cheng //6 Agenda Motvaton User Programs n Nachos Related Nachos Code for User Programs Project Assgnment Bonus Submsson Agenda Motvaton User Programs n Nachos Related Nachos
More informationCHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar
CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want
More informationOverview. CSC 2400: Computer Systems. Pointers in C. Pointers - Variables that hold memory addresses - Using pointers to do call-by-reference in C
CSC 2400: Comuter Systems Ponters n C Overvew Ponters - Varables that hold memory addresses - Usng onters to do call-by-reference n C Ponters vs. Arrays - Array names are constant onters Ponters and Strngs
More informationRESISTIVE CIRCUITS MULTI NODE/LOOP CIRCUIT ANALYSIS
RESSTE CRCUTS MULT NODE/LOOP CRCUT ANALYSS DEFNNG THE REFERENCE NODE S TAL 4 THESTATEMENT 4 S MEANNGLES UNTL THE REFERENCE PONT S DEFNED BY CONENTON THE GROUND SYMBOL SPECFES THE REFERENCE PONT. ALL NODE
More informationVRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,
VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual
More informationMATHEMATICS FORM ONE SCHEME OF WORK 2004
MATHEMATICS FORM ONE SCHEME OF WORK 2004 WEEK TOPICS/SUBTOPICS LEARNING OBJECTIVES LEARNING OUTCOMES VALUES CREATIVE & CRITICAL THINKING 1 WHOLE NUMBER Students wll be able to: GENERICS 1 1.1 Concept of
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationCACHE MEMORY DESIGN FOR INTERNET PROCESSORS
CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET
More informationSorting. Sorted Original. index. index
1 Unt 16 Sortng 2 Sortng Sortng requres us to move data around wthn an array Allows users to see and organze data more effcently Behnd the scenes t allows more effectve searchng of data There are MANY
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationK-means and Hierarchical Clustering
Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your
More informationMemory Hierarchy. Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP]
Memory Hierarchy Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP] Motivation Up to this point we have relied on a simple model of a computer system
More informationCMPS 10 Introduction to Computer Science Lecture Notes
CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not
More informationAssembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.
IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language
More informationCache Memory and Performance
Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)
More informationCS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit
More informationLOOP ANALYSIS. determine all currents and Voltages in IT IS DUAL TO NODE ANALYSIS - IT FIRST DETERMINES ALL CURRENTS IN A CIRCUIT
LOOP ANALYSS The second systematic technique to determine all currents and oltages in a circuit T S DUAL TO NODE ANALYSS - T FRST DETERMNES ALL CURRENTS N A CRCUT AND THEN T USES OHM S LAW TO COMPUTE NECESSARY
More informationLecture 3: Computer Arithmetic: Multiplication and Division
8-447 Lecture 3: Computer Arthmetc: Multplcaton and Dvson James C. Hoe Dept of ECE, CMU January 26, 29 S 9 L3- Announcements: Handout survey due Lab partner?? Read P&H Ch 3 Read IEEE 754-985 Handouts:
More informationNotes on Organizing Java Code: Packages, Visibility, and Scope
Notes on Organzng Java Code: Packages, Vsblty, and Scope CS 112 Wayne Snyder Java programmng n large measure s a process of defnng enttes (.e., packages, classes, methods, or felds) by name and then usng
More informationImage Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline
mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and
More informationHarvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)
Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationL2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary
HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationActive Contours/Snakes
Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng
More informationReview of Basic Computer Architecture
of Basc Computer Archtecture 1 Computer Archtecture What s Computer Archtecture From Wkpeda, the free encyclopeda In computer scence and engneerng, computer archtecture refers to specfcaton of the relatonshp
More informationA fast algorithm for color image segmentation
Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au
More informationAssignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.
Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton
More informationHow to Write Fast Numerical Code
How to Write Fast Numerical Code Lecture: Memory hierarchy, locality, caches Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh Organization Temporal and spatial locality Memory
More informationRange images. Range image registration. Examples of sampling patterns. Range images and range surfaces
Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples
More informationCS429: Computer Organization and Architecture
CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: April 5, 2018 at 13:55 CS429 Slideset 19: 1 Cache Vocabulary Much
More informationSimulation Based Analysis of FAST TCP using OMNET++
Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months
More informationAMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain
AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references
More informationAgenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals
Agenda & Readng COMPSCI 8 SC Applcatons Programmng Programmng Fundamentals Control Flow Agenda: Decsonmakng statements: Smple If, Ifelse, nested felse, Select Case s Whle, DoWhle/Untl, For, For Each, Nested
More informationMathematics 256 a course in differential equations for engineering students
Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the
More information