Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
|
|
- Elaine Day
- 5 years ago
- Views:
Transcription
1 Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed automatcally n hardware. Hold frequently accessed blocks of man memory CPU looks frst for data n L, then n L2, then n man memory. Typcal bus structure: L2 cache CPU chp regster fle L ALU cache cache bus system bus memory bus bus nterface I/O brdge man memory F4 2 Datorarktektur 28 Insertng an L Cache Between the CPU and Man Memory The transfer unt between the CPU regster fle and the cache s a 4-byte block. The transfer unt between the cache and man memory s a 4-word block (6 bytes). lne lne block block 2 block 3 a b c d... p q r s... w x y z... The tny, ery fast CPU regster fle has room for four 4-byte words. The small fast L cache has room for two 4-word blocks. The bg slow man memory has room for many 4-word blocks. F4 3 Datorarktektur 28 General Org of a Cache Memory bt t bts B = 2 b bytes Cache s an array per lne per lne per of sets. B Each set contans E lnes set : one or more lnes. per set B Each lne holds a block of data. B set : S = 2 s sets B B set S-: B Cache sze: C = B x E x S data bytes F4 4 Datorarktektur 28
2 Addressng Caches Address A: t bts s bts b bts Drect-Mapped Cache Smplest knd of cache set : B B m- <> <set ndex> <block offset> Characterzed by exactly one lne per set. set : set S-: B B B B The word at address A s n the cache f the bts n one of the <> lnes n set <set ndex> match <>. The word contents begn at offset <block offset> bytes from the begnnng of the block. set : set : set S-: E= lnes per set F4 5 Datorarktektur 28 F4 6 Datorarktektur 28 m- Accessng Drect-Mapped Caches Set selecton t bts Use the set ndex bts to determne the set of nterest. s bts selected set b bts set ndex block offset set : set : set S-: Accessng Drect-Mapped Caches Lne matchng and word selecton Lne matchng: Fnd a lne n the selected set wth a matchng Word selecton: Then extract the word selected set (): (2) The bts n the cache lne must match the bts n the address =? () The bt must be set m- =? t bts w w w 2 w 3 s bts b bts set ndex block offset (3) If () and (2), then cache ht, and block offset selects startng byte. F4 7 Datorarktektur 28 F4 8 Datorarktektur 28
3 Drect-Mapped Cache Smulaton t= s=2 b= x xx () (4) x M=6 byte addresses, B=2 bytes/block, S=4 sets, E= entry/set Address trace (reads): [ 2 ], [ 2 ], 3 [ 2 ], 8 [ 2 ], [ 2 ] [ 2 ] (mss) data m[] M[-] m[] 8 [ 2 ] (mss) data m[9] M[8-9] m[8] M[2-3] 3 [ 2 ] (mss) data m[] M[-] m[] m[3] m[2] M[2-3] F4 9 Datorarktektur 28 (3) (5) [ 2 ] (mss) data m[] M[-] m[] m[3] M[2-3] m[2] Why Use Mddle Bts as Index? 4-lne Cache Hgh-Order Bt Indexng Adjacent memory lnes would map to same cache entry Poor use of spatal localty Mddle-Order Bt Indexng Consecute memory lnes map to dfferent cache lnes Can hold C-byte regon of address space n cache at one tme Hgh-Order Bt Indexng Mddle-Order Bt Indexng F4 Datorarktektur 28 Set Assocate Caches Characterzed by more than one lne per set Accessng Set Assocate Caches Set selecton dentcal to drect-mapped cache set : E=2 lnes per set set : set : Selected set set : set S-: m- t bts s bts b bts set ndex block offset set S-: F4 Datorarktektur 28 F4 2 Datorarktektur 28
4 Accessng Set Assocate Caches Lne matchng and word selecton must compare the n each lne n the selected set. Mult-Leel Caches Optons: separate data and nstructon caches,, or a unfed cache selected set (): =? () The bt must be set w w w 2 w 3 Processor Regs L d-cache L -cache Unfed L2 Cache Memory dsk (2) The bts n one of the cache lnes must match the bts n the address m- =? t bts s bts b bts set ndex block offset (3) If () and (2), then cache ht, and block offset selects startng byte. sze: speed: $/Mbyte: lne sze: 2 B 8-64 KB 3 ns 3 ns 8 B 32 B larger, slower, cheaper -4MB SRAM 6 ns $/MB 32 B 28 MB DRAM 6 ns $.5/MB 8 KB 3 GB 8 ms $.5/MB F4 3 Datorarktektur 28 F4 4 Datorarktektur 28 Intel Pentum Cache Herarchy L Data cycle latency 6 KB L2 Unfed Regs. 4-way assoc 28KB--2 MB Wrte-through Man 4-way assoc 32B lnes Memory Wrte-back Up to 4GB L Instructon Wrte allocate 6 KB, 4-way 32B lnes 32B lnes Processor Chp F4 5 Datorarktektur 28 Cache Performance Metrcs Mss Rate Fracton of memory references not found n cache (msses/references) Typcal numbers: 3-% for L can be qute small (e.g., < %) for L2, dependng on sze, etc. Ht Tme Tme to deler a lne n the cache to the processor (ncludes tme to determne whether the lne s n the cache) Typcal numbers: clock cycle for L 3-8 clock cycles for L2 Mss Penalty Addtonal tme requred because of a mss Typcally 25- cycles for man memory F4 6 Datorarktektur 28
5 Wrtng Cache Frendly Code Repeated references to arables are good (temporal localty) Strde- reference patterns are good (spatal localty) Examples: cold cache, 4-byte words, 4-word s nt sumarrayrows(nt a[m][n]) { nt, j, sum = ; nt sumarraycols(nt a[m][n]) { nt, j, sum = ; The Memory Mountan Read throughput (read bandwdth) Number of bytes read from memory per second (MB/s) Memory mountan Measured read throughput as a functon of spatal and temporal localty. Compact way to characterze memory system performance. for ( = ; < M; ++) for (j = ; j < N; j++) sum += a[][j]; return sum; for (j = ; j < N; j++) for ( = ; < M; ++) sum += a[][j]; return sum; Mss rate = /4 = 25% Mss rate = % F4 7 Datorarktektur 28 F4 8 Datorarktektur 28 Memory Mountan Test Functon /* The test functon */ od test(nt elems, nt strde) { nt, result = ; olatle nt snk; for ( = ; < elems; += strde) result += data[]; snk = result; /* So compler doesn't optmze away the loop */ /* Run test(elems, strde) and return read throughput (MB/s) */ double run(nt sze, nt strde, double Mhz) { double cycles; nt elems = sze / szeof(nt); test(elems, strde); /* warm up the cache */ cycles = fcyc2(test, elems, strde, ); /* call test(elems,strde) */ return (sze / strde) / (cycles / Mhz); /* conert cycles to MB/s */ F4 9 Datorarktektur 28 Memory Mountan Man Routne /* mountan.c - Generate the memory mountan. */ #defne MINBYTES ( << ) /* Workng set sze ranges from KB */ #defne MAXBYTES ( << 23) /*... up to 8 MB */ #defne MAXSTRIDE 6 /* Strdes range from to 6 */ #defne MAXELEMS MAXBYTES/szeof(nt) nt data[maxelems]; /* The array we'll be traersng */ nt man() { nt sze; /* Workng set sze (n bytes) */ nt strde; /* Strde (n array elements) */ double Mhz; /* Clock frequency */ nt_data(data, MAXELEMS); /* Intalze each element n data to */ Mhz = mhz(); /* Estmate the clock frequency */ for (sze = MAXBYTES; sze >= MINBYTES; sze >>= ) { for (strde = ; strde <= MAXSTRIDE; strde++) prntf("%.f\t", run(sze, strde, Mhz)); prntf("\n"); ext(); F4 2 Datorarktektur 28
6 The Memory Mountan read throughput (MB/s) Slopes of Spatal Localty s s3 strde (words) s5 s7 s9 s s3 mem s5 8m F4 2 Datorarktektur 28 xe L2 2m 52k L 28k 32k 8k 2k Pentum III Xeon 55 MHz 6 KB on-chp L d-cache 6 KB on-chp L -cache 52 KB off-chp unfed L2 cache Rdges of Temporal Localty workng set sze (bytes) Rdges of Temporal Localty Slce through the memory mountan wth strde= llumnates read throughputs of dfferent caches and memory read througput (MB/s) m man memory regon 4m 2m 24k 52k 256k L2 cache regon 28k workng set sze (bytes) F4 22 Datorarktektur 28 64k 32k 6k 8k L cache regon 4k 2k k A Slope of Spatal Localty Slce through memory mountan wth sze=256kb shows sze. read throughput (MB/s) s s2 s3 s4 s5 s6 s7 s8 s9 s s s2 s3 s4 s5 s6 strde (words) one access per cache lne F4 23 Datorarktektur 28 Matrx Multplcaton Example Major Cache Effects to Consder Total cache sze Explot temporal localty and keep the workng set small (e.g., by usng blockng) Block sze Descrpton: Explot spatal localty Multply N x N matrces O(N3) total operatons Accesses N reads per source element N alues summed per destnaton» but may be able to hold n regster /* jk */ Varable sum for (=; <n; ++) { held n regster for (j=; j<n; j++) { sum =.; for (k=; k<n; k++) c[][j] = sum; F4 24 Datorarktektur 28
7 Mss Rate Analyss for Matrx Multply Assume: Lne sze = 32B (bg enough for 4 64-bt words) Matrx dmenson (N) s ery large Approxmate /N as. Cache s not een bg enough to hold multple rows Analyss Method: Look at access pattern of nner loop k A k j B j C Layout of C Arrays n Memory (reew) C arrays allocated n row-major order each row n contguous memory locatons Steppng through columns n one row: for ( = ; < N; ++) sum += a[][]; accesses successe elements f block sze (B) > 4 bytes, explot spatal localty compulsory mss rate = 4 bytes / B Steppng through rows n one column: for ( = ; < n; ++) sum += a[][]; accesses dstant elements no spatal localty! compulsory mss rate = (.e. %) F4 25 Datorarktektur 28 F4 26 Datorarktektur 28 Matrx Multplcaton (jk) Matrx Multplcaton (jk) /* jk */ for (=; <n; ++) { for (j=; j<n; j++) { sum =.; for (k=; k<n; k++) (*,j) (,j) (,*) /* jk */ for (j=; j<n; j++) { for (=; <n; ++) { sum =.; for (k=; k<n; k++) (*,j) (,j) (,*) c[][j] = sum; c[][j] = sum Msses per Inner Loop Iteraton: Row-wse Fxed Msses per Inner Loop Iteraton: Row-wse Columnwse Columnwse Fxed F4 27 Datorarktektur 28 F4 28 Datorarktektur 28
8 Matrx Multplcaton (kj) Matrx Multplcaton (kj) /* kj */ for (k=; k<n; k++) { for (=; <n; ++) { r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; (,k) (k,*) (,*) Fxed Row-wse Row-wse /* kj */ for (=; <n; ++) { for (k=; k<n; k++) { r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; (,k) (k,*) (,*) Fxed Row-wse Row-wse Msses per Inner Loop Iteraton: Msses per Inner Loop Iteraton: F4 29 Datorarktektur 28 F4 3 Datorarktektur 28 Matrx Multplcaton (jk) Matrx Multplcaton (kj) /* jk */ for (j=; j<n; j++) { for (k=; k<n; k++) { r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r; Msses per Inner Loop Iteraton: /* kj */ for (k=; k<n; k++) { for (j=; j<n; j++) { r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r; Msses per Inner Loop Iteraton: (*,k) (*,j) (k,j) Column - wse Fxed Columnwse (*,k) (*,j) (k,j) Columnwse Fxed Columnwse F4 3 Datorarktektur 28 F4 32 Datorarktektur 28
9 Summary of Matrx Multplcaton Pentum Matrx Multply Performance jk (& jk): kj (& kj): jk (& kj): Mss rates are helpful but not perfect predctors. 2 loads, stores 2 loads, store 2 loads, store Code schedulng matters, too. msses/ter =.25 msses/ter =.5 msses/ter = 2. 6 for (=; <n; ++) { for (k=; k<n; k++) { for (j=; j<n; j++) { for (j=; j<n; j++) { for (=; <n; ++) { for (k=; k<n; k++) { 5 sum =.; for (k=; k<n; k++) c[][j] = sum; r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r; Cycles/te kj jk kj kj jk jk Array sze (n) F4 33 Datorarktektur 28 F4 34 Datorarktektur 28 Improng Temporal Localty by Blockng Example: Blocked matrx multplcaton block (n ths context) does not mean. Instead, t mean a sub-block wthn the matrx. Example: N = 8; sub-block sze = 4 A A 2 A 2 A 22 B B 2 X = B 2 B 22 C C 2 C 2 C 22 Key dea: Sub-blocks (.e., A xy ) can be treated just lke scalars. C = A B + A 2 B 2 C 2 = A B 2 + A 2 B 22 C 2 = A 2 B + A 22 B 2 C 22 = A 2 B 2 + A 22 B 22 Blocked Matrx Multply (bjk) for (jj=; jj<n; jj+=bsze) { for (=; <n; ++) for (j=jj; j < mn(jj+bsze,n); j++) c[][j] =.; for (kk=; kk<n; kk+=bsze) { for (=; <n; ++) { for (j=jj; j < mn(jj+bsze,n); j++) { sum =. for (k=kk; k < mn(kk+bsze,n); k++) { c[][j] += sum; F4 35 Datorarktektur 28 F4 36 Datorarktektur 28
10 Blocked Matrx Multply Analyss Innermost loop par multples a X bsze sler of A by a bsze X bsze block of B and accumulates nto X bsze sler of C Loop oer steps through n row slers of A & C, usng same B for (=; <n; ++) { for (j=jj; j < mn(jj+bsze,n); j++) { sum =. for (k=kk; k < mn(kk+bsze,n); k++) { Innermost c[][j] += sum; kk jj jj Loop Par kk row sler accessed Update successe bsze tmes block reused n tmes elements of sler n successon F4 37 Datorarktektur 28 Pentum Blocked Matrx Multply Performance Blockng (bjk and bkj) mproes performance by a factor of two oer unblocked ersons (jk and jk) relately nsenste to array sze. Cycles/teraton kj jk kj kj jk jk bjk (bsze = 25) bkj (bsze = 25) Array sze (n) F4 38 Datorarktektur 28 Concludng Obseratons Programmer can optmze for cache performance How data structures are organzed How data are accessed Nested loop structure Blockng s a general technque All systems faor cache frendly code Gettng absolute optmum performance s ery platform specfc Cache szes, lne szes, assocattes, etc. Can get most of the adane wth generc code Keep workng set reasonably small (temporal localty) Use small strdes (spatal localty) F4 39 Datorarktektur 28
Giving credit where credit is due
CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard goddard@cse.unl.edu Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege
More informationCache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory
5-23 The course that gies CMU its Zip! Topics Cache Memories Oct., 22! Generic cache memory organization! Direct mapped caches! Set associatie caches! Impact of caches on performance Cache Memories Cache
More informationν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines.
Topics CISC 36 Cache Memories Dec, 29 ν Generic cache memory organization ν Direct mapped caches ν Set associatie caches ν Impact of caches on performance Cache Memories Cache memories are small, fast
More informationCache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory
5-23 The course that gies CMU its Zip! Cache Memories Oct, 2 Topics Generic cache memory organization Direct mapped caches Set associatie caches Impact of caches on performance Cache memories Cache memories
More informationCISC 360. Cache Memories Nov 25, 2008
CISC 36 Topics Cache Memories Nov 25, 28 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Cache memories are small, fast SRAM-based
More informationGiving credit where credit is due
CSCE 23J Computer Organization Cache Memories Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce23j Giving credit where credit is due Most of slides for this lecture are based
More informationCache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010
Cache Memories EL21 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 21 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of
More informationCache Memories October 8, 2007
15-213 Topics Cache Memories October 8, 27 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountain class12.ppt Cache Memories Cache
More informationSystems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations
Systems I Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Cache Performance Metrics Miss Rate Fraction of memory references not found in cache
More informationMemory Hierarchy. Announcement. Computer system model. Reference
Announcement Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 26//5 Grade for hw#4 is online Please DO submit homework if you haen t Please sign up a demo time on /6 or /7 at
More informationCache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance
Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,
More informationLecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.
Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons
More informationLast class. Caches. Direct mapped
Memory Hierarchy II Last class Caches Direct mapped E=1 (One cache line per set) Each main memory address can be placed in exactly one place in the cache Conflict misses if two addresses map to same place
More informationToday Cache memory organization and operation Performance impact of caches
Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality
More informationCache memories are small, fast SRAM based memories managed automatically in hardware.
Cache Memories Cache memories are small, fast SRAM based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More information211: Computer Architecture Summer 2016
211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University
More informationCache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access
Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?
More informationThe course that gives CMU its Zip! Memory System Performance. March 22, 2001
15-213 The course that gives CMU its Zip! Memory System Performance March 22, 2001 Topics Impact of cache parameters Impact of memory reference patterns memory mountain range matrix multiply Basic Cache
More informationMemory and I/O Organization
Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest
More informationDenison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud
Cache Memories CS-281: Introduction to Computer Systems Instructor: Thomas C. Bressoud 1 Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally
More informationExample. How are these parameters decided?
Example How are these parameters decided? Comparing cache organizations Like many architectural features, caches are evaluated experimentally. As always, performance depends on the actual instruction mix,
More informationAgenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories
Agenda Chapter 6 Cache Memories Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal
More informationCache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.
Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging
More informationVirtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory
Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process
More informationComputer Organization: A Programmer's Perspective
Computer Architecture and The Memory Hierarchy Oren Kapah orenkapah.ac@gmail.com Typical Computer Architecture CPU chip PC (Program Counter) register file AL U Main Components CPU Main Memory Input/Output
More informationToday. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,
Today Cache Memories CSci 2021: Machine Architecture and Organization November 7th-9th, 2016 Your instructor: Stephen McCamant Cache memory organization and operation Performance impact of caches The memory
More informationCarnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron
Cache Memories Computer Architecture Instructor: Norbert Lu1enberger based on the book by Randy Bryant and Dave O Hallaron 1 Today Cache memory organiza7on and opera7on Performance impact of caches The
More informationCache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory
Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and
More informationHigh level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization
What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton
More informationCache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons
Cache Memories 15-213/18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, 2017 Today s Instructor: Phil Gibbons 1 Today Cache memory organization and operation Performance impact
More informationCISC 360. Cache Memories Exercises Dec 3, 2009
Topics ν CISC 36 Cache Memories Exercises Dec 3, 29 Review of cache memory mapping Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. ν Hold frequently
More information4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.
//7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the
More informationCarnegie Mellon. Cache Memories
Cache Memories Thanks to Randal E. Bryant and David R. O Hallaron from CMU Reading Assignment: Computer Systems: A Programmer s Perspec4ve, Third Edi4on, Chapter 6 1 Today Cache memory organiza7on and
More informationMemory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska
Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran (NYU)
More informationMemory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,
Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Cache Memory Organization and Access Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O
More informationComputer Architecture ELEC3441
Causes of Cache Msses: The 3 C s Computer Archtecture ELEC3441 Lecture 9 Cache (2) Dr. Hayden Kwo-Hay So Department of Electrcal and Electronc Engneerng Compulsory: frst reference to a lne (a..a. cold
More informationArray transposition in CUDA shared memory
Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some
More informationMemory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds
More informationOptimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden
Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute
More informationAssembler. Building a Modern Computer From First Principles.
Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought
More informationCache Memories : Introduc on to Computer Systems 12 th Lecture, October 6th, Instructor: Randy Bryant.
Cache Memories 15-213: Introduc on to Computer Systems 12 th Lecture, October 6th, 2016 Instructor: Randy Bryant 1 Today Cache memory organiza on and opera on Performance impact of caches The memory mountain
More informationCompiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz
Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster
More informationProgramming in Fortran 90 : 2017/2018
Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values
More informationSome material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier
Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)
More informationLoop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)
Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks
More informationMotivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:
4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/
More informationLoop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation
Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop
More informationELEC 377 Operating Systems. Week 6 Class 3
ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems
More informationCache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron
Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron 1 Topics Cache memory organiza3on and opera3on Performance impact of caches 2 Cache Memories Cache memories are
More informationChapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative
Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory
More informationThe Codesign Challenge
ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.
More informationLobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide
Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.
More informationLocality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example
Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near
More informationSLAM Summer School 2006 Practical 2: SLAM using Monocular Vision
SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,
More informationNews. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example
Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt
More informationSequential search. Building Java Programs Chapter 13. Sequential search. Sequential search
Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to
More informationCS 61C: Great Ideas in Computer Architecture (Machine Structures)
CS 6C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H Katz David A PaHerson hhp://insteecsberkeleyedu/~cs6c/fa Direct Mapped (contnued) - Interface CharacterisTcs of the
More informationStorage Binding in RTL synthesis
Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne
More informationIntroduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers
1 2 Introducton to Programmng Bertrand Meyer Lecture 13: Contaner data structures Last revsed 1 December 2003 Topcs for ths lecture 3 Contaner data structures 4 Contaners and genercty Contan other objects
More informationA Fast Content-Based Multimedia Retrieval Technique Using Compressed Data
A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,
More informationHigh-Performance Parallel Computing
High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;
More informationCHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar
CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want
More informationOverview. CSC 2400: Computer Systems. Pointers in C. Pointers - Variables that hold memory addresses - Using pointers to do call-by-reference in C
CSC 2400: Comuter Systems Ponters n C Overvew Ponters - Varables that hold memory addresses - Usng onters to do call-by-reference n C Ponters vs. Arrays - Array names are constant onters Ponters and Strngs
More informationMemory Hierarchy. Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP]
Memory Hierarchy Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP] Motivation Up to this point we have relied on a simple model of a computer system
More informationVRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,
VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual
More informationMATHEMATICS FORM ONE SCHEME OF WORK 2004
MATHEMATICS FORM ONE SCHEME OF WORK 2004 WEEK TOPICS/SUBTOPICS LEARNING OBJECTIVES LEARNING OUTCOMES VALUES CREATIVE & CRITICAL THINKING 1 WHOLE NUMBER Students wll be able to: GENERICS 1 1.1 Concept of
More informationParallelism for Nested Loops with Non-uniform and Flow Dependences
Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr
More informationCACHE MEMORY DESIGN FOR INTERNET PROCESSORS
CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET
More informationMemory hierarchies: caches and their impact on the running time
Memory hierarchies: caches and their impact on the running time Irene Finocchi Dept. of Computer and Science Sapienza University of Rome A happy coincidence A fundamental property of hardware Different
More informationSorting. Sorted Original. index. index
1 Unt 16 Sortng 2 Sortng Sortng requres us to move data around wthn an array Allows users to see and organze data more effcently Behnd the scenes t allows more effectve searchng of data There are MANY
More informationDonn Morrison Department of Computer Science. TDT4255 Memory hierarchies
TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,
More informationCSCI-UA.0201 Computer Systems Organization Memory Hierarchy
CSCI-UA.0201 Computer Systems Organization Memory Hierarchy Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Programmer s Wish List Memory Private Infinitely large Infinitely fast Non-volatile
More informationLLVM passes and Intro to Loop Transformation Frameworks
LLVM passes and Intro to Loop Transformaton Frameworks Announcements Ths class s recorded and wll be n D2L panapto. No quz Monday after sprng break. Wll be dong md-semester class feedback. Today LLVM passes
More informationAdvanced Caching Techniques
Advanced Caching Approaches to improving memory system performance eliminate memory operations decrease the number of misses decrease the miss penalty decrease the cache/memory access times hide memory
More informationSE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan Memory Hierarchy
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Memory Hierarchy 2 1 Memory Organization Memory hierarchy CPU registers few in number (typically 16/32/128) subcycle access
More informationAssembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.
IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language
More informationRESISTIVE CIRCUITS MULTI NODE/LOOP CIRCUIT ANALYSIS
RESSTE CRCUTS MULT NODE/LOOP CRCUT ANALYSS DEFNNG THE REFERENCE NODE S TAL 4 THESTATEMENT 4 S MEANNGLES UNTL THE REFERENCE PONT S DEFNED BY CONENTON THE GROUND SYMBOL SPECFES THE REFERENCE PONT. ALL NODE
More informationCS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.
CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit
More informationLecture 3: Computer Arithmetic: Multiplication and Division
8-447 Lecture 3: Computer Arthmetc: Multplcaton and Dvson James C. Hoe Dept of ECE, CMU January 26, 29 S 9 L3- Announcements: Handout survey due Lab partner?? Read P&H Ch 3 Read IEEE 754-985 Handouts:
More informationImage Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline
mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and
More informationHarvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)
Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst
More informationCS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches
CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates
More informationNotes on Organizing Java Code: Packages, Visibility, and Scope
Notes on Organzng Java Code: Packages, Vsblty, and Scope CS 112 Wayne Snyder Java programmng n large measure s a process of defnng enttes (.e., packages, classes, methods, or felds) by name and then usng
More informationLecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )
Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program
More informationA mathematical programming approach to the analysis, design and scheduling of offshore oilfields
17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and
More informationNachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16
Nachos Project Speaker: Sheng-We Cheng //6 Agenda Motvaton User Programs n Nachos Related Nachos Code for User Programs Project Assgnment Bonus Submsson Agenda Motvaton User Programs n Nachos Related Nachos
More informationwrite-through v. write-back write-through v. write-back write-through v. write-back option 1: write-through write 10 to 0xABCD CPU RAM Cache ABCD: FF
write-through v. write-back option 1: write-through 1 write 10 to 0xABCD CPU Cache ABCD: FF RAM 11CD: 42 ABCD: FF 1 2 write-through v. write-back option 1: write-through write-through v. write-back option
More informationL2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary
HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40
More informationFEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur
FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents
More informationAdvanced Memory Organizations
CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU
More informationCache Performance II 1
Cache Performance II 1 cache operation (associative) 111001 index offset valid tag valid tag data data 1 10 1 00 00 11 AA BB tag 1 11 1 01 B4 B5 33 44 = data (B5) AND = AND OR is hit? (1) 2 cache operation
More informationReview of Basic Computer Architecture
of Basc Computer Archtecture 1 Computer Archtecture What s Computer Archtecture From Wkpeda, the free encyclopeda In computer scence and engneerng, computer archtecture refers to specfcaton of the relatonshp
More informationRoadmap. Java: Assembly language: OS: Machine code: Computer system:
Roadmap C: car *c = malloc(sizeof(car)); c->miles = 100; c->gals = 17; float mpg = get_mpg(c); free(c); Assembly language: Machine code: get_mpg: pushq movq... popq ret %rbp %rsp, %rbp %rbp 0111010000011000
More informationLecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University
Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27
More informationCS 240 Stage 3 Abstractions for Practical Systems
CS 240 Stage 3 Abstractions for Practical Systems Caching and the memory hierarchy Operating systems and the process model Virtual memory Dynamic memory allocation Victory lap Memory Hierarchy: Cache Memory
More informationR s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes
SPH3UW Unt 7.3 Sphercal Concave Mrrors Page 1 of 1 Notes Physcs Tool box Concave Mrror If the reflectng surface takes place on the nner surface of the sphercal shape so that the centre of the mrror bulges
More informationParallel matrix-vector multiplication
Appendx A Parallel matrx-vector multplcaton The reduced transton matrx of the three-dmensonal cage model for gel electrophoress, descrbed n secton 3.2, becomes excessvely large for polymer lengths more
More informationContent Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers
IOSR Journal of Electroncs and Communcaton Engneerng (IOSR-JECE) e-issn: 78-834,p- ISSN: 78-8735.Volume 9, Issue, Ver. IV (Mar - Apr. 04), PP 0-07 Content Based Image Retreval Usng -D Dscrete Wavelet wth
More informationSystems Programming and Computer Architecture ( ) Timothy Roscoe
Systems Group Department of Computer Science ETH Zürich Systems Programming and Computer Architecture (252-0061-00) Timothy Roscoe Herbstsemester 2016 AS 2016 Caches 1 16: Caches Computer Architecture
More information3D vector computer graphics
3D vector computer graphcs Paolo Varagnolo: freelance engneer Padova Aprl 2016 Prvate Practce ----------------------------------- 1. Introducton Vector 3D model representaton n computer graphcs requres
More information