Giving credit where credit is due

Size: px
Start display at page:

Download "Giving credit where credit is due"

Transcription

1 CSCE 23J Computer Organzaton Cache Memores Dr. Stee Goddard Gng credt where credt s due Most of sldes for ths lecture are based on sldes created by Drs. Bryant and O Hallaron, Carnege Mellon Unersty. I hae modfed them and added new sldes. 2 Topcs Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed automatcally n hardware. Hold frequently accessed blocks of man memory CPU looks frst for data n L, then n L2, then n man memory. Typcal bus structure: CPU chp regster fle L ALU cache cache bus system bus memory bus L2 cache bus nterface I/O brdge man memory 3 4 Insertng an L Cache Between the CPU and Man Memory The transfer unt between the CPU regster fle and the cache s a 4-byte block. lne lne The transfer unt between the cache and man memory s a 4-word block (6 bytes). block block 2 block 3 a b c d... p q r s... w x y z... The tny, ery fast CPU regster fle has room for four 4-byte words. The small fast L cache has room for two 4-word blocks. The bg slow man memory has room for many 4-word blocks. 5 General Org of a Cache Memory Cache s an array of sets. Each set contans one or more lnes. Each lne holds a block of data. S = 2 s sets set : set : set S-: bt t bts per lne per lne B = 2 b bytes per B B B Cache sze: C = B x E x S data bytes B B B E lnes per set 6 Page

2 set : set : set S-: Addressng Caches B B B B B B Address A: t bts s bts b bts m- <> <set ndex> <block offset> The word at address A s n the cache f the bts n one of the <> lnes n set <set ndex> match <>. The word contents begn at offset <block offset> bytes from the begnnng of the block. Drect-Mapped Cache Smplest knd of cache Characterzed by exactly one lne per set. set : set : set S-: E= lnes per set 7 8 Accessng Drect-Mapped Caches Set selecton Use the set ndex bts to determne the set of nterest. Accessng Drect-Mapped Caches Lne matchng and word selecton Lne matchng: Fnd a lne n the selected set wth a matchng Word selecton: Then extract the word m- t bts selected set s bts b bts set ndex block offset set : set : set S-: selected set (): (2) The bts n the cache lne must match the =? bts n the address =? () The bt must be set w w w 2 w 3 t bts m s bts b bts set ndex block offset (3) If () and (2), then cache ht, and block offset selects startng byte. 9 Drect-Mapped Cache Smulaton t= s=2 b= x xx x () (4) M=6 byte addresses, B=2 bytes/block, S=4 sets, E= entry/set Address trace (reads): [ 2 ], [ 2 ], 3 [ 2 ], 8 [ 2 ], [ 2 ] [ 2 ] (mss) data m[] M[-] m[] 8 [ 2 ] (mss) data m[9] M[8-9] m[8] M[2-3] (3) (5) 3 [ 2 ] (mss) data m[] M[-] m[] m[3] M[2-3] m[2] [ 2 ] (mss) data m[] M[-] m[] m[3] M[2-3] m[2] Why Use Mddle Bts as Index? 4-lne Cache Hgh-Order Bt Indexng Adjacent memory lnes would map to same cache entry Poor use of spatal localty Mddle-Order Bt Indexng Consecute memory lnes map to dfferent cache lnes Can hold C-byte regon of address space n cache at one tme Hgh-Order Bt Indexng Mddle-Order Bt Indexng 2 Page 2

3 Set Assocate Caches Characterzed by more than one lne per set Accessng Set Assocate Caches Set selecton dentcal to drect-mapped cache set : E=2 lnes per set set : set : set S-: m- t bts Selected set s bts b bts set ndex block offset set : set S-: 3 4 Accessng Set Assocate Caches Lne matchng and word selecton must compare the n each lne n the selected set. Mult-Leel Caches Optons: separate data and nstructon caches, or a unfed cache selected set (): =? () The bt must be set w w w 2 w 3 Processor Regs L d-cache L -cache Unfed L2 Cache Memory dsk (2) The bts n one of the cache lnes must =? match the bts n the address t bts m- s bts b bts set ndex block offset (3) If () and (2), then cache ht, and block offset selects startng byte. sze: speed: $/Mbyte: lne sze: 2 B 3 ns 8-64 KB 3 ns 8 B 32 B 32 B larger, slower, cheaper -4MB SRAM 6 ns $/MB 28 MB DRAM 6 ns $.5/MB 8 KB 3 GB 8 ms $.5/MB 5 6 Intel Pentum Cache Herarchy Cache Performance Metrcs Regs. L Data cycle latency 6 KB 4-way assoc Wrte-through 32B lnes L Instructon 6 KB, 4-way 32B lnes Processor Chp L2 Unfed 28KB--2 MB 4-way assoc Wrte-back Wrte allocate 32B lnes Man Memory Up to 4GB Mss Rate Fracton of memory references not found n cache (msses/references) Typcal numbers: 3-% for L can be qute small (e.g., < %) for L2, dependng on sze, etc. Ht Tme Tme to deler a lne n the cache to the processor (ncludes tme to determne whether the lne s n the cache) Typcal numbers: clock cycle for L 3-8 clock cycles for L2 Mss Penalty Addtonal tme requred because of a mss Typcally 25- cycles for man memory 7 8 Page 3

4 Wrtng Cache Frendly Code Repeated references to arables are good (temporal localty) Strde- reference patterns are good (spatal localty) Examples: cold cache, 4-byte words, 4-word s nt sumarrayrows(nt a[m][n]) nt, j, sum = ; nt sumarraycols(nt a[m][n]) nt, j, sum = ; The Memory Mountan Read throughput (read bandwdth) Number of bytes read from memory per second (MB/s) Memory mountan Measured read throughput as a functon of spatal and temporal localty. Compact way to characterze memory system performance. for ( = ; < M; ++) for (j = ; j < N; j++) sum += a[][j]; return sum; for (j = ; j < N; j++) for ( = ; < M; ++) sum += a[][j]; return sum; Mss rate = /4 = 25% Mss rate = % 9 2 Memory Mountan Test Functon /* The test functon */ od test(nt elems, nt strde) nt, result = ; olatle nt snk; for ( = ; < elems; += strde) result += data[]; snk = result; /* So compler doesn't optmze away the loop */ /* Run test(elems, strde) and return read throughput (MB/s) */ double run(nt sze, nt strde, double Mhz) double cycles; nt elems = sze / szeof(nt); test(elems, strde); /* warm up the cache */ cycles = fcyc2(test, elems, strde, ); /* call test(elems,strde) */ return (sze / strde) / (cycles / Mhz); /* conert cycles to MB/s */ Memory Mountan Man Routne /* mountan.c - Generate the memory mountan. */ #defne MINBYTES ( << ) /* Workng set sze ranges from KB */ #defne MAXBYTES ( << 23) /*... up to 8 MB */ #defne MAXSTRIDE 6 /* Strdes range from to 6 */ #defne MAXELEMS MAXBYTES/szeof(nt) nt data[maxelems]; /* The array we'll be traersng */ nt man() nt sze; /* Workng set sze (n bytes) */ nt strde; /* Strde (n array elements) */ double Mhz; /* Clock frequency */ nt_data(data, MAXELEMS); /* Intalze each element n data to */ Mhz = mhz(); /* Estmate the clock frequency */ for (sze = MAXBYTES; sze >= MINBYTES; sze >>= ) for (strde = ; strde <= MAXSTRIDE; strde++) prntf("%.f\t", run(sze, strde, Mhz)); prntf("\n"); ext(); 2 22 The Memory Mountan Rdges of Temporal Localty read throughput (MB/s) Slopes of Spatal Localty xe L2 L Pentum III Xeon 55 MHz 6 KB on-chp L d-cache 6 KB on-chp L -cache 52 KB off-chp unfed L2 cache Rdges of Temporal Localty Slce through the memory mountan wth strde= llumnates read throughputs of dfferent caches and memory read througput (MB/s) man memory regon L2 cache regon L cache regon s s3 s5 mem 8k 2k 2 strde (words) s7 s9 s s3 s5 8m 2m 52k 28k 32k workng set sze (bytes) 23 8m 4m 2m 24k 52k 256k 28k 64k 32k 6k workng set sze (bytes) 8k 4k 2k k 24 Page 4

5 A Slope of Spatal Localty Slce through memory mountan wth sze=256kb shows sze. read throughput (MB/s) one access per cache lne s s2 s3 s4 s5 s6 s7 s8 s9 s s s2 s3 s4 s5 s6 strde (words) Matrx Multplcaton Example Major Cache Effects to Consder Total cache sze Explot temporal localty and keep the workng set small (e.g., by usng blockng) /* jk */ Varable sum Block sze for (=; <n; ++) held n regster Explot spatal localty for (j=; j<n; j++) sum =.; for (k=; k<n; k++) Descrpton: Multply N x N matrces c[][j] = sum; O(N3) total operatons Accesses N reads per source element N alues summed per destnaton» but may be able to hold n regster Mss Rate Analyss for Matrx Multply Assume: Lne sze = 32B (bg enough for 4 64-bt words) Matrx dmenson (N) s ery large Approxmate /N as. Cache s not een bg enough to hold multple rows Analyss Method: Look at access pattern of nner loop k A k j B j C Layout of C Arrays n Memory (reew) C arrays allocated n row-major order each row n contguous memory locatons Steppng through columns n one row: for ( = ; < N; ++) sum += a[][]; accesses successe elements f block sze (B) > 4 bytes, explot spatal localty compulsory mss rate = 4 bytes / B Steppng through rows n one column: for ( = ; < n; ++) sum += a[][]; accesses dstant elements no spatal localty! compulsory mss rate = (.e. %) Matrx Multplcaton (jk) Matrx Multplcaton (jk) /* jk */ for (=; <n; ++) for (j=; j<n; j++) sum =.; for (k=; k<n; k++) c[][j] = sum;.25.. (,j) Row-wse Fxed /* jk */ for (j=; j<n; j++) for (=; <n; ++) sum =.; for (k=; k<n; k++) c[][j] = sum.25.. (,j) Row-wse Fxed 29 3 Page 5

6 Matrx Multplcaton (kj) Matrx Multplcaton (kj) /* kj */ for (k=; k<n; k++) for (=; <n; ++) r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; (,k) (k,*) Fxed Row-wse Row-wse /* kj */ for (=; <n; ++) for (k=; k<n; k++) r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; (,k) (k,*) Fxed Row-wse Row-wse Matrx Multplcaton (jk) Matrx Multplcaton (kj) /* jk */ for (j=; j<n; j++) for (k=; k<n; k++) r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r; (*,k) Column - wse... (k,j) Fxed /* kj */ for (k=; k<n; k++) for (j=; j<n; j++) r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r;... (*,k) (k,j) Fxed Summary of Matrx Multplcaton Pentum Matrx Multply Performance jk (& jk): 2 loads, stores msses/ter =.25 kj (& kj): 2 loads, store msses/ter =.5 jk (& kj): 2 loads, store msses/ter = 2. Mss rates are helpful but not perfect predctors. Code schedulng matters, too. 6 for (=; <n; ++) for (j=; j<n; j++) for (k=; k<n; k++) for (=; <n; ++) for (j=; j<n; j++) for (k=; k<n; k++) 5 sum =.; for (k=; k<n; k++) c[][j] = sum; r = a[][k]; for (j=; j<n; j++) c[][j] += r * b[k][j]; r = b[k][j]; for (=; <n; ++) c[][j] += a[][k] * r; Cycles/teraton kj jk kj kj jk jk Array sze (n) 36 Page 6

7 Improng Temporal Localty by Blockng Example: Blocked matrx multplcaton block (n ths context) does not mean. Instead, t mean a sub-block wthn the matrx. Example: N = 8; sub-block sze = 4 A A 2 A 2 A 22 X B B 2 B 2 B 22 C = A B + A 2 B 2 C 2 = A B 2 + A 2 B 22 C 2 = A 2 B + A 22 B 2 C 22 = A 2 B 2 + A 22 B 22 = C C 2 C 2 C 22 Key dea: Sub-blocks (.e., A xy ) can be treated just lke scalars. Blocked Matrx Multply (bjk) for (jj=; jj<n; jj+=bsze) for (=; <n; ++) for (j=jj; j < mn(jj+bsze,n); j++) c[][j] =.; for (kk=; kk<n; kk+=bsze) for (=; <n; ++) for (j=jj; j < mn(jj+bsze,n); j++) sum =. for (k=kk; k < mn(kk+bsze,n); k++) c[][j] += sum; Blocked Matrx Multply Analyss Innermost loop par multples a X bsze sler of A by a bsze X bsze block of B and accumulates nto X bsze sler of C Loop oer steps through n row slers of A & C, usng same B for (=; <n; ++) for (j=jj; j < mn(jj+bsze,n); j++) sum =. for (k=kk; k < mn(kk+bsze,n); k++) Innermost c[][j] += sum; Loop Par kk jj jj kk row sler accessed Update successe bsze tmes block reused n elements of sler tmes n successon 39 Pentum Blocked Matrx Multply Performance Blockng (bjk and bkj) mproes performance by a factor of two oer unblocked ersons (jk and jk) Cycles/teraton relately nsenste to array sze Array sze (n) kj jk kj kj jk jk bjk (bsze = 25) bkj (bsze = 25) 4 Concludng Obseratons Programmer can optmze for cache performance How data structures are organzed How data are accessed Nested loop structure Blockng s a general technque All systems faor cache frendly code Gettng absolute optmum performance s ery platform specfc Cache szes, lne szes, assocattes, etc. Can get most of the adane wth generc code Keep workng set reasonably small (temporal localty) Use small strdes (spatal localty) 4 Page 7

Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory

Cache Memories. Lecture 14 Cache Memories. Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory Topcs Lecture 4 Cache Memores Generc cache memory organzaton Drect mapped caches Set assocate caches Impact of caches on performance Cache Memores Cache memores are small, fast SRAM-based memores managed

More information

Giving credit where credit is due

Giving credit where credit is due CSCE 23J Computer Organization Cache Memories Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce23j Giving credit where credit is due Most of slides for this lecture are based

More information

Cache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory

Cache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory 5-23 The course that gies CMU its Zip! Topics Cache Memories Oct., 22! Generic cache memory organization! Direct mapped caches! Set associatie caches! Impact of caches on performance Cache Memories Cache

More information

ν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines.

ν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines. Topics CISC 36 Cache Memories Dec, 29 ν Generic cache memory organization ν Direct mapped caches ν Set associatie caches ν Impact of caches on performance Cache Memories Cache memories are small, fast

More information

Cache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory

Cache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory 5-23 The course that gies CMU its Zip! Cache Memories Oct, 2 Topics Generic cache memory organization Direct mapped caches Set associatie caches Impact of caches on performance Cache memories Cache memories

More information

CISC 360. Cache Memories Nov 25, 2008

CISC 360. Cache Memories Nov 25, 2008 CISC 36 Topics Cache Memories Nov 25, 28 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Cache memories are small, fast SRAM-based

More information

Cache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010

Cache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010 Cache Memories EL21 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 21 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of

More information

Cache Memories October 8, 2007

Cache Memories October 8, 2007 15-213 Topics Cache Memories October 8, 27 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountain class12.ppt Cache Memories Cache

More information

Systems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations

Systems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations Systems I Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Cache Performance Metrics Miss Rate Fraction of memory references not found in cache

More information

Memory Hierarchy. Announcement. Computer system model. Reference

Memory Hierarchy. Announcement. Computer system model. Reference Announcement Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 26//5 Grade for hw#4 is online Please DO submit homework if you haen t Please sign up a demo time on /6 or /7 at

More information

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,

More information

Last class. Caches. Direct mapped

Last class. Caches. Direct mapped Memory Hierarchy II Last class Caches Direct mapped E=1 (One cache line per set) Each main memory address can be placed in exactly one place in the cache Conflict misses if two addresses map to same place

More information

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access

Cache Performance 3/28/17. Agenda. Cache Abstraction and Metrics. Direct-Mapped Cache: Placement and Access Agenda Cache Performance Samra Khan March 28, 217 Revew from last lecture Cache access Assocatvty Replacement Cache Performance Cache Abstracton and Metrcs Address Tag Store (s the address n the cache?

More information

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III.

Lecture 15: Memory Hierarchy Optimizations. I. Caches: A Quick Review II. Iteration Space & Loop Transformations III. Lecture 15: Memory Herarchy Optmzatons I. Caches: A Quck Revew II. Iteraton Space & Loop Transformatons III. Types of Reuse ALSU 7.4.2-7.4.3, 11.2-11.5.1 15-745: Memory Herarchy Optmzatons Phllp B. Gbbons

More information

Today Cache memory organization and operation Performance impact of caches

Today Cache memory organization and operation Performance impact of caches Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality

More information

Cache memories are small, fast SRAM based memories managed automatically in hardware.

Cache memories are small, fast SRAM based memories managed automatically in hardware. Cache Memories Cache memories are small, fast SRAM based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and

More information

211: Computer Architecture Summer 2016

211: Computer Architecture Summer 2016 211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University

More information

The course that gives CMU its Zip! Memory System Performance. March 22, 2001

The course that gives CMU its Zip! Memory System Performance. March 22, 2001 15-213 The course that gives CMU its Zip! Memory System Performance March 22, 2001 Topics Impact of cache parameters Impact of memory reference patterns memory mountain range matrix multiply Basic Cache

More information

Memory and I/O Organization

Memory and I/O Organization Memory and I/O Organzaton 8-1 Prncple of Localty Localty small proporton of memory accounts for most run tme Rule of thumb For 9% of run tme next nstructon/data wll come from 1% of program/data closest

More information

Agenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories

Agenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories Agenda Chapter 6 Cache Memories Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal

More information

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6.

Cache Memories. From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Cache Memories From Bryant and O Hallaron, Computer Systems. A Programmer s Perspective. Chapter 6. Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging

More information

Today. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,

Today. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster, Today Cache Memories CSci 2021: Machine Architecture and Organization November 7th-9th, 2016 Your instructor: Stephen McCamant Cache memory organization and operation Performance impact of caches The memory

More information

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud Cache Memories CS-281: Introduction to Computer Systems Instructor: Thomas C. Bressoud 1 Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally

More information

Example. How are these parameters decided?

Example. How are these parameters decided? Example How are these parameters decided? Comparing cache organizations Like many architectural features, caches are evaluated experimentally. As always, performance depends on the actual instruction mix,

More information

Carnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron

Carnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron Cache Memories Computer Architecture Instructor: Norbert Lu1enberger based on the book by Randy Bryant and Dave O Hallaron 1 Today Cache memory organiza7on and opera7on Performance impact of caches The

More information

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory Background EECS. Operatng System Fundamentals No. Vrtual Memory Prof. Hu Jang Department of Electrcal Engneerng and Computer Scence, York Unversty Memory-management methods normally requres the entre process

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective Computer Architecture and The Memory Hierarchy Oren Kapah orenkapah.ac@gmail.com Typical Computer Architecture CPU chip PC (Program Counter) register file AL U Main Components CPU Main Memory Input/Output

More information

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management.

4/11/17. Agenda. Princeton University Computer Science 217: Introduction to Programming Systems. Goals of this Lecture. Storage Management. //7 Prnceton Unversty Computer Scence 7: Introducton to Programmng Systems Goals of ths Lecture Storage Management Help you learn about: Localty and cachng Typcal storage herarchy Vrtual memory How the

More information

Carnegie Mellon. Cache Memories

Carnegie Mellon. Cache Memories Cache Memories Thanks to Randal E. Bryant and David R. O Hallaron from CMU Reading Assignment: Computer Systems: A Programmer s Perspec4ve, Third Edi4on, Chapter 6 1 Today Cache memory organiza7on and

More information

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons Cache Memories 15-213/18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, 2017 Today s Instructor: Phil Gibbons 1 Today Cache memory organization and operation Performance impact

More information

CISC 360. Cache Memories Exercises Dec 3, 2009

CISC 360. Cache Memories Exercises Dec 3, 2009 Topics ν CISC 36 Cache Memories Exercises Dec 3, 29 Review of cache memory mapping Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. ν Hold frequently

More information

Memory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

Memory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran (NYU)

More information

Memory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,

Memory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster, Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Cache Memory Organization and Access Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O

More information

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization What s a Computer Program? Descrpton of algorthms and data structures to acheve a specfc ojectve Could e done n any language, even a natural language lke Englsh Programmng language: A Standard notaton

More information

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 Causes of Cache Msses: The 3 C s Computer Archtecture ELEC3441 Lecture 9 Cache (2) Dr. Hayden Kwo-Hay So Department of Electrcal and Electronc Engneerng Compulsory: frst reference to a lne (a..a. cold

More information

Assembler. Building a Modern Computer From First Principles.

Assembler. Building a Modern Computer From First Principles. Assembler Buldng a Modern Computer From Frst Prncples www.nand2tetrs.org Elements of Computng Systems, Nsan & Schocken, MIT Press, www.nand2tetrs.org, Chapter 6: Assembler slde Where we are at: Human Thought

More information

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to:

Motivation. EE 457 Unit 4. Throughput vs. Latency. Performance Depends on View Point?! Computer System Performance. An individual user wants to: 4.1 4.2 Motvaton EE 457 Unt 4 Computer System Performance An ndvdual user wants to: Mnmze sngle program executon tme A datacenter owner wants to: Maxmze number of Mnmze ( ) http://e-tellgentnternetmarketng.com/webste/frustrated-computer-user-2/

More information

Cache Memories : Introduc on to Computer Systems 12 th Lecture, October 6th, Instructor: Randy Bryant.

Cache Memories : Introduc on to Computer Systems 12 th Lecture, October 6th, Instructor: Randy Bryant. Cache Memories 15-213: Introduc on to Computer Systems 12 th Lecture, October 6th, 2016 Instructor: Randy Bryant 1 Today Cache memory organiza on and opera on Performance impact of caches The memory mountain

More information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier Some materal adapted from Mohamed Youns, UMBC CMSC 611 Spr 2003 course sldes Some materal adapted from Hennessy & Patterson / 2003 Elsever Scence Performance = 1 Executon tme Speedup = Performance (B)

More information

Array transposition in CUDA shared memory

Array transposition in CUDA shared memory Array transposton n CUDA shared memory Mke Gles February 19, 2014 Abstract Ths short note s nspred by some code wrtten by Jeremy Appleyard for the transposton of data through shared memory. I had some

More information

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation

Loop Transformations for Parallelism & Locality. Review. Scalar Expansion. Scalar Expansion: Motivation Loop Transformatons for Parallelsm & Localty Last week Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Scalar expanson for removng false dependences Loop nterchange Loop

More information

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden

Optimizing for Speed. What is the potential gain? What can go Wrong? A Simple Example. Erik Hagersten Uppsala University, Sweden Optmzng for Speed Er Hagersten Uppsala Unversty, Sweden eh@t.uu.se What s the potental gan? Latency dfference L$ and mem: ~5x Bandwdth dfference L$ and mem: ~x Repeated TLB msses adds a factor ~-3x Execute

More information

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz Compler Desgn Sprng 2014 Regster Allocaton Sample Exercses and Solutons Prof. Pedro C. Dnz USC / Informaton Scences Insttute 4676 Admralty Way, Sute 1001 Marna del Rey, Calforna 90292 pedro@s.edu Regster

More information

ELEC 377 Operating Systems. Week 6 Class 3

ELEC 377 Operating Systems. Week 6 Class 3 ELEC 377 Operatng Systems Week 6 Class 3 Last Class Memory Management Memory Pagng Pagng Structure ELEC 377 Operatng Systems Today Pagng Szes Vrtual Memory Concept Demand Pagng ELEC 377 Operatng Systems

More information

Programming in Fortran 90 : 2017/2018

Programming in Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Programmng n Fortran 90 : 2017/2018 Exercse 1 : Evaluaton of functon dependng on nput Wrte a program who evaluate the functon f (x,y) for any two user specfed values

More information

Cache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron

Cache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron 1 Topics Cache memory organiza3on and opera3on Performance impact of caches 2 Cache Memories Cache memories are

More information

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont) Loop Transformatons for Parallelsm & Localty Prevously Data dependences and loops Loop transformatons Parallelzaton Loop nterchange Today Loop nterchange Loop transformatons and transformaton frameworks

More information

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds

More information

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example

Locality. CS429: Computer Organization and Architecture. Locality Example 2. Locality Example Locality CS429: Computer Organization and Architecture Dr Bill Young Department of Computer Sciences University of Texas at Austin Principle of Locality: Programs tend to reuse data and instructions near

More information

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example

News. Recap: While Loop Example. Reading. Recap: Do Loop Example. Recap: For Loop Example Unversty of Brtsh Columba CPSC, Intro to Computaton Jan-Apr Tamara Munzner News Assgnment correctons to ASCIIArtste.java posted defntely read WebCT bboards Arrays Lecture, Tue Feb based on sldes by Kurt

More information

Storage Binding in RTL synthesis

Storage Binding in RTL synthesis Storage Bndng n RTL synthess Pe Zhang Danel D. Gajsk Techncal Report ICS-0-37 August 0th, 200 Center for Embedded Computer Systems Department of Informaton and Computer Scence Unersty of Calforna, Irne

More information

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative

Chapter 6 Caches. Computer System. Alpha Chip Photo. Topics. Memory Hierarchy Locality of Reference SRAM Caches Direct Mapped Associative Chapter 6 s Topics Memory Hierarchy Locality of Reference SRAM s Direct Mapped Associative Computer System Processor interrupt On-chip cache s s Memory-I/O bus bus Net cache Row cache Disk cache Memory

More information

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide Lobachevsky State Unversty of Nzhn Novgorod Polyhedron Quck Start Gude Nzhn Novgorod 2016 Contents Specfcaton of Polyhedron software... 3 Theoretcal background... 4 1. Interface of Polyhedron... 6 1.1.

More information

Memory hierarchies: caches and their impact on the running time

Memory hierarchies: caches and their impact on the running time Memory hierarchies: caches and their impact on the running time Irene Finocchi Dept. of Computer and Science Sapienza University of Rome A happy coincidence A fundamental property of hardware Different

More information

The Codesign Challenge

The Codesign Challenge ECE 4530 Codesgn Challenge Fall 2007 Hardware/Software Codesgn The Codesgn Challenge Objectves In the codesgn challenge, your task s to accelerate a gven software reference mplementaton as fast as possble.

More information

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision SLAM Summer School 2006 Practcal 2: SLAM usng Monocular Vson Javer Cvera, Unversty of Zaragoza Andrew J. Davson, Imperal College London J.M.M Montel, Unversty of Zaragoza. josemar@unzar.es, jcvera@unzar.es,

More information

High-Performance Parallel Computing

High-Performance Parallel Computing High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;

More information

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search Sequental search Buldng Java Programs Chapter 13 Searchng and Sortng sequental search: Locates a target value n an array/lst by examnng each element from start to fnsh. How many elements wll t need to

More information

CS 61C: Great Ideas in Computer Architecture (Machine Structures)

CS 61C: Great Ideas in Computer Architecture (Machine Structures) CS 6C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H Katz David A PaHerson hhp://insteecsberkeleyedu/~cs6c/fa Direct Mapped (contnued) - Interface CharacterisTcs of the

More information

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers

Introduction to Programming. Lecture 13: Container data structures. Container data structures. Topics for this lecture. A basic issue with containers 1 2 Introducton to Programmng Bertrand Meyer Lecture 13: Contaner data structures Last revsed 1 December 2003 Topcs for ths lecture 3 Contaner data structures 4 Contaners and genercty Contan other objects

More information

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data A Fast Content-Based Multmeda Retreval Technque Usng Compressed Data Borko Furht and Pornvt Saksobhavvat NSF Multmeda Laboratory Florda Atlantc Unversty, Boca Raton, Florda 3343 ABSTRACT In ths paper,

More information

Nachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16

Nachos Project 3. Speaker: Sheng-Wei Cheng 2010/12/16 Nachos Project Speaker: Sheng-We Cheng //6 Agenda Motvaton User Programs n Nachos Related Nachos Code for User Programs Project Assgnment Bonus Submsson Agenda Motvaton User Programs n Nachos Related Nachos

More information

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar

CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vidyanagar CHARUTAR VIDYA MANDAL S SEMCOM Vallabh Vdyanagar Faculty Name: Am D. Trved Class: SYBCA Subject: US03CBCA03 (Advanced Data & Fle Structure) *UNIT 1 (ARRAYS AND TREES) **INTRODUCTION TO ARRAYS If we want

More information

Overview. CSC 2400: Computer Systems. Pointers in C. Pointers - Variables that hold memory addresses - Using pointers to do call-by-reference in C

Overview. CSC 2400: Computer Systems. Pointers in C. Pointers - Variables that hold memory addresses - Using pointers to do call-by-reference in C CSC 2400: Comuter Systems Ponters n C Overvew Ponters - Varables that hold memory addresses - Usng onters to do call-by-reference n C Ponters vs. Arrays - Array names are constant onters Ponters and Strngs

More information

RESISTIVE CIRCUITS MULTI NODE/LOOP CIRCUIT ANALYSIS

RESISTIVE CIRCUITS MULTI NODE/LOOP CIRCUIT ANALYSIS RESSTE CRCUTS MULT NODE/LOOP CRCUT ANALYSS DEFNNG THE REFERENCE NODE S TAL 4 THESTATEMENT 4 S MEANNGLES UNTL THE REFERENCE PONT S DEFNED BY CONENTON THE GROUND SYMBOL SPECFES THE REFERENCE PONT. ALL NODE

More information

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) ,

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , VRT012 User s gude V0.1 Thank you for purchasng our product. We hope ths user-frendly devce wll be helpful n realsng your deas and brngng comfort to your lfe. Please take few mnutes to read ths manual

More information

MATHEMATICS FORM ONE SCHEME OF WORK 2004

MATHEMATICS FORM ONE SCHEME OF WORK 2004 MATHEMATICS FORM ONE SCHEME OF WORK 2004 WEEK TOPICS/SUBTOPICS LEARNING OBJECTIVES LEARNING OUTCOMES VALUES CREATIVE & CRITICAL THINKING 1 WHOLE NUMBER Students wll be able to: GENERICS 1 1.1 Concept of

More information

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Parallelism for Nested Loops with Non-uniform and Flow Dependences Parallelsm for Nested Loops wth Non-unform and Flow Dependences Sam-Jn Jeong Dept. of Informaton & Communcaton Engneerng, Cheonan Unversty, 5, Anseo-dong, Cheonan, Chungnam, 330-80, Korea. seong@cheonan.ac.kr

More information

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS

CACHE MEMORY DESIGN FOR INTERNET PROCESSORS CACHE MEMORY DESIGN FOR INTERNET PROCESSORS WE EVALUATE A SERIES OF THREE PROGRESSIVELY MORE AGGRESSIVE ROUTING-TABLE CACHE DESIGNS AND DEMONSTRATE THAT THE INCORPORATION OF HARDWARE CACHES INTO INTERNET

More information

Sorting. Sorted Original. index. index

Sorting. Sorted Original. index. index 1 Unt 16 Sortng 2 Sortng Sortng requres us to move data around wthn an array Allows users to see and organze data more effcently Behnd the scenes t allows more effectve searchng of data There are MANY

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

K-means and Hierarchical Clustering

K-means and Hierarchical Clustering Note to other teachers and users of these sldes. Andrew would be delghted f you found ths source materal useful n gvng your own lectures. Feel free to use these sldes verbatm, or to modfy them to ft your

More information

Memory Hierarchy. Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP]

Memory Hierarchy. Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP] Memory Hierarchy Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP] Motivation Up to this point we have relied on a simple model of a computer system

More information

CMPS 10 Introduction to Computer Science Lecture Notes

CMPS 10 Introduction to Computer Science Lecture Notes CPS 0 Introducton to Computer Scence Lecture Notes Chapter : Algorthm Desgn How should we present algorthms? Natural languages lke Englsh, Spansh, or French whch are rch n nterpretaton and meanng are not

More information

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface. IDC Herzlya Shmon Schocken Assembler Shmon Schocken Sprng 2005 Elements of Computng Systems 1 Assembler (Ch. 6) Where we are at: Human Thought Abstract desgn Chapters 9, 12 abstract nterface H.L. Language

More information

Cache Memory and Performance

Cache Memory and Performance Cache Memory and Performance Cache Performance 1 Many of the following slides are taken with permission from Complete Powerpoint Lecture Notes for Computer Systems: A Programmer's Perspective (CS:APP)

More information

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved.

CS 33. Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. CS 33 Architecture and Optimization (3) CS33 Intro to Computer Systems XVI 1 Copyright 2018 Thomas W. Doeppner. All rights reserved. Hyper Threading Instruction Control Instruction Control Retirement Unit

More information

LOOP ANALYSIS. determine all currents and Voltages in IT IS DUAL TO NODE ANALYSIS - IT FIRST DETERMINES ALL CURRENTS IN A CIRCUIT

LOOP ANALYSIS. determine all currents and Voltages in IT IS DUAL TO NODE ANALYSIS - IT FIRST DETERMINES ALL CURRENTS IN A CIRCUIT LOOP ANALYSS The second systematic technique to determine all currents and oltages in a circuit T S DUAL TO NODE ANALYSS - T FRST DETERMNES ALL CURRENTS N A CRCUT AND THEN T USES OHM S LAW TO COMPUTE NECESSARY

More information

Lecture 3: Computer Arithmetic: Multiplication and Division

Lecture 3: Computer Arithmetic: Multiplication and Division 8-447 Lecture 3: Computer Arthmetc: Multplcaton and Dvson James C. Hoe Dept of ECE, CMU January 26, 29 S 9 L3- Announcements: Handout survey due Lab partner?? Read P&H Ch 3 Read IEEE 754-985 Handouts:

More information

Notes on Organizing Java Code: Packages, Visibility, and Scope

Notes on Organizing Java Code: Packages, Visibility, and Scope Notes on Organzng Java Code: Packages, Vsblty, and Scope CS 112 Wayne Snyder Java programmng n large measure s a process of defnng enttes (.e., packages, classes, methods, or felds) by name and then usng

More information

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline mage Vsualzaton mage Vsualzaton mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and Analyss outlne mage Representaton & Vsualzaton Basc magng Algorthms Shape Representaton and

More information

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6) Harvard Unversty CS 101 Fall 2005, Shmon Schocken Assembler Elements of Computng Systems 1 Assembler (Ch. 6) Why care about assemblers? Because Assemblers employ some nfty trcks Assemblers are the frst

More information

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches CS 61C: Great Ideas in Computer Architecture Direct Mapped Caches Instructor: Justin Hsia 7/05/2012 Summer 2012 Lecture #11 1 Review of Last Lecture Floating point (single and double precision) approximates

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields 17 th European Symposum on Computer Aded Process Engneerng ESCAPE17 V. Plesu and P.S. Agach (Edtors) 2007 Elsever B.V. All rghts reserved. 1 A mathematcal programmng approach to the analyss, desgn and

More information

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary

L2 cache provides additional on-chip caching space. L2 cache captures misses from L1 cache. Summary HY425 Lecture 13: Improving Cache Performance Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS November 25, 2011 Dimitrios S. Nikolopoulos HY425 Lecture 13: Improving Cache Performance 1 / 40

More information

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur FEATURE EXTRACTION Dr. K.Vjayarekha Assocate Dean School of Electrcal and Electroncs Engneerng SASTRA Unversty, Thanjavur613 41 Jont Intatve of IITs and IISc Funded by MHRD Page 1 of 8 Table of Contents

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Active Contours/Snakes

Active Contours/Snakes Actve Contours/Snakes Erkut Erdem Acknowledgement: The sldes are adapted from the sldes prepared by K. Grauman of Unversty of Texas at Austn Fttng: Edges vs. boundares Edges useful sgnal to ndcate occludng

More information

Review of Basic Computer Architecture

Review of Basic Computer Architecture of Basc Computer Archtecture 1 Computer Archtecture What s Computer Archtecture From Wkpeda, the free encyclopeda In computer scence and engneerng, computer archtecture refers to specfcaton of the relatonshp

More information

A fast algorithm for color image segmentation

A fast algorithm for color image segmentation Unersty of Wollongong Research Onlne Faculty of Informatcs - Papers (Arche) Faculty of Engneerng and Informaton Scences 006 A fast algorthm for color mage segmentaton L. Dong Unersty of Wollongong, lju@uow.edu.au

More information

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009. Farrukh Jabeen Algorthms 51 Assgnment #2 Due Date: June 15, 29. Assgnment # 2 Chapter 3 Dscrete Fourer Transforms Implement the FFT for the DFT. Descrbed n sectons 3.1 and 3.2. Delverables: 1. Concse descrpton

More information

How to Write Fast Numerical Code

How to Write Fast Numerical Code How to Write Fast Numerical Code Lecture: Memory hierarchy, locality, caches Instructor: Markus Püschel TA: Alen Stojanov, Georg Ofenbeck, Gagandeep Singh Organization Temporal and spatial locality Memory

More information

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces Range mages For many structured lght scanners, the range data forms a hghly regular pattern known as a range mage. he samplng pattern s determned by the specfc scanner. Range mage regstraton 1 Examples

More information

CS429: Computer Organization and Architecture

CS429: Computer Organization and Architecture CS429: Computer Organization and Architecture Dr. Bill Young Department of Computer Sciences University of Texas at Austin Last updated: April 5, 2018 at 13:55 CS429 Slideset 19: 1 Cache Vocabulary Much

More information

Simulation Based Analysis of FAST TCP using OMNET++

Simulation Based Analysis of FAST TCP using OMNET++ Smulaton Based Analyss of FAST TCP usng OMNET++ Umar ul Hassan 04030038@lums.edu.pk Md Term Report CS678 Topcs n Internet Research Sprng, 2006 Introducton Internet traffc s doublng roughly every 3 months

More information

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain

AMath 483/583 Lecture 21 May 13, Notes: Notes: Jacobi iteration. Notes: Jacobi with OpenMP coarse grain AMath 483/583 Lecture 21 May 13, 2011 Today: OpenMP and MPI versons of Jacob teraton Gauss-Sedel and SOR teratve methods Next week: More MPI Debuggng and totalvew GPU computng Read: Class notes and references

More information

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals

Agenda & Reading. Simple If. Decision-Making Statements. COMPSCI 280 S1C Applications Programming. Programming Fundamentals Agenda & Readng COMPSCI 8 SC Applcatons Programmng Programmng Fundamentals Control Flow Agenda: Decsonmakng statements: Smple If, Ifelse, nested felse, Select Case s Whle, DoWhle/Untl, For, For Each, Nested

More information

Mathematics 256 a course in differential equations for engineering students

Mathematics 256 a course in differential equations for engineering students Mathematcs 56 a course n dfferental equatons for engneerng students Chapter 5. More effcent methods of numercal soluton Euler s method s qute neffcent. Because the error s essentally proportonal to the

More information