Cache Memory and Performance

Size: px
Start display at page:

Download "Cache Memory and Performance"

Transcription

1 Cache Memo and Pefomance Code and Caches 1 Man of the following slides ae taken with pemission fom Complete Powepoint Lectue Notes fo Compute Sstems: A Pogamme's Pespective (CS:APP) Randal E. Bant and David R. O'Hallaon The book is used eplicitl in CS 2505 and CS 3214 and as a efeence in CS Compute Oganiation II

2 Localit Eample (1) Code and Caches 2 Claim: Being able to look at code and get a qualitative sense of its localit is a ke skill fo a pofessional pogamme. Question: Which of these functions has good localit? int sumaaows(int a[m][n]) { int i, j, sum = 0; fo (i = 0; i < M; i++) fo (j = 0; j < N; j++) sum += a[i][j]; etun sum; int sumaacols(int a[m][n]) { int i, j, sum = 0; fo (j = 0; j < N; j++) fo (i = 0; i < M; i++) sum += a[i][j]; etun sum; Compute Oganiation II

3 Laout of C Aas in Memo C aas allocated in contiguous memo locations with addesses ascending with the aa inde: int32_t A[10] = {0, 1, 2, 3, 4,..., 8, 9; Code and Caches 3 7FFF FFF FFF FFF C 3 7FFF FFF FFF Compute Oganiation II

4 Two-dimensional Aas in C Code and Caches 4 In C, a two-dimensional aa is an aa of aas: A[0] A[1] A[2] int32_t A[3][5] = { { 0, 1, 2, 3, 4, {10, 11, 12, 13, 14, {20, 21, 22, 23, 24 ; In fact, if we pint the values as pointes, we see something like this: A: 07fff22e41d30 A[0]: 07fff22e41d30 A[1]: 07fff22e41d44 A[2]: 07fff22e41d Compute Oganiation II

5 Laout of C Aas in Memo Two-dimensional C aas allocated in ow-majo ode - each ow in contiguous memo locations: int32_t A[3][5] = { { 0, 1, 2, 3, 4, {10, 11, 12, 13, 14, {20, 21, 22, 23, 24 ; Code and Caches 5 7FFF22E41D30 0 7FFF22E41D34 1 7FFF22E41D38 2 7FFF22E41D3C 3 7FFF22E41D40 4 7FFF22E41D FFF22E41D FFF22E41D4C 12 7FFF22E41D FFF22E41D FFF22E41D FFF22E41D5C 21 7FFF22E41D FFF22E41D FFF22E41D68 24 Compute Oganiation II

6 Laout of C Aas in Memo int32_t A[3][5] = { { 0, 1, 2, 3, 4, {10, 11, 12, 13, 14, {20, 21, 22, 23, 24, ; Stepping though columns in one ow: fo (i = 0; i < 3; i++) fo (j = 0; j < 5; j++) sum += A[i][j]; - accesses successive elements in memo - if cache block sie B > 4 btes, eploit spatial localit compulso miss ate = 4 btes / B i = 0 i = 1 i = 2 Code and Caches 6 7FFF22E41D30 0 7FFF22E41D34 1 7FFF22E41D38 2 7FFF22E41D3C 3 7FFF22E41D40 4 7FFF22E41D FFF22E41D FFF22E41D4C 12 7FFF22E41D FFF22E41D FFF22E41D FFF22E41D5C 21 7FFF22E41D FFF22E41D FFF22E41D68 24 Compute Oganiation II

7 Laout of C Aas in Memo int32_t A[3][5] = { { 0, 1, 2, 3, 4, {10, 11, 12, 13, 14, {20, 21, 22, 23, 24, ; Stepping though ows in one column: fo (j = 0; i < 5; i++) fo (i = 0; i < 3; i++) sum += a[i][j]; accesses distant elements no spatial localit! compulso miss ate = 1 (i.e. 100%) j = 0 j = 1 Code and Caches 7 7FFF22E41D30 0 7FFF22E41D34 1 7FFF22E41D38 2 7FFF22E41D3C 3 7FFF22E41D40 4 7FFF22E41D FFF22E41D FFF22E41D4C 12 7FFF22E41D FFF22E41D FFF22E41D FFF22E41D5C 21 7FFF22E41D FFF22E41D FFF22E41D68 24 Compute Oganiation II

8 Stide and Aa Accesses Code and Caches 8 7FFF22E41D30 0 Stide 1 Stide 4 7FFF22E41D34 1 7FFF22E41D38 2 7FFF22E41D3C 3 7FFF22E41D40 4 7FFF22E41D FFF22E41D FFF22E41D4C 12 7FFF22E41D FFF22E41D FFF22E41D FFF22E41D5C 21 7FFF22E41D FFF22E41D FFF22E41D68 24 Compute Oganiation II

9 Witing Cache Fiendl Code Code and Caches 9 Repeated efeences to vaiables ae good (tempoal localit) Stide-1 efeence pattens ae good (spatial localit) Assume an initiall-empt cache with 16-bte cache blocks. int sumaaows(int a[m][n]) { int ow, col, sum = 0; fo (ow = 0; ow < M; ow++) fo (col = 0; col < N; col++) sum += a[ow][col]; etun sum; i = 0, j = 0 to i = 0, j = 3 i = 0, j = 4 to i = 1, j = Miss ate = 1/4 = 25% Compute Oganiation II

10 Witing Cache Fiendl Code Conside the pevious slide, but assume that the cache uses a block sie of 64 btes instead of 16 btes.. Code and Caches int sumaaows(int a[m][n]) { int ow, col, sum = 0; i = 0, j = 0 to i = 3, j = fo (ow = 0; ow < M; ow++) fo (col = 0; col < N; col++) sum += a[ow][col]; etun sum; Miss ate = 1/16 = 6.25% Compute Oganiation II

11 Witing Cache Fiendl Code Code and Caches 11 "Skipping" accesses down the ows of a column do not povide good localit: int sumaacols(int a[m][n]) { int ow, col, sum = 0; fo (col = 0; col < N; col++) fo (ow = 0; ow < M; ow++) sum += a[ow][col]; etun sum; Miss ate = 100% (That's actuall somewhat pessimistic... depending on cache geomet.) Compute Oganiation II

12 Laout of C Aas in Memo Code and Caches 12 It's eas to wite an aa tavesal and see the addesses at which the aa elements ae stoed: int A[5] = {0, 1, 2, 3, 4; fo (i = 0; i < 5; i++) pintf("%d: %p\n", i, &A[i]); We see thee that fo a 1D aa, the inde vaies in a stide-1 patten. i addess : 28ABE0 1: 28ABE4 2: 28ABE8 3: 28ABEC 4: 28ABF0 stide-1 : addesses diffe b the sie of an aa cell (4 btes, hee) Compute Oganiation II

13 Laout of C Aas in Memo Code and Caches 13 int B[3][5] = {... ; fo (i = 0; i < 3; i++) fo (j = 0; j < 5; j++) pintf("%d %3d: %p\n", i, j, &B[i][j]); We see that fo a 2D aa, the second inde vaies in a stide-1 patten. i-j ode: i j addess : 28ABA4 0 1: 28ABA8 0 2: 28ABAC 0 3: 28ABB0 0 4: 28ABB4 1 0: 28ABB8 1 1: 28ABBC 1 2: 28ABC0 stide-1 But the fist inde does not va in a stide-1 patten. j-i ode: i j addess : 28CC9C stide-5 (014/4) 1 0: 28CCB0 2 0: 28CCC4 0 1: 28CCA0 1 1: 28CCB4 2 1: 28CCC8 0 2: 28CCA4 1 2: 28CCB8 Compute Oganiation II

14 3D Aas in C Code and Caches 14 int32_t A[2][3][5] = { { { 0, 1, 2, 3, 4, { 10, 11, 12, 13, 14, { 20, 21, 22, 23, 24, { { 0, 1, 2, 3, 4, {110, 111, 112, 113, 114, {220, 221, 222, 223, 224 ; Compute Oganiation II

15 Localit Eample (2) Code and Caches 15 Question: Can ou pemute the loops so that the function scans the 3D aa a[][][] with a stide-1 efeence patten (and thus has good spatial localit)? int sumaa3d(int a[n][n][n]) { int ow, col, page, sum = 0; fo (ow = 0; ow < N; ow++) fo (col = 0; col < N; col++) fo (page = 0; page < N; page++) sum += a[page][ow][col]; etun sum; Compute Oganiation II

16 Laout of C Aas in Memo Code and Caches 16 int C[2][3][5] = {... ; fo (i = 0; i < 2; i++) fo (j = 0; j < 3; j++) fo (k = 0; k < 5; k++) pintf("%3d %3d %3d: %p\n", i, j, k, &C[i][j][k]); We see that fo a 3D aa, the thid inde vaies in a stide-1 patten: i-j-k ode: But if we change the ode of access, we no longe have a stide-1 patten: k-j-i ode: i-k-j ode: i j k addess : 28CC1C 0 0 1: 28CC : 28CC : 28CC : 28CC2C 0 1 0: 28CC : 28CC : 28CC i j k addess : 28CC24 03C 1 0 0: 28CC : 28CC : 28CC74 03C 0 2 0: 28CC4C 1 2 0: 28CC : 28CC : 28CC64 i j k addess : 28CC : 28CC : 28CC4C 0 0 1: 28CC : 28CC3C 0 2 1: 28CC : 28CC2C 0 1 2: 28CC Compute Oganiation II

17 Localit Eample (2) Code and Caches 17 Question: Can ou pemute the loops so that the function scans the 3D aa a[] with a stide-1 efeence patten (and thus has good spatial localit)? int sumaa3d(int a[n][n][n]) { int i, j, k, sum = 0; fo (i = 0; i < N; i++) fo (j = 0; j < N; j++) fo (k = 0; k < N; k++) sum += a[k][i][j]; etun sum; This code does not ield good localit at all. The inne loop is vaing the fist inde, wost case! Compute Oganiation II

18 Localit Eample (3) Code and Caches 18 Question: Which of these two ehibits bette spatial localit? // stuct of aas stuct soa { float *; float *; float *; float *; ; compute_(stuct soa s) { fo (i = 0; ) { s.[i] = s.[i] * s.[i] + s.[i] * s.[i] + s.[i] * s.[i]; // aa of stucts stuct aos { float ; float ; float ; float ; ; compute_(stuct aos *s) { fo (i = 0; ) { s[i]. = s[i]. * s[i]. + s[i]. * s[i]. + s[i]. * s[i].; Fo the following discussions assume a cache block sie of 32 btes, and that the cache is not capable of holding all the blocks of the elevant stuctue at once. Compute Oganiation II

19 Localit Eample (3) Code and Caches 19 // stuct of aas stuct soa { float *; float *; float *; float *; ; stuct soa s; s. = malloc(1000 * sieof(float)); btes 4 btes pe cell, 1000 cells pe aa Compute Oganiation II

20 Code and Caches 20 Compute Oganiation II Localit Eample (3) // aa of stucts stuct aos { float ; float ; float ; float ; ; stuct aos s[1000]; 16 btes pe cell, 1000 cells

21 Localit Eample (3) Descibe the localit ehibited b this algoithm: // stuct of aas compute_(stuct soa s) { fo (int i = 0; i < 1000; i++) { s.[i] = s.[i] * s.[i] + s.[i] * s.[i] + s.[i] * s.[i]; 8 cells Code and Caches 21 s.[0] miss s.[0] miss s.[0] miss s.[0] miss s.[1] hit s.[1] hit s.[1] hit s.[1] hit... s.[7] hit s.[7] hit s.[7] hit s.[7] hit s.[8] miss s.[8] miss s.[8] miss s.[8] miss 32 btes 4 btes pe cell, 1000 cells pe aa Compute Oganiation II

22 Localit Eample (3) Descibe the localit ehibited b this algoithm: // stuct of aas compute_(stuct soa s) { fo (int i = 0; i < 1000; i++) { s.[i] = s.[i] * s.[i] + s.[i] * s.[i] + s.[i] * s.[i]; Code and Caches 22 s.[8] miss s.[8] miss s.[8] miss s.[8] miss s.[9] hit s.[9] hit s.[9] hit s.[9] hit... 8 cells 8 cells Fo the aas: Misses = 4*1*125 Hits = 4*7*125 Hit ate = 87.5% 32 btes 4 btes pe cell, 1000 cells pe aa Compute Oganiation II

23 Localit Eample (3) Descibe the localit ehibited b this algoithm: // aa of stucts compute_(stuct aos *s) { fo (int i = 0; i < 1000; i++) { s[i]. = s[i]. * s[i]. + s[i]. * s[i]. + s[i]. * s[i].; Code and Caches 23 s[0]. miss s[0]. hit s[0]. hit s[0]. hit s[1]. hit s[2]. hit s[3]. hit s[4]. hit... Hit ate: 7/8 o 87.5% Compute Oganiation II

24 Localit Eample (4) Code and Caches 24 Descibe the localit ehibited b this algoithm: // stuct of aas sum_(stuct soa s) { sum = 0; fo (int i = 0; i < 1000; i++) { sum += s.[i]; Compute Oganiation II

25 Code and Caches 25 Compute Oganiation II Localit Eample (4) // aa of stucts sum_(stuct aos *s) { sum = 0; fo (int i = 0; i < 1000; i++) { sum += s[i].; Descibe the localit ehibited b this algoithm:

26 Localit Eample (5) Code and Caches 26 QTP: How would this compae to the pevious two? // aa of pointes to stucts stuct aops { float ; float ; float ; float ; ; stuct *aops apos[1000]; fo (i = 0; i < 1000; i++) apos[i] = malloc(sieof(stuct aops)); Compute Oganiation II

27 Witing Cache Fiendl Code Code and Caches 27 Make the common case go fast Focus on the inne loops of the coe functions Minimie the misses in the inne loops Repeated efeences to vaiables ae good (tempoal localit) Stide-1 efeence pattens ae good (spatial localit) Ke idea: Ou qualitative notion of localit is quantified though ou undestanding of cache memoies. Compute Oganiation II

28 Miss Rate Analsis fo Mati Multipl Code and Caches 28 Assume: Line sie = 32B (big enough fo fou 64-bit wods) Mati dimension (N) is ve lage Appoimate 1/N as 0.0 Cache is not even big enough to hold multiple ows Analsis Method: Look at access patten of inne loop k j j i k i A B C Compute Oganiation II

29 Mati Multiplication Eample Code and Caches 29 Desciption: Multipl N N matices O(N 3 ) total opeations N eads pe souce element N values summed pe destination Vaiable sum /* ijk */ held in egiste fo (i=0; i<n; i++) { fo (j=0; j<n; j++) { sum = 0.0; fo (k=0; k<n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; Compute Oganiation II

30 Mati Multiplication (ijk) Code and Caches 30 /* ijk */ fo (i = 0; i < n; i++) { fo (j = 0; j < n; j++) { sum = 0.0; fo (k = 0; k < n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; Inne loop: A Row-wise (i,*) (*,j) B Columnwise (i,j) C Fied Misses pe inne loop iteation: A B C Compute Oganiation II

31 Mati Multiplication (kij) /* kij */ fo (k = 0; k < n; k++) { fo (i = 0; i < n; i++) { = a[i][k]; fo (j = 0; j < n; j++) c[i][j] += * b[k][j]; Code and Caches 31 Inne loop: (i,k) (k,*) (i,*) A B C Fied Row-wise Row-wise Misses pe inne loop iteation: A B C Compute Oganiation II

32 Mati Multiplication (jki) Code and Caches 32 /* jki */ fo (j = 0; j < n; j++) { fo (k = 0; k < n; k++) { = b[k][j]; fo (i = 0; i < n; i++) c[i][j] += a[i][k] * ; Inne loop: (*,k) A (k,j) B Fied (*,j) C Columnwise Columnwise Misses pe inne loop iteation: A B C Compute Oganiation II

33 Summa of Mati Multiplication fo (i = 0; i < n; i++) { fo (j = 0; j < n; j++) { sum = 0.0; fo (k = 0; k < n; k++) sum += a[i][k] * b[k][j]; c[i][j] = sum; fo (k = 0; k < n; k++) { fo (i = 0; i < n; i++) { = a[i][k]; fo (j = 0; j < n; j++) c[i][j] += * b[k][j]; fo (j = 0; j < n; j++) { fo (k = 0; k < n; k++) { = b[k][j]; fo (i = 0; i < n; i++) c[i][j] += a[i][k] * ; ijk (& jik): 2 loads, 0 stoes misses/ite = 1.25 kij (& ikj): 2 loads, 1 stoe misses/ite = 0.5 jki (& kji): 2 loads, 1 stoe misses/ite = 2.0 Code and Caches 33 Compute Oganiation II

34 Ccles pe inne loop iteation Coe i7 Mati Multipl Pefomance Code and Caches jki / kji ijk / jik jki kji ijk jik kij 10 kij / ikj Aa sie (n) Compute Oganiation II

35 Concluding Obsevations Code and Caches 35 Pogamme can optimie fo cache pefomance How data stuctues ae oganied How data ae accessed Nested loop stuctue Blocking is a geneal technique All sstems favo cache fiendl code Getting absolute optimum pefomance is ve platfom specific Cache sies, line sies, associativities, etc. Can get most of the advantage with geneic code Keep woking set easonabl small (tempoal localit) Use small stides (spatial localit) Compute Oganiation II

CS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 33. Caches. CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 33 Caches CS33 Intro to Computer Systems XVIII 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Cache Performance Metrics Miss rate fraction of memory references not found in cache (misses

More information

Cache memories are small, fast SRAM based memories managed automatically in hardware.

Cache memories are small, fast SRAM based memories managed automatically in hardware. Cache Memories Cache memories are small, fast SRAM based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and

More information

Today Cache memory organization and operation Performance impact of caches

Today Cache memory organization and operation Performance impact of caches Cache Memories 1 Today Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal locality

More information

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Memory Hierarchy. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Memory Hierarchy Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Time (ns) The CPU-Memory Gap The gap widens between DRAM, disk, and CPU speeds

More information

211: Computer Architecture Summer 2016

211: Computer Architecture Summer 2016 211: Computer Architecture Summer 2016 Liu Liu Topic: Assembly Programming Storage - Assembly Programming: Recap - Call-chain - Factorial - Storage: - RAM - Caching - Direct - Mapping Rutgers University

More information

Systems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations

Systems I. Optimizing for the Memory Hierarchy. Topics Impact of caches on performance Memory hierarchy considerations Systems I Optimizing for the Memory Hierarchy Topics Impact of caches on performance Memory hierarchy considerations Cache Performance Metrics Miss Rate Fraction of memory references not found in cache

More information

Memory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska

Memory Hierarchy. Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3. Instructor: Joanna Klukowska Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O Hallaron (CMU) Mohamed Zahran (NYU)

More information

Memory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster,

Memory Hierarchy. Cache Memory Organization and Access. General Cache Concept. Example Memory Hierarchy Smaller, faster, Memory Hierarchy Computer Systems Organization (Spring 2017) CSCI-UA 201, Section 3 Cache Memory Organization and Access Instructor: Joanna Klukowska Slides adapted from Randal E. Bryant and David R. O

More information

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons

Cache Memories /18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, Today s Instructor: Phil Gibbons Cache Memories 15-213/18-213/15-513: Introduction to Computer Systems 12 th Lecture, October 5, 2017 Today s Instructor: Phil Gibbons 1 Today Cache memory organization and operation Performance impact

More information

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance

Cache Memories. Topics. Next time. Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Next time Dynamic memory allocation and memory bugs Fabián E. Bustamante,

More information

Agenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories

Agenda Cache memory organization and operation Chapter 6 Performance impact of caches Cache Memories Agenda Chapter 6 Cache Memories Cache memory organization and operation Performance impact of caches The memory mountain Rearranging loops to improve spatial locality Using blocking to improve temporal

More information

Today. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster,

Today. Cache Memories. General Cache Concept. General Cache Organization (S, E, B) Cache Memories. Example Memory Hierarchy Smaller, faster, Today Cache Memories CSci 2021: Machine Architecture and Organization November 7th-9th, 2016 Your instructor: Stephen McCamant Cache memory organization and operation Performance impact of caches The memory

More information

CISC 360. Cache Memories Nov 25, 2008

CISC 360. Cache Memories Nov 25, 2008 CISC 36 Topics Cache Memories Nov 25, 28 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance Cache Memories Cache memories are small, fast SRAM-based

More information

Giving credit where credit is due

Giving credit where credit is due CSCE 23J Computer Organization Cache Memories Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce23j Giving credit where credit is due Most of slides for this lecture are based

More information

The course that gives CMU its Zip! Memory System Performance. March 22, 2001

The course that gives CMU its Zip! Memory System Performance. March 22, 2001 15-213 The course that gives CMU its Zip! Memory System Performance March 22, 2001 Topics Impact of cache parameters Impact of memory reference patterns memory mountain range matrix multiply Basic Cache

More information

High-Performance Parallel Computing

High-Performance Parallel Computing High-Performance Parallel Computing P. (Saday) Sadayappan Rupesh Nasre Course Overview Emphasis on algorithm development and programming issues for high performance No assumed background in computer architecture;

More information

Cache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010

Cache Memories. EL2010 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 2010 Cache Memories EL21 Organisasi dan Arsitektur Sistem Komputer Sekolah Teknik Elektro dan Informatika ITB 21 Topics Generic cache memory organization Direct mapped caches Set associative caches Impact of

More information

Cache Memories October 8, 2007

Cache Memories October 8, 2007 15-213 Topics Cache Memories October 8, 27 Generic cache memory organization Direct mapped caches Set associative caches Impact of caches on performance The memory mountain class12.ppt Cache Memories Cache

More information

Last class. Caches. Direct mapped

Last class. Caches. Direct mapped Memory Hierarchy II Last class Caches Direct mapped E=1 (One cache line per set) Each main memory address can be placed in exactly one place in the cache Conflict misses if two addresses map to same place

More information

Cache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory

Cache Memories. Cache Memories Oct. 10, Inserting an L1 Cache Between the CPU and Main Memory. General Org of a Cache Memory 5-23 The course that gies CMU its Zip! Topics Cache Memories Oct., 22! Generic cache memory organization! Direct mapped caches! Set associatie caches! Impact of caches on performance Cache Memories Cache

More information

Cache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory

Cache memories The course that gives CMU its Zip! Cache Memories Oct 11, General organization of a cache memory 5-23 The course that gies CMU its Zip! Cache Memories Oct, 2 Topics Generic cache memory organization Direct mapped caches Set associatie caches Impact of caches on performance Cache memories Cache memories

More information

ν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines.

ν Hold frequently accessed blocks of main memory 2 CISC 360, Fa09 Cache is an array of sets. Each set contains one or more lines. Topics CISC 36 Cache Memories Dec, 29 ν Generic cache memory organization ν Direct mapped caches ν Set associatie caches ν Impact of caches on performance Cache Memories Cache memories are small, fast

More information

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud

Denison University. Cache Memories. CS-281: Introduction to Computer Systems. Instructor: Thomas C. Bressoud Cache Memories CS-281: Introduction to Computer Systems Instructor: Thomas C. Bressoud 1 Random-Access Memory (RAM) Key features RAM is traditionally packaged as a chip. Basic storage unit is normally

More information

211: Computer Architecture Summer 2016

211: Computer Architecture Summer 2016 211: Computer Architecture Summer 2016 Liu Liu Topic: Storage Project3 Digital Logic - Storage: Recap - Direct - Mapping - Fully Associated - 2-way Associated - Cache Friendly Code Rutgers University Liu

More information

Cache Memories. Lecture, Oct. 30, Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

Cache Memories. Lecture, Oct. 30, Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition Cache Memories Lecture, Oct. 30, 2018 1 General Cache Concept Cache 84 9 14 10 3 Smaller, faster, more expensive memory caches a subset of the blocks 10 4 Data is copied in block-sized transfer units Memory

More information

Cache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron

Cache Memories. Andrew Case. Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron Cache Memories Andrew Case Slides adapted from Jinyang Li, Randy Bryant and Dave O Hallaron 1 Topics Cache memory organiza3on and opera3on Performance impact of caches 2 Cache Memories Cache memories are

More information

Memory Hierarchy. Announcement. Computer system model. Reference

Memory Hierarchy. Announcement. Computer system model. Reference Announcement Memory Hierarchy Computer Organization and Assembly Languages Yung-Yu Chuang 26//5 Grade for hw#4 is online Please DO submit homework if you haen t Please sign up a demo time on /6 or /7 at

More information

Carnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron

Carnegie Mellon. Cache Memories. Computer Architecture. Instructor: Norbert Lu1enberger. based on the book by Randy Bryant and Dave O Hallaron Cache Memories Computer Architecture Instructor: Norbert Lu1enberger based on the book by Randy Bryant and Dave O Hallaron 1 Today Cache memory organiza7on and opera7on Performance impact of caches The

More information

Carnegie Mellon. Cache Memories

Carnegie Mellon. Cache Memories Cache Memories Thanks to Randal E. Bryant and David R. O Hallaron from CMU Reading Assignment: Computer Systems: A Programmer s Perspec4ve, Third Edi4on, Chapter 6 1 Today Cache memory organiza7on and

More information

CSCI 402: Computer Architectures. Performance of Multilevel Cache

CSCI 402: Computer Architectures. Performance of Multilevel Cache CSCI 402: Computer Architectures Memory Hierarchy (5) Fengguang Song Department of Computer & Information Science IUPUI Performance of Multilevel Cache Main Memory CPU L1 cache L2 cache Given CPU base

More information

COSC 6385 Computer Architecture. - Pipelining

COSC 6385 Computer Architecture. - Pipelining COSC 6385 Compute Achitectue - Pipelining Sping 2012 Some of the slides ae based on a lectue by David Culle, Pipelining Pipelining is an implementation technique wheeby multiple instuctions ae ovelapped

More information

Example. How are these parameters decided?

Example. How are these parameters decided? Example How are these parameters decided? Comparing cache organizations Like many architectural features, caches are evaluated experimentally. As always, performance depends on the actual instruction mix,

More information

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy

CSCI-UA.0201 Computer Systems Organization Memory Hierarchy CSCI-UA.0201 Computer Systems Organization Memory Hierarchy Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com Programmer s Wish List Memory Private Infinitely large Infinitely fast Non-volatile

More information

Introduction to Computer Systems: Semester 1 Computer Architecture

Introduction to Computer Systems: Semester 1 Computer Architecture Introduction to Computer Systems: Semester 1 Computer Architecture Fall 2003 William J. Taffe using modified lecture slides of Randal E. Bryant Topics: Theme Five great realities of computer systems How

More information

Computer Science 141 Computing Hardware

Computer Science 141 Computing Hardware Compute Science 141 Computing Hadwae Fall 2006 Havad Univesity Instucto: Pof. David Books dbooks@eecs.havad.edu [MIPS Pipeline Slides adapted fom Dave Patteson s UCB CS152 slides and May Jane Iwin s CSE331/431

More information

2D Transformations. Why Transformations. Translation 4/17/2009

2D Transformations. Why Transformations. Translation 4/17/2009 4/7/9 D Tansfomations Wh Tansfomations Coodinate sstem tansfomations Placing objects in the wold Move/animate the camea fo navigation Dawing hieachical chaactes Animation Tanslation + d 5,4 + d,3 d 4,

More information

Cache Memories : Introduc on to Computer Systems 12 th Lecture, October 6th, Instructor: Randy Bryant.

Cache Memories : Introduc on to Computer Systems 12 th Lecture, October 6th, Instructor: Randy Bryant. Cache Memories 15-213: Introduc on to Computer Systems 12 th Lecture, October 6th, 2016 Instructor: Randy Bryant 1 Today Cache memory organiza on and opera on Performance impact of caches The memory mountain

More information

Any modern computer system will incorporate (at least) two levels of storage:

Any modern computer system will incorporate (at least) two levels of storage: 1 Any moden compute system will incopoate (at least) two levels of stoage: pimay stoage: andom access memoy (RAM) typical capacity 32MB to 1GB cost pe MB $3. typical access time 5ns to 6ns bust tansfe

More information

Memory Hierarchy. Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP]

Memory Hierarchy. Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP] Memory Hierarchy Instructor: Adam C. Champion, Ph.D. CSE 2431: Introduction to Operating Systems Reading: Chap. 6, [CSAPP] Motivation Up to this point we have relied on a simple model of a computer system

More information

All lengths in meters. E = = 7800 kg/m 3

All lengths in meters. E = = 7800 kg/m 3 Poblem desciption In this poblem, we apply the component mode synthesis (CMS) technique to a simple beam model. 2 0.02 0.02 All lengths in metes. E = 2.07 10 11 N/m 2 = 7800 kg/m 3 The beam is a fee-fee

More information

CS 2461: Computer Architecture 1 Program performance and High Performance Processors

CS 2461: Computer Architecture 1 Program performance and High Performance Processors Couse Objectives: Whee ae we. CS 2461: Pogam pefomance and High Pefomance Pocessos Instucto: Pof. Bhagi Naahai Bits&bytes: Logic devices HW building blocks Pocesso: ISA, datapath Using building blocks

More information

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University

Lecture 12. Memory Design & Caches, part 2. Christos Kozyrakis Stanford University Lecture 12 Memory Design & Caches, part 2 Christos Kozyrakis Stanford University http://eeclass.stanford.edu/ee108b 1 Announcements HW3 is due today PA2 is available on-line today Part 1 is due on 2/27

More information

The Processor: Improving Performance Data Hazards

The Processor: Improving Performance Data Hazards The Pocesso: Impoving Pefomance Data Hazads Monday 12 Octobe 15 Many slides adapted fom: and Design, Patteson & Hennessy 5th Edition, 2014, MK and fom Pof. May Jane Iwin, PSU Summay Pevious Class Pipeline

More information

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson

Lecture 8 Introduction to Pipelines Adapated from slides by David Patterson Lectue 8 Intoduction to Pipelines Adapated fom slides by David Patteson http://www-inst.eecs.bekeley.edu/~cs61c/ * 1 Review (1/3) Datapath is the hadwae that pefoms opeations necessay to execute pogams.

More information

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma

MapReduce Optimizations and Algorithms 2015 Professor Sasu Tarkoma apreduce Optimizations and Algoithms 2015 Pofesso Sasu Takoma www.cs.helsinki.fi Optimizations Reduce tasks cannot stat befoe the whole map phase is complete Thus single slow machine can slow down the

More information

The Java Virtual Machine. Compiler construction The structure of a frame. JVM stacks. Lecture 2

The Java Virtual Machine. Compiler construction The structure of a frame. JVM stacks. Lecture 2 Compile constuction 2009 Lectue 2 Code geneation 1: Geneating code The Java Vitual Machine Data types Pimitive types, including intege and floating-point types of vaious sizes and the boolean type. The

More information

Computer Organization - Overview

Computer Organization - Overview Computer Organization - Overview Hyunyoung Lee CSCE 312 1 Course Overview Topics: Theme Five great realities of computer systems Computer system overview Summary NOTE: Most slides are from the textbook

More information

Lecture #22 Pipelining II, Cache I

Lecture #22 Pipelining II, Cache I inst.eecs.bekeley.edu/~cs61c CS61C : Machine Stuctues Lectue #22 Pipelining II, Cache I Wiewold cicuits 2008-7-29 http://www.maa.og/editoial/mathgames/mathgames_05_24_04.html http://www.quinapalus.com/wi-index.html

More information

UCB CS61C : Machine Structures

UCB CS61C : Machine Structures inst.eecs.bekeley.edu/~cs61c UCB CS61C : Machine Stuctues Lectue SOE Dan Gacia Lectue 28 CPU Design : Pipelining to Impove Pefomance 2010-04-05 Stanfod Reseaches have invented a monitoing technique called

More information

High performance CUDA based CNN image processor

High performance CUDA based CNN image processor High pefomance UDA based NN image pocesso GEORGE VALENTIN STOIA, RADU DOGARU, ELENA RISTINA STOIA Depatment of Applied Electonics and Infomation Engineeing Univesity Politehnica of Buchaest -3, Iuliu Maniu

More information

Output Primitives. Ellipse Drawing

Output Primitives. Ellipse Drawing Output Pimitives Ellipse Dawing Ellipses. An ellipses is an elongated cicle and can be dawn with modified cicle dawing algoithm.. An ellipse has set of fied points (foci) that will have a constant total

More information

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia

CS 61C: Great Ideas in Computer Architecture. Pipelining Hazards. Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Geat Ideas in Compute Achitectue Pipelining Hazads Instucto: Senio Lectue SOE Dan Gacia 1 Geat Idea #4: Paallelism So9wae Paallel Requests Assigned to compute e.g. seach Gacia Paallel Theads Assigned

More information

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011

You Are Here! Review: Hazards. Agenda. Agenda. Review: Load / Branch Delay Slots 7/28/2011 CS 61C: Geat Ideas in Compute Achitectue (Machine Stuctues) Instuction Level Paallelism: Multiple Instuction Issue Guest Lectue: Justin Hsia Softwae Paallel Requests Assigned to compute e.g., Seach Katz

More information

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives

a Not yet implemented in current version SPARK: Research Kit Pointer Analysis Parameters Soot Pointer analysis. Objectives SPARK: Soot Reseach Kit Ondřej Lhoták Objectives Spak is a modula toolkit fo flow-insensitive may points-to analyses fo Java, which enables expeimentation with: vaious paametes of pointe analyses which

More information

A Memory Efficient Array Architecture for Real-Time Motion Estimation

A Memory Efficient Array Architecture for Real-Time Motion Estimation A Memoy Efficient Aay Achitectue fo Real-Time Motion Estimation Vasily G. Moshnyaga and Keikichi Tamau Depatment of Electonics & Communication, Kyoto Univesity Sakyo-ku, Yoshida-Honmachi, Kyoto 66-1, JAPAN

More information

DYNAMIC STORAGE ALLOCATION. Hanan Samet

DYNAMIC STORAGE ALLOCATION. Hanan Samet ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu

More information

Computer Organization: A Programmer's Perspective

Computer Organization: A Programmer's Perspective Computer Architecture and The Memory Hierarchy Oren Kapah orenkapah.ac@gmail.com Typical Computer Architecture CPU chip PC (Program Counter) register file AL U Main Components CPU Main Memory Input/Output

More information

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies

Accelerating Storage with RDMA Max Gurtovoy Mellanox Technologies Acceleating Stoage with RDMA Max Gutovoy Mellanox Technologies 2018 Stoage Develope Confeence EMEA. Mellanox Technologies. All Rights Reseved. 1 What is RDMA? Remote Diect Memoy Access - povides the ability

More information

Introduction to Computer Systems

Introduction to Computer Systems CSCE 230J Computer Organization Introduction to Computer Systems Dr. Steve Goddard goddard@cse.unl.edu Giving credit where credit is due Most of slides for this lecture are based on slides created by Drs.

More information

Introduction to Computer Systems

Introduction to Computer Systems CSCE 230J Computer Organization Introduction to Computer Systems Dr. Steve Goddard goddard@cse.unl.edu http://cse.unl.edu/~goddard/courses/csce230j Giving credit where credit is due Most of slides for

More information

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc

dc - Linux Command Dc may be invoked with the following command-line options: -V --version Print out the version of dc - CentOS 5.2 - Linux Uses Guide - Linux Command SYNOPSIS [-V] [--vesion] [-h] [--help] [-e sciptexpession] [--expession=sciptexpession] [-f sciptfile] [--file=sciptfile] [file...] DESCRIPTION is a evese-polish

More information

Reader & ReaderT Monad (11A) Young Won Lim 8/20/18

Reader & ReaderT Monad (11A) Young Won Lim 8/20/18 Copyight (c) 2016-2018 Young W. Lim. Pemission is ganted to copy, distibute and/o modify this document unde the tems of the GNU Fee Documentation License, Vesion 1.2 o any late vesion published by the

More information

Computer Graphics and Animation 3-Viewing

Computer Graphics and Animation 3-Viewing Compute Gaphics and Animation 3-Viewing Pof. D. Chales A. Wüthich, Fakultät Medien, Medieninfomatik Bauhaus-Univesität Weima caw AT medien.uni-weima.de Ma 5 Chales A. Wüthich Viewing Hee: Viewing in 3D

More information

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines

COEN-4730 Computer Architecture Lecture 2 Review of Instruction Sets and Pipelines 1 COEN-4730 Compute Achitectue Lectue 2 Review of nstuction Sets and Pipelines Cistinel Ababei Dept. of Electical and Compute Engineeing Maquette Univesity Cedits: Slides adapted fom pesentations of Sudeep

More information

University of Waterloo CS240 Winter 2018 Assignment 4 Due Date: Wednesday, Mar. 14th, at 5pm

University of Waterloo CS240 Winter 2018 Assignment 4 Due Date: Wednesday, Mar. 14th, at 5pm Univesit of Wateloo CS Winte Assinment Due Date: Wednesda, Ma. th, at pm vesion: -- : Please ead the uidelines on sumissions: http://.student.cs.uateloo.ca/ ~cs//uidelines.pdf. This assinment contains

More information

Shape Matching / Object Recognition

Shape Matching / Object Recognition Image Pocessing - Lesson 4 Poduction Line object classification Object Recognition Shape Repesentation Coelation Methods Nomalized Coelation Local Methods Featue Matching Coespondence Poblem Alignment

More information

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1

CMCS Mohamed Younis CMCS 611, Advanced Computer Architecture 1 CMCS 611-101 Advanced Compute Achitectue Lectue 6 Intoduction to Pipelining Septembe 23, 2009 www.csee.umbc.edu/~younis/cmsc611/cmsc611.htm Mohamed Younis CMCS 611, Advanced Compute Achitectue 1 Pevious

More information

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4)

User Visible Registers. CPU Structure and Function Ch 11. General CPU Organization (4) Control and Status Registers (5) Register Organisation (4) PU Stuctue and Function h Geneal Oganisation Registes Instuction ycle Pipelining anch Pediction Inteupts Use Visible Registes Vaies fom one achitectue to anothe Geneal pupose egiste (GPR) ata, addess,

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication Nur Dean PhD Program in Computer Science The Graduate Center, CUNY 05/01/2017 Nur Dean (The Graduate Center) Matrix Multiplication 05/01/2017 1 / 36 Today, I will talk about matrix

More information

5. Geometric Transformations and Projections

5. Geometric Transformations and Projections 5. Geometic Tansfomations and ojections 5. Tanslations and Rotations a) Tanslation d d d d d d d d b) Scaling s s s s c) Reflection (about - - lane) d) Rotation about Ais ( ) ( ) CCW 5.. Homogeneous Repesentation

More information

4.2. Co-terminal and Related Angles. Investigate

4.2. Co-terminal and Related Angles. Investigate .2 Co-teminal and Related Angles Tigonometic atios can be used to model quantities such as

More information

CSE 230 Intermediate Programming in C and C++ Arrays and Pointers

CSE 230 Intermediate Programming in C and C++ Arrays and Pointers CSE 230 Intermediate Programming in C and C++ Arrays and Pointers Fall 2017 Stony Brook University Instructor: Shebuti Rayana http://www3.cs.stonybrook.edu/~cse230/ Definition: Arrays A collection of elements

More information

Complete Solution to Potential and E-Field of a sphere of radius R and a charge density ρ[r] = CC r 2 and r n

Complete Solution to Potential and E-Field of a sphere of radius R and a charge density ρ[r] = CC r 2 and r n Complete Solution to Potential and E-Field of a sphee of adius R and a chage density ρ[] = CC 2 and n Deive the electic field and electic potential both inside and outside of a sphee of adius R with a

More information

Query Language #1/3: Relational Algebra Pure, Procedural, and Set-oriented

Query Language #1/3: Relational Algebra Pure, Procedural, and Set-oriented Quey Language #1/3: Relational Algeba Pue, Pocedual, and Set-oiented To expess a quey, we use a set of opeations. Each opeation takes one o moe elations as input paamete (set-oiented). Since each opeation

More information

Topic 4 Root Finding

Topic 4 Root Finding Couse Instucto D. Ramond C. Rump Oice: A 337 Phone: (915) 747 6958 E Mail: cump@utep.edu Topic 4 EE 4386/531 Computational Methods in EE Outline Intoduction Backeting Methods The Bisection Method False

More information

The Memory Hierarchy. Computer Organization 2/12/2015. CSC252 - Spring Memory. Conventional DRAM Organization. Reading DRAM Supercell (2,1)

The Memory Hierarchy. Computer Organization 2/12/2015. CSC252 - Spring Memory. Conventional DRAM Organization. Reading DRAM Supercell (2,1) Computer Organization 115 The Hierarch Kai Shen Random access memor (RM) RM is traditionall packaged as a chip. Basic storage unit is normall a cell (one bit per cell). Multiple RM chips form a memor.

More information

Lecture overview. Visualisatie BMT. Visualization pipeline. Data representation. Discrete data. Sampling. Data Datasets Interpolation

Lecture overview. Visualisatie BMT. Visualization pipeline. Data representation. Discrete data. Sampling. Data Datasets Interpolation Viualiatie BMT Lectue oveview Data Dataet Intepolation Data epeentation Ajan Kok a.j.f.kok@tue.nl 2 Viualization pipeline Raw Data Data Enichment/Enhancement Deived Data Data epeentation Viualization data

More information

DYNAMIC STORAGE ALLOCATION. Hanan Samet

DYNAMIC STORAGE ALLOCATION. Hanan Samet ds0 DYNAMIC STORAGE ALLOCATION Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 074 e-mail: hjs@umiacs.umd.edu

More information

Lecture 17: Memory Hierarchy and Cache Coherence Concurrent and Mul7core Programming

Lecture 17: Memory Hierarchy and Cache Coherence Concurrent and Mul7core Programming Lecture 17: Memory Hierarchy and Cache Coherence Concurrent and Mul7core Programming Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu www.secs.oakland.edu/~yan 1 Parallelism

More information

Two-Dimensional Coding for Advanced Recording

Two-Dimensional Coding for Advanced Recording Two-Dimensional Coding fo Advanced Recoding N. Singla, J. A. O Sullivan, Y. Wu, and R. S. Indec Washington Univesity Saint Louis, Missoui s Motivation: Aeal Density Pefomance: match medium, senso, pocessing

More information

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality

A Crash Course in Compilers for Parallel Computing. Mary Hall Fall, L2: Transforms, Reuse, Locality A Crash Course in Compilers for Parallel Computing Mary Hall Fall, 2008 1 Overview of Crash Course L1: Data Dependence Analysis and Parallelization (Oct. 30) L2 & L3: Loop Reordering Transformations, Reuse

More information

GARBAGE COLLECTION METHODS. Hanan Samet

GARBAGE COLLECTION METHODS. Hanan Samet gc0 GARBAGE COLLECTION METHODS Hanan Samet Compute Science Depatment and Cente fo Automation Reseach and Institute fo Advanced Compute Studies Univesity of Mayland College Pak, Mayland 07 e-mail: hjs@umiacs.umd.edu

More information

CS 2461: Computer Architecture 1

CS 2461: Computer Architecture 1 Next.. : Computer Architecture 1 Performance Optimization CODE OPTIMIZATION Code optimization for performance A quick look at some techniques that can improve the performance of your code Rewrite code

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hadwae Oganization and Design Lectue 16: Pipelining Adapted fom Compute Oganization and Design, Patteson & Hennessy, UCB Last time: single cycle data path op System clock affects pimaily the Pogam

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

MAQAO hands-on exercises

MAQAO hands-on exercises MAQAO hands-on exercises Perf: generic profiler Perf/MPI: Lightweight MPI oriented profiler CQA: code quality analyzer Setup Recompile NPB-MZ with dynamic if using cray compiler #---------------------------------------------------------------------------

More information

Monitors. Lecture 6. A Typical Monitor State. wait(c) Signal and Continue. Signal and What Happens Next?

Monitors. Lecture 6. A Typical Monitor State. wait(c) Signal and Continue. Signal and What Happens Next? Monitos Lectue 6 Monitos Summay: Last time A combination of data abstaction and mutual exclusion Automatic mutex Pogammed conditional synchonisation Widely used in concuent pogamming languages and libaies

More information

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers

XFVHDL: A Tool for the Synthesis of Fuzzy Logic Controllers XFVHDL: A Tool fo the Synthesis of Fuzzy Logic Contolles E. Lago, C. J. Jiménez, D. R. López, S. Sánchez-Solano and A. Baiga Instituto de Micoelectónica de Sevilla. Cento Nacional de Micoelectónica, Edificio

More information

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop

Parallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j

More information

Cache Performance II 1

Cache Performance II 1 Cache Performance II 1 cache operation (associative) 111001 index offset valid tag valid tag data data 1 10 1 00 00 11 AA BB tag 1 11 1 01 B4 B5 33 44 = data (B5) AND = AND OR is hit? (1) 2 cache operation

More information

# $!$ %&&' Thanks and enjoy! JFK/KWR. All material copyright J.F Kurose and K.W. Ross, All Rights Reserved

# $!$ %&&' Thanks and enjoy! JFK/KWR. All material copyright J.F Kurose and K.W. Ross, All Rights Reserved A note on the use of these ppt slides: We e making these slides feely available to all (faculty, students, eades). They e in PowePoint fom so you can add, modify, and delete slides (including this one)

More information

An Adaptive Multiphase Approach for Large Unconditional and Conditional p-median Problems

An Adaptive Multiphase Approach for Large Unconditional and Conditional p-median Problems An Adaptive Multiphase Appoach fo Lage Unconditional and Conditional p-median oblems Chanda Ade Iawan Said Salhi Maia aola Scapaa Cente fo Logistics & Heuistic Optimization (CLHO), Kent Buess School, Univesit

More information

Multidimensional Testing

Multidimensional Testing Multidimensional Testing QA appoach fo Stoage netwoking Yohay Lasi Visuality Systems 1 Intoduction Who I am Yohay Lasi, QA Manage at Visuality Systems Visuality Systems the leading commecial povide of

More information

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining

Lecture Topics ECE 341. Lecture # 12. Control Signals. Control Signals for Datapath. Basic Processing Unit. Pipelining EE 341 Lectue # 12 Instucto: Zeshan hishti zeshan@ece.pdx.edu Novembe 10, 2014 Potland State Univesity asic Pocessing Unit ontol Signals Hadwied ontol Datapath contol signals Dealing with memoy delay Pipelining

More information

How many times is the loop executed? middle = (left+right)/2; if (value == arr[middle]) return true;

How many times is the loop executed? middle = (left+right)/2; if (value == arr[middle]) return true; This lectue Complexity o binay seach Answes to inomal execise Abstact data types Stacks ueues ADTs, Stacks, ueues 1 binayseach(int[] a, int value) { while (ight >= let) { { i (value < a[middle]) ight =

More information

A New Finite Word-length Optimization Method Design for LDPC Decoder

A New Finite Word-length Optimization Method Design for LDPC Decoder A New Finite Wod-length Optimization Method Design fo LDPC Decode Jinlei Chen, Yan Zhang and Xu Wang Key Laboatoy of Netwok Oiented Intelligent Computation Shenzhen Gaduate School, Habin Institute of Technology

More information

A Novel Parallel Deadlock Detection Algorithm and Architecture

A Novel Parallel Deadlock Detection Algorithm and Architecture A Novel Paallel Deadlock Detection Aloithm and Achitectue Pun H. Shiu 2, Yudon Tan 2, Vincent J. Mooney III {ship, ydtan, mooney}@ece.atech.ed }@ece.atech.edu http://codesin codesin.ece.atech.eduedu,2

More information

Introduction To Pipelining. Chapter Pipelining1 1

Introduction To Pipelining. Chapter Pipelining1 1 Intoduction To Pipelining Chapte 6.1 - Pipelining1 1 Mooe s Law Mooe s Law says that the numbe of pocessos on a chip doubles about evey 18 months. Given the data on the following two slides, is this tue?

More information

Matrices. Jordi Cortadella Department of Computer Science

Matrices. Jordi Cortadella Department of Computer Science Matrices Jordi Cortadella Department of Computer Science Matrices A matrix can be considered a two-dimensional vector, i.e. a vector of vectors. my_matrix: 3 8 1 0 5 0 6 3 7 2 9 4 // Declaration of a matrix

More information

GCC-AVR Inline Assembler Cookbook Version 1.2

GCC-AVR Inline Assembler Cookbook Version 1.2 GCC-AVR Inline Assemble Cookbook Vesion 1.2 About this Document The GNU C compile fo Atmel AVR isk pocessos offes, to embed assembly language code into C pogams. This cool featue may be used fo manually

More information