Mode-Controlled Dataflow Modeling of Real-Time Memory Controllers

Size: px
Start display at page:

Download "Mode-Controlled Dataflow Modeling of Real-Time Memory Controllers"

Transcription

1 Mode-Controlled Dataflow Modeling of eal-time Memory Controllers Yonghui Li, Hrishikesh alunkhe, Joao Bastos, Orlando Moreira 2, Benny Akesson 3 and Kees Goossens Eindhoven University of Technology, the Netherlands 2 Intel corporation, 3 CITE/INEC TEC, IEP, Portugal yonghui.li@tue.nl

2 Introduction: DTV & TB NXP Viper2 (PNX8550) 0.3 m orst-case Bandwidth (CB)? MIP P4450 M Memory Controller TM32 M M TM32 ~50 M transistors ~00 clock domains more than 70 IP blocks MPEG MIP MDC VIP MP MB MCU TDC QVCP5L TriMedia # TriMedia #2 QVCP2L DC-EC DC-CT M-GIC M-IPC CLOCK GLOBAL EET TM-DBG TM2-DBG UAT UAT2 UAT3 EJTAG BOOT M M M M-DC PMA-MON PMA-EC PMA-AB M PCI/XIO DE IIC IIC2 IIC3 UB MC MC2 M-Gate C-Bridge PMA DC-EC DC-CT VMPG DVDD EDMA VLD QVCP2 MB MB2 QTN QVCP VIP VIP2 VPK TDMA T-DC TM-IPC TM-GIC M M M M M M TM2-IPC TM2-GIC DENC PDIO AIO AIO2 AIO3 GPIO TUNNEL MP MP2

3 Outline Background DAM Dynamically scheduled memory controller Dataflow modeling of command scheduling hy dataflow modeling? Mode-controlled dataflow (MCDF) MCDF modeling of a memory controller Experimental results Conclusions 2

4 DAM Memories DAM is accessed by scheduling commands ACT, PE, D,, EF, NOP ubject to timing constraints cmd addr. data DAM Activate (ACT) ead (D) Bank Bank 7 Bank 0 ow buffer ow buffer ow buffer rite () Precharge (PE) ACT NoP D NoP ACT NoP NoP PE NoP PE NoP 3

5 Dynamically cheduled Memory Controller A transaction is translated into a sequence of commands Trans Command Generator Log. Addr. Data Memory Map Cmd queue Phy. Addr. cmd Bank 7 Bank Bank 0 ow buffer ow buffer ow buffer Timing Counters cheduler DAM cheduling algorithm First-Come First-erve (FCF) for transactions D or commands have higher priority than ACT 4

6 cheduling Dependencies of a Transaction A transaction T i is executed by scheduling commands to successive banks twitch td tp ta tcd tccd ttp ACT PE tp Bank 0 T i tfa tp tfa tp tfa td tccd ta tcd tccd ttp tp ACT PE Bank td td tccd tccd Bank BI i ta tcd tccd ttp tp ACT PE twitch orst-case analyses have been carried out based on analyzing individual dependency [4][0][][2] 5

7 Dataflow Modeling of Command cheduling From command scheduling to actor firing Commands are represented by actors Timing constraints are captured by delay actors cheduling dependencies are depicted by the edges between actors D D CCD tccd ACT tcd D ACT CD D t D j = max{t ACT j + tcd, t D j + tccd} Max-plus algebra 6

8 Dataflow Modeling of Command cheduling Dataflow model of transactions (e.g., 32-byte read) Command scheduling dependencies ACT D PE tccd ttp td tccd ACT tcd D A ta ttp ta PE tp tp ACT CD D TP PE Bank 0 Bank P Dynamisms Unknown order of accessing different banks Variable transaction sizes need different number of banks D CCD CCD ACT CD D TP PE P A ingle rate dataflow graph (DF) 7

9 Outline Background DAM Dynamically scheduled memory controller Dataflow modeling of command scheduling hy dataflow modeling? Mode-controlled dataflow (MCDF) MCDF modeling of a memory controller Experimental results Conclusions 8

10 Mode-Controlled Dataflow Model (MCDF) MCDF is a restricted variant of Boolen data-flow capturing dynamisms while being analyzable The structure includes actors behaving as switch and select, which are controlled by an actor, named mode-controller (MC) 0 M0 A, 0 0 rc, 2 switch Tunnel select M = M 0 M M 0 M 0 M M B, 2 MC Dynamism is captured by defining mode sequences (M) 9

11 Mode-Controlled Dataflow Model (MCDF) 0 M0 A, 0 0 MCDF & M rc, 2 switch Tunnel select MC M B, 2 M 0 = M 0 M = M ingle rate dataflow graph (DF) rc A L Tsw a Tsl rc B L Tsw b Tsl MC MC 0

12 Mode-Controlled Dataflow Model (MCDF) 0 M0 A, 0 0 MCDF & M rc, 2 switch Tunnel select MC M B, 2 M = M 0 M ingle rate dataflow graph (DF) rc A_ L_ Tsw_ a_ Tsl_ rc_2 _2 B_ L_2 Tsw_2 b_ Tsl_2 MC_ MC_2

13 Mode-Controlled Dataflow Model (MCDF) M = M 0 M MCDF & M Tsl_ L_ L_2 Tsl_2 a_ A_ B_ b_ Tsw _2 Tsw_ 2 ingle rate dataflow graph (DF) MC_ rc_ rc_2 MC_2 2

14 Mode-Controlled Dataflow Model (MCDF) M = M 0 M MCDF & M Tsl_ L_ L_2 Tsl_2 a_ A_ B_ b_ M 0_0 M _0 Tsw _2 Tsw_ 2 ingle rate dataflow graph (DF) MC_ rc_ rc_2 MC_2 3

15 Mode-Controlled Dataflow Model (MCDF) M = M 0 M MCDF & M M 0_0 M _0 M 0_ M _ ingle rate dataflow graph (DF) M 0_2 M _2 4

16 Outline Background DAM Dynamically scheduled memory controller Dataflow modeling of command scheduling hy dataflow modeling? Mode-controlled dataflow (MCDF) MCDF modeling of a memory controller Experimental results Conclusions 5

17 MCDF Modeling of Dynamic Command cheduling Mode_0 ACT, 2 Bank 0 Mode_ Mode_7 ACT, 2 ACT, 2 Bank Bank 7 TC: ACT TC: ACT PE Mode_8 PE, Bank 0 Mode_9 Mode_5 PE, PE, Bank Bank 7 TC: PE Mode_6 D, Mode_7, 6

18 Mode tunnel: D Mode tunnel: FA D D D FA FA FA ACT, 2 Mode_0 ource, Mode switch Mode_8 Mode_9 CD ACT, 2 CD Mode tunnel: CD ACT, 2 CD A Mode tunnel: A PE, A PE, A Mode_ Mode_7 P P Mode tunnel: P Mode tunnel: P Mode select Mode_5 PE, P Mode tunnel: P CCD Mode tunnel: CCD Mode tunnel: TP D, TP Mode_6 T Mode tunnel: T Mode tunnel: T T Mode controller Mode_7, TP Mode tunnel: CCD CCD 7

19 MCDF Modeling of Dynamic Command cheduling From transactions to mode sequences Transaction Commands Mode sequence 32-byte read: Bank0, Bank Bank2, Bank3 Bank4, Bank5 Bank6, Bank7 ACT D PE tccd ttp td tccd ACT tcd orst-case bandwidth (CB) D ttp PE Bank 0 Bank From Maximum cycle mean (MCM) to CB C MCM = max C G ω(c M 0 = M 0 M 6 M 8 M M 6 M 9 M = M 2 M 6 M 0 M 3 M 6 M M 2 = M 4 M 6 M 2 M 5 M 6 M 3 M 3 = M 6 M 6 M 4 M 7 M 6 M 5 M = M 0 M M 2 M 3 ω(c (C CB = min C G C f mem e ref 8

20 MCDF Modeling of Dynamic Command cheduling MCDF model Commands are captured by cmd actors Timing constraints are described by delay actors A mode is constructed with these actors Transactions are translated in to static mode sequences (M) Cmds ACT... Timing constraints tcd... Mode_0 M 0 Actors ACT, Actors tcd Mode_ Mode_8 M M N T 0 T T N Trans Trace MCDF Graph Cmd cheduling Memory traffic 9

21 Experiments Goal Validate the MCDF model Obtain the worst-case bandwidth results etup Heracles: a temporal analysis tool developed at Ericsson TMemController: an open-source analysis tool for a real-time memory controller with dynamic command scheduling. 6-bit DD3-800D/600G/233K DAMs with 2Gb Transaction sizes include 6-byte, 32-byte, 64-byte, 28-bytes, 256-byte 20

22 Experiment Validation of the MCDF model imulating the MCDF model with Heracles gives identical command schedules as the cycle-accurate TMemController simulator 2

23 CB (MB/s) Experiment 2: Fixed transaction sizes MCDF >= Analytical MCDF < cheduled only for 6-byte due to cmd collision MCDF dynamic(scheduled [4]) dynamic(analytical [4]) Transaction sizes (bytes) [4] Y. Li et.al. Architecture and analysis of a dynamically-scheduled real-time memory controller. eal-time ystems Journal, pp -55, pringer,

24 CB (MB/s) Experiment 3: Variable transaction sizes 64-byte transactions followed by 28-byte transactions (known order) andom mix of 64-byte and 28-byte transactions (unknown order) transaction order(unknown) MCDF scheduled [4] analytical [4] 23 transaction order(known) [4] Y. Li et.al. Architecture and analysis of a dynamically-scheduled real-time memory controller. eal-time ystems Journal, pp -55, pringer, 205.

25 Conclusions A mode-controlled dataflow (MCDF) model of dynamic command scheduling for T memory controllers upports simulation Provides worst-case bandwidth results Is easy to adapt to other memory controllers The MCDF model outperforms several existing analysis approaches 24

26 Thank You. 25

Trends in Embedded System Design

Trends in Embedded System Design Trends in Embedded System Design MPSoC design gets increasingly complex Moore s law enables increased component integration Digital convergence creates a market for highly integrated devices The resulting

More information

Networks on Chip. on-chip interconnect: physical. Kees Goossens. Kees Goossens Eindhoven University of Technology 1

Networks on Chip. on-chip interconnect: physical. Kees Goossens. Kees Goossens Eindhoven University of Technology 1 1 Networks on Chip Kees Goossens Kees Goossens Group Electrical Engineering Faculty on-chip interconnect: physical Kees Goossens

More information

Kees Goossens Electronic Systems TM 3218 PPMA TPBC T-PIC MBS AICP1 AICP2 VMPG VIP1 VIP2 MSP1 MSP2 MSP3 S S R W M S M S M S M S M S M S

Kees Goossens Electronic Systems TM 3218 PPMA TPBC T-PIC MBS AICP1 AICP2 VMPG VIP1 VIP2 MSP1 MSP2 MSP3 S S R W M S M S M S M S M S M S EJTAG FPBC -PIC DBG PBC GLOBAL IIC x CLOCK EET T_DBG BOOT P90 -Bridge -PI Bus F-PI Bus F-Gate DE PCI -Gate DA CAD UAT x UB 9 emory Controller C-Bridge PPA B AICP AICP VPG V V P P P T-Gate T 8 T-PI Bus

More information

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni

Worst Case Analysis of DRAM Latency in Multi-Requestor Systems. Zheng Pei Wu Yogen Krish Rodolfo Pellizzoni orst Case Analysis of DAM Latency in Multi-equestor Systems Zheng Pei u Yogen Krish odolfo Pellizzoni Multi-equestor Systems CPU CPU CPU Inter-connect DAM DMA I/O 1/26 Multi-equestor Systems CPU CPU CPU

More information

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization

Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization Exploiting Expendable Process-Margins in DRAMs for Run-Time Performance Optimization Karthik Chandrasekar, Sven Goossens 2, Christian Weis 3, Martijn Koedam 2, Benny Akesson 4, Norbert Wehn 3, and Kees

More information

Introduction to memory system :from device to system

Introduction to memory system :from device to system Introduction to memory system :from device to system Jianhui Yue Electrical and Computer Engineering University of Maine The Position of DRAM in the Computer 2 The Complexity of Memory 3 Question Assume

More information

Variability Windows for Predictable DDR Controllers, A Technical Report

Variability Windows for Predictable DDR Controllers, A Technical Report Variability Windows for Predictable DDR Controllers, A Technical Report MOHAMED HASSAN 1 INTRODUCTION In this technical report, we detail the derivation of the variability window for the eight predictable

More information

1 Introduction. embedded systems design networks and connected systems verification and validation networks on chip

1 Introduction. embedded systems design networks and connected systems verification and validation networks on chip NETOK ON CHIP: A COMMUNICATION-CENTIC APPOACH TO PLATFOM-BAED DEIGN Jef van Meerbergen Fellow Philips esearch Professor Technical University Eindhoven Abstract Embedded system implementations must be flexible,

More information

Design and Implementation of Refresh and Timing Controller Unit for LPDDR2 Memory Controller

Design and Implementation of Refresh and Timing Controller Unit for LPDDR2 Memory Controller Design and Implementation of Refresh and Timing Controller Unit for LPDDR2 Memory Controller Sandya M.J Dept. of Electronics and communication BNM Institute Of Technology Chaitra.N Dept. of Electronics

More information

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers. 5kk73 An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part

More information

Reliable Dynamic Embedded Data Processing Systems

Reliable Dynamic Embedded Data Processing Systems 2 Embedded Data Processing Systems Reliable Dynamic Embedded Data Processing Systems sony Twan Basten thales Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen,

More information

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM

Low-Cost Inter-Linked Subarrays (LISA) Enabling Fast Inter-Subarray Data Movement in DRAM Low-Cost Inter-Linked ubarrays (LIA) Enabling Fast Inter-ubarray Data Movement in DRAM Kevin Chang rashant Nair, Donghyuk Lee, augata Ghose, Moinuddin Qureshi, and Onur Mutlu roblem: Inefficient Bulk Data

More information

CIS-331 Exam 2 Fall 2015 Total of 105 Points Version 1

CIS-331 Exam 2 Fall 2015 Total of 105 Points Version 1 Version 1 1. (20 Points) Given the class A network address 117.0.0.0 will be divided into multiple subnets. a. (5 Points) How many bits will be necessary to address 4,000 subnets? b. (5 Points) What is

More information

Reliable Embedded Multimedia Systems?

Reliable Embedded Multimedia Systems? 2 Overview Reliable Embedded Multimedia Systems? Twan Basten Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, and others Embedded Multi-media Analysis of

More information

Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system

Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system Modelling and simulation of guaranteed throughput channels of a hard real-time multiprocessor system A.J.M. Moonen Information and Communication Systems Department of Electrical Engineering Eindhoven University

More information

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun

EECS750: Advanced Operating Systems. 2/24/2014 Heechul Yun EECS750: Advanced Operating Systems 2/24/2014 Heechul Yun 1 Administrative Project Feedback of your proposal will be sent by Wednesday Midterm report due on Apr. 2 3 pages: include intro, related work,

More information

Memory management. Knut Omang Ifi/Oracle 10 Oct, 2012

Memory management. Knut Omang Ifi/Oracle 10 Oct, 2012 Memory management Knut Omang Ifi/Oracle 1 Oct, 212 (with slides from V. Goebel, C. Griwodz (Ifi/UiO), P. Halvorsen (Ifi/UiO), K. Li (Princeton), A. Tanenbaum (VU Amsterdam), and M. van Steen (VU Amsterdam))

More information

trend: embedded systems Composable Timing and Energy in CompSOC trend: multiple applications on one device problem: design time 3 composability

trend: embedded systems Composable Timing and Energy in CompSOC trend: multiple applications on one device problem: design time 3 composability Eindhoven University of Technology This research is supported by EU grants T-CTEST, Cobra and NL grant NEST. Parts of the platform were developed in COMCAS, Scalopes, TSA, NEVA,

More information

CIS-331 Fall 2014 Exam 1 Name: Total of 109 Points Version 1

CIS-331 Fall 2014 Exam 1 Name: Total of 109 Points Version 1 Version 1 1. (24 Points) Show the routing tables for routers A, B, C, and D. Make sure you account for traffic to the Internet. Router A Router B Router C Router D Network Next Hop Next Hop Next Hop Next

More information

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems

Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Balancing DRAM Locality and Parallelism in Shared Memory CMP Systems Min Kyu Jeong, Doe Hyun Yoon^, Dam Sunwoo*, Michael Sullivan, Ikhwan Lee, and Mattan Erez The University of Texas at Austin Hewlett-Packard

More information

Performance Measurements Improving Latency and Bandwidth of Your DDR4 System MEMCON 2014

Performance Measurements Improving Latency and Bandwidth of Your DDR4 System MEMCON 2014 Performance Measurements Improving Latency and Bandwidth of Your DDR4 System Barbara Aichinger Vice President New Business Development FuturePlus Systems Corporation Outline Performance measurements or

More information

Efficient real-time SDRAM performance

Efficient real-time SDRAM performance 1 Efficient real-time SDRAM performance Kees Goossens with Benny Akesson, Sven Goossens, Karthik Chandrasekar, Manil Dev Gomony, Tim Kouters, and others Kees Goossens

More information

The CompSOC Design Flow for Virtual Execution Platforms

The CompSOC Design Flow for Virtual Execution Platforms NEST COBRA CA104 The CompSOC Design Flow for Virtual Execution Platforms FPGAWorld 10-09-2013 Sven Goossens*, Benny Akesson*, Martijn Koedam*, Ashkan Beyranvand Nejad, Andrew Nelson, Kees Goossens* * Introduction

More information

Fault tolerance in consumer products. Ben Pronk

Fault tolerance in consumer products. Ben Pronk Fault tolerance in consumer products Ben Pronk Content Consumer electronics, some background Reliability and software in consumer products Current solutions Future outllok 2 Consumer electronics, some

More information

MARACAS: A Real-Time Multicore VCPU Scheduling Framework

MARACAS: A Real-Time Multicore VCPU Scheduling Framework : A Real-Time Framework Computer Science Department Boston University Overview 1 2 3 4 5 6 7 Motivation platforms are gaining popularity in embedded and real-time systems concurrent workload support less

More information

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved.

Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA Copyright 2003, SAS Institute Inc. All rights reserved. Intelligent Storage Results from real life testing Ingo Brenckmann Jochen Kirsten Storage Technology Strategists SAS EMEA SAS Intelligent Storage components! OLAP Server! Scalable Performance Data Server!

More information

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1 CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1 This exam is closed book, closed notes. All cell phones must be turned off. No calculators may be used. You have two hours to complete

More information

A Comparative Study of Predictable DRAM Controllers

A Comparative Study of Predictable DRAM Controllers 0:1 0 A Comparative Study of Predictable DRAM Controllers Real-time embedded systems require hard guarantees on task Worst-Case Execution Time (WCET). For this reason, architectural components employed

More information

Product Specifications

Product Specifications Product Specificatio L75S655BGAS. General Information 5MB 6Mx7 SDRAM PC ECC REGISTERED PIN SODIMM Description: The L75S655B is a 6Mx 7 Synchronous Dynamic RAM high deity memory module. This memory module

More information

6.1 Combinational Circuits. George Boole ( ) Claude Shannon ( )

6.1 Combinational Circuits. George Boole ( ) Claude Shannon ( ) 6. Combinational Circuits George Boole (85 864) Claude Shannon (96 2) Signals and Wires Digital signals Binary (or logical ) values: or, on or off, high or low voltage Wires. Propagate digital signals

More information

INSTALLATION and OPERATION MANUAL

INSTALLATION and OPERATION MANUAL INSTALLATION and OPERATION MANUAL Table of Contents Overview and Features.. 1 Specifications 2 Functional Block Diagram 3 Unit Detail. 4 DIP Switch Setting Tables 7 RS-232 I/F Pin Assignment.. 10 V.35

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II

CS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste

More information

Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague

Memory Controllers for Real-Time Embedded Systems. Benny Akesson Czech Technical University in Prague Memory Controllers for Real-Time Embedded Systems Benny Akesson Czech Technical University in Prague Trends in Embedded Systems Embedded systems get increasingly complex Increasingly complex applications

More information

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

Multilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology 1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823

More information

Topic 21: Memory Technology

Topic 21: Memory Technology Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,

More information

Topic 21: Memory Technology

Topic 21: Memory Technology Topic 21: Memory Technology COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August 1 Old Stuff Revisited Mercury Delay Line Memory Maurice Wilkes, in 1947,

More information

The Memory Hierarchy 1

The Memory Hierarchy 1 The Memory Hierarchy 1 What is a cache? 2 What problem do caches solve? 3 Memory CPU Abstraction: Big array of bytes Memory memory 4 Performance vs 1980 Processor vs Memory Performance Memory is very slow

More information

Hybrid Storage Performance Characteristics

Hybrid Storage Performance Characteristics Hybrid Storage Performance Characteristics Kirill Malkin CTO, Starboard Storage Systems Flash Memory Summit 2013 Santa Clara, CA 1 ho is Starboard Storage? Designer and innovator of Hybrid Storage Innovative

More information

CIS-331 Fall 2013 Exam 1 Name: Total of 120 Points Version 1

CIS-331 Fall 2013 Exam 1 Name: Total of 120 Points Version 1 Version 1 1. (24 Points) Show the routing tables for routers A, B, C, and D. Make sure you account for traffic to the Internet. NOTE: Router E should only be used for Internet traffic. Router A Router

More information

Robustness for Control-Data-Traffic in Time Sensitive Networks

Robustness for Control-Data-Traffic in Time Sensitive Networks Robustness for Control-Data-Traffic in Time Sensitive Networks 2013-07-15 -v01- IEEE 802.1 TSN TG Meeting Geneva - Switzerland Presenter: Franz-Josef Goetz, Siemens AG franz-josef.goetz@siemens.com Structure

More information

The cache is 4-way set associative, with 4-byte blocks, and 16 total lines

The cache is 4-way set associative, with 4-byte blocks, and 16 total lines Sample Problem 1 Assume the following memory setup: Virtual addresses are 20 bits wide Physical addresses are 15 bits wide The page size if 1KB (2 10 bytes) The TLB is 2-way set associative, with 8 total

More information

128Mx72 bits PC133 SDRAM Registered DIMM with PLL, based on 64Mx4 SDRAM with LVTTL, 4 banks & 8K Refresh

128Mx72 bits PC133 SDRAM Registered DIMM with PLL, based on 64Mx4 SDRAM with LVTTL, 4 banks & 8K Refresh 128Mx72 bits PC133 SDAM egistered DIMM with PLL, based on 64Mx4 SDAM with LVTTL, 4 banks & 8K efresh DSCIPTION The HYM72V12C736K4 Series are 128Mx72bits CC Synchronous DAM Modules. The modules are composed

More information

Functional modeling style for efficient SW code generation of video codec applications

Functional modeling style for efficient SW code generation of video codec applications Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,

More information

6. Specifications & Additional Information

6. Specifications & Additional Information 6. Specifications & Additional Information SIIGX52004-3.1 Transceier Blocks Table 6 1 shows the transceier blocks for Stratix II GX and Stratix GX deices and compares their features. Table 6 1. Stratix

More information

N551C321. Table of Contents-

N551C321. Table of Contents- Table of Contents- 1. GENERAL DESCRIPTION... 2 2. FEATURES... 2 3. BLOCK DIAGRAM... 3 4. PIN DESCRIPTION... 3 5. ABSOLUTE MAXIMUM RATINGS... 4 6. ELECTRICAL CHARACTERISTICS... 4 7. APPLICATION CIRCUIT...

More information

Why an additional Shaper

Why an additional Shaper Distributed Embedded Systems University of Paderborn Why an additional Shaper Marcel Kiessling Distributed Embedded Systems Marcel Kießling 1 Outline Recap: Industrial Requirements for Latency Different

More information

Page 1. Multilevel Memories (Improving performance using a little cash )

Page 1. Multilevel Memories (Improving performance using a little cash ) Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency

More information

BlueVisor: A Scalable Real-time Hardware Hypervisor for Many-core Embedded System

BlueVisor: A Scalable Real-time Hardware Hypervisor for Many-core Embedded System BlueVisor: A Scalable eal-time Hardware Hypervisor for Many-core Embedded System Zhe Jiang, Neil C Audsley, Pan Dong eal-time Systems Group Department of Computer Science University of York, United Kingdom

More information

EE 457 Unit 7b. Main Memory Organization

EE 457 Unit 7b. Main Memory Organization 1 EE 457 Unit 7b Main Memory Organization 2 Motivation Organize main memory to Facilitate byte-addressability while maintaining Efficient fetching of the words in a cache block Low order interleaving (L.O.I)

More information

2 nd Half. Memory management Disk management Network and Security Virtual machine

2 nd Half. Memory management Disk management Network and Security Virtual machine Final Review 1 2 nd Half Memory management Disk management Network and Security Virtual machine 2 Abstraction Virtual Memory (VM) 4GB (32bit) linear address space for each process Reality 1GB of actual

More information

CIS-331 Spring 2016 Exam 1 Name: Total of 109 Points Version 1

CIS-331 Spring 2016 Exam 1 Name: Total of 109 Points Version 1 Version 1 Instructions Write your name on the exam paper. Write your name and version number on the top of the yellow paper. Answer Question 1 on the exam paper. Answer Questions 2-4 on the yellow paper.

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 19 Advanced Processors III 2006-11-2 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/ 1 Last

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

Avoiding Utilization Inefficiency in.1qbv

Avoiding Utilization Inefficiency in.1qbv Avoiding Utilization Inefficiency in.1qbv IEEE 802 Interim Meeting, Norfolk, VA, May/2014 (preliminary version) Wilfried Steiner, Corporate Scientist wilfried.steiner@tttech.com Page 1 From 802.1Qbv-D1.2

More information

4. Specifications and Additional Information

4. Specifications and Additional Information 4. Specifications and Additional Information AGX52004-1.0 8B/10B Code This section provides information about the data and control codes for Arria GX devices. Code Notation The 8B/10B data and control

More information

Giorgio Buttazzo. Scuola Superiore Sant Anna, Pisa. The transition

Giorgio Buttazzo. Scuola Superiore Sant Anna, Pisa. The transition Giorgio Buttazzo Scuola Superiore Sant Anna, Pisa The transition On May 7 th, 2004, Intel, the world s largest chip maker, canceled the development of the Tejas processor, the successor of the Pentium4-style

More information

Peristaltic Shaper: updates, multiple speeds

Peristaltic Shaper: updates, multiple speeds Peristaltic Shaper: ups, multiple speeds Michael Johas Teener Broadcom, mikejt@broadcom.com 00 IEEE 80. TimeSensitive Networking TG Agenda Objectives review History Baseline design and assumptions Higher

More information

Performance Tuning on the Blackfin Processor

Performance Tuning on the Blackfin Processor 1 Performance Tuning on the Blackfin Processor Outline Introduction Building a Framework Memory Considerations Benchmarks Managing Shared Resources Interrupt Management An Example Summary 2 Introduction

More information

TopNet implementation: SpW IP Tunnel and Protocol Analyser

TopNet implementation: SpW IP Tunnel and Protocol Analyser TopNet implementation: SpW IP Tunnel and Protocol Analyser Vitulli R. TEC-EDP Email: Raffaele.Vitulli@esa.int Slide : 1 SpaceWire System Node 71 Node 72 Node 61 Router 2 Node 73 Node 62 Router 1 Router

More information

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks ABSTRACT High end System-on-Chip (SoC) architectures consist of tens of processing engines. These processing engines have varied

More information

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig

Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig Accelerating Development and Troubleshooting of Data Center Bridging (DCB) Protocols Using Xgig The new Data Center Bridging (DCB) protocols provide important mechanisms for enabling priority and managing

More information

Mapping and Configuration Methods for Multi-Use-Case Networks on Chips

Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Srinivasan Murali, Stanford University Martijn Coenen, Andrei Radulescu, Kees Goossens, Giovanni De Micheli, Ecole Polytechnique Federal

More information

Memory Access Scheduling

Memory Access Scheduling Memory Access Scheduling ECE 5900 Computer Engineering Seminar Ying Xu Mar 4, 2005 Instructor: Dr. Chigan 1 ECE 5900 spring 05 1 Outline Introduction Modern DRAM architecture Memory access scheduling Structure

More information

Embedded Systems. Series Editors

Embedded Systems. Series Editors Embedded Systems Series Editors Nikil D. Dutt, Department of Computer Science, Zot Code 3435, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697-3435, USA

More information

A Comprehensive Analytical Performance Model of DRAM Caches

A Comprehensive Analytical Performance Model of DRAM Caches A Comprehensive Analytical Performance Model of DRAM Caches Authors: Nagendra Gulur *, Mahesh Mehendale *, and R Govindarajan + Presented by: Sreepathi Pai * Texas Instruments, + Indian Institute of Science

More information

CIS-331 Exam 2 Fall 2014 Total of 105 Points. Version 1

CIS-331 Exam 2 Fall 2014 Total of 105 Points. Version 1 Version 1 1. (20 Points) Given the class A network address 119.0.0.0 will be divided into a maximum of 15,900 subnets. a. (5 Points) How many bits will be necessary to address the 15,900 subnets? b. (5

More information

Computer Structure. Unit 4. Processor

Computer Structure. Unit 4. Processor Computer Structure Unit 4. Processor Departamento de Informática Grupo de Arquitectura de Computadores, Comunicaciones y Sistemas UNIVERSIDAD CARLOS III DE MADRID Contents Computer elements Processor organization

More information

Refresh-Aware DDR3 Barrel Memory Controller with Deterministic Functionality

Refresh-Aware DDR3 Barrel Memory Controller with Deterministic Functionality Refresh-Aware DDR3 Barrel Memory Controller with Deterministic Functionality Abir M zah Department of Computer and System Engineering ENSTA-Paristech, 828 Blvd Marechaux, Palaiseau, France abir.mzah@ensta-paristech.fr

More information

Computer Architecture ELEC3441

Computer Architecture ELEC3441 CPU-Memory Bottleneck Computer Architecture ELEC44 CPU Memory Lecture 9 Cache Dr. Hayden Kwok-Hay So Department of Electrical and Electronic Engineering Performance of high-speed computers is usually limited

More information

ELCT 912: Advanced Embedded Systems

ELCT 912: Advanced Embedded Systems Advanced Embedded Systems Lecture 2: Memory and Programmable Logic Dr. Mohamed Abd El Ghany, Memory Random Access Memory (RAM) Can be read and written Static Random Access Memory (SRAM) Data stored so

More information

DMA Latency

DMA Latency AB-36 APPLICATION BRIEF 80186 80188 DMA Latency STEVE FARRER APPLICATIONS ENGINEER April 1989 Order Number 270525-001 Information in this document is provided in connection with Intel products Intel assumes

More information

Efficient Signature Matching with Multiple Alphabet Compression Tables

Efficient Signature Matching with Multiple Alphabet Compression Tables Efficient Signature Matching with Multiple Alphabet Compression Tables Shijin Kong Randy Smith Cristian Estan Presented at SecureComm, Istanbul, Turkey Signature Matching Signature Matching a core component

More information

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question:

Database Workload. from additional misses in this already memory-intensive databases? interference could be a problem) Key question: Database Workload + Low throughput (0.8 IPC on an 8-wide superscalar. 1/4 of SPEC) + Naturally threaded (and widely used) application - Already high cache miss rates on a single-threaded machine (destructive

More information

A high-level model of embedded flash energy consumption

A high-level model of embedded flash energy consumption A high-level model of embedded flash energy consumption To appear in CASES, ESWEEK 2014 James Pallister, PhD Student Kerstin Eder, Primary PhD Supervisor Simon Hollis, PhD Supervisor Jeremy Bennett, Embecosm

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

Composable Resource Sharing Based on Latency-Rate Servers

Composable Resource Sharing Based on Latency-Rate Servers Composable Resource Sharing Based on Latency-Rate Servers Benny Akesson 1, Andreas Hansson 1, Kees Goossens 2,3 1 Eindhoven University of Technology 2 NXP Semiconductors Research 3 Delft University of

More information

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by: Main Memory Main memory generally utilizes Dynamic RAM (DRAM), which use a single transistor to store a bit, but require a periodic data refresh by reading every row. Static RAM may be used for main memory

More information

CPU issues address (and data for write) Memory returns data (or acknowledgment for write)

CPU issues address (and data for write) Memory returns data (or acknowledgment for write) The Main Memory Unit CPU and memory unit interface Address Data Control CPU Memory CPU issues address (and data for write) Memory returns data (or acknowledgment for write) Memories: Design Objectives

More information

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor

Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor Juan C. Pichel Centro de Investigación en Tecnoloxías da Información (CITIUS) Universidade de Santiago de Compostela, Spain

More information

A Comparative Study of Predictable DRAM Controllers

A Comparative Study of Predictable DRAM Controllers A A Comparative Study of Predictable DRAM Controllers Danlu Guo,Mohamed Hassan,Rodolfo Pellizzoni and Hiren Patel, {dlguo,mohamed.hassan,rpellizz,hiren.patel}@uwaterloo.ca, University of Waterloo Recently,

More information

KNX TinySerial 810. Communication Protocol. WEINZIERL ENGINEERING GmbH

KNX TinySerial 810. Communication Protocol. WEINZIERL ENGINEERING GmbH WEINZIERL ENGINEERING GmbH KNX TinySerial 810 Communication Protocol WEINZIERL ENGINEERING GmbH Bahnhofstr. 6 DE-84558 Tyrlaching GERMAY Tel. +49 8623 / 987 98-03 Fax +49 8623 / 987 98-09 E-Mail: info@weinzierl.de

More information

Comparative Analysis of Contemporary Cache Power Reduction Techniques

Comparative Analysis of Contemporary Cache Power Reduction Techniques Comparative Analysis of Contemporary Cache Power Reduction Techniques Ph.D. Dissertation Proposal Samuel V. Rodriguez Motivation Power dissipation is important across the board, not just portable devices!!

More information

Part IV: 3D WiNoC Architectures

Part IV: 3D WiNoC Architectures Wireless NoC as Interconnection Backbone for Multicore Chips: Promises, Challenges, and Recent Developments Part IV: 3D WiNoC Architectures Hiroki Matsutani Keio University, Japan 1 Outline: 3D WiNoC Architectures

More information

Snoop-Based Multiprocessor Design III: Case Studies

Snoop-Based Multiprocessor Design III: Case Studies Snoop-Based Multiprocessor Design III: Case Studies Todd C. Mowry CS 41 March, Case Studies of Bus-based Machines SGI Challenge, with Powerpath SUN Enterprise, with Gigaplane Take very different positions

More information

Section 8.3 Vector, Parametric, and Symmetric Equations of a Line in

Section 8.3 Vector, Parametric, and Symmetric Equations of a Line in Section 8.3 Vector, Parametric, and Symmetric Equations of a Line in R 3 In Section 8.1, we discussed vector and parametric equations of a line in. In this section, we will continue our discussion, but,

More information

Design and Implementation of High Performance Application Specific Memory

Design and Implementation of High Performance Application Specific Memory Design and Implementation of High Performance Application Specific Memory - 고성능 Application Specific Memory 의설계와구현 - M.S. Thesis Sungdae Choi Dec. 20th, 2002 Outline Introduction Memory for Mobile 3D Graphics

More information

11.1. Unit 11. Adders & Arithmetic Circuits

11.1. Unit 11. Adders & Arithmetic Circuits . Unit s & Arithmetic Circuits .2 Learning Outcomes I understand what gates are used to design half and full adders I can build larger arithmetic circuits from smaller building blocks ADDER.3 (+) Register.4

More information

EECS 322 Computer Architecture Superpipline and the Cache

EECS 322 Computer Architecture Superpipline and the Cache EECS 322 Computer Architecture Superpipline and the Cache Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow Summary:

More information

Computer Structure. The Uncore. Computer Structure 2013 Uncore

Computer Structure. The Uncore. Computer Structure 2013 Uncore Computer Structure The Uncore 1 2 nd Generation Intel Next Generation Intel Turbo Boost Technology High Bandwidth Last Level Cache Integrates CPU, Graphics, MC, PCI Express* on single chip PCH DMI PCI

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

F21 Microprocessor Preliminary specifications 9/98

F21 Microprocessor Preliminary specifications 9/98 F21 contains a CPU, a memory interface processor, two analog I/O coprocessors, an active message serial network coprocessor, and a parallel I/O port on a small custom VLSI CMOS chip. CPU 0 operand stack

More information

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review

Chapter 7 Large and Fast: Exploiting Memory Hierarchy. Memory Hierarchy. Locality. Memories: Review Memories: Review Chapter 7 Large and Fast: Exploiting Hierarchy DRAM (Dynamic Random Access ): value is stored as a charge on capacitor that must be periodically refreshed, which is why it is called dynamic

More information

Real-Time (Paradigms) (47)

Real-Time (Paradigms) (47) Real-Time (Paradigms) (47) Memory: Memory Access Protocols Tasks competing for exclusive memory access (critical sections, semaphores) become interdependent, a common phenomenon especially in distributed

More information

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S.

Memory Hierarchy. Reading. Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 (2) Lecture notes from MKP, H. H. Lee and S. Memory Hierarchy Lecture notes from MKP, H. H. Lee and S. Yalamanchili Sections 5.1, 5.2, 5.3, 5.4, 5.8 (some elements), 5.9 Reading (2) 1 SRAM: Value is stored on a pair of inerting gates Very fast but

More information

A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems

A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems Sven Goossens, Jasper Kuijsten, Benny Akesson, Kees Goossens Eindhoven University of Technology {s.l.m.goossens,k.b.akesson,k.g.w.goossens}@tue.nl

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. 13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third

More information

On Generalized Processor Sharing with Regulated Traffic for MPLS Traffic Engineering

On Generalized Processor Sharing with Regulated Traffic for MPLS Traffic Engineering On Generalized Processor Sharing with Regulated Traffic for MPLS Traffic Engineering Shivendra S. Panwar New York State Center for Advanced Technology in Telecommunications (CATT) Department of Electrical

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information