Design Space Exploration for Memory Subsystems of VLIW Architectures

Size: px
Start display at page:

Download "Design Space Exploration for Memory Subsystems of VLIW Architectures"

Transcription

1 E University of Paderborn Dr.-Ing. Mario Porrmann Design Space Exploration for Memory Subsystems of VLIW Architectures Thorsten Jungeblut 1, Gregor Sievers, Mario Porrmann 1, Ulrich Rückert 2 1 System and Circuit Technology, University of Paderborn 2 Cognitive Interaction Technology Center of Excellence, Bielefeld University

2 Motivation(1) - Increasing complexity of mobile applications - More functionality - New algorithms (LTE; LTE Advanced) - Multimedia applications (Video, 3-D, ) - Nonflexible hardware Flexible software implementation (Software-Defined Radio - SDR) Powerful CPU necessary - High requirements to ressource efficiency! 2

3 Motivation(2) In embedded processors size of on-chip memories is limited External (SDRAM) memory Low costs per bit Slow/high latency Intermediate storage of accesses in the cache Loading of entire cache lines from the external memory Use of temporal and spatial locality Size of the caches is limited by the operating frequency of the processor core Cache hierachie Level-1 cache is matched to the core frequency additional levels with higher latency Register Level-1 cache Level-2 cache Main memory Hard disk 3

4 Outline Concurrent design flow for DSE VLIW architecture/cache architecture Prototyping Environment Performance results and resource requirements Conclusion/Outlook Specification FE Instruction Fetch / L1 Instruction Cache Instruction Memory DC Vice-UPSLA Instruction Decode Benchmarks Source Code Bypass RTL-Description RD UPSLA Register Read Compiler RTL-Code Assembler Code Register RTL-Simulator ALU Condition RTL-Code EX Register * / Synthesis-Tool LD/ST Netlist ME Emulator (Prototyp) Assembler ALU ALU ALU Object-Files * / * / * / Linker LD/ST LD/ST LD/ST Executables ASIC-Realization L1 Data-Cache Software Simulator Data Memory Profiling-Data Functional WR Verification Register Write Visualization Ressource Efficiency 4

5 Design Space Exploration Tool Flow Specification Goal: Highly automated design flow Vice-UPSLA Benchmarks Source Code RTL-Description UPSLA Compiler RTL-Code Assembler Code RTL-Simulator Assembler RTL-Code Object-Files Synthesis-Tool Linker Netlist Executables Emulator (Prototyp) ASIC-Realization Software Simulator Functional Verification Visualization Profiling-Data Ressource Efficiency 5

6 The CoreVA architecture Modular Design FE Instruction Fetch / L1 Instruction Cache Instruction Memory DC Instruction Decode Bypass RD Register Read Condition Register EX * ALU / ALU ALU ALU * / * / * / Register ME LD/ST LD/ST LD/ST LD/ST L1 Data-Cache Data Memory WR Register Write 6

7 Dynamically Reconfigurable Platform RAPTOR-X64 Prototypic Implementation of Microelectronic Circuits on FPGAs Up to 200 Million transistors emulated Flexible, modular concept: PCI-Busmotherboard with up to six modules Partial dynamic reconfiguration at high reconfiguration bandwidth USB Controller USB 2.0-High-Speed USB-OTG System Monitor Voltage, Tempature, Analog Inputs Clock Sythesis, Distribution TST-JTAG CFG-JTAG USB Logic Local-Bus Master Local-Bus Slave OTG-Control CTRL+Config Logic Arbiter, MMU Diagnostics, CLK, Configuration, etc. Xilinx SystemACE CF CF Access, JTAG Control PCI-X-Bus (64Bit Data / 32Bit Address) PCI-Bus- Bridge Master, Slave, DMA Dual-Port SRAM Module 6 CTRL, SMB 128 SelectMAP, CFG-JTAG Local-Bus (32Bit Data / 32Bit Address) 85 Module 1 75 CTRL, SMB 128 SelectMAP, CFG-JTAG 85 Module 2 75 CTRL, SMB 128 SelectMAP, CFG-JTAG 85 Module Module 4 Broadcast-Bus 7

8 System Environment Multi master system bus Generic I/D cache interfaces to external memory 4 GB SDRAM Penalty cycles on cache misses: Instr. cache: >73 clock cycles Data cache: >61 clock cycles Internal memories can be accessed from host system Generic interface for dedicated hardware extensions 9.1 Gbit/s external bandwidth SDRAM SDRAM Controller Systembus Controller Localbus Interface Arbiter Instr. Cache Data Cache Systembus MMIO CoreVA CPU Host PC FIFO CRC UART Xilinx FPGA ASIC RAPTOR2000 System 8

9 Cache Architecture Overview I-Cache: 32 bit per issue slot 4 slot configuration: 128 bit interface Direct mapped (low latency/power/area) 16kB cache size, 64 bytes line width (configurable) D-Cache: 1-/2-port configuration possible Direct mapped 16kB cache size, 32 bytes line width (configurable) Write-back policy, non-blocking Two programmable allocation modes: fetch-on-write-miss/allocate-on-write-miss I-/D-Caches can dynamically be configured as scratch pad memories Higher performance for timing critical parts of an application (cache misses are avoided) Energy improvements due to nonexistent external memory accesses 9

10 Cache Architecture Synthesis Results 10

11 Application Evaluation Different Cache Configurations Applications: synthetic benchmarks, baseband, cryptography, multimedia, LTE protocol stack 50% LD/ST-units per #FUs best trade-off Concurrent LD/ST Speedup! Speedup dependent on scheduling! 11

12 Results(1) Hit Rates High hit rates for all applications Allocate-on-write-miss Fetch-on-write-miss 12

13 Results(2) Portion of Stall Cycles to Execution Time Latencies of SDRAM accesses may vary dependent on the order, distribution and frequency of the accesses. 13

14 Results(3) Performance 14

15 Results(4) Energy 15

16 Results(5) Energy-Delay 16

17 Register File 1.66 mm The CoreVA VLIW architecture ASIC realization 4-issue VLIW processor, 2x MLA,DIV 1-Port I-Cache (16kByte,128 Bit), 2-Port D-Cache (16kByte, 32 Bit) 65nm ST Microelectronics, Low Power (Thick Oxide), 1.2V MixedVT, 1.8V I/Os (configurable pullups) Hardware extensions (incl. ECC) 1.66 mm Instruction Cache Comp. Cell Execute Frequency Area (32kB SRAM) Power Consumption 400 MHz 2.7 mm² 0.1 W 1.6 GOP/s in scalar mode 3.2 GOP/s in SIMD mode Data Cache ECC 17

18 Conclusion/Outlook Framework for the design-space exploration of processor architectures and memory subsystems Rapid prototyping environment RAPTOR Dynamic configurable cache architecture 2-slot configuration/allocate-on-write-miss shows best energy trade-off Performance/Energy gains up to 25% Future work: Include associativity Combination of caches/scratch-pad memories to enhance memory bandwidth 18

19 Questions? Specification Vice-UPSLA Benchmarks Source Code RTL-Description UPSLA Compiler RTL-Code Assembler Code RTL-Simulator Assembler RTL-Code Object-Files Synthesis-Tool Linker Netlist Executables Emulator (Prototyp) ASIC-Realization Software Simulator Functional Verification Visualization Profiling-Data Ressource Efficiency Design space exploration VLIW architecture SDRAM SDRAM Controller Systembus Controller Arbiter Instr. Cache Data Cache Systembus CoreVA CPU Localbus Interface MMIO Host PC FIFO CRC UART Xilinx FPGA ASIC Rapid prototyping RAPTOR2000 System System Architecture 19

20 E University of Paderborn Dr.-Ing. Mario Porrmann Thank you for your attention! Heinz Nixdorf Institute University of Paderborn System and Circuit Technology Dipl.-Ing. Thorsten Jungeblut Fürstenallee Paderborn Tel.: / Fax.: / tj@hni.upb.de

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive

More information

DRPM architecture overview

DRPM architecture overview DRPM architecture overview Jens Hagemeyer, Dirk Jungewelter, Dario Cozzi, Sebastian Korf, Mario Porrmann Center of Excellence Cognitive action Technology, Bielefeld University, Germany Project partners:

More information

RAPTOR A Scalable Platform for Rapid Prototyping and FPGA-based Cluster Computing

RAPTOR A Scalable Platform for Rapid Prototyping and FPGA-based Cluster Computing RAPTOR A Scalable Platform for Rapid Prototyping and FPGA-based Cluster Computing Mario PORRMANN, Jens HAGEMEYER, Johannes ROMOTH, Manuel STRUGHOLTZ, and Christopher POHL 1 Heinz Nixdorf Institute, University

More information

RAPTOR. Modular Rapid Prototyping HEINZ NIXDORF INSTITUTE

RAPTOR. Modular Rapid Prototyping HEINZ NIXDORF INSTITUTE HEINZ NIXDORF INSTITUTE University of Paderborn RAPTOR Modular Rapid Prototyping The modular FPGA-based rapid prototyping systems of the RAPTOR family integrate all key components to realize circuit and

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

A Reconfigurable SOM Hardware Accelerator

A Reconfigurable SOM Hardware Accelerator A Reconfigurable SOM Hardware Accelerator M. Porrmann, M. Franzmeier, H. Kalte, U. Witkowski, U. Rückert Heinz Nixdorf Institute, System and Circuit Technology, University of Paderborn, Germany email:

More information

LEON4: Fourth Generation of the LEON Processor

LEON4: Fourth Generation of the LEON Processor LEON4: Fourth Generation of the LEON Processor Magnus Själander, Sandi Habinc, and Jiri Gaisler Aeroflex Gaisler, Kungsgatan 12, SE-411 19 Göteborg, Sweden Tel +46 31 775 8650, Email: {magnus, sandi, jiri}@gaisler.com

More information

CoreVA-MPSoC: A Many-core Architecture with Tightly Coupled Shared and Local Data Memories

CoreVA-MPSoC: A Many-core Architecture with Tightly Coupled Shared and Local Data Memories IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, POST-PRINT, DECEMBER 2017 1 CoreVA-MPSoC: A Many-core Architecture with Tightly Coupled and Local Data Memories Johannes Ax, Gregor Sievers, Julian

More information

Towards Optimal Custom Instruction Processors

Towards Optimal Custom Instruction Processors Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors

More information

System-Level Analysis of Network Interfaces for Hierarchical MPSoCs

System-Level Analysis of Network Interfaces for Hierarchical MPSoCs System-Level Analysis of Network Interfaces for Hierarchical MPSoCs Johannes Ax *, Gregor Sievers *, Martin Flasskamp *, Wayne Kelly, Thorsten Jungeblut *, and Mario Porrmann * * Cognitronics and Sensor

More information

A Synchronization Method for Register Traces of Pipelined Processors

A Synchronization Method for Register Traces of Pipelined Processors Synchronization Method for Register Traces of Pipelined Processors Ralf Dreesen 1, Thorsten Jungeblut 2, Michael Thies 1, Mario Porrmann 2, Uwe Kastens 1, Ulrich Rückert 2 1 University of Paderborn, Department

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Organization Part II Ujjwal Guin, Assistant Professor Department of Electrical and Computer Engineering Auburn University, Auburn,

More information

COE758 Digital Systems Engineering

COE758 Digital Systems Engineering COE758 Digital Systems Engineering Project #1 Memory Hierarchy: Cache Controller Objectives To learn the functionality of a cache controller and its interaction with blockmemory (SRAM based) and SDRAM-controllers.

More information

Next Generation Multi-Purpose Microprocessor

Next Generation Multi-Purpose Microprocessor Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009 www.aeroflex.com/gaisler OUTLINE NGMP key requirements Development schedule Architectural Overview LEON4FT features

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

Advanced Memory Organizations

Advanced Memory Organizations CSE 3421: Introduction to Computer Architecture Advanced Memory Organizations Study: 5.1, 5.2, 5.3, 5.4 (only parts) Gojko Babić 03-29-2018 1 Growth in Performance of DRAM & CPU Huge mismatch between CPU

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

A ONE CHIP HARDENED SOLUTION FOR HIGH SPEED SPACEWIRE SYSTEM IMPLEMENTATIONS

A ONE CHIP HARDENED SOLUTION FOR HIGH SPEED SPACEWIRE SYSTEM IMPLEMENTATIONS A ONE CHIP HARDENED SOLUTION FOR HIGH SPEED SPACEWIRE SYSTEM IMPLEMENTATIONS Joseph R. Marshall, Richard W. Berger, Glenn P. Rakow Conference Contents Standards & Topology ASIC Program History ASIC Features

More information

Summary of Computer Architecture

Summary of Computer Architecture Summary of Computer Architecture Summary CHAP 1: INTRODUCTION Structure Top Level Peripherals Computer Central Processing Unit Main Memory Computer Systems Interconnection Communication lines Input Output

More information

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation Mainstream Computer System Components CPU Core 2 GHz - 3.0 GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation One core or multi-core (2-4) per chip Multiple FP, integer

More information

Recap: Machine Organization

Recap: Machine Organization ECE232: Hardware Organization and Design Part 14: Hierarchy Chapter 5 (4 th edition), 7 (3 rd edition) http://www.ecs.umass.edu/ece/ece232/ Adapted from Computer Organization and Design, Patterson & Hennessy,

More information

53 GOPS Programmable Vision Processor for Processing, Coding-Decoding and Synthesizing of Images

53 GOPS Programmable Vision Processor for Processing, Coding-Decoding and Synthesizing of Images 53 GOPS Programmable Vision Processor for Processing, Coding-Decoding and Synthesizing of Images Ulrich Ramacher, Wolfgang Raab, Nico Bruels, Ulrich Hachmann, Christian Sauer, Alex Schackow, Jörg Gliese,

More information

MPSOC 2011 BEAUNE, FRANCE

MPSOC 2011 BEAUNE, FRANCE MPSOC 2011 BEAUNE, FRANCE BOADRES: A SCALABLE BASEBAND PROCESSOR TEMPLATE FOR Gbps RADIOS VICE PRESIDENT, CHAIRMAN OF THE TECHNOLOGY OFFICE PROFESSOR AT THE KATHOLIEKE UNIVERSITEIT LEUVEN STATUS SDR BASEBAND

More information

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info. A FPGA based development platform as part of an EDK is available to target intelop provided IPs or other standard IPs. The platform with Virtex-4 FX12 Evaluation Kit provides a complete hardware environment

More information

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS UNIT-I OVERVIEW & INSTRUCTIONS 1. What are the eight great ideas in computer architecture? The eight

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600) 200 MHz (internal base chip clock) 8-way interleaved

More information

S2C K7 Prodigy Logic Module Series

S2C K7 Prodigy Logic Module Series S2C K7 Prodigy Logic Module Series Low-Cost Fifth Generation Rapid FPGA-based Prototyping Hardware The S2C K7 Prodigy Logic Module is equipped with one Xilinx Kintex-7 XC7K410T or XC7K325T FPGA device

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

Creating a Scalable Microprocessor:

Creating a Scalable Microprocessor: Creating a Scalable Microprocessor: A 16-issue Multiple-Program-Counter Microprocessor With Point-to-Point Scalar Operand Network Michael Bedford Taylor J. Kim, J. Miller, D. Wentzlaff, F. Ghodrat, B.

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant

More information

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1)

Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) Department of Electr rical Eng ineering, Chapter 5 Large and Fast: Exploiting Memory Hierarchy (Part 1) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering,

More information

FPQ6 - MPC8313E implementation

FPQ6 - MPC8313E implementation Formation MPC8313E implementation: This course covers PowerQUICC II Pro MPC8313 - Processeurs PowerPC: NXP Power CPUs FPQ6 - MPC8313E implementation This course covers PowerQUICC II Pro MPC8313 Objectives

More information

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350): Motivation for The Memory Hierarchy: { CPU/Memory Performance Gap The Principle Of Locality Cache $$$$$ Cache Basics:

More information

Digital Systems Design. System on a Programmable Chip

Digital Systems Design. System on a Programmable Chip Digital Systems Design Introduction to System on a Programmable Chip Dr. D. J. Jackson Lecture 11-1 System on a Programmable Chip Generally involves utilization of a large FPGA Large number of logic elements

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

Building and Using the ATLAS Transactional Memory System

Building and Using the ATLAS Transactional Memory System Building and Using the ATLAS Transactional Memory System Njuguna Njoroge, Sewook Wee, Jared Casper, Justin Burdick, Yuriy Teslyar, Christos Kozyrakis, Kunle Olukotun Computer Systems Laboratory Stanford

More information

CPU Pipelining Issues

CPU Pipelining Issues CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput

More information

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies TDT4255 Lecture 10: Memory hierarchies Donn Morrison Department of Computer Science 2 Outline Chapter 5 - Memory hierarchies (5.1-5.5) Temporal and spacial locality Hits and misses Direct-mapped, set associative,

More information

FPGA Based Digital Design Using Verilog HDL

FPGA Based Digital Design Using Verilog HDL FPGA Based Digital Design Using Course Designed by: IRFAN FAISAL MIR ( Verilog / FPGA Designer ) irfanfaisalmir@yahoo.com * Organized by Electronics Division Integrated Circuits Uses for digital IC technology

More information

Caches. Hiding Memory Access Times

Caches. Hiding Memory Access Times Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY

More information

Digital Signal Processor Core Technology

Digital Signal Processor Core Technology The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x

More information

The Memory Hierarchy & Cache

The Memory Hierarchy & Cache Removing The Ideal Memory Assumption: The Memory Hierarchy & Cache The impact of real memory on CPU Performance. Main memory basic properties: Memory Types: DRAM vs. SRAM The Motivation for The Memory

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

Universal Serial Bus Host Interface on an FPGA

Universal Serial Bus Host Interface on an FPGA Universal Serial Bus Host Interface on an FPGA Application Note For many years, designers have yearned for a general-purpose, high-performance serial communication protocol. The RS-232 and its derivatives

More information

14:332:331. Week 13 Basics of Cache

14:332:331. Week 13 Basics of Cache 14:332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson s UCB CS152 slides and Mary Jane Irwin s PSU CSE331 slides] 331 Lec20.1 Fall 2003 Head

More information

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform BL Standard IC s, PL Microcontrollers October 2007 Outline LPC3180 Description What makes this

More information

SoC Platforms and CPU Cores

SoC Platforms and CPU Cores SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation Blue Gene/P ASIC Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory

More information

Overview of SOC Architecture design

Overview of SOC Architecture design Computer Architectures Overview of SOC Architecture design Tien-Fu Chen National Chung Cheng Univ. SOC - 0 SOC design Issues SOC architecture Reconfigurable System-level Programmable processors Low-level

More information

Memory. Lecture 22 CS301

Memory. Lecture 22 CS301 Memory Lecture 22 CS301 Administrative Daily Review of today s lecture w Due tomorrow (11/13) at 8am HW #8 due today at 5pm Program #2 due Friday, 11/16 at 11:59pm Test #2 Wednesday Pipelined Machine Fetch

More information

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory

Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory Cache Memories Cache memories are small, fast SRAM-based memories managed automatically in hardware. Hold frequently accessed blocks of main memory CPU looks first for data in caches (e.g., L1, L2, and

More information

Introduction to cache memories

Introduction to cache memories Course on: Advanced Computer Architectures Introduction to cache memories Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Summary Summary Main goal Spatial and temporal

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality

Memory Hierarchy. Caching Chapter 7. Locality. Program Characteristics. What does that mean?!? Exploiting Spatial & Temporal Locality Caching Chapter 7 Basics (7.,7.2) Cache Writes (7.2 - p 483-485) configurations (7.2 p 487-49) Performance (7.3) Associative caches (7.3 p 496-54) Multilevel caches (7.3 p 55-5) Tech SRAM (logic) SRAM

More information

CpE 442. Memory System

CpE 442. Memory System CpE 442 Memory System CPE 442 memory.1 Outline of Today s Lecture Recap and Introduction (5 minutes) Memory System: the BIG Picture? (15 minutes) Memory Technology: SRAM and Register File (25 minutes)

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Ideally, computer memory would be large and fast

More information

2001 Altera Corporation (1)

2001 Altera Corporation (1) 2001 Altera Corporation (1) SOPC Design Using ARM-Based Excalibur Devices Outline! ARM-based Devices Overview! Embedded Stripe! Excalibur MegaWizard! Verification Tools Bus Functional Model Full Stripe

More information

Designing Embedded Processors in FPGAs

Designing Embedded Processors in FPGAs Designing Embedded Processors in FPGAs 2002 Agenda Industrial Control Systems Concept Implementation Summary & Conclusions Industrial Control Systems Typically Low Volume Many Variations Required High

More information

ATmega128. Introduction

ATmega128. Introduction ATmega128 Introduction AVR Microcontroller 8-bit microcontroller released in 1997 by Atmel which was founded in 1984. The AVR architecture was conceived by two students (Alf-Egil Bogen, Vergard-Wollen)

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

Reconfigurable Computing. Introduction

Reconfigurable Computing. Introduction Reconfigurable Computing Tony Givargis and Nikil Dutt Introduction! Reconfigurable computing, a new paradigm for system design Post fabrication software personalization for hardware computation Traditionally

More information

PCI to SH-3 AN Hitachi SH3 to PCI bus

PCI to SH-3 AN Hitachi SH3 to PCI bus PCI to SH-3 AN Hitachi SH3 to PCI bus Version 1.0 Application Note FEATURES GENERAL DESCRIPTION Complete Application Note for designing a PCI adapter or embedded system based on the Hitachi SH-3 including:

More information

Transistor: Digital Building Blocks

Transistor: Digital Building Blocks Final Exam Review Transistor: Digital Building Blocks Logically, each transistor acts as a switch Combined to implement logic functions (gates) AND, OR, NOT Combined to build higher-level structures Multiplexer,

More information

Crusoe Processor Model TM5800

Crusoe Processor Model TM5800 Model TM5800 Crusoe TM Processor Model TM5800 Features VLIW processor and x86 Code Morphing TM software provide x86-compatible mobile platform solution Processors fabricated in latest 0.13µ process technology

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

ASIC Design of Shared Vector Accelerators for Multicore Processors

ASIC Design of Shared Vector Accelerators for Multicore Processors 26 th International Symposium on Computer Architecture and High Performance Computing 2014 ASIC Design of Shared Vector Accelerators for Multicore Processors Spiridon F. Beldianu & Sotirios G. Ziavras

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in

More information

Platform-based Design

Platform-based Design Platform-based Design The New System Design Paradigm IEEE1394 Software Content CPU Core DSP Core Glue Logic Memory Hardware BlueTooth I/O Block-Based Design Memory Orthogonalization of concerns: the separation

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

Intellectual Property Macrocell for. SpaceWire Interface. Compliant with AMBA-APB Bus

Intellectual Property Macrocell for. SpaceWire Interface. Compliant with AMBA-APB Bus Intellectual Property Macrocell for SpaceWire Interface Compliant with AMBA-APB Bus L. Fanucci, A. Renieri, P. Terreni Tel. +39 050 2217 668, Fax. +39 050 2217522 Email: luca.fanucci@iet.unipi.it - 1 -

More information

CS Computer Architecture

CS Computer Architecture CS 35101 Computer Architecture Section 600 Dr. Angela Guercio Fall 2010 An Example Implementation In principle, we could describe the control store in binary, 36 bits per word. We will use a simple symbolic

More information

Memory Hierarchy and Caches

Memory Hierarchy and Caches Memory Hierarchy and Caches COE 301 / ICS 233 Computer Organization Dr. Muhamed Mudawar College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals Presentation Outline

More information

100M Gate Designs in FPGAs

100M Gate Designs in FPGAs 100M Gate Designs in FPGAs Fact or Fiction? NMI FPGA Network 11 th October 2016 Jonathan Meadowcroft, Cadence Design Systems Why in the world, would I do that? ASIC replacement? Probably not! Cost prohibitive

More information

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor

More information

Digital Blocks Semiconductor IP

Digital Blocks Semiconductor IP Digital Blocks Semiconductor IP TFT Controller General Description The Digital Blocks TFT Controller IP Core interfaces a microprocessor and frame buffer memory via the AMBA 2.0 to a TFT panel. In an FPGA,

More information

Rapidly Developing Embedded Systems Using Configurable Processors

Rapidly Developing Embedded Systems Using Configurable Processors Class 413 Rapidly Developing Embedded Systems Using Configurable Processors Steven Knapp (sknapp@triscend.com) (Booth 160) Triscend Corporation www.triscend.com Copyright 1998-99, Triscend Corporation.

More information

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY. Department of Computer science and engineering DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY Department of Computer science and engineering Year :II year CS6303 COMPUTER ARCHITECTURE Question Bank UNIT-1OVERVIEW AND INSTRUCTIONS PART-B

More information

08 - Address Generator Unit (AGU)

08 - Address Generator Unit (AGU) October 2, 2014 Todays lecture Memory subsystem Address Generator Unit (AGU) Schedule change A new lecture has been entered into the schedule (to compensate for the lost lecture last week) Memory subsystem

More information

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02 Fujitsu System Applications Support 1 Overview System Applications Support SOC Application Development Lab Multimedia VoIP Wireless Bluetooth Processors, DSP and Peripherals ARM Reference Platform 2 SOC

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

A HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing

A HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing A HT3 Platform for Rapid Prototyping and High Performance Reconfigurable Computing Second International Workshop on HyperTransport Research and Application (WHTRA 2011) University of Heidelberg Computer

More information

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India Outline of Presentation MPEG-2 to H.264 Transcoding Need for a multiprocessor

More information

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design

ECE 1160/2160 Embedded Systems Design. Midterm Review. Wei Gao. ECE 1160/2160 Embedded Systems Design ECE 1160/2160 Embedded Systems Design Midterm Review Wei Gao ECE 1160/2160 Embedded Systems Design 1 Midterm Exam When: next Monday (10/16) 4:30-5:45pm Where: Benedum G26 15% of your final grade What about:

More information

Assembly Language Programming

Assembly Language Programming Assembly Language Programming Ľudmila Jánošíková Department of Mathematical Methods and Operations Research Faculty of Management Science and Informatics University of Žilina tel.: 421 41 513 4200 Ludmila.Janosikova@fri.uniza.sk

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. 13 1 CMPE110 Computer Architecture, Winter 2009 Andrea Di Blas 110 Winter 2009 CMPE Cache Direct-mapped cache Reads and writes Cache associativity Cache and performance Textbook Edition: 7.1 to 7.3 Third

More information

Design and Implementation of High Performance DDR3 SDRAM controller

Design and Implementation of High Performance DDR3 SDRAM controller Design and Implementation of High Performance DDR3 SDRAM controller Mrs. Komala M 1 Suvarna D 2 Dr K. R. Nataraj 3 Research Scholar PG Student(M.Tech) HOD, Dept. of ECE Jain University, Bangalore SJBIT,Bangalore

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

Design of Embedded Hardware and Firmware

Design of Embedded Hardware and Firmware Design of Embedded Hardware and Firmware Introduction on "System On Programmable Chip" NIOS II Avalon Bus - DMA Andres Upegui Laboratoire de Systèmes Numériques hepia/hes-so Geneva, Switzerland Embedded

More information

ESA Contract 18533/04/NL/JD

ESA Contract 18533/04/NL/JD Date: 2006-05-15 Page: 1 EUROPEAN SPACE AGENCY CONTRACT REPORT The work described in this report was done under ESA contract. Responsibility for the contents resides in the author or organisation that

More information

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals

Cache Memory COE 403. Computer Architecture Prof. Muhamed Mudawar. Computer Engineering Department King Fahd University of Petroleum and Minerals Cache Memory COE 403 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline The Need for Cache Memory The Basics

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Computer Architecture ECE 568 Final Exam - Review Israel Koren ECE568 Final_Exam.1 1. A computer system contains an IOP which may

More information

Chapter 7. Hardware Implementation Tools

Chapter 7. Hardware Implementation Tools Hardware Implementation Tools 137 The testing and embedding speech processing algorithm on general purpose PC and dedicated DSP platform require specific hardware implementation tools. Real time digital

More information

SONICS, INC. Sonics SOC Integration Architecture. Drew Wingard. (Systems-ON-ICS)

SONICS, INC. Sonics SOC Integration Architecture. Drew Wingard. (Systems-ON-ICS) Sonics SOC Integration Architecture Drew Wingard 2440 West El Camino Real, Suite 620 Mountain View, California 94040 650-938-2500 Fax 650-938-2577 http://www.sonicsinc.com (Systems-ON-ICS) Overview 10

More information

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

CEC 450 Real-Time Systems

CEC 450 Real-Time Systems CEC 450 Real-Time Systems Lecture 6 Accounting for I/O Latency September 28, 2015 Sam Siewert A Service Release and Response C i WCET Input/Output Latency Interference Time Response Time = Time Actuation

More information

Mainstream Computer System Components

Mainstream Computer System Components Mainstream Computer System Components Double Date Rate (DDR) SDRAM One channel = 8 bytes = 64 bits wide Current DDR3 SDRAM Example: PC3-2800 (DDR3-600) 200 MHz (internal base chip clock) 8-way interleaved

More information