The Nios II Family of Configurable Soft-core Processors

Similar documents
Embedded Computing Platform. Architecture and Instruction Set

Nios II Performance Benchmarks

ECE332, Week 2, Lecture 3. September 5, 2007

ECE332, Week 2, Lecture 3

Designing Embedded Processors in FPGAs

SoC Platforms and CPU Cores

ARM Processors for Embedded Applications

ORCA FPGA- Optimized VectorBlox Computing Inc.

Qsys and IP Core Integration

Digital Integrated Circuits

Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim

Rapidly Developing Embedded Systems Using Configurable Processors

Nios Soft Core Embedded Processor

Chapter 5. Introduction ARM Cortex series

A 1-GHz Configurable Processor Core MeP-h1

Caches. Hiding Memory Access Times

ARM ARCHITECTURE. Contents at a glance:

ARM processor organization

Design of Embedded Hardware and Firmware

Chapter 4. The Processor

Embedded Systems. "System On Programmable Chip" NIOS II Avalon Bus. René Beuchat. Laboratoire d'architecture des Processeurs.

General Purpose Signal Processors

Copyright 2016 Xilinx

LEON4: Fourth Generation of the LEON Processor

Registers. Instruction Memory A L U. Data Memory C O N T R O L M U X A D D A D D. Sh L 2 M U X. Sign Ext M U X ALU CTL INSTRUCTION FETCH

Design and Implementation of a FPGA-based Pipelined Microcontroller

2. System Interconnect Fabric for Memory-Mapped Interfaces

ECE260: Fundamentals of Computer Engineering

Chapter Seven Morgan Kaufmann Publishers

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core.

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

CHAPTER 4 MARIE: An Introduction to a Simple Computer

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

COMPUTER ORGANIZATION AND ARCHITECTURE

Pipelining. CSC Friday, November 6, 2015

Designing with Nios II Processor for Hardware Engineers

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4

Buses. Disks PCI RDRAM RDRAM LAN. Some slides adapted from lecture by David Culler. Pentium 4 Processor. Memory Controller Hub.

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

Topics in computer architecture

MIPS R5000 Microprocessor. Technical Backgrounder. 32 kb I-cache and 32 kb D-cache, each 2-way set associative

Laboratory Exercise 5

Memory. Lecture 22 CS301

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Advanced Memory Organizations

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

Today s Agenda Background/Experience Course Information Altera DE2B Board do Overview Introduction to Embedded Systems Design Abstraction Microprocess

Appendix C: Pipelining: Basic and Intermediate Concepts

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

4. Hardware Platform: Real-Time Requirements

Multimedia Decoder Using the Nios II Processor

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

EE 8217 *Reconfigurable Computing Systems Engineering* Sample of Final Examination

Laboratory Exercise 3 Comparative Analysis of Hardware and Emulation Forms of Signed 32-Bit Multiplication

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

«Real Time Embedded systems» Multi Masters Systems

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Universität Dortmund. ARM Architecture

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Write only as much as necessary. Be brief!

Computer-System Organization (cont.)

Cycle Time for Non-pipelined & Pipelined processors

Chapter 4. The Processor

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

COSC 6385 Computer Architecture - Pipelining

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Design Space Exploration for Memory Subsystems of VLIW Architectures

SEMICON Solutions. Bus Structure. Created by: Duong Dang Date: 20 th Oct,2010

The Memory Hierarchy & Cache

PowerPC 740 and 750

Programmable Logic Design Grzegorz Budzyń Lecture. 15: Advanced hardware in FPGA structures

Portland State University ECE 587/687. Superscalar Issue Logic

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

EITF20: Computer Architecture Part2.2.1: Pipeline-1

The Processor: Instruction-Level Parallelism

The ARM10 Family of Advanced Microprocessor Cores

footprint Ways to reduce RISC-V soft processor 万瑞罡 (Ruigang Wan) Chengdu University GitHub account: rgwan

System-on Solution from Altera and Xilinx

Designing with ALTERA SoC Hardware

ECE 341 Final Exam Solution

COMPUTER ORGANIZATION AND DESIGN

Final Lecture. A few minutes to wrap up and add some perspective

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Today. Comments about assignment Max 1/T (skew = 0) Max clock skew? Comments about assignment 3 ASICs and Programmable logic Others courses

Contents of this presentation: Some words about the ARM company

Hardware Design I Chap. 10 Design of microprocessor

Embedded Systems: Hardware Components (part I) Todor Stefanov

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

08 - Address Generator Unit (AGU)

MICROTRONIX AVALON MULTI-PORT FRONT END IP CORE

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

Embedded Systems Ch 15 ARM Organization and Implementation

1 Hazards COMP2611 Fall 2015 Pipelined Processor

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

CPU Pipelining Issues

Nios II Processor Reference Handbook

Chapter 2 Lecture 1 Computer Systems Organization

Transcription:

The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation

Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture CPU Micro-architecture Nios II/f CPU Description Pipeline details Nios II Embedded Systems Taking advantage of FPGA configurability 2 2005 Altera Corporation

2005 Altera Corporation Nios II Introduction

Nios II Overview Nios II is Altera s soft-core configurable CPU Introduced summer/2004 New 32-bit RISC Instruction Set Architecture (ISA) Replaces original 16-bit Nios Over 4500 active licenses Most licensed embedded CPU in the world Designed for embedded FPGA-based systems Strong performance (up to 225 Dhrystone MIPS) Support for many operating systems Available in all current Altera FPGAs 4 2005 Altera Corporation

Why a New Instruction Set? Primary Issue Existing instruction sets optimized for ASIC Inefficient in FPGA Secondary Issue Existing instruction sets have licensing restrictions 5 2005 Altera Corporation

Nios II Size Largest 90nm FPGA 180,000 LUTs Smallest 90nm FPGA 4600 LUTs Nios II Nios II Nios II FPGA Nios II FPGA Nios II 13% of FPGA Nios II/e economy 35 in lowest cost FPGA 6 2005 Altera Corporation Nios II 1% of FPGA Nios II/f fast

Nios II is Classic RISC 32-Bit Instruction Set 32-Bit Data path 32 General-Purpose Registers 3 Instruction Formats 82 Instructions Instruction set is not configurable Provides code compatibility for all implementations Up to 256 Custom Instructions 3 Operand Instructions (2 source, 1 destination) Optional Multiply and Divide 7 2005 Altera Corporation

Nios II Processor Block Diagram reset clock JTAG Debug Nios II Processor Core Pipeline Controller General Purpose Registers I-Cache Tightly-Coupled Instruction Mem Tightly-Coupled Instruction Mem irq[31..0] Exception Controller Interrupt Controller Control Registers Instruction Master Data Master Custom Instructions Arithmetic Logic Unit D-cache Tightly-Coupled Data Mem Tightly-Coupled Data Mem 8 2005 Altera Corporation

Configurable Tightly Coupled Memories Map on-chip RAMs into CPU address space Behave like caches that never miss One access every cycle without stalling FPGA RAMs are already dual-ported One port for Nios II connection Second port available for other uses CPU Tightly-Coupled Data Mem DMA Avalon Interconnect Off-chip SDRAM 9 2005 Altera Corporation

Configurable CPU Implementation Choose your pipeline Nios II/f Nios II/s Nios II/e Fast Standard Economy Pipeline 6-stage 5-stage none Max Frequency 1 200 MHz 180 MHz 210 MHz Max D-MIPS 1 225 130 30 Size (4-input LUTs) 1800 1200 600 Branch Prediction Dynamic Static no I-Cache Up to 64K Up to 64K no D-Cache Up to 64K no no 1. Characteristics in Stratix II 90nm FPGA 10 2005 Altera Corporation

Configurable Pipeline Options Cache options Size Line size Multiply instruction options Fully pipelined using built-in FPGA multipliers Un-pipelined using normal LUT logic Trap (software emulated) Divide instruction options Un-pipelined using normal LUT logic Trap (software emulated) 11 2005 Altera Corporation

Configurable Custom Instructions Users write Verilog/VHDL for custom instructions Added to CPU with automatic configuration tool Callable from C-code or assembly language Pipeline independent 2 source operands and 1 destination operand Access CPU register file Access custom instruction register file Combinatorial custom instructions Execute in parallel with ALU Multi-cycle custom instructions Stall CPU pipeline until complete 12 2005 Altera Corporation

Configuring for Higher Performance Add Custom Instructions Example: 64 Kbyte CRC I/O I/O I/O Custom Instruction FPGA CPU DSP CPU Flash DRAM Million Clock Cycles 25 20 15 10 5 0 Software Only 27 Times Faster Custom Instruction 13 2005 Altera Corporation

Configuring for Higher Performance Add Custom Accelerator Example: 64 Kbyte CRC I/O I/O I/O Custom Accelerator FPGA CPU DSP CPU Flash DRAM Million Clock Cycles 25 20 15 10 5 0 Software Only 27 Times Faster Custom Instruction 530 Times Faster Custom Accelerator 14 2005 Altera Corporation

FPGA vs. ASIC CPU Design 2005 Altera Corporation

Efficient FPGA Design Guidelines RAMs, adders, registers, and multipliers Relatively fast and plentiful RAMs are already dual-ported Muxing and control logic Relatively slow and expensive Wire delays Relatively long Take advantage of FPGA configurability Minimize run-time control registers Rely on configuration-time options 16 2005 Altera Corporation

Existing ISAs are Inefficient in FPGAs Variable-length instructions or 16-bit instructions Higher code density not worth extra control logic Register windows Lower memory bandwidth not worth extra control logic Can create difficult real-time requirements Barrel shifts combined with other arithmetic operations Barrel shifts are relatively slow on FPGAs due to muxing Delay slots Decreased branch penalty not worth extra control logic Unnatural for some pipelines 17 2005 Altera Corporation

Existing ISAs are Inefficient in FPGAs Condition code register Complicates pipeline control and increases muxing Multiply/divide 64-bit operand registers All 64-bits rarely used in C language and increases muxing Many run-time control registers Extra logic not required in a configurable FPGA CPU Complex cache management State machines to initialize on reset not worth extra logic Many instruction options for flushing not worth extra logic Vectored interrupts Not required for most designs Use custom instruction to reduce interrupt latency 18 2005 Altera Corporation

Getting Back to RISC Roots CPU is an engine to run C code Benchmarking shows Nios II has comparable performance to established embedded CPUs To increase CPU performance in an FPGA Increase the Nios II cache size Add Nios II custom instructions Add custom accelerators Add multiple Nios II CPUs Add tightly-coupled memories 19 2005 Altera Corporation

Nios II/f CPU Description Fast 2005 Altera Corporation

Nios II/f Pipeline Fetch Decode Execute Memory Align Writeback I-cache RAM Read Register RAM Read ALU D-cache RAM Read D-cache RAM Write Register RAM Write Result Bypass Logic Memory Address Calc D-cache Tag Compare Avalon Access Dynamic Branch Prediction Instruction Decode Branch Resolution Divide Multiply/Barrel Shift 21 2005 Altera Corporation

Direct-mapped Caches Set-associative caches inefficient in FPGA I-cache 32-byte line Critical word first D-cache 4/16/32-byte line Writeback with write allocate One entry writeback buffer 22 2005 Altera Corporation

Dynamic Branch Prediction 2-bit branch prediction (g-share algorithm) Branch History Table RAM (256x2 bits) No Branch Target Buffer Simple ISA allows fast branch target calculation Performance Taken branch is 2 cycles Not taken branch is 1 cycle Mispredicted branch penalty is 4 cycles 23 2005 Altera Corporation

Arithmetic Instructions 32-bit Multiply 1 cycle throughput (fully pipelined) 32-bit Divide 4-67 cycle throughput (not pipelined) Barrel shift/rotate Uses multiplier with 2 n calculation Better performance and lower cost than using LUTs 24 2005 Altera Corporation

Nios II Embedded Systems 2005 Altera Corporation 15

Board-based Embedded System FPGA-based Embedded System I/O I/O I/O CPU I/O I/O I/O FPGA DSP CPU CPU Flash DRAM DSP CPU Move board components into FPGA 26 2005 Altera Corporation

Nios II Evaluation Board FPGA DRAM 2 Flash Preconfigured with a web server running under µclinux 27 2005 Altera Corporation

FPGA-based Systems It s all configurable Configurable CPUs Configurable Memories (on-chip and off-chip) Configurable Peripherals Configurable I/O Configurable System Interconnect Custom Accelerators and we provide the tools to make it easy 28 2005 Altera Corporation

System Configuration Tool 29 2005 Altera Corporation

CPU Configuration Tool 30 2005 Altera Corporation

Avalon System Interconnect Automatically generated for your system Switches connect components not a bus Slave side arbitration Enables concurrent accesses Avalon Functions Arbitration Multiplexing Address Decoding Wait-State Generation Dynamic Bus Sizing 31 2005 Altera Corporation

Traditional Bus Interconnect System Bottleneck Master Arbiter Masters CPU DMA CPU Processor System Bus Slaves Program Memory I/O 1 Data Memory Data Memory I/O 1 Program Memory 32 2005 Altera Corporation

Avalon Switch Interconnect Masters CPU DMA CPU Avalon Switch Switch Switch Arbiter Arbiter Slaves Program Memory I/O 1 Data Memory Data Memory I/O 1 Program Memory 33 2005 Altera Corporation

Conclusions Efficient FPGA design takes advantage of configurable CPUs and systems Nios II is optimized for FPGA-based systems Established CPUs based on ISAs optimized for ASICs are less efficient in FPGAs 34 2005 Altera Corporation

The End Questions? 2005 Altera Corporation

2005 Altera Corporation Backup Slides

Why a Soft-Core FPGA CPU? 2005 Altera Corporation 20

FPGA Soft-Core CPU Advantages Flexibility Utilize existing silicon resources Scalability Number of CPUs, CPU types, cache sizes, etc. Configurability Generation-time configuration instead of run-time Eliminates logic required to control CPU options Ubiquity Available in all FPGA families 38 2005 Altera Corporation

FPGA Soft-Core CPU Advantages Relatively small compared to FPGA capacities Largest Altera FPGA fits 300 Nios II/e CPUs May have spare capacity so CPU is free Lifecycle No obsolescence New releases of CPU improve your design Improved efficiency with latest silicon technologies 39 2005 Altera Corporation

Altera s Latest FPGA Devices Stratix II Technology 90 nm 4-input LUTs 180,000 18-bit Multipliers 384 On-chip RAM 1.2 Mbytes Cyclone II 90 nm 70,000 180 144 Kbytes 40 2005 Altera Corporation