LEON4: Fourth Generation of the LEON Processor

Similar documents
Next Generation Multi-Purpose Microprocessor

GR712RC A MULTI-PROCESSOR DEVICE WITH SPACEWIRE INTERFACES

ESA Contract 18533/04/NL/JD

Introduction to LEON3, GRLIB

AN OPEN-SOURCE VHDL IP LIBRARY WITH PLUG&PLAY CONFIGURATION

COMPARISON BETWEEN GR740, LEON4-N2X AND NGMP

Processor and Peripheral IP Cores for Microcontrollers in Embedded Space Applications

Copyright 2016 Xilinx

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

DEVELOPING RTEMS SMP FOR LEON3/LEON4 MULTI-PROCESSOR DEVICES. Flight Software Workshop /12/2013

The Nios II Family of Configurable Soft-core Processors

FPQ6 - MPC8313E implementation

Advanced Computing, Memory and Networking Solutions for Space

CCSDS Unsegmented Code Transfer Protocol (CUCTP)

V8uC: Sparc V8 micro-controller derived from LEON2-FT

Lab-2: Profiling m4v_dec on GR-XC3S National Chiao Tung University Chun-Jen Tsai 3/28/2011

SoC Platforms and CPU Cores

SpaceWire Remote Terminal Controller

CCSDS Time Distribution over SpaceWire

Technical Note on NGMP Verification. Next Generation Multipurpose Microprocessor. Contract: 22279/09/NL/JK

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS

Zynq-7000 All Programmable SoC Product Overview

ARM Processors for Embedded Applications

LEON3-Fault Tolerant Design Against Radiation Effects ASIC

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

Chapter 5. Introduction ARM Cortex series

Fujitsu SOC Fujitsu Microelectronics America, Inc.

Real Time Trace Solution for LEON/GRLIB System-on-Chip. Master of Science Thesis in Embedded Electronics System Design ALEXANDER KARLSSON

Migrating from the UT699 to the UT699E

Multi-DSP/Micro-Processor Architecture (MDPA)

Product Series SoC Solutions Product Series 2016

Designing Embedded Processors in FPGAs

Design Space Exploration for Memory Subsystems of VLIW Architectures

S2C K7 Prodigy Logic Module Series

Hello, and welcome to this presentation of the STM32L4 System Configuration Controller.

Meltdown, Spectre, and Security Boundaries in LEON/GRLIB. Technical note Doc. No GRLIB-TN-0014 Issue 1.1

CoreTile Express for Cortex-A5

Interconnects, Memory, GPIO

Developing a LEON3 template design for the Altera Cyclone-II DE2 board Master of Science Thesis in Integrated Electronic System Design

Contents ASIC Logic Overview Background Information Rapid Prototyping with Logic Module Basic Platforms: AHB and

Gaisler Research LEON SPARC Processor The past, present and future Jiri Gaisler Gaisler Research

Rad-Hard Microcontroller For Space Applications

MYC-C7Z010/20 CPU Module

FPQ9 - MPC8360E implementation

New System Solutions for Laser Printer Applications by Oreste Emanuele Zagano STMicroelectronics

DRPM architecture overview

The ARM10 Family of Advanced Microprocessor Cores

Core Peripherals. Speaker: Tzu-Wei Tseng. Adopted from National Chiao-Tung University IP Core Design. SOC Consortium Course Material

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

EMC2. Prototyping and Benchmarking of PikeOS-based and XTRATUM-based systems on LEON4x4

Industrial use of open-source IP cores

Designing with ALTERA SoC Hardware

Organisasi Sistem Komputer

Agilent N2533A RMP 4.0 Remote Management Processor Data Sheet

Mainstream Computer System Components

Excalibur Device Overview

Next Generation Microprocessor Functional Prototype SpaceWire Router Validation Results

Mainstream Computer System Components CPU Core 2 GHz GHz 4-way Superscaler (RISC or RISC-core (x86): Dynamic scheduling, Hardware speculation

Bus AMBA. Advanced Microcontroller Bus Architecture (AMBA)

L2: FPGA HARDWARE : ADVANCED DIGITAL DESIGN PROJECT FALL 2015 BRANDON LUCIA

AT-501 Cortex-A5 System On Module Product Brief

MYD-C7Z010/20 Development Board

Memory latency: Affects cache miss penalty. Measured by:

Memory latency: Affects cache miss penalty. Measured by:

The Design Complexity of Program Undo Support in a General-Purpose Processor

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

Computer System Components

Input/Output Problems. External Devices. Input/Output Module. I/O Steps. I/O Module Function Computer Architecture

Systems in Silicon. Converting Élan SC400/410 Design to Élan SC520

FCQ2 - P2020 QorIQ implementation

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

ARM ARCHITECTURE. Contents at a glance:

Microcontrollers Applications within Thales Alenia Space products

Zynq Architecture, PS (ARM) and PL

Spartan-6 and Virtex-6 FPGA Embedded Kit FAQ


Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

1. Microprocessor Architectures. 1.1 Intel 1.2 Motorola

5. On-chip Bus

Mainstream Computer System Components

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

The CoreConnect Bus Architecture

Enabling success from the center of technology. A Practical Guide to Configuring the Spartan-3A Family

The PowerPC RISC Family Microprocessor

NIOS CPU Based Embedded Computer System on Programmable Chip

ARM-Based Embedded Processor Device Overview

CSP PROJECT VIRTUAL FPGA. Working with Microblaze on Alpha Data board

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 7: Memory Organization Part II

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core.

NXP Unveils Its First ARM Cortex -M4 Based Controller Family

ELE 375 / COS 471 Final Exam Fall, 2001 Prof. Martonosi

Product Technical Brief S3C2440X Series Rev 2.0, Oct. 2003

Full Linux on FPGA. Sven Gregori

Machine Vision Camera Interfaces. Korean Vision Show April 2012

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Avnet, Xilinx ATCA PICMG Design Kit Hardware Manual

Memory Hierarchies 2009 DAT105

Lecture 5: Computing Platforms. Asbjørn Djupdal ARM Norway, IDI NTNU 2013 TDT

Transcription:

LEON4: Fourth Generation of the LEON Processor Magnus Själander, Sandi Habinc, and Jiri Gaisler Aeroflex Gaisler, Kungsgatan 12, SE-411 19 Göteborg, Sweden Tel +46 31 775 8650, Email: {magnus, sandi, jiri}@gaisler.com Extended Abstract The LEON4 SPARC V8 processor is designed to meet the increasing performance demands and low power requirements of current and future embedded and aerospace applications. The LEON4 processor is an integral part of the GRLIB IP core library, which is a rich set of reusable IP cores, designed for System on Chip (SoC) development. The IP cores are centered around the common Advanced Microcontroller Bus Architecture (AMBA) on-chip bus, and are vendor independent, with support for different CAD tools and target technologies. A unique plug&play method is used to configure and connect the IP cores, which provides for rapid development of advanced SoC designs. The advances made by the LEON4 processor, which are presented in this paper, makes the GRLIB IP core library suitable for designing highperformance and low-power SoC designs of the future. LEON4 SPARC V8 Processor The LEON4 processor is the fourth generation of processors in the successful LEON processor family, which is targeting embedded and aerospace systems. LEON4 has all the features of a LEON3 processor, including a SPARC V8 compliant architecture with a seven-stage pipeline for fast and energyefficient execution, configurable Level 1 (L1) instruction and data caches, configurable number of register windows, and IEEE-754/IEEE-1754 compatible floating-point units. This makes LEON4 software compatible with all the code written for the LEON3 processor. For improved performance the LEON4 processor includes a branch prediction unit, a wide (64 or 128-bit) processor bus, support for configurable MMU page size, and a Level 2 (L2) cache. These microarchitectural improvements make a LEON4 processor on average 35% faster than a LEON3 processor operating at the same clock frequency.

Branch Prediction LEON4 includes a branch prediction unit that implements a branch taken policy. This allows the pipeline to continue executing instructions when a branch is encountered in the instruction stream. In previous generations of the LEON processor the pipeline is stalled for one or two clock cycles if the branch condition depends on the result of any of the two instructions executed before the branch. With the branch prediction unit it is not necessary to wait for the branch condition to be evaluated before fetching new instructions. There is no penalty for miss-speculations. The speculatively fetched instructions are simply flushed from the pipeline and new instructions are fetched from the correct branch target. MMU Page Size The LEON4 Memory Management Unit (MMU) can be configured to have a 4, 8, 16, or 32 kbyte page size. A larger page size makes it possible to increase the size of the L1 cache. This is because the L1 cache way size cannot be larger than the page size of the MMU without causing memory aliasing. The support for a configurable MMU page size makes it possible to increase the L1 data and instruction caches to a maximum of 4x32 kbyte for a 4 way cache configuration. An added advantage of a larger page size is that it increases the Translation Lookaside Buffer (TLB) hit rate. This is due to the fact that since each page is larger there will be fewer pages in total and the TLB entries will represent a larger portion of the memory. One disadvantage with a large page size is that the smallest allocatable memory size is a complete page, e.g. if many small files are to be opened each of these will require a whole page. Configurable Processor Bus The AMBA Advanced High-Speed Bus () of the LEON4 processor is configurable to either 64 or 128 data bits. The wide bus allows LEON4 to fetch a complete L1 cache line in a single data transfer (in the case of a 128-bit wide bus and a 128-bit cache line). This reduces the cache-miss penalty and the bus activity caused by the processor in a SoC design. Level 2 Cache An L2 cache is being developed in conjunction with the LEON4 processor. The L2 cache is highly configurable and supports different associativity (1/2/4), way size (1-256 kbyte), and bus width (64 or 128 data bits). An L2 cache reduces the L1 cache-miss penalty for the LEON4 processor and it reduces the load on off-chip memory. The added advantage of an L2 cache is therefore not only an improvement in performance but also a reduction in energy, since power-dissipating off-chip signaling is reduced.

FPGA Prototype System A SoC prototype, based on the LEON4 processor and GRLIB, has been developed and functionally verified on Xilinx ML505 and ML507 FPGA boards. The prototype design consists of the LEON4 processor and a set of IP cores connected through AMBA buses. The complete system can be seen in Figure 1 1. RS232 JTAG PHY System ACE DDR2 Virtex-5 FPGA LEON4 Processor DSU4 Serial Dbg link JTAG Dbg Ethernet System ACE I/F DDR2 Primary AMBA bus / Secondary AMBA bus PS2 PS2 UART Timers IRQ Memory SVGA SSRAM FLASH DVI PS/2 RS232 Figure 1: Block diagram of a LEON4 SoC for FPGA prototyping The prototype design has two AMBA buses, with the LEON4 processor, high-speed interfaces, and debug interfaces being connected to the primary bus and a SVGA controller, a memory controller for Flash PROM and SSRAM, and a local -to-apb bridge connected to the secondary bus. The SSRAM is used as frame buffer for the SVGA controller and the use of a dedicated bus reduces the bandwidth requirements of the main bus. The local -to-apb bridge on the secondary bus is used to access control registers of the SVGA and memory controller. The two buses are connected to each other through an -to- bridge. A number of low-speed interfaces and IP cores are connected to an APB bus that is connected to the main bus through an unidirectional to-apb bridge. 1 The block diagram is not a detailed description of the system. Some cores have more than one instance and several cores are connected both to an and an APB bus.

The primary bus is 128 bits wide but it is only the LEON4 processor and DDR2 memory controller that are connected to all 128 data bits. All other cores on the main bus are connected to the 32 least significant data bits of the bus. The on-chip peripheral devices include an Ethernet 10/100 Mbit MAC, JTAG debug interface, one UART, interrupt controller, timers, System ACE interface, PS/2 interfaces, and an I/O port. The System ACE interface provides a memory mapped interface to the development board's System ACE CompactFlash solution. This allows CompactFlash cards to be used as hard disk storage. The L2 cache is currently under development and is therefore not part of the current prototype design. The L2 cache development is expected to be completed in summer 2009 and the prototype design will then be updated. We have Linux running on the prototype design in our labs and all SPEC2000 benchmarks have successfully been executed. The preliminary results show 19.1 SPECint and 12.5 SPECfp at 70 MHz, giving a performance of 0.27 SPECint/MHz and 0.2 SPECfp/MHz. The LEON4 processor achieves a lower Clocks Per Instruction (CPI) count during heavy loads than a LEON3 processor operating at the same clock frequency. For the SPEC CPU2000 benchmarks the LEON4 processor is 30%-40% faster than the LEON3. System on Chip ASIC A customer evaluation SoC ASIC with dual LEON4 cores and numerous high-speed and low-speed interfaces is currently under development. The evaluation SoC has an AMBA bus to which the two LEON4 cores, highspeed interfaces, Debug Support Unit (DSU4), and system debug links are connected. A variety of low speed interfaces are connected to two separate APB buses that are connected to the bus through two to-apb bridges. In this design the SVGA controller is connected directly to the main bus, since the operating frequency of the bus is high enough to provide the required bandwidth. This simplifies the design compared to the prototype design described in the previous section. The complete system can be seen in Figure 2 2. 2 The block diagram is not a detailed description of the system. Several cores are connected both to the bus and an APB bus.

LEON4/GRLIB ASIC LEON4 Processor DSU4 Serial Dbg link JTAG Dbg Ethernet SVGA DDR2 Memory controller USB Host USB Device USB Dbg PCI CAN 0 UART IRQ Timers PS/2 I2C slave I2C master 1 SPI I/O Port Status register PCI Arbiter PCI Trace buffer Figure 2: Block diagram of the LEON4 based evaluation SoC ASIC The evaluation SoC ASIC implements all the main interfaces as expected from a desktop computer, including PCI, dual Ethernet 10/100 Mbit MACs, USB (both host and device), DVI, DDR2, and PS2. This makes the evaluation SoC readily available for development of Linux and PDA like applications. In addition, the evaluation SoC also has a general SPI controller, an SPI memory controller for Flash PROM, three I 2 C masters and one slave, two UARTs, and a CAN controller. The SoC is therefore a suitable platform for development of a wide range of embedded applications. The evaluation SoC will be manufactured in easic's 90 nm Nextreme technology and is expected to operate at a core frequency of 200 MHz with a 32-bit wide DDR2 400 MHz memory interface. The DDR2 memory interface is chosen to be 32 bit wide due to pin limitations. Operating the DDR2 interface at twice the frequency of the bus makes it possible to read/write two 32-bit words from/to memory for each bus cycle. It then becomes natural to have a 64-bit wide bus. The evaluation SoC ASIC and an accompanying development board is expected to be ready for delivery during summer 2009. Outlook A fault-tolerant version (LEON4-FT) of the LEON4 processor is being developed and is expected to be available in late 2009 or early 2010. The LEON4-FT will have all the fault-tolerant features of the LEON3-FT processor while incorporating the new features of the LEON4 processor described in this paper.