Current and Next Generation LEON SoC Architectures for Space

Similar documents
Technical Note on NGMP Verification. Next Generation Multipurpose Microprocessor. Contract: 22279/09/NL/JK

Next Generation Multipurpose Microprocessor. Activity Overview

Next Generation Multi-Purpose Microprocessor

DEVELOPING RTEMS SMP FOR LEON3/LEON4 MULTI-PROCESSOR DEVICES. Flight Software Workshop /12/2013

Processor and Peripheral IP Cores for Microcontrollers in Embedded Space Applications

COMPARISON BETWEEN GR740, LEON4-N2X AND NGMP

ESA Contract 18533/04/NL/JD

SCOC3 (Spacecraft Controller On Chip) ESTEC, Noordwijk, 7 th and 8 th March 2007

LEON4: Fourth Generation of the LEON Processor

GR712RC A MULTI-PROCESSOR DEVICE WITH SPACEWIRE INTERFACES

Building Blocks For System on a Chip Spacecraft Controller on a Chip

Introduction to LEON3, GRLIB

EMC2. Prototyping and Benchmarking of PikeOS-based and XTRATUM-based systems on LEON4x4

GR740 Technical Note on Benchmarking and Validation

Rad-Hard Microcontroller For Space Applications

Design of Next Generation CPU Card for State of the Art Satellite Control Application

Evaluation of Soft-Core Processors on a Xilinx Virtex-5 Field Programmable Gate Array

Multi-DSP/Micro-Processor Architecture (MDPA) Paul Rastetter Astrium GmbH

Next Generation Microprocessor Functional Prototype SpaceWire Router Validation Results

V8uC: Sparc V8 micro-controller derived from LEON2-FT

Executive Summary Next Generation Microprocessor (NGMP) Engineering Models Product code: GR740

Migrating from the UT699 to the UT699E

CCSDS Time Distribution over SpaceWire

Advanced Computing, Memory and Networking Solutions for Space

CCSDS Unsegmented Code Transfer Protocol (CUCTP)

Analyze system performance using IWB. Interconnect Workbench Dave Huang

Multi-DSP/Micro-Processor Architecture (MDPA)

Booting a LEON system over SpaceWire RMAP. Application note Doc. No GRLIB-AN-0002 Issue 2.1

Atmel AT697 validation report

GR740 Technical Note on Benchmarking and Validation

SoC Platforms and CPU Cores

GR740 Processor development

ADVANCED ELECTRONIC SOLUTIONS AVIATION SERVICES COMMUNICATIONS AND CONNECTIVITY MISSION SYSTEMS

LEON3-Fault Tolerant Design Against Radiation Effects ASIC

Introduction Electrical Considerations Data Transfer Synchronization Bus Arbitration VME Bus Local Buses PCI Bus PCI Bus Variants Serial Buses

Zynq-7000 All Programmable SoC Product Overview

Fujitsu SOC Fujitsu Microelectronics America, Inc.

FPQ6 - MPC8313E implementation

Chapter 6 Storage and Other I/O Topics

GR716 Single-Core LEON3FT Microcontroller. Cobham Gaisler AMICSA 2018

ESL-Based Full System Simulation Platform

SpaceWire Remote Terminal Controller

The Nios II Family of Configurable Soft-core Processors

ARM Processors for Embedded Applications

Hello, and welcome to this presentation of the STM32L4 System Configuration Controller.

AN OPEN-SOURCE VHDL IP LIBRARY WITH PLUG&PLAY CONFIGURATION

Hello and welcome to this Renesas Interactive module that provides an architectural overview of the RX Core.

Copyright 2016 Xilinx

FTMCTRL: BCH EDAC with multiple 8-bit wide PROM and SRAM banks. Application note Doc. No GRLIB-AN-0003 Issue 1.0

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

KeyStone C665x Multicore SoC

ESA IPs & SoCs developments

The CoreConnect Bus Architecture

System-On-Chip Design with the Leon CPU The SOCKS Hardware/Software Environment

DRPM architecture overview

A Portable and Fault-Tolerant Microprocessor Based on the SPARC V8 Architecture

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

DQ8051. Revolutionary Quad-Pipelined Ultra High performance 8051 Microcontroller Core

Interconnects, Memory, GPIO

CONTACT: ,

FPQ9 - MPC8360E implementation

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Fujitsu System Applications Support. Fujitsu Microelectronics America, Inc. 02/02

S2C K7 Prodigy Logic Module Series

The ARM10 Family of Advanced Microprocessor Cores

Jiri Gaisler. Gaisler Research.

SCS750. Super Computer for Space. Overview of Specifications

Designing Embedded Processors in FPGAs

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

GAISLER. IEEE-STD-754 Floating Point Unit GRFPU Lite / GRFPU-FT Lite CompanionCore Data Sheet

Page 1 SPACEWIRE SEMINAR 4/5 NOVEMBER 2003 JF COLDEFY / C HONVAULT

SoC Overview. Multicore Applications Team

Characterizing the Performance of SpaceWire on a LEON3FT. Ken Koontz, Andrew Harris, David Edell

HIGH PERFORMANCE PPC BASED DPU WITH SPACEWIRE RMAP PORT

FCQ2 - P2020 QorIQ implementation

CHAPTER 4 MARIE: An Introduction to a Simple Computer

William Stallings Computer Organization and Architecture 10 th Edition Pearson Education, Inc., Hoboken, NJ. All rights reserved.

A ONE CHIP HARDENED SOLUTION FOR HIGH SPEED SPACEWIRE SYSTEM IMPLEMENTATIONS

Development an update. Aeroflex Gaisler

(Advanced) Computer Organization & Architechture. Prof. Dr. Hasan Hüseyin BALIK (3 rd Week)

POWER7: IBM's Next Generation Server Processor

4. Hardware Platform: Real-Time Requirements

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Chapter 3. Top Level View of Computer Function and Interconnection. Yonsei University

A 1-GHz Configurable Processor Core MeP-h1

Final Presentation. Network on Chip (NoC) for Many-Core System on Chip in Space Applications. December 13, 2017

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series. Ian Johnson Senior Product Manager, ARM

Course II Parallel Computer Architecture. Week 2-3 by Dr. Putu Harry Gunawan

A Flexible SystemC Simulator for Multiprocessor Systemson-Chip

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

AT697E LEON2-FT Final Presentation

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

Place Your Logo Here. K. Charles Janac

CoreTile Express for Cortex-A5

The S6000 Family of Processors

Design Space Exploration for Memory Subsystems of VLIW Architectures

Memory System Design. Outline

Transcription:

Current and Next Generation LEON oc Architectures for pace Flight oftware Workshop 2012 November 7 th, 2012 www.aeroflex.com/gaisler Presentation does not contain U Export controlled information (aka ITAR)

Outline History of LEON Project LEON3FT Overview LEON4 Overview LEON3FT systems: UT699, AG RTAX, GR712RC LEON4 Next Generation icroprocessor ummary Presentation does not contain U Export controlled information (aka ITAR) 2

LEON Processor History Project started by European pace Agency in 1997 with the objectives: To Provide an open, portable and non-proprietary processor design Based on PARC architecture To be able to manufacture in EU sensitive semiconductor process and to maintain correct operation in presence of EUs LEON1 VHDL design (EA) LEONExpress test chip (2001, 0.35 um) LEON2(FT) VHDL design (EA / Gaisler Research) AT697 (2004) LEON3(FT) VHDL (Gaisler Research) UT699 (Aeroflex Co) LEON4(FT) (Aeroflex Gaisler) Next Generation icroprocessor Presentation does not contain U Export controlled information (aka ITAR) 3

LEON3FT IEEE-1754 PARC V8 compliant 32-bit processor 7-stage pipeline, multi-processor support eparate multi-set caches On-chip debug support unit with trace buffer Highly configurable Cache size 1 256 KiB, 1 4 ways, LRU/LRR/RND Hardware ultiply/divide/ac options PARC Reference emory anagement Unit (RU) Floating point unit (high-performance or small size) Fault-tolerant version available Register file protection Cache protection Certified PARC V8 by PARC international uitable for space and military applications Baseline processor for space projects in U, Europe, Asia Presentation does not contain U Export controlled information (aka ITAR) 4

LEON4 IEEE-1754 PARC V8 compliant 32-bit processor ame features and architecture as LEON3 (7-stage pipe, P support etc.) Offers higher performance compared to LEON3 64-bit single-clock load/store operation 64- or 128-bit AHB bus interface Branch prediction PARC V9 Compare-and-wap instruction Performance counters 1.7 Dhrystone IP/Hz 0.6 Whetstone FLOP/Hz 2.1 Coreark/Hz (comparable to AR11) Typically used with L2 cache to reduce effects of memory acc. Latency Currently in use for commercial projects and also for EA's Next Generation icroprocessor Presentation does not contain U Export controlled information (aka ITAR) 5

Traditional LEON oc Architecture Aeroflex Gaisler LEON3FT-RTAX, Aeroflex Co UT699, etc. ingle processor core ABA AHB 2.0 with one or several AHB/APB bridges DA capable peripheral units Presentation does not contain U Export controlled information (aka ITAR) 6

Current ulti-core LEON: GR712RC GR712RC: Dual core LEON3FT Traditional topology Two processor core connected to same bus Presentation does not contain U Export controlled information (aka ITAR) 7

GR712RC Features Two LEON3FT PARC V8 processors RU, IEEE-754 FPU, 16 KiB I-cache, 16 KiB D-cache On-chip 192 KiB RA with EDAC Off-chip memory types: DRA, DRA, Flash PRO / EEPRO I/O matrix connecting subset of: Four 200 bit/s pacewire ports IL-TD-1553B (BC/RT/B) Ethernet AC, 10/100 bit/s Two CAN 2.0B bus controllers and one atcan controller ix UART port, PI master, I 2 C master, AC16 and LINK serial ports CCD/EC 5-channel TC decoder, 10 bit/s input rate CCD Telemetry encoder, 50 bit/s output rate 26 input and 38 input/output GPIO ports Presentation does not contain U Export controlled information (aka ITAR) 8

Next Generation LEON oc: NGP Quad core connected to shared L2 cache FPU FPU IRQ(A)P emory crubber LEON4 TAT. UNIT IRQCTRL FPU Caches Caches U U IRQCTRL FPU Caches Caches U U AHB/AHB Bridge DU AHB/APB Bridge pw RAP DCL DDR2 AND DRA CTRL PRO & IO CTRL L2 Cache AHB/AHB Bridge AHB Bridge IOU AHB tatus AHBTRACE JTAG UB DCL AHB tatus Timers 1-4 AHB/APB Bridge PCI aster PCI DA PCI Target pw router HL HL HL HL IL-TD- 1553B Ethernet Ethernet CLKGATE Timers 0 watchdog GPIO UART UART PCI Arbiter PI controller Presentation does not contain U Export controlled information (aka ITAR) 9

NGP Project Overview NGP is an EA activity developing a multi-processor system with higher performance than earlier generations of European pace processors Part of the EA roadmap for standard microprocessor components Aeroflex Gaisler's assignment consists of specification, the architectural (VHDL) design, and verification by simulation and on FPGA. The goal of this work is to produce a verified gate-level netlist for a suitable technology. As an additional step in the development of the NGP, a functional prototype AIC NGFP has been developed, also under EA contract. Presentation does not contain U Export controlled information (aka ITAR) 10

NGP Architecture Overview (1/2) GRFPC GRFPU GRFPC GRFPC GRFPU GRFPC DA asters L1 Cache L1 Cache L1 Cache L1 Cache IOU CPU bus 128-bit Level 2 Cache to low-speed slaves... crubber emory bus 128-bit EDAC emory controller mem_ifsel DDR2 DRA to either DDR2 or PC133 DRA Presentation does not contain U Export controlled information (aka ITAR) 11

NGP Architecture Overview (2/2) - Quad-core Leon4 - GRFPU, pairwise shared GRFPC GRFPU GRFPC GRFPC GRFPU GRFPC DA asters L1 Cache L1 Cache L1 Cache L1 Cache IOU CPU bus 128-bit - L1 and L2 Caches Level 2 Cache to low-speed slaves... emory bus 128-bit - emory controller EDAC emory controller crubber mem_ifsel DDR2 DRA to either DDR2 or PC133 DRA Presentation does not contain U Export controlled information (aka ITAR) 12

NGP Overview - Features 4x with 128-bit AHB bus interface High-performance GRFPU, one FPU shared between two CPUs 16 KiB I-cache, 16 KiB D-cache, write-through Level-2 cache, bridge in the bus topology Configurable, copy-back operation, can be used as OC RA External memory: DDR2 or DRA, selected with bootstrap signal Powerful interleaved 16/32+8-bit ECC giving 32 or 16 checkbits (W selected, can be switched on the fly) emory error handling (memory controller, scrubber, CPU together) Hardware memory scrubber for initialization, background scrub, error reporting and statistics Rapid regeneration of contents after EFI Graceful degradation of failed byte lane, regaining EU tolerance Example code for RTE available Presentation does not contain U Export controlled information (aka ITAR) 13

NGP Overview I/O Interfaces Large number of I/O interfaces: 8-port pacewire router 32-bit 33/66 Hz PCI aster/target with DA 10/100/1000 bit Ethernet IL-TD-1553B 2xUART, PI master/slave, 16 GPIO Debug interfaces: Ethernet UB pacewire (RAP) JTAG Presentation does not contain U Export controlled information (aka ITAR) 14

NGP Overview Improvements Resource partitioning The architecture has been designed to support both P, AP and mixtures (examples: 3 CPU:s running Linux P and one running RTE, 4 separate RTE instances, 2x Linux/1x RTE/1x VxWorks, etc.) The L2 cache can be set to 1 way/cpu mode Each CPU can get one dedicated interrupt controller and timer unit, or share with other CPU Peripheral register interfaces are located at separate 4K pages to allow restricting (via U) user-level software from accessing the wrong IP in case of software malfunction. IOU Improved debugging Dedicated debug bus allows for non-intrusive debugging Performance counters, AHB and instruction trace buffers with filtering, interrupt time stamping Presentation does not contain U Export controlled information (aka ITAR) 15

NGP - PRO-less / pw applications Extended support for PRO-less boot PRO-less booting possible via pacewire Connect via RAP Configure main memory controller Use HW memory scrubber to initialize memory Enable L2 cache (optional) Upload software Assign processor start address(es) tart processor(s) pacewire router, with eight external ports, is fully functional without processor intervention Device can also act as a software/processor-free bridge between pw and PCI/PI/1553 etc. IOU can be used to restrict RAP access Presentation does not contain U Export controlled information (aka ITAR) 16

NGP Overview Block Diagram 96-bit PC100 DRA 96-bit DDR2-800 DRA PRO IO 8/16-bit DDR2 AND DRA CTRL PRO & IO CTRL CLKGATE 32-bit APB @ 400 Hz emory bus 128-bit AHB @ 400 Hz AHB tatus IRQ(A)P X emory crubber L2 Cache 32-bit AHB @ 400 Hz GPIO lave IO bus AHB/APB Bridge Timers 0 watchdog Interrupt bus Timers 1-4 tatistics LEON4 TAT. UNIT UART UART AHB/AHB Bridge PCI aster X FPU IRQCTRL FPU Caches Caches U U X PCI DA PCI Arbiter Processor bus 128-bit AHB @ 400 Hz PCI Target AHB Bridge IOU pw router AHB/AHB Bridge AHB tatus HL HL HL HL X IRQCTRL FPU Caches Caches U U PI controller FPU X X X 32-bit APB @ 400 Hz DU aster IO bus 32-bit AHB @ 400 Hz Debug bus 32-bit AHB @ 400 Hz AHBTRACE X IL-TD- 1553B Ethernet Ethernet = aster interface(s) = lave interface(s) X = noop interface AHB/APB Bridge JTAG pw RAP DCL UB DCL Presentation does not contain U Export controlled information (aka ITAR) 17

NGP Overview FT and target tech Fault-tolerance External DDR2/PC100 DRA: Reed-olomon. PRO: BCH L2 Cache General 4-bit parity on L1 cache Protected register files (both CPU and FPU) BCH protected memories, Built-in crubber Block RA contents in IP cores protected by ECC Rad-hard flip-flops and logic by process, library or TR on netlist Baseline target technology: T pace D (65 nm) Alternate target technologies also currently under evaluation Presentation does not contain U Export controlled information (aka ITAR) 18

NGFP Evaluation Board Evaluation board providing interfaces of the NGFP device 6U CPCI form factor Demo in Aeroflex booth here at FW! Presentation does not contain U Export controlled information (aka ITAR) 19

NGP ummary Quad core architecture with shared L2 cache. Baseline memory interfaces: DDR2-800 DRA, PC100 DRA Non-intrusive debugging, performance counters, trace buffer filters I/O interfaces; PCI 2.3, pacewire, 1553, Gigabit Ethernet, PI, UART, Debug interfaces: Ethernet, JTAG, UB, pacewire Part of EA roadmap for standard microprocessor components Functional prototype device on evaluation board on display in Aeroflex booth here at FW NGP website: http://microelectronics.esa.int/ngmp/ Presentation does not contain U Export controlled information (aka ITAR) 20

LEON oc Architecture ummary Current systems use single ABA AHB bus with one or several LEON3FT or LEON4 processor cores. Trend is toward multicore with more or less complicated bus topologies. New developments still started with traditional single core, single bus. NGP architecture introduces several new features for European space processors in terms of separation, FT Level-2 cache, error recovery and improved debugging support. Next incremental improvement: itigating effects of shared memory Presentation does not contain U Export controlled information (aka ITAR) 21

Thank you for listening! Questions? Presentation does not contain U Export controlled information (aka ITAR) 22

EXTRA LIDE Presentation does not contain U Export controlled information (aka ITAR) 23

NGP Architecture - Level-2 Cache 256 KiB baseline, 4-way, 256-bit internal cache line Replacement policy configurable between LRU, Pseudo-Random, master based. BCH ECC and internal scrubber Copy-back and write-through operation 0-waitstate pipelined write, 5-waitstates read hit (FT enabled) upport for locking one more more ways upport for separating cache so a processor cannot replace lines allocated by another processor Fence registers for backup software protection Essential for P performance scaling, reduces effects of slow, or high latency, memory. FPU 96-bit PC100 DRA 96-bit DDR2-800 DRA DDR2 AND DRA CTRL emory bus 128-bit AHB @ 400 Hz emory crubber L2 Cache IRQCTRLFPU Caches Caches U U X X 24

NGP Architecture - emory scrubber Can access external DDR2/DRA and on-chip DRA Performs the following operations: Initialization crubbing emory re-generation Configurable by software Counts correctable errors with option to alert CPU upports, together with DDR2 DRA and DRA memory controller, on-line code switch in case of permanent device failure 96-bit PC100 DRA 96-bit DDR2-800 DRA DDR2 AND DRA CTRL emory bus 128-bit AHB @ 400 Hz emory crubber L2 Cache 25

NGP Architecture - IOU Uni-directional AHB bridge with protection functionality Connects all DA capable I/O master through one interface onto the Processor bus or emory bus (configurable per master) Performs pre-fetching and read/write combining (connects 32-bit masters to 128-bit buses) Provides address translation and access restriction via page tables Provides access restriction via bit vector aster can be placed in groups where each group can have its own set of protection data structures Processor bus 128-bit AHB @ 400 Hz emory bus 128-bit AHB @ 400 Hz PCI DA PCI Target AHB Bridge IOU pw router aster IO bus 32-bit AHB @ 400 Hz HL HL HL HL IL-TD- 1553B Ethernet Ethernet 26

Benchmarks - Overview Benchmarks I/O traffic FPU sharing caling Comparison with AT697, UT699, GR712RC 27

Benchmarks I/O traffic routing Traffic simulations* on system at target frequency Tests benefited from L2 cache, however, decent transfer rates can be achieved without causing traffic on the Processor bus Configuration 1x Eth 2x Eth pw Combined Eth0 Eth1 Per port Total Eth0 Eth1 pw/port pw total L2 cache disabled 1.2 Gb/s 730 b/s 790 b/s 394 b/s 1.57 Gb/s 438 b/s 480 b/s 216 b/s 865 b/s L2 cache enabled 1.7 Gb/s 1.7 Gb/s 1.7 Gb/s 1.56 Gb/s 6.25 Gb/s 1.4 Gb/s 1.5 Gb/s 1 Gb/s 4 Gb/s L2 cache FT enabled 1.7 Gb/s 1.7 Gb/s 1.7 Gb/s 1.5 Gb/s 6.1 Gb/s 1.4 Gb/s 1.4 Gb/s 1.5 Gb/s 3.9 Gb/s Bypassing L2 cache using IOU connection directly to emory bus 1.4 Gb/s 1.2 Gb/s 1.2 Gb/s 697 b/s 2.8 Gb/s 746 b/s 850 b/s 338 b/s 1.4 Gb/s DDR2 AND DRA CTRL emory bus 128-bit AHB @ 400 Hz emory crubber L2 Cache * Exact figures no longer fully applicable due to updates of L2 cache and memory controller 28

Benchmarks FPU sharing (0) Quad instance runs of single/double precision Whetstone on quad CPU system show no measurable difference between having 1x FPU, 2xFPU or 4xFPU (one dedicated FPU per CPU) Runs of some of the benchmarks included in PEC CPU2000 on dual CPU system with 1x and 2x FPU: 29

Benchmarks FPU sharing (1) FPU sharing, for this particular system, more noticeable with applications using division (FDIV, FQRT): 30

Benchmarks caling (0) Running gcc, parser, eon, twolf, applu and art CPU2000 benchmarks first sequentially and then in parallel on one to four cores: 14000 peed-up is 2.91 when going to four cores 12000 10000 Execution time 8000 6000 4000 43% increase in exec time 4x the amount of work 2000 0 4 par inst / 1 core 4 par inst / 3 cores 1 inst / 1 core 4 seq inst / 1 core 4 par inst / 2 cores 4 par inst / 4 cores Benchmark 31

Benchmarks caling (1) Execution time (s) 5000 4000 3000 2000 1000 0 1 inst / 1 core 4 par inst / 4 cores Four parallel instances on four cores versus one instance on one core 32

Benchmarks caling (2) Execution time (s) 14000 12000 10000 8000 6000 4000 2000 0 4 par inst / 4 cores 4 par inst / 1 core Four parallel instances on four cores versus four parallel instances on one core 33

Benchmarks - Comparison Comparison with existing processors targeted for space Benchmark scores relative to AT697: Benchmark AT697 UT699 GR712RC NGP 164.gzip 1 0.94 (0.66) 1.1 (1.1) 1.31 (5.24) 176.gcc 1 0.79 (0.55) 0.97 (0.97) 1.3 (5.2) 256.bzip2 1 0.93 (0.65) 1.06 (1.06) 1.33 (5.32) AOC 1 1.2 (0.84) 1.52 (1.52) 1.79 (7.16) Basicmath 1 1.3 (0.91) 1.46 (1.46) 1.62 (6.48) Coremark, 1 thread 1 0.89 (0.62) 1.09 (1.09) 1.21 (4.84) Coremark, 4 threads 1 0.89 (0.62) 2.05 (2.05) 4.59 (18.36) Dhrystone 1 0.94 (0.66) 1.05 (1.05) 1.39 (5.56) Dhrystone, 4 instances 1 0.94 (0.66) 1.61 (1.61) 4.81 (19.24) Linpack 1 1.2 (0.84) 1.26 (1.26) 1.71 (6.84) Whetstone 1 1.94 (1.36) 2 (2) 2.22 (8.88) Whetstone, 4 instances 1 1.94 (1.36) 3.7 (3.7) 8.68 (34.72) All benchmarks were compiled with GCC-4.3.2 tuned for PARC V8. All systems were clocked at 50 Hz during the tests, using 32-bit DRA (LEON2/3) or 64-bit DDR2 (NGP) Note that the maximum operating frequency of the devices differ, here all tests are run at 50 Hz Values in parentheses are scaled for maximum frequency. 34