Tomasz Włostowski Beams Department Controls Group Hardware and Timing Section. Developing hard real-time systems using FPGAs and soft CPU cores

Similar documents
Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Introduction Technology Equipment Performance Current developments Conclusions. White Rabbit. A quick introduction. Javier Serrano

Why a new kit Introduction to OH OH products Gateware architecture New tools Future work & conclusions. CERN s FMC kit

The Nios II Family of Configurable Soft-core Processors

«Real Time Embedded systems» Multi Masters Systems

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

SoC Platforms and CPU Cores

White-Rabbit NIC Gateware

Migrating to Cortex-M3 Microcontrollers: an RTOS Perspective

Scientific Instrumentation using NI Technology

Hardware Design. University of Pannonia Dept. Of Electrical Engineering and Information Systems. MicroBlaze v.8.10 / v.8.20

Commercial Real-time Operating Systems An Introduction. Swaminathan Sivasubramanian Dependable Computing & Networking Laboratory

The control of I/O devices is a major concern for OS designers

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Simplifying FPGA Design for SDR with a Network on Chip Architecture

PowerPC on NetFPGA CSE 237B. Erik Rubow

Open Hardware for CERN s Accelerator Control Systems

ECE 448 Lecture 15. Overview of Embedded SoC Systems

Design and Implementation of a FPGA-based Pipelined Microcontroller

ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware

ORCA FPGA- Optimized VectorBlox Computing Inc.

RECONFIGURABLE SPI DRIVER FOR MIPS SOFT-CORE PROCESSOR USING FPGA

DSP/BIOS Kernel Scalable, Real-Time Kernel TM. for TMS320 DSPs. Product Bulletin

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

ARM Processors for Embedded Applications

ROB IN Performance Measurements

Martin Kruliš, v

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

RISC-V Core IP Products

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

In this tutorial, we will discuss the architecture, pin diagram and other key concepts of microprocessors.

Practical Hardware Debugging: Quick Notes On How to Simulate Altera s Nios II Multiprocessor Systems Using Mentor Graphics ModelSim

Midterm Exam. Solutions

Common Computer-System and OS Structures

Real-Time Programming

The White Rabbit Project

Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools

D Demonstration of disturbance recording functions for PQ monitoring

Chapter 5. Introduction ARM Cortex series

Full Linux on FPGA. Sven Gregori

V8uC: Sparc V8 micro-controller derived from LEON2-FT

CHAPTER 3 LabVIEW REAL TIME APPLICATION DEVELOPMENT REFERENCES: [1] NI, Real Time LabVIEW. [2] R. Bishop, LabVIEW 2009.

EE 459/500 HDL Based Digital Design with Programmable Logic

Interrupt/Timer/DMA 1

Tracing embedded heterogeneous systems

FPGA introduction 2008

CS/EE 3710 Computer Architecture Lab Checkpoint #2 Datapath Infrastructure

Microcontroller basics

systems such as Linux (real time application interface Linux included). The unified 32-

White Rabbit PTP Core

I/O Systems. Amir H. Payberah. Amirkabir University of Technology (Tehran Polytechnic)

Is WorldFIP a reliable rad-hard fieldbus for the long term?

AC OB S. Multi-threaded FW framework (OS) for embedded ARM systems Torsten Jaekel, June 2014

ELC4438: Embedded System Design Embedded Processor

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Atlys (Xilinx Spartan-6 LX45)

Intelop. *As new IP blocks become available, please contact the factory for the latest updated info.

Towards a new Timing System for Particle Accelerators

CS370 Operating Systems

White Rabbit for Time Transfer

Real-time Support in Operating Systems

Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630

Department of Computer Science, Institute for System Architecture, Operating Systems Group. Real-Time Systems '08 / '09. Hardware.

Synchronization Spinlocks - Semaphores

What's "vspi"? What's included?

Measuring the impacts of the Preempt-RT patch

Tasks. Task Implementation and management

WB_MP3DEC - Wishbone MP3 Decoder

Simplify System Complexity

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

EC EMBEDDED AND REAL TIME SYSTEMS

Ten Reasons to Optimize a Processor

CS370 Operating Systems

WP 14 and Timing Sync

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

Proven 8051 Microcontroller Technology, Brilliantly Updated

Simplify System Complexity

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

Multicore for safety-critical embedded systems: challenges andmarch opportunities 15, / 28

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl

DQSPI IP Core. Serial Peripheral Interface Master/Slave with single, dual and quad SPI Bus support v. 2.01

Analysis and Research on Improving Real-time Performance of Linux Kernel

Parallel Computing: Parallel Architectures Jin, Hai

2 MARKS Q&A 1 KNREDDY UNIT-I

CPU offloading using SoC fabric Avnet Silica & Enclustra Seminar Getting started with Xilinx Zynq SoC Fribourg, April 26, 2017

Main Points of the Computer Organization and System Software Module

Introduction Architecture overview. Multi-cluster architecture Addressing modes. Single-cluster Pipeline. architecture Instruction folding

ACC, a Next Generation CAN Controller

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT-I

epython An Implementation of Python Enabling Accessible Programming of Micro-Core Architectures Nick Brown, EPCC

Introduction to OS. Introduction MOS Mahmoud El-Gayyar. Mahmoud El-Gayyar / Introduction to OS 1

WHITE RABBIT: SUB-NANOSECOND SYNCHRONIZATION FOR EMBEDDED SYSTEMS

Configurable UART with FIFO ver 2.20

XPU A Programmable FPGA Accelerator for Diverse Workloads

The hardware implementation of PXI/PXIe consists of a chassis, controller or computer interface, and peripheral cards.

Linux multi-core scalability

Digital System Design Using Verilog. - Processing Unit Design

Copyright 2016 Xilinx

Embedded Systems: Architecture

Transcription:

Tomasz Włostowski Beams Department Controls Group Hardware and Timing Section Developing hard real-time systems using FPGAs and soft CPU cores Melbourne, 22 October 2015

Outline 2 Hard Real Time control systems: background The Mock Turtle Core The RV a soft processor core designed at CERN Application examples Summary and outlook

Hard Real Time Controls 3 What do we mean by hard RT?

Hard RT: technologies FPGAs Embedded processors Very low and deterministic latency Easy to customize Long development cycle Difficult to program by the end user Deterministic latency Easy to program Difficult to customize (often need a companion FPGA) Portability issues PLCs Linux PCs Standardized environment Easy to program by the end users Too slow for many accelerator applications 4 Very short development cycle Easy to program by the end users Can't guarantee determinism (at least on x86 platform)

Example: CERN timing receiver Custom VHDL design, done from scratch Only a small part (counters) really needs dedicated VHDL Limited flexibility 5

Example: LHC Instability Trigger 6 a.k.a. The LIST Instruments in Point 4 detect an onset of beam instability Generate a trigger Distribute the trigger to other instruments and acquire a massive amount of data for offline study.

LIST: The Challenge 7 Distributed hard real time system Exchange triggers between any pair of devices in the network Never exceed the design latency (max 270 s) Never miss a trigger Deliver a feature-rich system in a reasonable time...

Hard RT: technologies Embedded processors FPGAs Very low and deterministic latency Easy to customize Long development cycle Difficult to program by the end user Deterministic latency Easy to program Difficult to customize (often need a companion FPGA) Portability issues PLCs Linux PCs Standardized environment Easy to program by the end users Too slow for many accelerator applications 8 Very short development cycle Easy to program by the end users Can't guarantee determinism (at least on x86 platform)

Outline 9 Hard Real Time control systems: background The Mock Turtle Core The RV a soft processor core designed at CERN Application examples Summary and outlook

Mock Turtle: the idea 10 Take an FPGA chip Take a CPU core that is deterministic Put as many CPUs as needed Let them communicate with each other... and with the external world Connect user cores and hardware

Mock Turtle: the idea 11 Take an FPGA chip Take a CPU core that is deterministic Put as many CPUs as needed Let them communicate with each other... and with the external world Connect user cores and hardware

Mock Turtle: the idea 12 Take an FPGA chip Take a CPU core that is deterministic Put as many CPUs as needed Let them communicate with each other... and with the external world Connect user cores and hardware

Mock Turtle: the idea 13 Take an FPGA chip Take a CPU core that is deterministic Put as many CPUs as needed Let them communicate with each other... and with the external world Connect user cores and hardware

Mock Turtle: the idea 14 Take an FPGA chip Take a CPU core that is deterministic Put as many CPUs as needed Let them communicate with each other... and with the external world Connect user cores and hardware

Mock Turtle: the idea 15 Take an FPGA chip Take a CPU core that is deterministic Put as many CPUs as needed Let them communicate with each other... and with the external world Connect user cores and hardware

Mock Turtle: the idea 16 Make it a service! Standard. Portable. Reusable. With or without White Rabbit as the means of communication and synchronization.

Inside Mock Turtle 17 The CPU Cores Up to 8 LM32 cores, with private code/data memory Programed in bare metal C, using standard GCC tool chain Each core runs a single task in a tight loop. No caches, no interrupts Program loading and flow control from the host system Run with same time base if used with WR

Inside Mock Turtle 18 Shared Memory Foreseen for inter-core communication and task synchronization Small, but atomically accessed Add/subtract Bit operations Test-and-set Task synchronization primitives Mutexes Semaphores Queues Flags Events...

Inside Mock Turtle 19 Communication System Simple FIFO queues holding multiple messages Polling in a tight loop on the RT side Interrupt-driven on the host side Each queue provides a configurable number of channels (slots) Host Message Queue for communication with the host software Optional Remote Message Queue for communication with other nodes over WR network Uses Etherbone protocol

Inside Mock Turtle 20 Software Architecture Unified kernel driver + user space C library Host code written in user space easier development Message Queue and Shared Memory access Loading CPU applications and controlling each core's execution flow Real-time library for the CPU cores. Python bindings available.

Outside Mock Turtle 21 Interfaces & integration in user design Based on the Wishbone bus. Each CPU can have its exclusive Dedicated Peripheral. All CPUs can access Shared Peripheral(s). Etherbone master/slave port for communication with other nodes in a WR network (optional). Time interface from the WR PTP Core. Control port for the host system.

Outline 22 Hard Real Time control systems: background The Mock Turtle Core The RV a soft processor core designed at CERN Application examples Summary and outlook

What is RV? 23 Micro RISC-V: a compact soft processor (CPU) core for use in FPGAs. Written in Verilog, no external dependencies. Reduced Instruction Set (RISC) processor. Small number of instructions, each doing one function at a time. Based on the RISC-V architecture developed at the University of Berkeley. Well thought and promising instruction set. Small, portable and deterministic. Determinism of execution over performance.

Why RV? 24 A number of our projects depend on soft CPUs Popular architectures have either license limitations (LM32) or are patented (ARM, MIPS) FPGA vendor-provided cores (Microblaze, Nios) are not portable (no sources available either) RISC-V: a very nice architecture, but with no small Verilog/VHDL implementation

The RV design Simple, 4-stage pipeline All instructions except jumps (branches) in one clock cycle Code/data in a private memory Wishbone bus for peripheral access 100 MHz and 20% of a small FPGA (Spartan-6 SLX9) 25

Outline 26 Hard Real Time control systems: background The Mock Turtle Core The RV a soft processor core designed at CERN Application examples Summary and outlook

Trigger Distribution Node Dual-core system: 32 kb private RAM per CPU CPU 1 responsible for the inputs: 27 Poll TDC for incoming pulses Apply dead time Assign trigger ID Send to the WR network Log the trigger CPU 0 takes care of the outputs: Poll Remote MQ for trigger messages Search a hash table for matching triggers Apply delay & dead time Program the pulse generator Log the trigger

Trigger Distribution Node 28 Example of a real time task Purpose

Outline 29 Hard Real Time control systems: background The Mock Turtle Core The RV a soft processor core designed at CERN Application examples Summary and outlook

Summary & outlook Proven real-time performance 30 Trigger distribution system working in the LHC Other projects (FIP master & RF distribution) ongoing Savings in development time Minimal amount of dedicated VHDL Standardized software stack and interfaces A service that anyone can use The RV soft processor Passes CoreMark & official RISC-V test suite Migration of MT Cores from LM32 in progress Sources available at the Open Hardware Repository: ohwr.org

Questions?

MasterFIP project 31

MasterFIP project Clean implementation of WorldFIP bus master 32 Critical technology for LHC controls in radiation areas Two-CPU design CPU0 plays the current FIP macrocycle CPU1 exchanges the cycle configuration and variables with the host Dedicated VHDL only needed to serialize/deserialize FIP frames Shared Memory holds the FIP variables and macrocycle setup No White Rabbit