Introducing the Latest SiFive RISC-V Core IP Series

Similar documents
Jack Kang ( 剛至堅 ) VP Product June 2018

RISC-V Core IP Products

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

The ARM Cortex-M0 Processor Architecture Part-1

ARM Processors for Embedded Applications

Chapter 5. Introduction ARM Cortex series

Copyright 2016 Xilinx

Evaluating SiFive RISC- V Core IP

ARM Cortex core microcontrollers 3. Cortex-M0, M4, M7

The Next Steps in the Evolution of Embedded Processors

Kinetis Software Optimization

STM32 F0 Value Line. Entry-level MCUs

SiFive Freedom SoCs: Industry s First Open-Source RISC-V Chips

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

ELC4438: Embedded System Design ARM Embedded Processor

RM3 - Cortex-M4 / Cortex-M4F implementation

CPU Project in Western Digital: From Embedded Cores for Flash Controllers to Vision of Datacenter Processors with Open Interfaces

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

L2 - C language for Embedded MCUs

Hercules ARM Cortex -R4 System Architecture. Processor Overview

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

Chapter 15 ARM Architecture, Programming and Development Tools

Custom Silicon for all

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Contents of this presentation: Some words about the ARM company

Embedded System Design

Domain-Specific Acceleration via AndeStar V5 Processors

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

Copyright 2014 Xilinx

Cortex-M3/M4 Software Development

The Next Steps in the Evolution of ARM Cortex-M

Interconnects, Memory, GPIO

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

Software Defined Modem A commercial platform for wireless handsets

Growth outside Cell Phone Applications

ARMv8-A Software Development

Cortex-R5 Software Development

The Nios II Family of Configurable Soft-core Processors

Cortex-A9 MPCore Software Development

SiFive E24 Core Complex Manual v1p0. SiFive, Inc.

Cycle Approximate Simulation of RISC-V Processors

DSP ISA Extensions for an Open-Source RISC-V Implementation

Embedded Computing Platform. Architecture and Instruction Set

Zynq-7000 All Programmable SoC Product Overview

ECE 471 Embedded Systems Lecture 2

ARM architecture road map. NuMicro Overview of Cortex M. Cortex M Processor Family (2/3) All binary upwards compatible

ELC4438: Embedded System Design ARM Cortex-M Architecture II

ARM ARCHITECTURE. Contents at a glance:

Cortex-A15 MPCore Software Development

How to Select Hardware forvolume IoT Deployment?

RM4 - Cortex-M7 implementation

ARM Cortex core microcontrollers

OUTLINE. STM32F0 Architecture Overview STM32F0 Core Motivation for RISC and Pipelining Cortex-M0 Programming Model Toolchain and Project Structure

15CS44: MICROPROCESSORS AND MICROCONTROLLERS. QUESTION BANK with SOLUTIONS MODULE-4

ARM Cortex -M for Beginners

Amber Baruffa Vincent Varouh

CMP Conference 20 th January Director of Business Development EMEA

ARM Processor Architecture

Embedded Systems. 7. System Components

Agile Hardware Design: Building Chips with Small Teams

Parallel Simulation Accelerates Embedded Software Development, Debug and Test

Mobile & IoT Market Trends and Memory Requirements

AN719 PRECISION32 IDE AND APPBUILDER DETAILED TUTORIAL AND WALKTHROUGH. 1. Introduction. Figure 1. Precision32 IDE and AppBuilder Walkthrough Overview

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

.org. IoT Development Platform

ARM Embedded Systems: ARM Design philosophy, Embedded System Hardware, Embedded System Software

RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

NXP Unveils Its First ARM Cortex -M4 Based Controller Family

Diploma in Embedded Systems

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

Processor Trace in a Holistic World. DAC-2018 San Francisco RISC-V Foundation Booth

Course Introduction. Purpose: Objectives: Content: 27 pages 4 questions. Learning Time: 20 minutes

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

ARM Cortex-M and RTOSs Are Meant for Each Other

ARM Cortex -M7: Bringing High Performance to the Cortex-M Processor Series. Ian Johnson Senior Product Manager, ARM

SiFive E31 Coreplex Manual v1p0. c SiFive, Inc.

The Ultra-Low Power Open-source Core

New STM32 F7 Series. World s 1 st to market, ARM Cortex -M7 based 32-bit MCU

Modular ARM System Design

Getting the Most out of Advanced ARM IP. ARM Technology Symposia November 2013

ECE 471 Embedded Systems Lecture 2

Wed. Aug 23 Announcements

Bringing the benefits of Cortex-M processors to FPGA

The ARM Cortex-A9 Processors

PRU Hardware Overview. Building Blocks for PRU Development: Module 1

CISC RISC. Compiler. Compiler. Processor. Processor

Next Generation Multi-Purpose Microprocessor

Power, Performance and Area Implementation Analysis.

ECE 471 Embedded Systems Lecture 2

ECE254 Lab3 Tutorial. Introduction to MCB1700 Hardware Programming. Irene Huang

ARMv8-M Architecture Technical Overview

Mobile & IoT Market Trends and Memory Requirements

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

The Challenges of System Design. Raising Performance and Reducing Power Consumption

Each Milliwatt Matters

The ARM10 Family of Advanced Microprocessor Cores

SoC Platforms and CPU Cores

all: arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb main.c \ -T generic-hosted.ld -o main.out -g arm-none-eabi-objdump -S main.out > main.

ARM Cortex -M and Java in the Internet of Things. Asim Chaudhry Field Applications Engineer, ARM

Transcription:

Introducing the Latest SiFive RISC-V Core IP Series Drew Barbier DAC, June 2018 1

SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance 64-bit Application Cores High Performance Embedded Networking Industrial Modems Multi-Core RISC-V Linux Low Cost Linux Storage Gateways SiFive s Most Efficient Series Microcontrollers IoT Wearables Core Series offer unique design points which can then be customized to meet application specific requirements Standard Cores represent pre-configured implementations of a Core Series which are available for free RTL and FPGA evaluations 2

SiFive E3 and E5 Series RISC-V Core IP June 2018 Update

SiFive E3 and E5 Series Overview High Performance 32bit and 64bit RISC-V MCUs 4 New Pipelined Multiplication Unit Better performance: 3 Coremarks/MHz Multicore Support Pre-integrated and verified by SiFive Supports up to 8+ cores Flexible Memory Architecture I-Cache can be reconfigured into I-Cache + ITIM DTIM for fast on Core Complex Data Access (D-Cache option also available) ECC/Parity Protection on all memories Off Core Complex memory access through Memory, System and Peripheral Ports with configurable Fast Interrupts Supports fast vectored local interrupts and shared global interrupts E3/E5 with Interrupt Handlers in ITIM can enter a local interrupt ISR in as little as 10 cycles Does not require separate integration with a separate interrupt controller Memory Protection 8 region physical memory protection Region locking Evaluate Today E51 and E31 Standard Core RTL and FPGA evaluations are available now

Introducing SiFive E2 Series RISC-V Core IP

SiFive E2 Series RISC-V Core IP SiFive s Smallest, Lowest Power, Core Series Clean sheet design from the inventors of RISC-V New interrupt controller enabling fast interrupt handling Support for Heterogenous MP with other SiFive Cores SiFive E2 Series RISC-V Core IP E2 Series Configurability Core Tune the E2 for higher performance, or lower area Security Support for Memory Protection and RISC-V Machine and User Modes Memory Map Fully customizable memory map Ports Flexibility in the number and types of ports And more Interrupts, Debug and Trace, Memory Protection, Tightly Integrated Memory Industry leading 32-bit MCU efficiency: Embedded Microcontroller IoT Analog Mixed Signal 8-bit replacement Programmable Finite State Machine Full featured 32-bit RISC-V MCU : Sensor Fusion Wearables General Purpose Microcontroller Smart IoT Minion Core Smart Connected Toys E20 and E21 are Standard Cores within the E2 Series Standard Cores are benchmarked with published Power, Performance, and Area RTL and FPGA evaluations are available 6

E2 Series Features The Smallest, Most Efficient RISC-V MCU Family E2 Series core architectural overview RV32IMACF capable core 2-3 stage, optional, Harvard Pipeline Configurable to meet application specific needs Ability to add multiple outbound Ports of different specifications Optional Tightly Integrated Memory (TIM) Flexible address map Optional FPU, tunable Multiply performance, Memory Protection, and more First RISC-V core with support for the RISC-V Core Local Interrupt Controller (CLIC) Execute first instruction of a C handler in 6 cycles Hardware interrupt prioritization and nesting Drop In Cortex-M0+ and Cortex-M3/M4 replacement E21 has 12% more performance vs Cortex-M4 E20 has 28% more performance vs Cortex-M0+ 7

Better than the Competition SiFive Standard Core outperform ARM equivalent cores E21 is 12% higher performance per MHz vs Cortex-M4 in CoreMark When using equivalent GCC Compilers E20 is 28% higher performance per MHz vs Cortex-M0+ in CoreMark When using equivalent GCC Compilers E2 Series Cores are also Area and Power Efficient See details in the following slides E2 Series configurability allows for the core performance and area to be tuned to the exact application requirements The E2 Series can be configured smaller than the E20 Standard Core The E2 Series can be configured with more features than the E21 Standard Core DMIPS/MHz Coremarks/MHz DMIPS/MHz Coremarks/MHz E21 vs Cortex-M4 Absolute 1.25 1.38 2.76 3.1 0 0.5 1 1.5 2 2.5 3 3.5 0.95 1.1 Cortex-M4 1.87 2.4 E21 E20 vs Cortex-M0+ Absolute 0 0.5 1 1.5 2 2.5 3 Cortex-M0+ E20 8

E2 Series Memory Subsystem E2 Core can be configured with 1 or 2 bus interfaces 2 bus interfaces allows for simultaneous Instruction and Data accesses from the core giving higher performance S-Bus is a highly optimized crossbar allowing for fast access the TIM banks and System Port Can have multiple parallel access: Fetching instructions from one TIM, while writing data to the other Optional 2x TIM banks Guaranteed low latency access through the S-Bus Supports parallel access to both TIM banks for E2 cores with 2 bus interface configurations *Dotted lines represent optional interfaces/modules P-Bus supports On-Core Complex peripherals and the Peripheral Port Support for RISC-V Atomic instructions which can be used for single-cycle Read-Modify-Write operations E2 Core Instruction and Data accesses can target any Port or TIM No pre-defined address map! Place instructions and data where it makes sense for your application 9

Core Local Interrupt Controller (CLIC) Simplified interrupt scheme which allows for low latency interrupt servicing, hardware prioritization, and pre-emption Support for up to 1024 interrupts Global programable prioritization of all interrupts including software and timer Works with E Cores and U Cores Still supports PLIC for global interrupts shared with other cores in the SoC Extreme low latency with support for Vectoring directly to ISR 6 cycles into first instruction of ISR, 18 cycles total to complete a simple ISR in an E2 Series Pipeline Vector table contains function pointers (addresses) to ISR Interrupt pre-emption capabilities Up to 16 levels of nesting, and programmable priority levels within each level Easy to use programmers model GCC interrupt function attribute, no assembly necessary Multiple SW interrupts with programmable priority levels CLIC driver included in SiFive SDK for easy programmability 10

E21 Standard Core 11 Full Featured RISC-V MCU 0.037mm2 in TSMC 28HPC for entire Core Complex without TIM RAMs 1.38 DMIPS/MHz, 3.1 Coremarks/MHz 2 Core Interfaces Allows for simultaneous instruction and data access Efficient memory system Banked TIM for fast local memory Simultaneous Instruction and Data Accesses (Harvard Architecture) User Mode and Physical Memory Protection (PMP) 4 Region PMP 4 Hardware breakpoint/watchpoints Extreme low latency interrupt handling Execute first instruction of C handler in 6 cycles Execute entire ISR in 18 cycles E21 (AHB-Lite) Post-Route Physical Design TSMC 28HPC UMC 55LP Frequency @ worst setup corner 50MHz 585MHz 50MHz 210MHz Worst Setup Corner ssg_0p81v_m40c_cworst ss_1p08v_125c_cmax Implementation Details 9t; LVT, SVT, UHVT 12t; LVT, SVT, UHVT 7t; LVT, SVT, HVT 7t; LVT, SVT, HVT Core Complex Area (mm 2 )* 0.037 0.077 0.1 0.146 Core Only Area (mm 2 )** 0.015 0.042 0.043 0.074 Core Complex Power Dhrystone (mw) @ worst setup frequency 1.3 26 3.1 18 Power Characterization Corner tt_0p9v_25c_typical tt_1p2v_25c_typ Note: All area and power numbers do not include RAMs *Core Complex includes the Core plus CLIC w/127 irq and 4 priority bits, Debug w/ 4 hw breakpoints, 4 Region PMP, TIM Logic, internal bus and ports **Core only includes the core pipeline only

E20 Standard Core SiFive s Most Efficient Standard Core 0.023mm2 in TSMC 28HPC for entire Core Complex including CLIC, Debug, and System Port 1.1DMIPS/MHz, 2.4 Coremarks/MHz 12 E20 Standard Core is optimized for Area and Power Single Core Interface for all Instruction and Data accesses 4 cycle hardware multiply Single System Port Interface 4 Hardware breakpoint/watchpoints Extreme low latency interrupt handling Execute first instruction of C handler in 6 cycles Execute entire ISR in 18 cycles E20 (AHB-Lite) Post-Route Physical Design TSMC 28HPC UMC 55LP Frequency @ worst setup corner 50MHz 725MHz 50MHz 250MHz Worst Setup Corner ssg_0p81v_m40c_cworst ss_1p08v_125c_cmax Implementation Details 9t; LVT, SVT, UHVT 12t; LVT, SVT, UHVT 7t; LVT, SVT, HVT 7t; LVT, SVT, HVT Core Complex Area (mm 2 )* 0.023 0.046 0.064 0.083 Core Only Area (mm 2 )** 0.011 0.029 0.031 0.049 Core Complex Power Dhrystone (mw) @ worst setup frequency 0.58 18 1.3 8.8 Power Characterization Corner tt_0p9v_25c_typical tt_1p2v_25c_typ Note: All area and power numbers do not include RAMs *Core Complex includes the Core plus CLIC w/32 irq and 2 priority bits, Debug w/ 4 hw breakpoints, internal bus and ports **Core only includes the core pipeline only

E2 Series VS ARM Cortex-M The E2 Series can be configured to meet your application requirements E2 Series Options E2 Series VS ARM Cortex-M Comparison Table E20 Standard Core E21 Standard Core Cortex-M0+ Cortex-M3 Cortex-M4 Dhrystone Up to 1.38 DMIPS/MHz 1.1 DMIPS/MHz 1.38 DMIPS/MHz 0.95 DMIPS/MHz 1.25 DMIPS/MHz 1.25DMIPS/MHz CoreMark Up to 3.1 2.4 CoreMarks/MHz 3.1 CoreMarks/MHz 1.8 CoreMarks/MHz 2.76 Coremarks/MHz 2.76 CoreMarks/MHz Integer Registers 31 Useable 31 Useable 31 Useable 13 Useable 13 Useable 13 Useable FPU Optional FPU None None None None Optional Hardware Multiply and Divide Yes, Optional Yes Yes Hardware Multiply Only Yes Yes Memory Map Customizable SiFive Freedom Platform SiFive Freedom Platform Fixed ARMv6-M Fixed ARMv7-M Fixed ARMv7-M Atomics Optional: RISC-V standard AMO support via Peripheral Port No Peripheral Port RISC-V AMO standard support via Peripheral Port None Bit-band and Load/Store Exclusive Number of Interrupts Up to 1024 32 127 32 240 240 Interrupt Latency into C Handler 6 Cycles CLIC Vectored Mode 6 Cycles 6 Cycles 15 Cycles 12 Cycles 12 Cycles Bit-band and Load/Store Exclusive Memory Protection Optional up to 8 Regions N/A 4 Regions Optional, ARMv6m 0 or 8 Region 0 or 8 Region Tightly Integrated Memory Optional 2 Banks None 2 Banks No No No 13 Bus Interfaces Configurable: Up to 3 masters and 1 slave with support for TileLink, AXI, AHB-Lite, APB 1 Master 2 Master, 1 Slave 1 AHB-Lite 3 AHB-Lite 3 AHB-Lite

End

E21 is 12% Higher Performance vs Cortex-M4 Coremarks/MHz 3.5 3 2.5 2 1.5 1 0.5 0 DMIPS/MHz 3.1 E21 vs Cortex-M4 Relative 2.76 Coremarks/MHz 112% 110% 90% 95% 100% 105% 110% 115% Cortex-M4 E21 E21 vs Cortex-M4 Absolute 1.38 1.25 DMIPS/MHz E21 outperforms the Cortex-M4 when using GCC Compilers Versions SiFive v201712 ARM gcc-arm-none-eabi-7-2017-q4-major Dhrystone Compiler Flags SiFive cflags: -Os -fno-common -O2 -DTIME -DNOENUM -fno-inline -fno-builtin-printf -Wno-implicit -falign-functions=4 - DCRTS_SPLIT_TEXT_DATA ARM https://developer.arm.com/products/processors/cortex-m/cortex-m4 Coremark Compiler Flags SiFive cflags: -O2 -fno-common -funroll-loops -finline-functions -fno-builtin-printf --param max-inline-insns-auto=20 -falignfunctions=4 -falign-jumps=4 -falign-loops=4 -DCRTS_SPLIT_TEXT_DATA ARM cflags: -mcpu=cortex-m4 -mthumb -O2 -fno-common -funroll-loops -finline-functions -fno-builtin-printf --param max-inlineinsns-auto=20 -falign-functions=4 -falign-jumps=4 -falign-loops=4 -Wall -fmessage-length=0 -mno-sched-prolog -fno-builtin - ffunction-sections -fdata-sections -mfpu=fpv4-sp-d16 -mfloat-abi=softfp 15 E21 Cortex-M4 Cortex-M4 Silicon Labs Pearl Gecko Dev board : https://www.silabs.com/products/mcu/32-bit/efm32-pearl-gecko/device.efm32pg1b200f256gm48 Cortex-M4 @19MHz w/fpu

E20 is 28% Higher Performance vs Cortex-M0+ Coremarks/MHz DMIPS/MHz E20 vs Cortex-M0+ Relative 128% 116% 0% 50% 100% 150% Cortex-M0+ E20 E20 outperforms the Cortex-M4 when using GCC Compilers Versions SiFive v201712 ARM gcc-arm-none-eabi-7.2.1 (MCUExpresso default) Dhrystone Compiler Flags SiFive cflags: -Os -fno-common -O2 -DTIME -DNOENUM -fno-inline -fno-builtin-printf -Wno-implicit -falign-functions=4 3 2.5 2 E20 vs Cortex-M0+ Absolute ARM https://developer.arm.com/products/processors/cortex-m/cortex-m0-plus Coremark Compiler Flags SiFive cflags: -O2 -fno-common -funroll-loops -finline-functions -fno-builtin-printf --param max-inline-insns-auto=20 -falignfunctions=4 -falign-jumps=4 -falign-loops=4 1.5 1 0.5 2.4 1.87 1.1 0.95 ARM cflags: -O2 -fno-common -g3 -Wall -c -fmessage-length=0 -fno-builtin -ffunction-sections -falign-functions=4 -falign-jumps=4 -falign-loops=4 -mno-sched-prolog -mcpu=cortex-m0plus -mthumb 0 16 Coremarks/MHz E20 Cortex-M0+ DMIPS/MHz Cortex-M0+ NXP FRDM-KL25Z running at 48MHz https://www.nxp.com/products/processors-and-microcontrollers/arm-based-processors-and-mcus/kinetis-cortex-m-mcus/lseriesultra-low-powerm0-plus/freedom-development-platform-for-kinetis-kl14-kl15-kl24-kl25-mcus:frdm-kl25z