Experiences from Andes Technology. Alan Kao, Zong Li Andes Technology LPC'18

Similar documents
Kernel perf tool user guide

OS lpr. www. nfsd gcc emacs ls 1/27/09. Process Management. CS 537 Lecture 3: Processes. Example OS in operation. Why Processes? Simplicity + Speed

ECE 498 Linux Assembly Language Lecture 1

KVM/ARM. Marc Zyngier LPC 12

LIVEPATCH MODULE CREATION. LPC 2016 Josh Poimboeuf

The Beast We Call A3. CS 161: Lecture 10 3/7/17

64 bit Bare Metal Programming on RPI-3. Tristan Gingold

This section covers the MIPS instruction set.

This training session will cover the Translation Look aside Buffer or TLB

UVM-based RISC-V processor verification platform. Tao Liu, Richard Ho, Udi Jonnalagadda

Breaking Kernel Address Space Layout Randomization (KASLR) with Intel TSX. Yeongjin Jang, Sangho Lee, and Taesoo Kim Georgia Institute of Technology

Running on the Bare Metal with GeekOS

www nfsd emacs lpr Process Management CS 537 Lecture 4: Processes Example OS in operation Why Processes? Simplicity + Speed

Linux Kernel on RISC-V: Where do we stand?

COS 318: Operating Systems. Virtual Memory and Address Translation

OVP Guide to Using Processor Models. Model specific information for riscv RV32I

Syscalls, exceptions, and interrupts, oh my!

Multi-level Translation. CS 537 Lecture 9 Paging. Example two-level page table. Multi-level Translation Analysis

Embedded Systems Programming

Address spaces and memory management

Application Fault Tolerance Using Continuous Checkpoint/Restart

Section 9: Cache, Clock Algorithm, Banker s Algorithm and Demand Paging

This section describes the Boot-MIPS code and walks you through the booting of a coherent processing system.

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

DEBUGGING: DYNAMIC PROGRAM ANALYSIS

Migrating RC3233x Software to the RC32434/5 Device

Virtual memory - Paging

The process. one problem

Porting FreeBSD to AArch64

Processes and Threads

Storage Systems. NPTEL Course Jan K. Gopinath Indian Institute of Science

Operating System Architecture. CS3026 Operating Systems Lecture 03

Operating Systems. 09. Memory Management Part 1. Paul Krzyzanowski. Rutgers University. Spring 2015

ARM CORTEX-R52. Target Audience: Engineers and technicians who develop SoCs and systems based on the ARM Cortex-R52 architecture.

Porting Linux to a new SoC

CS 152, Spring 2012 Section 11

Projects on the Intel Single-chip Cloud Computer (SCC)

Fosdem perf status on ARM and ARM64

Anne Bracy CS 3410 Computer Science Cornell University

1 Do not confuse the MPU with the Nios II memory management unit (MMU). The MPU does not provide memory mapping or management.

William Stallings Computer Organization and Architecture. Chapter 11 CPU Structure and Function

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

ECE 598 Advanced Operating Systems Lecture 10

Operating Systems: Internals and Design Principles, 7/E William Stallings. Chapter 1 Computer System Overview

Technical Committee Update

Cross Compiling. Real Time Operating Systems and Middleware. Luca Abeni

Disco: Running Commodity Operating Systems on Scalable Multiprocessors

C6000 Compiler Roadmap

Anne Bracy CS 3410 Computer Science Cornell University

Process Environment. Pradipta De

CSC369 Lecture 2. Larry Zhang, September 21, 2015

ARM Trusted Firmware From Embedded to Enterprise. Dan Handley

Production-Run Software Failure Diagnosis via Hardware Performance Counters. Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu

An Introduction to Balder An OpenMP Run-time Library for Clusters of SMPs

Bluespec s RISC-V Factory

BUD Navigating the ABI for the ARM Architecture. Peter Smith

An Overview of the BLITZ System

Prelinker Usage for MIPS Cores CELF Jamboree #11 Tokyo, Japan, Oct 27, 2006

Linux-Ready RV-GC AndesCore with Architecture Extensions Charlie Su, Ph.D. CTO and SVP 2018/05/09

Xen/paravirt_ops upstreaming Jeremy Fitzhardinge XenSource 04/18/07

CS5460/6460: Operating Systems. Lecture 21: Shared libraries. Anton Burtsev March, 2014

Experience with GNU, LINUX, and other Open Source on ARC Processors

COS 318: Operating Systems

QorIQ P4080 Software Development Kit

System Virtual Machines

Protection and System Calls. Otto J. Anshus

i.mx 7 - Hetereogenous Multiprocessing Architecture

[537] Virtual Machines. Tyler Harter

PROCESS VIRTUAL MEMORY PART 2. CS124 Operating Systems Winter , Lecture 19

Profiling: Understand Your Application

This section will cover the core initialization process.

COS 318: Operating Systems

Memory Management. Fundamentally two related, but distinct, issues. Management of logical address space resource

System Virtual Machines

The Challenges of X86 Hardware Virtualization. GCC- Virtualization: Rajeev Wankar 36

Parallel Processing with the SMP Framework in VTK. Berk Geveci Kitware, Inc.

Virtual Memory. ICS332 Operating Systems

ARMageddon: Cache Attacks on Mobile Devices

Porting Linux to a New Architecture

July 14, EPITA Systems/Security Laboratory (LSE) Code sandboxing. Alpha Abdoulaye - Pierre Marsais. Introduction. Solutions.

ENCM 501 Winter 2019 Assignment 9

Address Translation. Tore Larsen Material developed by: Kai Li, Princeton University

ECE 598 Advanced Operating Systems Lecture 14

ARM-KVM: Weather Report Korea Linux Forum

Porting Linux to a New Architecture

EXPLICIT SYNCHRONIZATION

Concurrent Server Design Multiple- vs. Single-Thread

Architectural Support for Operating Systems. Jinkyu Jeong ( Computer Systems Laboratory Sungkyunkwan University

Cache Coherence and Atomic Operations in Hardware

Processors, Performance, and Profiling

Hakim Weatherspoon CS 3410 Computer Science Cornell University

Virtual Memory 1. Virtual Memory

Virtual Memory 1. Virtual Memory

The Cache-Coherence Problem

Intel Transactional Synchronization Extensions (Intel TSX) Linux update. Andi Kleen Intel OTC. Linux Plumbers Sep 2013

Presented in Open Source Series Workshop Linux & Memory Maps. Presented by : Abdul Qadeer December, 2010 ICOSST 2010

Methods to protect proprietary components in device drivers

ECE 571 Advanced Microprocessor-Based Design Lecture 13

Kid Operating System

ARM Trusted Firmware: Changes for Axxia

Transcription:

Experiences from Andes Technology Alan Kao, Zong Li Andes Technology 2018/11/15 @ LPC'18

About us A company in Taiwan since 2005 (went public in 2017) A pure-play IP vendor with 140+ licenses >2.5B Andes-Embedded SoCs Diversified applications/needs Bare metal, RTOS, Linux Control, DSP/FP, security, and more AndeStar V5: RISC-V with Andes Extensions 2

Outline Performance Counter (Perf) ELF Attribute Fences after Page Table Entry Update [RFC 0/2] Proposal Infrastructure for Vendor-Specific Codes Non-Coherent Agent Address Space Identifier 3

Outline Performance Counter (Perf) ELF Attribute Fences after Page Table Entry Update [RFC 0/2] Proposal Infrastructure for Vendor-Specific Codes Non-Coherent Agent Address Space Identifier 4

Perf: What we have now Priliminary Perf Support Counting (perf stat) is OK, but Sampling (perf record) is not 5

perf record -c 1000 /bin/ls set $mcycle = MAX - 1000 start couter; /bin/ls interrupt due to counter overflow stop couter; record data 6

Limitation 1: Set counter (For single S-mode) We cannot write to counters without an SBI call. set $mcycle = MAX - 1000 start couter; /bin/ls stop couter; record data Possible Solution Add a SBI call Andes Solution New register: mcounterwen 7

Limitation 2: Pause/Resume set $mcycle = MAX - 1000 start couter; /bin/ls We cannot select privileged mode(s) to be counted. stop couter; record data Possible Solution (null) Andes Solution New register: scountermask[u s m] 8

Limitation 3: Pause/Resume set $mcycle = MAX - 1000 start couter; /bin/ls We cannot pause/resume counters. stop couter; record data Possible Solution (null) Andes Solution Some hack with scountermask[u s m] 9

Limitation 4: Interrupt set $mcycle = MAX - 1000 start couter; /bin/ls We have no interrupts from counter overflow. 10 stop couter; record data Possible Solution (null) Andes Solution A local interrupt, with new registers: scounterinen, scounterinov

Perf: Summary Status Quo Possible Solution now Andes Solution Limitation 1 Cannot write counter in S SBI mcounterwen Limitation 2,3 Cannot pause/resume/ mode selection (null) scountermask [u s m] Limitation 4 No interrupts for counter overflow (null) scounterinen, scounterinov 11

Outline Performance Counter (Perf) ELF Attribute Fences after Page Table Entry Update [RFC 0/2] Proposal Infrastructure for Vendor-Specific Codes Non-Coherent Agent Address Space Identifier 12

ELF Attribute: The Need # RISC-V Platforms increases # RISC-V ELF objects increases Disassembler: "Can I interpret the ELF correctly?" Linker: "Can I link the ELF objects?" Loader: "Can I load the ELF objects and make them run?" Linux: target executable and dynamic linker libc: libraries 13

ELF Attribute: Two Scenarios Executable rv64imafdc rv64imafdcxaaa Executable Library rv64imafdcxbbb Loader Loader rv32imafdc Platform rv64imafdcxaaa Platform 14

ELF Attribute: Brief Log Kito raises this proposal in March Add a PT_PROC section Full ISA string, privileged spec version, etc. Discussions Initial thread PR for ELF PSABI Document Components Binutils & gcc: nearly done Linux & glibc: need feedback 15

ELF Attribute: Implementation DTS: rv64i2p0m2p0a2p0f2p0d2p0c2p0xaaa0p0xbbb0p1 1.11 Kernel Target ELF Auxiliary Vector: rv64i2p0m2p0a2p0f2p0d2p0c2p0xaaa0p0xbbb0p1 1.11 check ELF pass attr ld.so Required Linraries 16

ELF Attribute: Any Thoughts? Does this "DTS-kernel-AUX-libc" mechanism seem good? Need new AT_XXX auxiliary vectors Should we use existing "riscv,isa" for the augmented ISA string? or invent one? Suggestion: adding restrictions to ISA format, easier life writing parser All lower cases Limited length for non-official extensions Force dictionary ordeer for non-official extensions 17

Outline Performance Counter (Perf) ELF Attribute Fences after Page Table Entry Update [RFC 0/2] Proposal Infrastructure for Vendor-Specific Codes Non-Coherent Agent Address Space Identifier 18

Infrastructures for Vendor-Specific Codes Clarifications for this patch set no custom instructions no stateful extensions custom CSRs Runtime probing for vendor features No custom CSRs in the kernel Extensions instead of vendors 19

Principle 1: Runtime Probing Custom Func. Is SX suitable here? A CPU feature but as an extension-like? +static int init nds_nocoh_ag_init(void) +{ + if(!strstr(isa_sx,"sxndsnocohag")) + return 0; + setup_maxpa(); + custom_ext_cache_op = nds_cache_op; + custom_ext_dma_alloc = nds_dma_alloc; + custom_ext_dma_free = nds_dma_free; +} +arch_initcall(nds_nocoh_ag_init); 20

Principle 2: No Custom CSRs in kernel How about this? +#define custom_csr_write(csr_enum,val) csr_write #defie csr_write(csr, val) \ unsigned long \ asm("csrw...") \... custom_csr_write(cctl_reg_ucctlbeginaddr_num, start); custom_csr_write(cctl_reg_ucctlcommand_num, CCTL_L1D_VA_INVAL); 21

Principle 3: Extensions instead of vendors Any comments? arch/riscv/kconfig 5 + arch/riscv/makefile 2 +- arch/riscv/custom-ext/nds_nocoh_ag/kconfig 10 ++ arch/riscv/custom-ext/nds_nocoh_ag/makefile 3 +... in arch/riscv/kconfig: +menu "RISC-V custom extension" + +source "arch/riscv/custom-ext/nds_nocoh_ag/kconfig 22

Non-Coherent Agents Requirement Lower hardware cost on embedded system DMA API Allocate a consistent memory (.alloc) How to configure PMAs to disable cacheability at run time by software Flush parts of cache (.sync_single_for_cpu/device) Only flush all at one time struct dma_map_ops {... void* (*alloc)(...); void (*sync_single_for_cpu)(...); void (*sync_single_for_device)(...);... } 23

Non-Coherent Agents Allocate a consistent memory in Andes A pseudo uncacheable memory space which alias to the real memory space Use the thirty-two bit to distinguish between them 24

Non-Coherent Agents Flush parts of cache in Andes CCTL command operations mcctlbeginaddr/ucctlbeginaddr CSRs The virtual address to access the cache mcctlcommand/ucctlcommand CSRs Trigger a CCTL command operation 25 /* CCTL command */ L1D_VA_INVAL L1D_VA_WB L1D_VA_WBINVAL L1D_VA_LOCK L1D_VA_UNLOCK L1D_WBINVAL_ALL L1D_WB_ALL L1I_VA_INVAL L1I_VA_LOCK L1I_VA_UNLOCK /* * writes the cache line back to memory * if the cache line state is valid and dirty. */ csr_write(ucctlbeginaddr, start); csr_write(ucctlcommand, L1D_VA_WB); /* * unlocks the cache line and sets its state to invalid. */ csr_write(ucctlbeginaddr, start); csr_write(ucctlcommand, L1D_VA_INVAL);

Outline Performance Counter (Perf) ELF Attribute Fences after Page Table Entry Update [RFC 0/2] Proposal Infrastructure for Vendor-Specific Codes Non-Coherent Agent Address Space Identifier 26

Address Space Identifier Support ASID Not used the ASID field of satp sv32 sv64 TLB with ASID TLB doesn t have to be invalidated on context switch Flush TLB when running out of ASID 27

Address Space Identifier Trigger module without ASID Target debugging have to set trigger on context switch 28

Address Space Identifier Trigger module with ASID Match ASID in trigger register and satp Need a ASID field in trigger register or new one tdata1 satp(sv32) satp(sv64) 29

Q & A zong@andestech.com alankao@andestech.com 30