A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software RTOS

Similar documents
A Novel Deadlock Avoidance Algorithm and Its Hardware Implementation

Hardware/Software Deadlock Avoidance for Multiprocessor Multiresource System-on-a-Chip

Interconnect Delay Aware RTL Verilog Bus Architecture Generation for an SoC

A Novel Deadlock Avoidance Algorithm and Its Hardware Implementation

Dynamic Memory Management for Real-Time Multiprocessor System-on-a-Chip

Hardware Support for Priority Inheritance

Hardware Support for Real-Time Embedded Multiprocessor System-on-a-Chip Memory Management

An Approach to Advance Real-Time Os Services with Soft-Error

A Dynamic Memory Management Unit for Embedded Real-Time System-on-a-Chip

A Comparison of Five Different Multiprocessor SoC Bus Architectures

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

DX-Gt: Memory Management and Crossbar Switch Generator for Multiprocessor System-on-a-Chip

Modeling Software with SystemC 3.0

Hardware Support for Priority Inheritance

Co-synthesis and Accelerator based Embedded System Design

System-on-a-Chip Processor Synchronization Support in Hardware

A Novel Parallel Deadlock Detection Algorithm and Architecture

Automated Bus Generation for Multiprocessor SoC Design

Cache Design and Timing Analysis for Preemptive Multi- tasking Real-Time Uniprocessor Systems

A Prioritized Cache for Multitasking Real-time Systems

Parallel Programming Multicore systems

CS 571 Operating Systems. Midterm Review. Angelos Stavrou, George Mason University

SoC Design Lecture 13: NoC (Network-on-Chip) Department of Computer Engineering Sharif University of Technology

Hardware-Software Codesign. 1. Introduction

Reference Model and Scheduling Policies for Real-Time Systems

Hardware/Software Co-design

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

EE382V: System-on-a-Chip (SoC) Design

Hardware-Software Codesign. 1. Introduction

A Configurable Hardware Scheduler (CHS) for Real-Time Systems

CS140 Operating Systems and Systems Programming Midterm Exam

REAL TIME OPERATING SYSTEM PROGRAMMING-I: VxWorks

Hardware Software Codesign of Embedded Systems

CSI3131 Final Exam Review

EE382V: System-on-a-Chip (SoC) Design

Design and Implementation of A Reconfigurable Arbiter

Embedded Software Streaming via Block Stream

ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing

Park Sung Chul. AE MentorGraphics Korea

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Observability in Multiprocessor Real-Time Systems with Hardware/Software Co-Simulation

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK

SPACE: SystemC Partitioning of Architectures for Co-design of real-time Embedded systems

System Verification of Hardware Optimization Based on Edge Detection

Long Term Trends for Embedded System Design

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

CACHE DESIGN AND TIMING ANALYSIS FOR PREEMPTIVE MULTI-TASKING REAL-TIME UNIPROCESSOR SYSTEMS. Yudong Tan

Outline Background Jaluna-1 Presentation Jaluna-2 Presentation Overview Use Cases Architecture Features Copyright Jaluna SA. All rights reserved

Commercial Real-time Operating Systems An Introduction. Swaminathan Sivasubramanian Dependable Computing & Networking Laboratory

UNIT I. Introduction to OS& System Structures

VALLIAMMAI ENGINEERING COLLEGE

Short Term Courses (Including Project Work)

INTEGRATING CACHE COHERENCE PROTOCOLS FOR HETEROGENEOUS MULTIPROCESSOR SYSTEMS, PART 1

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK

Overview of SOC Architecture design

Part B Questions. Unit I

Multi-core microcontroller design with Cortex-M processors and CoreSight SoC

Programming in the Brave New World of Systems-on-a-chip

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

Lecture 7: Introduction to Co-synthesis Algorithms

A VARIETY OF ICS ARE POSSIBLE DESIGNING FPGAS & ASICS. APPLICATIONS MAY USE STANDARD ICs or FPGAs/ASICs FAB FOUNDRIES COST BILLIONS

Cycle accurate transaction-driven simulation with multiple processor simulators

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT I

SCope: Efficient HdS simulation for MpSoC with NoC

Hardware Design and Simulation for Verification

Performance Verification for ESL Design Methodology from AADL Models

The RM9150 and the Fast Device Bus High Speed Interconnect

A Generic RTOS Model for Real-time Systems Simulation with SystemC

Hardware Accelerators

ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware

Embedded System Design and Modeling EE382V, Fall 2008

Verification of Multiprocessor system using Hardware/Software Co-simulation

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

SONICS, INC. Sonics SOC Integration Architecture. Drew Wingard. (Systems-ON-ICS)

Introduction to MLM. SoC FPGA. Embedded HW/SW Systems

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

asoc: : A Scalable On-Chip Communication Architecture

VALLIAMMAI ENGINEERING COLLEGE

A VERIOG-HDL IMPLEMENTATION OF VIRTUAL CHANNELS IN A NETWORK-ON-CHIP ROUTER. A Thesis SUNGHO PARK

«Real Time Embedded systems» Multi Masters Systems

Design Issues in Hardware/Software Co-Design

SimXMD: Simulation-based HW/SW Co-Debugging for FPGA Embedded Systems

SimXMD Co-Debugging Software and Hardware in FPGA Embedded Systems

OpenSMART: Single-cycle Multi-hop NoC Generator in BSV and Chisel

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

VLSI Design of Multichannel AMBA AHB

Part 2: Principles for a System-Level Design Methodology

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

SEOS: Hardware Implementation of Real-Time Operating System for Adaptability

Embedded Systems: Hardware Components (part II) Todor Stefanov

A Prioritized Cache for Multi-tasking Real-Time Systems

Cosimulation of ITRON-Based Embedded Software with SystemC

MPSoC Design Space Exploration Framework

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

High Performance Computing on GPUs using NVIDIA CUDA

Assertion and Model Checking of SystemC

Sri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1.

Green Hills Software, Inc.

Transcription:

A Framework for Automatic Generation of Configuration Files for a Custom Hardware/Software RTOS Jaehwan Lee* Kyeong Keol Ryu* Vincent J. Mooney III + {jaehwan, kkryu, mooney}@ece.gatech.edu http://codesign.ece.gatech.edu + Assistant Professor, *School of Electrical and Computer Engineering ing + Adjunct Assistant Professor, College of Computing Adjunct Assistant Professor, College of Computing Georgia Institute of Technology 26 of the HW/SW Codesign Group at GT

Outline Introduction Goals Motivation Methodology Experimental Results Conclusion 2

Introduction Specify custom HW/SW RTOS in a graphical user interface (GUI) Generate configuration files used to make a custom RTOS A custom RTOS may contain HW (as well as SW) components User input Software RTOS library GUI tool Makefile User.h Compile both hardware and software with an application Hardware RTOS library Base Architecture library Verilog File(s) Simulate the system to evaluate the result 3

Goals To help the user examine which configuration is most suitable for the user s specific applications To help the user explore the RTOS design space after chip fabrication as well as before chip fabrication To help the user examine different system-on-achip (SoC) architectures subject to a custom RTOS 4

Motivation (1/5) HW/SW RTOS partitioning approach Three previous innovations in HW/SW RTOS components SoCLC: System-on-a-Chip Lock Cache SoCDMMU: System-on-a-Chip Dynamic Memory Management Unit SoCDDU: System-on-a-Chip Deadlock Detection Unit 5

Motivation (2/5) System-on-a-Chip Lock Cache A hardware mechanism that resolves the critical section (CS) interactions among PEs Lock variables are moved into a separate lock cache outside of the memory Improving the performance criteria in terms of lock latency, lock delay and bandwidth consumption 6

Motivation (3/5) SoCDMMU: System-on-a-Chip Dynamic Memory Management Unit Provides fast, deterministic and yet dynamic memory management of a global on-chip memory Achieves flexible, efficient memory utilization Provides APIs for applications 7

Motivation (4/5) SoCDDU: System-on-a-Chip Deadlock Detection Unit Performs a novel parallel hardware deadlock detection based on implementing deadlock searches on the resource allocation matrix in hardware Provides a very fast deadlock detection at run-time with dedicated hardware performing simple bit-wise boolean operations Reduces deadlock detection time by 99% as compared to software Requires at most O(2*min(m,n)) iterations as opposed to O(m*n) required by all previously reported (sequential) software algorithms 8

Motivation (5/5) Constraints about using three previous innovations Perhaps not enough chip space for all three of them All of them may not be necessary Our framework Enables automatic generation of different mixes of the three previous innovations for different versions of a HW/SW RTOS Can be generalized to instantiate additional HW or SW RTOS components 9

Methodology (1/2) Translates the user choices into a custom RTOS Given the IP library of processors and HW/SW RTOS components Generates configuration files to glue together a custom RTOS executable in the Seamless Co- Verification Environment from Mentor Graphics Makefile and User.h for SW link Verilog header file for HW glue User input Hardware RTOS library Software RTOS library GUI tool Base Architecture library Makefile User.h Verilog File 10

Methodology (2/2) Explores the HW/SW RTOS design space defined by the available HW/SW RTOS components easily User input Software RTOS library GUI tool Application Makefile User.h SW Compile and Link HW/SW Co- Simulation Result and Feedback Hardware RTOS library Base Architecture library Verilog File HW Compile 11

Our RTOS and Possible Target SoC H/W S/W RTOS A multiprocessor System-on-a-Chip (Base architecture) A multiprocessor RTOS Application(s) running on the SoC using the RTOS APIs 12

Our RTOS in Detail Atalanta software RTOS A multiprocessor SoC RTOS The RTOS and device drivers are loaded into the L2 cache memory All Processing Elements (PEs) share the kernel code and data structures H/W S/W RTOS Hardware RTOS components are downloaded into the reconfigurable logic 13

Selectable RTOS IP components Software (Atalanta RTOS) Inter-Process Communication (IPC) components (semaphore, queue, event, mailbox, etc) CPU schedulers (priority, round-robin) Memory management module (gmm) Deadlock detection module (ddm) Hardware SoC Lock Cache for fast IPC (SoCLC) Dynamic Memory Management Unit (SoCDMMU) Deadlock Detection Unit (SoCDDU) 14

Implementation (1/8) SW may override task size SW module linking method HW integration method IPC module linking method 15

Implementation (2/8) with Example Use of GUI Tool The user Selects Deadlock detection SW module Semaphores for synchronization SoCLC for critical section Clicks Generate button The tool Generates Makefile & User.h Verilog header file 16

Implementation (3/8) 1) Software module linking method Command-line object inclusion method (well-known) Used for the same function but different implementations Implemented in Makefile Used for linking the deadlock detection SW module in the example 17

Implementation (4/8) 1) Software module linking method (cont d) $(LD) o $@ $(OTHER_OBJS) $(OPT_OBJ1) $(OPT_OBJ2) $(LIBRARY) Making to Stage gcc o ddutest.x [all other objects including ddutest.o] ddm.o atalanta.a GUI Tool Makefile OTHER_OBJS = ddutest.o OPT_OBJ1 = ddm.o OPT_OBJ2 = (blank) Software component selection Linking Stage ddm.o gmm.o X ddutest.x 18

Implementation (5/8) 2) Inter-process communication module linking method Library function linking method (common) Implemented in User.h Used for the semaphore function in the example 19

Implementation (6/8) 2) IPC module linking method (cont d) GUI Tool Generated Configuration user.h #define semaphores TRUE included application user.c user.i Pre-processing user.o Library (Atalanta.a) Semaphore functions Queue functions ddutest.x Making Stage Mailbox functions Event functions Linking Stage ddutest.c ddutest.o Other functions Selection flow of IPC methods 20

Implementation (7/8) 3) HW RTOS component integration method Novel HW integration method MPC750-1 MPC750-2 MPC750-3 MPC750-4 Construct a Verilog header file L1 L1 L1 L1 Integrate user-selected HW RTOS components into the Base architecture Start with an SoCLC architecture description (an example with SoCLC) SoCLC Reconfig. Logic Arbiter, Intr. controller, Clock Memory controller and memory 21

Implementation (8/8) Verilog header file generation example Start with SoCLC description Extract modules IP Library module PE ~~~ endmodule (i) Extract module PE ~~~ endmodule module clock ~~~ endmodule module soclc ~~~ endmodule Add wires Insert the instantiation code for each module PEs 1,2,3, Memory 1,2, SoCLC Clock Arbiter (iii) Instantiation code module PE ~~~ endmodule module clock ~~~ endmodule (ii) Add wires module soclc ~~~ endmodule wires and signals 22

Experimental Setup Five custom RTOSes With semaphores and spin-locks, no HW components in the RTOS With SoCLC, no SW IPCs Software RTOS library Hardware RTOS library Base Architecture library With deadlock detection software, no HW RTOS components User Input GUI tool With SoCDDU, no SW IPCs With SoCLC and SoCDDU Each with the Base architecture SW RTOS w/ sem RTOS1 SW RTOS + SoCLC SW RTOS w/ ddm SW RTOS + SoCDDU SW RTOS + SoCLC, SoCDDU RTOS2 RTOS3 RTOS4 RTOS5 Each with application(s) Each executable in Seamless CVE 4 MPC750 processors Reconfigurable logic Single bus 23 Application VCS Executable HW file for each Compile Stage for each system Simulation in Seamless CVE Executable SW file for each XRAY

Experimental Results (1/4) Example 1: Database transaction application [1] short_req3 Client transaction1 O 2 long_req1 Access of Object O 2 by transaction1 Server O 2 transaction2 short_req4 transaction3 long_req3 O 4 O 3 O 4 Access of Object O 4 by transaction3 transaction4 [1] M. A. Olson, Selecting and implementing an embedded database system, IEEE Computer, pp.27-34, September 2000. 24

Experimental Results (2/4) Comparison with database application example [2] RTOS1 with semaphores and spin-locks RTOS2 with SoCLC, no SW semaphores or spin-locks (clock cycles) * Without SoCLC With SoCLC Speedup Lock Latency 1200 908 1.32x Lock Delay 47264 23590 2.00x Execution Time 36.9M 29M 1.27x * Semaphores for long critical sections (CSes) and spin-locks for short CSes are used instead of SoCLC. [2] B. S. Akgul, J. Lee and V. Mooney, System-on-a-chip processor synchronization hardware unit with task preemption support, CASES 01, pp.149-157, November 2001. 25

Experimental Results (3/4) Example 2: Interactions between multiple processors and resources [3] MPC750-1 MPC750-2 MPC750-3 MPC750-4 Arbiter, Intr. controller, Clock FFT R1 MPEG R2 PCI R3 WI R4 DDU Reconfig. Logic Memory controller and memory [3] S. Morgan, Jini to the rescue, IEEE Spectrum, 37(4), pp 44-49, April 2000. 26

Experimental Results (4/4) Comparison with deadlock detection example [4] RTOS3 with a software deadlock detection module, no HW RTOS RTOS4 with SoCDDU Method of Deadlock Detection Software Algorithm SoCDDU Speedup Detection Time (clock cycles) Execution time up to deadlock detection 16928 2 61131 44205 8463x 1.38x [4] P. H. Shiu, Y. Tan and V. Mooney, A novel parallel deadlock detection algorithm and architecture, CODES 01, pp.30-36, April 2001. 27

Hardware Area Total area in SoCLC (For 64 short CS locks + 64 long CS locks) SoCDDU (For 5 Processors x 5 Resources) Semi-custom VLSI 7435 gates using TSMC 0.25µm standard cell library 364 gates using AMI 0.3µm standard cell library Xilinx XC4000E 4003EPC84 # Seq. logic 532 10 # Other gates 9036 559 28

Conclusion A framework for automatic generation of configuration files for a custom HW/SW RTOS A novel HW header file generation methodology Experimental results showing the configured systems are correct A framework used to explore the RTOS design space. Future work support for heterogeneous processors support for multiple bus systems/structures 29