signature i-1 signature i instruction j j+1 branch adjustment value "if - path" initial value signature i signature j instruction exit signature j+1

Similar documents
Chapter 4. MARIE: An Introduction to a Simple Computer. Chapter 4 Objectives. 4.1 Introduction. 4.2 CPU Basics

Module 4c: Pipelining

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

What is Pipelining? RISC remainder (our assumptions)

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

structural RTL for mov ra, rb Answer:- (Page 164) Virtualians Social Network Prepared by: Irfan Khan

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

2 MARKS Q&A 1 KNREDDY UNIT-I

Predicting the Worst-Case Execution Time of the Concurrent Execution. of Instructions and Cycle-Stealing DMA I/O Operations

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Chapter 4. The Processor

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

MARIE: An Introduction to a Simple Computer

DC57 COMPUTER ORGANIZATION JUNE 2013

Module 5 - CPU Design

Advanced Computer Architecture

Chapter 14 - Processor Structure and Function

Chapter 4. MARIE: An Introduction to a Simple Computer

There are different characteristics for exceptions. They are as follows:

CAD for VLSI 2 Pro ject - Superscalar Processor Implementation

A Tool for the Computation of Worst Case Task Execution Times

VLIW Digital Signal Processor. Michael Chang. Alison Chen. Candace Hobson. Bill Hodges

MARIE: An Introduction to a Simple Computer

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Blog -

Computer Architecture

The CPU Design Kit: An Instructional Prototyping Platform. for Teaching Processor Design. Anujan Varma, Lampros Kalampoukas

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Instruction Pipelining Review

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Class Notes. Dr.C.N.Zhang. Department of Computer Science. University of Regina. Regina, SK, Canada, S4S 0A2

CC312: Computer Organization

Chapter 9 Pipelining. Jin-Fu Li Department of Electrical Engineering National Central University Jungli, Taiwan

1. Fundamental Concepts

Programming Level A.R. Hurson Department of Computer Science Missouri University of Science & Technology Rolla, Missouri

Design of a Pipelined 32 Bit MIPS Processor with Floating Point Unit

Chapter 4. The Processor

ECE 486/586. Computer Architecture. Lecture # 12

Computer organization by G. Naveen kumar, Asst Prof, C.S.E Department 1

PIPELINE AND VECTOR PROCESSING

Chapter 4. The Processor

Appendix C: Pipelining: Basic and Intermediate Concepts

Chapter 12. CPU Structure and Function. Yonsei University

COMPUTER ORGANIZATION AND DESIGN

UNIT- 5. Chapter 12 Processor Structure and Function

MICROPROCESSOR AND MICROCONTROLLER BASED SYSTEMS

ECEC 355: Pipelining

Preventing Stalls: 1

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

ARM ARCHITECTURE. Contents at a glance:

What is Pipelining. work is done at each stage. The work is not finished until it has passed through all stages.

Chapter 4. Chapter 4 Objectives. MARIE: An Introduction to a Simple Computer

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS

EE 2700 Project 2 Microprocessor Design

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Pipelining and Vector Processing

FlexRay International Workshop. Protocol Overview

Department of Computer Science and Engineering

General purpose registers These are memory units within the CPU designed to hold temporary data.

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

Chapter 3 : Control Unit

Suggested Readings! Recap: Pipelining improves throughput! Processor comparison! Lecture 17" Short Pipelining Review! ! Readings!

COMPUTER STRUCTURE AND ORGANIZATION

THE OPTIUM MICROPROCESSOR AN FPGA-BASED IMPLEMENTATION

Chapter 4. Chapter 4 Objectives

Design and Implementation of 5 Stages Pipelined Architecture in 32 Bit RISC Processor

Chapter 8. Pipelining

MICROPROGRAMMED CONTROL

Introduction to Computers - Chapter 4

1 MALP ( ) Unit-1. (1) Draw and explain the internal architecture of 8085.

UNIT I (Two Marks Questions & Answers)

COMPUTER ORGANIZATION AND DESIGN

Department of Computer and Mathematical Sciences. Lab 4: Introduction to MARIE

Instruction Pipelining

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 12 Processor Structure and Function

Chapter 4. The Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor

The Nios II Family of Configurable Soft-core Processors

Improving the Interleaved Signature Instruction Stream Technique

ENGG3380: Computer Organization and Design Lab5: Microprogrammed Control

COMPUTER ORGANIZATION AND DESI

CPE300: Digital System Architecture and Design


William Stallings Computer Organization and Architecture

UNIT-II. Part-2: CENTRAL PROCESSING UNIT

Processor design - MIPS

Basic concepts UNIT III PIPELINING. Data hazards. Instruction hazards. Influence on instruction sets. Data path and control considerations

5008: Computer Architecture HW#2

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU

Computer Architecture 2/26/01 Lecture #

CS150 Fall 2012 Solutions to Homework 6

ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University

PROBLEMS. 7.1 Why is the Wait-for-Memory-Function-Completed step needed when reading from or writing to the main memory?

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Real-Time Component Software. slide credits: H. Kopetz, P. Puschner

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Transcription:

CONTROL FLOW MONITORING FOR A TIME-TRIGGERED COMMUNICATION CONTROLLER Thomas M. Galla 1, Michael Sprachmann 2, Andreas Steininger 1 and Christopher Temple 1 Abstract A novel control ow monitoring scheme is presented that has been tailored to the architecture of the protocol control unit of a time-triggered communication system. Within the approach a signature derived on-line for a sequence of s (\block") is checked against an embedded reference signature. The execution of the signature checks is enforced by a timeout mechanism that is based on the count between two successive checks. The approach provides a derivable Hamming distance for the signatures and a bounded error detection latency while being transparent for the protocol execution. The overhead in terms of performance penalty, memory overhead and extra hardware is low. 1. Introduction Ecient error detection is of fundamental importance in dependable computing systems. Among the numerous approaches for error detection control ow checking is a well established costecient technique. Various types of control ow monitors based on watchdog processors have been devised to ensure the integrity of control ow. [1] provides a survey of the basic approaches. Within control ow monitoring the stream is partitioned into basic blocks. A basic block consists of a sequence of consecutive s with an unique entry point and an unique exit point. In most approaches each block is assigned an identier that is determined by some of its properties, like start address and block size as in [2], or a signature of the sequence within the block [8]. A watchdog processor not only observes the correct signature (or size, respectively) of the block, but most importantly it checks the sequence in which dierent blocks are executed. For this purpose information describing allowed control ow paths must be provided. In most approaches this information is embedded in the stream, there are, however, various strategies to minimize the resulting memory and performance overhead [5, 7]. 2. System Model In the course of the Esprit OMI Project \Time-Triggered Architecture TTA" [4] a single chip communication controller has been developed for TTA. The time-triggered architecture is de- 1 Institut fur Technische Informatik, Technische Universitat Wien, Treitlstr. 3/182/1, A-1040 Wien, Austria 2 Dependable Computer Systems KEG, Mariahilferstr. 117/1/14, A-1060 Wien, Austria

signed to support a wide range of fault-tolerant distributed real-time systems for use in safety critical dependable applications, especially within the elds of automotive, aviation and railway electronics. Thus the demand for ecient error detection capabilities arises. Within the timetriggered architecture each node of a distributed system is composed of a host computer that executes the assigned share of a distributed application and of a communication controller that is responsible for the autonomous exchange of messages among the dierent nodes. In order to accomplish this task the communication controller executes a round based communication protocol. The communication controller is a key component of the time-triggered architecture. It interacts with the host computer via a shared memory holding messages and status/control information. Apart from the shared memory the communication controller contains the protocol control unit (PCU) and a set of low-level functional units. The protocol control unit executes the communication protocol while the functional units provide means for executing performance critical protocol services, like frame transmission and reception, CRC calculation, and clock synchronization. The protocol control unit is implemented as a -bit application-specic set processor. For the proposed technique we assume a set of faults, both transient and permanent, consisting of storage faults within the memory, faults in the address path from the decoder to the memory and faults in the data path from the memory to the decoder. The fault model imposes two basic requirements on the control ow monitoring. On the one hand the integrity of the basic blocks must be established and on the other hand the integrity of the execution ow must be checked. Apart from these two basic requirements an additional set of application specic requirements must be considered. The execution time penalty caused by control ow checking must be minimal and known at design time since the system is intended to serve hard real-time applications. Due to the limited size of the memory the code size overhead must be kept low. To meet the requirements of fault-tolerance it is important to have a low bound on the error detection latency, in particular the case that a fault masks all embedded check s must be covered. To meet the demands of safety critical applications the code used to ensure the integrity of the blocks should have a derivable hamming distance. 3. Principle of Operation The presented approach is based on integrating a signature calculation circuit into the processor pipeline of the protocol control unit to test the integrity of the basic blocks and to monitor the control ow between basic blocks.

3.1. Integrity Checking and Control Flow Monitoring The integrity of the basic blocks is checked by assigning signatures to each. In general the signature for an is generated by feeding the through the signature calculation circuit where it is processed along with the signature value from the previous to obtain the new signature. In this way the signature of a block is an i signature i-1 signature i i j-1 j j+1 check reference sig. initial value signature i signature j-1 signature j initial value j j+1 j+2 k k+1 k+2 branch "if - path" jump "else - path" signature j exit signature j+1 signature j signature j+2 signature k signature k+1 signature k (a) m (b) signature m-1 signature m Figure 1. Signature Checking and Adjustment incrementally generated value that protects all s within the block. The signatures are calculated by the assembler at compile time and selected signatures are then stored within the code. At run-time the the signatures are recalculated as the s are fed through the protocol control unit. A special check- triggers checking the signature. The code word subsequent to the check- contains the reference signature calculated by the assembler for the preceding block (Figure 1a). The check- resets the value of the signature to its initial value. In order to handle branches it is necessary to modify the signature according to the branch destination. This modication is performed based on an additional adjustment-value included in the program code immediately after the branch. This value is used to modify the signature so that it matches the entry signature at the destination. The adjustment value is only considered if the branch istaken, if the branch is not taken the is skipped. Figure 1b depicts the signature adjustment for an if-then-else construct. The at address j+1 is set to adjust the signature to equal signature k. This procedure is repeated for the at address k+1. In this case the signature is adjusted to equal signature m-1. For subroutines the signature of the rst is set to a specic entry signature of the subroutine and at the end of a subroutine the signature is adjusted to a specic exit signature. Each subroutine is assigned an unique entry signature in order to establish whether the program ow has continued with the correct subroutine. Similar to the assignment of entry signatures it makes sense to chose a unique exit signature for each subroutine in order to verify the correct return from the subroutine to the calling code. Figure 2 illustrates the use of the entry and

i-1 i i+1 jump subroutine signature i-1 signature i exit signature i+1 m m+1 Subroutine entry signature m signature m signature m+1 i+2 i+3 entry signature i+2 signature i+2 signature i+3 n-1 n n+1 jump return signature n-1 signature n exit signature n+1 Figure 2. Subroutine Call with Entry and Exit Signatures exit signatures for subroutines. As in the previous example the at address i+1 causes the exit signature i+1 to equal the entry signature m of the subroutine. The same procedure is used at the end of the subroutine to match the exit signature n+1 to the entry signature i+2. 3.2. Protocol Control Unit The protocol control unit is a pipelined set processor with an accumulator based arithmetic logic unit. Instructions are bit wide and are executed in a 3-stage pipeline. The protocol software is located in the memory from where the s are fetched by the \Instruction Fetch"-stage. The \Decode/Move/Branch and Check"-stage decodes the Instruction Fetch Stage Decode/Move/Branch Stage Execute Stage ProgramCounter ADD MUX +1 jump_offset jump halt read_selector write_selector register_bus alu_status INSTRUCTION MEMORY address data Instruction Register DECODE EX opcode ALU check test CRC CIRCUIT CRC ZERO? WATCHDOG COUNTER crc_error watchdog_error Signature Monitor Figure 3. Protocol Control Unit with Signature Monitor s, issues register transfers on the internal register bus, and resolves branches. According to the opcode the \Execute"-stage then executes ALU s. Due to the architecture of the pipeline, no data hazards can occur. The PCU is master of the synchronous register bus, which interconnects the PCU with the low-level functional units. The processor supports three

types of s ALU s issue ALU operations either on registers or immediate values, move s issue register transfers on the register bus, and branch and jump s control the program ow. To provide subroutine calls, the program counter can be stored and reloaded by issuing appropriate move s. 3.2.1. Signature Calculation Control ow monitoring is facilitated by adding a signature calculation circuit to the PCU (Figure 3). In contrast to MISR based approaches a CRC polynomial is used here to perform signature calculation, which enables easier determination of the code properties [3]. The size of each block is limited by the hamming distance of the CRC polynomial and the desired error detection latency. The hardware implementation supports both, signature adjustment and signature check (implying signature reset). The block signature detects errors in the memory and the related address and data paths from and to the register. Using signature adjustment for branch and jump s extends the eectiveness of the signatures beyond the block borders and thus allows to detect errors in the address calculation logic (MUX and ADD in Figure 3) as well. To support transparent and continuous operation of the pipeline the CRC calculation has to be performed within one cycle which equals one clock cycle. Traditionally CRC calculators are based on the linear feedback shift register (LFSR) approach which is well suited for sequential bit-stream processing. Within the PCU a parallel CRC calculation technique is employed that is capable of processing bits per clock cycle ([6]). Apart from the CRC calculation circuit the signature monitor contains a register for storing the intermediate signature values and a circuit for comparing the signature to the initial value, which is chosen as zero in the current implementation. Every code word that passes to the decoder is fed into the signature monitor, i.e. the CRC calculation circuit. If the decoder encounters a check it raises a check signal to the signature monitor. In addition the decoder ignores the next code word in the register, which contains the reference signature, and generates a stall cycle. The signature monitor processes the reference signature and tests for zero. A similar procedure is followed for jump and branch s. The decoder always generates a stall cycle following a branch or jump. Again this prevents processing of the. For branches taken and for jumps the stall cycle is necessary, since the target address for these s is computed in stage two of the pipeline. Thus no run-time overhead is incurred as the adjustment value is fed to the signature monitor. If the branch is not taken then the stall cycle introduces a delay of one cycle (to skip the ). In this case the signature monitor ignores the code word. Due to the nature of the time-triggered architecture the communication controller does not process any interrupts. Thus saving the signature within the signature monitor is not necessary, which simplies the design of the watchdog processor. Adding the signature calculation circuit

to the PCU increases the silicon area on the chip die by approximately 5%. Signature calculation is performed within 5ns. Since the signature calculation circuit is not on the critical path, the self-checking hardware does not inuence the processor cycle time. 3.2.2. Watchdog Counter The CRC signatures provide a means for checking the code integrity and for monitoring the control ow. However a failure could mask the opcode of the check-s and thus prevent their execution. Hence a mechanism must ensure that the check-s are executed by the protocol control unit at run-time. For this purpose a watchdog counter is added to the protocol control unit. It is set to a predened value 3 every time a check- is executed and decremented by one whenever an other than the check- is carried out. When this counter reaches zero, it is assumed that a failure has occurred, and the processor pipeline is halted. Through this mechanism a bounded error detection latency is achieved even if the execution of the check-s is prevented. 4. Summary The presented control ow monitoring scheme provides several vital properties for our faulttolerant real-time application The simple structure requires only low hardware overhead (only 5% of a small processor core) and results in an ecient implementation. An additional watchdog counter guarantees bounded error detection latency even if no more check s are executed. The use of a parallel CRC allows comparatively easy mathematical determination of the hamming distance. Runtime penalty is minimized by the placement of s into the idle slots after branch and jump s. References [1] A. Mahmood and E. J. McCluskey. Concurrent Error Detection Using Watchdog Processors { A Survey. IEEE Transactions on Computers, 37(2)0{174, Feb. 1988. [2] G. Miremadi, J. Ohlsson, M. Rimen, and J. Karlsson. Use of Time and Address Signatures for Control Flow Checking. In Dependable Computing and Fault-Tolerant Systems, volume 10, Urbana Champaign, Illinois, USA, Sep. 1995. [3] N. Saxena and E. J. McCluskey. Parallel Signature Analysis Design with Bounds on Aliasing. IEEE Transactions on Computers, 46(4)425{438, Apr. 1997. [4] C. Scheidler, G. Heiner, R. Sasse, E. Fuchs, H. Kopetz, and C. Temple. Time-Triggered Architecture (TTA). In J.-Y. Roger, B. Stanford Smith, and P. T. Kidd, editors, Proceedings of EMMSEC'97, Advances in Information Technologies The Business Challenge, pages 758{765. IOS Press, Nov. 1997. [5] M. Schuette and J. P. Shen. Processor Control Flow Monitoring Using Signatured Instruction Streams. IEEE Transactions on Computers, 36(3)264{276, Mar. 1987. [6] M. Sprachmann. Automatic Generation of Parallel CRC Circuits. IEEE Design and Test, submitted 1999. [7] K. D. Wilken and T.Kong. Concurrent Detection of Software and Hardware Data-Access Faults. IEEE Transactions on Computers, 46(4)412{424, Apr. 1997. [8] K. D. Wilken and J. P. Shen. Concurrent Error Detection using Signature Monitoring and Encryption. In Dependable Computing and Fault-Tolerant Systems, volume 4, pages 365{384, 1991. 3 Similar to the insertion of the check- this value depends on the hamming distance of the used CRC and on the desired error detection latency.