A Portable Worst-Case Execution Time Analysis Framework for Real-Time Java Architectures

Size: px

Start display at page:

Download "A Portable Worst-Case Execution Time Analysis Framework for Real-Time Java Architectures"

Tyrone Ryan
5 years ago
Views:

1 A Portable Worst-Case Execution Time Analysis Framework for Real-Time Java Architectures Doctoral Thesis By Yu-Shing Hu Real-Time Systems Research Group Department of Computer Science University of York SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY OCTOBER 2004 c Copyright by Yu-Shing Hu, 2004

3 i Abstract Real-time and embedded systems are systems that react continuously to their environment within time constraints and at a speed imposed by the environment. The success of such systems relies upon their capability of producing functionally correct results within defined timing constraints. In such systems, the role of Worst-Case Execution Time (WCET) analysis is fundamental since the WCET estimations of hard real-time threads have to be known prior to performing the schedulability analysis. This thesis is mainly concerned with describing a portable WCET analysis for realtime Java programs, where dynamic dispatching issues are addressed to be able to offer greater flexibility but without resulting in unpredictable timing behaviour. The dynamic dispatching feature is addressed by the means of introducing minimum annotations in the platform-independent analysis phase to allow the use of dynamic dispatching in hard realtime applications and this does not necessarily result in unpredictable timing analysis. Two measurement-based analysis techniques that can perform platform-dependent analysis are introduced to accommodate a diverse set of implementations on the underlying platforms and virtual machines for embedded systems. To improve the utilisation of resources of the overall system, a framework is also proposed to reclaim gain time, unused processor resources allocated for hard real-time threads. Gain time is produced when the hard real-time threads execute in less than their WCET estimations. This framework is integrated with the portable WCET analysis so that real-time Java applications can be developed without loss of predictability and performance. Hence, the Real-Time Specification for Java (RTSJ) augmented with the portable WCET analysis taking account of dynamic dispatching and a gain time reclaiming framework provides a flexible real-time Java environment to develop mission-critical real-time and embedded systems.

4 ii

5 Table of Contents Abstract Table of Contents Acknowledgements Declaration i iii xiii xv 1 Introduction Motivation Thesis Aims Organisation of The Thesis Background and Related Work Worst-Case Execution Time Analysis High-Level Timing Analysis Low-Level Timing Analysis Calculation Combining WCET Analysis with Measurement Techniques Summary Real-Time Java Real-Time Specification for Java Core Real-Time Extensions for the Java Platform Portable WCET Analysis High-Level Analysis Low-Level Analysis Calculation Gain Time Reclaiming iii

6 iv 3 A Computational Model & The WCET Analysis Framework XRTJ Environment Overview Ravenscar-Java Profile Extensible Annotation Class (XAC) Format XRTJ-Compiler Static Analysis Environment Timing Analysis The Portable WCET Analysis Framework WCET Annotations Summary Addressing Dynamic Dispatching Issues Dynamic Dispatching Issues Issues connected with Restricting Dynamic Dispatching Features Issues involved in Using Dynamic Dispatching Dynamic Dispatching with Annotations Annotation for Dynamic Dispatching Methods Annotations for Nested Scopes in Objected-Oriented Programs Annotation for Class Hierarchy Correctness of Annotations Evaluation Summary Virtual Machine Timing Models Deriving Java Virtual Machine Models Profiling-Based Approach Benchmark-Based Approach Estimating WCET bounds with VMTMs Summary A Gain Time Reclaiming Analysis Gain Time Reclaiming Structural Constraint Reclaiming Object Constraint Reclaiming Functional Constraint Reclaiming Implementation Guideline Summary

7 v 7 Prototype Implementation XRTJ-Compiler Extracting XAC files Deriving WCEF Vectors and CFG Identifying Gain Time Reclaiming Points Virtual Machine Timing Model Profiling-based Approach Benchmark-based Approach Summary Case Studies Case Study One: Attitude and Orbit Control Systems Background of AOCS Real-Time Java Project for AOCS Estimating WCET bounds of TelemetryManager Analysis with the Gain Time Reclaiming Approach Case Study Two: FIDO Rover Analysis with the Gain Time Reclaiming Approach Summary Conclusions and Future Work Summary of Contributions Possible Directions for Future Research A WCET Annotations 175 A.1 Lexical Structure A.2 Control Flow Annotations A.2.1 Labels A.2.2 Modes A.3 Upper Bound Annotation A.4 Method Invocation Annotation A.5 The use of annotations in use pow Class Bibliography 187

8 vi

9 List of Tables 2.1 Timing formulas of the ETS [75] Measurements of the WCET of the instrumenting code Measurements of the WCET of iload with the instrumenting code A VMTM derived with the benchmark-based analysis Comparing the final WCET bounds Objects overriding the writetotelemetry() method in the AOCS framework 156 A.1 Definition of nonterminals A.2 WCET annotation statements vii

10 viii

11 List of Figures 2.1 Running multi-tasks in a system Iteration constructs in Real-Time Euclid [64] An example program and flow analysis for abstract interpretation [46] Components of a WCET tool [34] Overview of Static Cache Simulation [84] Structure of atomic object [76] Structure of atomic object [74] An example of a basic-path graph [15] An example of a syntax tree [75] An example of if-then-else statement and its CFG [71] An Example of the IPET-Based approach [71] Tool chain of a measurement-based WCET analysis [94] The context of a measurement WCET method for timing program sub-paths[78] WCETAn Class A basic block model of the XRTJ environment Two execution phases of Ravenscar Virtual Machine [67] The Format of the XAC file Static Analysis Environment The Portable WCET Analysis Framework A fragment of the annotated Call ForLoop method A fragment of the XAC file for Call ForLoop method Classes for Example ix

12 x 4.2 Example Example 1 with annotations Example maxwcet(...) annotation maxwcet(...) annotation used for interfaces maxwcet(...) annotation in an application An abstract Sensor class A class hierarchy for Sensor Control Systems A fragment of the Temperature Sen Java program A fragment of the SensorController Java program An illustration of the PeriodicThread class [67] Instrumenting profiling code into an interpreter engine A block diagram of the benchmark-based approach Measuring WCET of a particular set of Java bytecode with RDTSC library Measuring the execution time of the instrumenting code Measuring the execution time of the iload with the instrumenting code Probability Distribution of the measurements with the benchmark-based analysis Cumulative Distribution of the measurements of the iload bytecode with the benchmark-based analysis The Bubble Sort Algorithm in Java Individual basic blocks with their offset numbers WCEF Vectors of the bubble sort algorithm in text mode Comparing the profiling-based and benchmark-based analyses Structural Constraint Reclaiming An example of gain time reclaiming [71] An example of object constraints Producing OGTRG from Figure Analysing gain time reclaiming during compilation Gain time reclaiming at run-time

13 xi 7.1 Tool chain of the XRTJ environment XRTJ-Compiler An example of CFG in an XAC file Instrumenting a specific set of Java bytecode Measuring WCET of the specific set of bytecodes on the target VM and generating VMTMs A typical structure of Satellite System [92] A block diagram of Satellite System [92] AOCS System [92] TelemetryManager The class hierarchy of the Telemeterable Object Telemetry Manager Class Object gain time reclaiming in the TelemetryManager Object Comparing the simulations of the estimated WCET bounds and actual WCET bounds FIDO Rover prototype and its software implementation layers [59] A class hierarchy of the Instructions Classes with WCET estimations An example of periodic thread A.1 An example of using label annotations A.2 An example of using Begin Label & End Label annotations A.3 An example of using mode annotations A.4 An example of using Use Mode() annotation A.5 An example of using Loopcount annotation

14 xii

15 Acknowledgements There are a number of people and organisations who have contributed directly and indirectly to this thesis. I would like to express my sincere gratitude to them all. In particular, I am very grateful to my supervisor Professor Andy Wellings for providing me with the opportunity to work at the University of York, and for his continuing support and interest in my work. I would like to thank Dr. Guillem Bernat for his enthusiastic discussions and invaluable advice on this work, and also for his role as the internal assessor for this doctorate programme. I would also like to thank my colleagues in the Real-Time Systems Group at the University of York for their support and informal discussions which from a vital part of the research environment, in particular, Dr. Iain Bate, Dr. Stefan Petters, Dr. Antoine Colin, Dr. George de Lima, Dr. Tse Lin and Mr. Jagun Kown. I also thank Professor Peter Puschner of the Vienna University of Technology and Dr. Jan Gustafsson of Mlardalen University for their advice and informal discussions on this research work in the early stage. Many thanks to Dr. Malcolm Wren, who helped me improve my grasp of the English language over five years, and took time from his busy schedule to proof read this thesis. I wish to acknowledge the Real-time Systems Group at the University of York and the U.K. EPSRC for providing the financial support under Grant GR/M Without this funding, this work would not have been possible. My deepest gratitude goes to my grandmother, my parents, my brother, my sisters and the rest of my family, for always supporting and believing in me despite our physical distance. Special thanks to Denise who supported me with the love and patience during the stressful months of this thesis writing. Above all, I thank God for giving me such a great opportunity to complete this work, with the help of the people mentioned above, to glorify Him. xiii

16 xiv

17 Declaration Certain parts of this report have appeared in previously published papers in which the major contributions were made by the author, specifically the following references: Erik Yu-Shing Hu, Andy Wellings and Guillem Bernat. Deriving Java Virtual Machine Timing Models for Portable Worst-Case Execution Time Analysis. On the Move to Meaningful Internet Systems 2003: Workshop on Java Technologies for Real-Time and Embedded Systems, LNCS2889, pages , Springer, November Erik Yu-Shing Hu, Andy Wellings and Guillem Bernat. Gain Time Reclaiming in High Performance Real-Time Java Systems. Proceedings of the 6th IEEE International Symposium on Object-Oriented Real-Time Distributed Computing ISORC-2003, pages , Hakodate, Hokkaido, Japan, May Erik Yu-Shing Hu, Jagun Kwon and Andy Wellings. XRTJ: An Extensible Distributed High-Integrity Real-Time Java Environment. Proceedings of the 9th International Conference on Real-Time and Embedded Computing Systems and Applications RTCSA-2003, pages , Tainan, Taiwan, February Erik Yu-Shing Hu, Andy Wellings and Guillem Bernat. A Novel Gain Time Reclaiming Framework Integrating WCET Analysis for Object-Oriented Real-Time Systems. Proceedings of the 2nd International Workshop on Worst-Case Execution Time Analysis WCET-2002, Vienna, Austria, June Erik Yu-Shing Hu, Guillem Bernat and Andy Wellings. Addressing Dynamic Dispatching Issues in WCET Analysis for Object-Oriented Hard Real-Time Systems. Proceedings of the 5th IEEE International Symposium on Object-Oriented Real-Time xv

18 xvi Distributed Computing ISORC-2002, pages , Washington D.C., USA, April Erik Yu-Shing Hu, Guillem Bernat and Andy Wellings. A Static Timing Analysis Environment Using Java Architecture for Safety Critical Real-Time Systems. Proceedings of the 7th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems WORDS-2002, pages 77 84, San Diego, California, January 2002.

19 Chapter 1 Introduction Real-time and embedded systems are timely reactive systems, which means that they react continuously to their environment within time constraints and at a speed imposed by the environment. The use of such systems in our daily life is rapidly growing and these systems are widely applied in various environments, such as consumer electronics and embedded devices, industrial automation, space shuttles, nuclear power plants and medical instruments. The number of embedded systems used in a product ranges from one to tens in consumer products and to hundreds in large professional systems. It has been estimated that the market size of embedded systems is 100 times as large as the desktop market and this will grow at least in the same order of magnitude in this decade [30]. In such systems, the design objectives and the run-time architecture are strongly influenced by the requirements of their non-functional constraints, such as timing constraints and memory constraints. For example, high-integrity applications, where failure can cause loss of life, environmental harm, or significant financial penalties, have high development and maintenance costs due to the customised nature of their components. In addition, as these systems are applied in a wide variety of applications for a diverse range of functions, not only does their design become more and more complex, but also requirements for compatibility, reusability and portability of these applications have been raised. There is a trend towards using object-oriented programming languages, for instance Java 1

20 2 CHAPTER 1. INTRODUCTION and C++, in real-time and embedded systems because using such languages has several advantages, for example reusability, data accessibility and maintainability. By means of these advantages, the use of object-oriented programming may also offer a number of additional benefits including increased flexibility in design and implementation, reduced production cost, and enhanced management of complexity. The Java technology with its significant characteristics, including a cost-effective platformindependent environment, relatively familiar linguistic semantics, and support for concurrency, has many features for developing embedded systems among other object-oriented programming languages. It also provides well-defined Remote Method Invocation (RMI) features which support distributed applications on the Java architecture. Therefore, Java has become one of the most promising programming languages and architectures for realtime and embedded systems. However, the non-deterministic behaviour of memory management, poor performance of most Java implementations, and the lack of real-time facilities have hindered the acceptance of Java in real-time and embedded applications. In order to support a predictable and expressive real-time Java environment, two major international efforts have attempted to provide real-time extensions to Java: the Real-Time Specification for Java (RTSJ) [10] and the Real-Time Core extensions to Java [23]. These specifications have addressed the issues related to using Java in a real-time context, including scheduling support, memory management issues, interaction between non-real-time and real-time Java programs, and device handling, among others. Since the RTSJ specification has been approved by the Sun Java community, it has become one of the most promising object-oriented programming languages for real-time systems in recent years. The RTSJ has attempted to accommodate the variety of underlying systems, techniques, algorithms, and mechanisms for Real-Time Java. The specification

21 1.1. MOTIVATION 3 has defined mandatory and optional requirements to be able to offer flexibility in implementation. It also addresses building larger scale real-time systems and providing dynamic real-time resource management technologies. In the RTSJ, seven enhanced areas of extended semantics for the modifications to the Java language specification and the JVM specification are given, including scheduling, memory management, synchronisation, asynchronous event handling, asynchronous transfer of control, asynchronous thread termination, and physical memory access. 1.1 Motivation The success of real-time embedded systems undoubtedly relies upon their capability of producing functionally correct results within defined timing constraints. Therefore, timing analysis is crucial to be able to guarantee that all hard real-time threads will meet their deadlines in accordance with the design. To ensure this, appropriate scheduling algorithms and schedulability analysis are required. Most scheduling algorithms assume that the Worst-Case Execution Time (WCET) estimation of each thread has to be known prior to conducting the schedulability analysis [96]. In addition, to allocate resources more precisely to the system during the design phase, safe and tight WCET bounds are needed. Therefore conducting WCET analysis of real-time threads is of vital importance in real-time systems. The purpose of static WCET analysis is to determine the maximum possible execution time of a piece of code. The goal of WCET analysis is to get a safe and tight value for the program. This means that the predicted WCET must not be less than the real WCET, whilst the difference between the predicted WCET and the real WCET should be small. Unsafe prediction may cause catastrophic results by unexpected deadline misses of tasks, and large over estimations lead to a pessimistic schedulability analysis that results in underutilisation of system resources [74].

22 4 CHAPTER 1. INTRODUCTION In order to achieve a tight estimate, both the program flow, such as loop iterations and infeasible paths, and the execution characteristics of the object code on the target system, such as instruction caches and pipelining, must be taken into account. On the whole, the WCET analysis technique may be divided into two levels: high-level analysis and low-level analysis. The role of the high-level analysis is to analyse possible program flows of the source program, without regard to the execution time for each atomic unit of flow, whereas the role of the low-level analysis is to determine the timing aspects of the hardware features. Given the high-level analysis and low-level analysis, the final WCET estimation can be calculated. A number of research approaches [18, 34, 73, 96, 103] have demonstrated how to estimate WCET estimations in various languages and architectures. Sophisticated techniques [49, 71, 84] have been used in these approaches, for instance to model caches and pipelines, to achieve safe and tight estimation. However, most approaches [74, 77, 78, 94, 111] are tied to either a particular language or target architecture. It should also be stressed that most WCET analysis approaches are only considered in relation to procedural programming languages. Performing WCET analysis on programs written in object-oriented programming languages need to take into account additional dynamic features, such as dynamic dispatching 1, dynamic loading and memory management. Arguably, these dynamic features may result in object-oriented applications being either unanalysable and/or unpredictable. Therefore, in order to use object-oriented languages in hard real-time systems, most research approaches have prohibited using these dynamic features. This thesis mainly focuses on dynamic dispatching issues. The issues related to dynamic dispatching have been considered in compiler techniques for a number of years [1, 4, 27, 28, 107]. Unfortunately, these approaches cannot be directly 1 Dynamic dispatching, also called dynamic binding, is a mechanism in which the actual method code for a method invocation of an object is selected by the actual type of the object at run-time. This cannot be determined during compilation.

23 1.1. MOTIVATION 5 applied to WCET analysis since they are solely for optimising dynamic binding and do not guarantee that all dynamic binding will be resolved before run-time. However, in WCET analysis for hard real-time applications, the execution time of every single method has to be known prior to executing it. Therefore, most approaches in the WCET analysis field have simply assumed that dynamic dispatching features should be prohibited. It is possible that these restrictions could make applications very limited and unrealistic because they might eliminate the major advantages of object-oriented programming. Notable exceptions include [8, 46, 93]. However, these do not give a satisfactory solution to the issues connected with the dynamic dispatching issues of the Java programming paradigm. For example, Persson and Hedin [93] have proposed providing a maximum time-bound for dynamic dispatching methods, but they do not mention how these maximum time-bounds can be estimated. In addition, the run-time characteristics of Java, such as high frequency of method invocation, dynamic dispatching and dynamic loading, make Java much more difficult than other object-oriented programming languages, such as C++, for conducting WCET analysis. Furthermore, RTSJ has also kept silent on how the WCET estimations can be carried out on the highly portable Java architecture. In consequence, it is unlikely to achieve Java s promise of write once, run anywhere or perhaps more appropriately for real-time write once carefully, run anywhere conditionally [10]. Moreover, in order to guarantee the deadlines of hard real-time threads, the processor and resource requirements of the hard real-time tasks have to be reserved. However, this may result in under utilisation and lead to very poor performance for aperiodic tasks. Arguably, object-oriented programming languages support more dynamic behaviour than procedural programming languages, and some of these features may result in object-oriented applications having a more pessimistic worst-case behaviour. As a result, object-oriented real-time systems may suffer from significantly lower utilisation and poorer overall performance of the whole system than procedural real-time systems.

24 6 CHAPTER 1. INTRODUCTION Therefore, performing WCET analysis on the highly portable real-time Java architectures without resulting in under utilisation of the overall system has several challenges. To be able to offer a predictable and reliable environment for embedded real-time Java applications in practice, a number of issues need to be addressed: what kind of computational model should be used in mission-critical real-time Java applications, how to estimate safe and tight WCET bounds on the highly portable real-time Java architecture, how the run-time characteristics of Java, such as high frequency of method invoking and dynamic dispatching, can be addressed, how schedulability analysis can be conducted statically for different target platforms, how to improve the utilisation and overall performance of real-time Java systems given the potential pessimism in WCET analysis of object-oriented programs, and how to construct a bridge between WCET analysis and scheduling algorithms to provide greater flexibility without loss of predictability and efficiency. 1.2 Thesis Aims This thesis mainly concerns the WCET analysis of real-time Java programs in missioncritical real-time systems, where dynamic dispatching issues are addressed to be able to offer greater flexibility but without resulting in unpredictable timing behaviour. The major aim of the research is twofold: how to estimate safe and tight WCET bounds in the real-time Java architecture while bearing portability and reusability in mind; and how to improve the utilisation and performance of real-time Java systems without rendering real-time threads unsafe. The following statement synthesises the central research proposition:

25 1.2. THESIS AIMS 7 The RTSJ augmented with: - portable WCET analysis taking account of dynamic dispatching; and - a framework for reclaiming unused resources at run-time; provides a flexible real-time Java environment to develop mission-critical real-time and embedded systems. The goal is to demonstrate that a resource reclaiming framework integrated with WCET analysis can improve the utilisation and performance of the whole system without rendering hard real-time threads unsafe. To support the proposition, the following objectives are considered: 1. to define a computational mode which can be supported in a reliable real-time Java environment, 2. to address or relax the restrictions of using dynamic behaviour in the real-time Java environment, 3. to investigate how to estimate tight and safe WCET bounds in the highly portable real-time Java environment. 4. to demonstrate how static timing analysis using the Java architecture can be carried out from portable Java class files,and 5. to improve the utilisation and performance of the whole system without rendering hard real-time threads unsafe. Achieving these goals would facilitate the development of portable and predictable realtime applications in the real-time Java architecture. The rationale behind the research is mainly based on strengthening the predictability of hard real-time tasks during the design phase and reinforcing the performance of the whole system during run-time.

26 8 CHAPTER 1. INTRODUCTION 1.3 Organisation of The Thesis In accordance with the motivations and objectives of the research, the thesis will be organised with nine chapters. Brief descriptions of the remaining chapters are given below: Chapter 2. Background and Related Work: This chapter explores the related work in the worst-case execution analysis and real-time Java fields. In particular, this chapter gives an overall view of the areas, including classical approaches related to the WCET analysis, portable WCET analysis, real-time Java specifications and techniques to reclaim allocated but unused computation time. Chapter 3. A Computational Model and The WCET Analysis Framework: The chapter defines a computational model for achieving safe and tight timing estimations in the real-time Java architecture. In line with the computational model, the portable WCET analysis framework is proposed. An introduction to major components used in the framework is also given. Additionally, this chapter also illustrates how a static timing analysis can be carried out on the real-time Java architecture. Chapter 4. Addressing Dynamic Dispatching Issues: This chapter discusses major issues connected with using and restricting dynamic dispatching in object-oriented programming languages, particularly in Java. In the light of the discussion, this chapter demonstrates how dynamic dispatching issues can be addressed. Chapter 5. Virtual Machine Timing Models: This chapter explores how virtual machine timing models can be derived from target platforms. Two approaches are discussed further and evaluated. Chapter 6. A Gain Time Reclaiming Analysis: The chapter demonstrates how to improve the utilisation and performance of the whole system by reclaiming gain time

27 1.3. ORGANISATION OF THE THESIS 9 at run-time. A gain time reclaiming framework integrated with WCET analysis is developed in this chapter. Chapter 7. Prototype Implementation: Implementation of the tools necessary to support the framework is discussed in this chapter. Chapter 8. Case Studies: The chapter mainly presents two case studies that have been carried out with the aim of evaluating the research. This chapter also discusses the advantages and disadvantages of the approach advocated by the thesis. Chapter 9. Conclusions and Future Work: The final comments about the research results are given in this chapter. Additionally, some directions for further research are also presented.

28 10 CHAPTER 1. INTRODUCTION

29 Chapter 2 Background and Related Work To be able to guarantee that all hard real-time tasks will meet their deadlines in line with design, scheduling algorithms and schedulability analyses are being used in real-time systems. On the whole, most scheduling algorithms assume that WCET estimations of each task have to be known prior to performing the schedulability analysis. Therefore, the WCET analysis is crucial in hard real-time systems. However, performing timing analysis with most general purpose programming languages may have some difficulties. These issues can be summarised as follows: Most programming languages themselves do not provide enough information in terms of execution time of the program. Optimisation during translation time makes flow analysis more difficult to predict in low-level codes. Advanced processors using more caches and pipelines have hindered the acceptance of hard real-time systems because the analysis of such architecture becomes very complex, making an exact analysis infeasible. Hence, the complexity of conducting WCET analysis may depend on the programming language, the operating system and the hardware architecture that are selected to be used 11

30 12 CHAPTER 2. BACKGROUND AND RELATED WORK in the systems. To offer a predictable and analysable computational model for real-time applications, a number of research groups have made efforts to introduce new programming languages that are purportedly designed with real-time systems in mind, such as Ada [12], PEARL [113] and Real-Time Euclid [64], or to define a subset of existing programming languages, such as MARS-C [65], Real-Time Concurrent C (RTCC) [43] and Flex [60]. Most approaches support timing constructs or annotations to express the temporal constraints to address the inadequacy of existing programming languages for real-time applications. For example, Puschner and Koza [97] have proposed a number of language constructs, such as scope and marker, to calculate the maximum execution time of real-time MARS- C programs. These timing constructs or annotations can also facilitate analysing WCET estimations of real-time applications. Over the last fifteen years, much research has been performed on timing analysis. A survey of WCET analysis techniques used for estimating the computation time of a program in the literature is presented in Section 2.1. Due to a trend towards using object-oriented programming languages to develop hard real-time applications, Java [44, 79] has become one of the most promising programming languages for embedded and real-time applications. Two recent approaches have been attempted to provide real-time extensions to Java: Real-Time Specification for Java (RTSJ) [10] ( 2.2.1) and the Real-Time Core extensions to Java [23] ( 2.2.2) to be able to address non-determinism issues and provide real-time facilities. These specifications have addressed the issues related to using Java in a real-time context, including scheduling support, memory management issues, interaction between non-real-time Java and real-time Java programs, and device management. However, none of the specifications provide a satisfactory solution to compute WCET estimations of Java applications. In general, most WCET approaches are tied to either a particular language or target architecture. Only portable WCET analysis [5, 8, 95], which presents how WCET analysis can be performed on Java class files and how portable timing annotations can be provided

31 2.1. WORST-CASE EXECUTION TIME ANALYSIS 13 with Java bytecodes, is concerned with portability for WCET analysis. This is achieved by providing platform independent analysis and platform dependent analysis. Platform independence is achieved by analysing an intermediate representation (i.e. Java class files) rich enough to capture control flow and data flow information. Platform dependent analysis is achieved by parameterising the different target platforms. There is some additional pessimism in performing the portable WCET process. For example, the pessimism of each bytecode will have a cumulative effect on the WCET of the application. However, the benefits that portability brings [5, 8, 95] outweigh the pessimism resulting from such analysis. A summary of the portable WCET analysis is given in Sections 2.3. In accordance with the estimations of WCET analysis, the processor and resource requirements of the hard real-time tasks have to be reserved. However, it is not always the case that hard real-time tasks are executing via their worst-case execution time paths. Therefore, pessimistic WCET estimations are not desirable because this may result in under utilisation and lead to very poor performance for aperiodic tasks. A summary of techniques that demonstrate how to reclaim these unused resources is given in Section Worst-Case Execution Time Analysis The purpose of WCET analysis is to determine the maximum possible execution time of a piece of code without actually executing it. To achieve a safe and tight estimate, both the program flow (such as loop iterations, infeasible paths, etc.) and the execution characteristics of the object code on the target system (for instance instruction caches, pipelining, etc.) must be taken into account. A good review of the WCET analysis which explores the achievements in WCET analysis and reports the recent advances in this field is presented by Puschner and Burns [96]. They also state the informal definition of WCET analysis as follows:

32 14 CHAPTER 2. BACKGROUND AND RELATED WORK WCET analysis computes upper bounds for the execution time of pieces of code for a given application, where the execution time of a piece of code is defined as the time it takes the processor to execute that piece of code [96]. The estimated WCET must not be smaller than the real WCET bound. However, this does not mean that the estimated WCET will always be greater than the actual execution time. In general, there is more than one task running in a real-time system. Therefore, one should notice that the estimated WCET could be smaller than the response time of the task. For example, there are two tasks running in a system. Assume that the estimated WCET for task A is 4ms and its priority is 5. The estimated WCET for task B is 3ms and its priority is 7. As shown in Figure 2.1, the actual response time for task A could be greater than the estimated WCET because it could be preempted by task B. Obviously, we can see that the response time of task A took 7ms. Therefore, an assumption of the WCET is needed. The assumption is given by Engblom et al. [36] as follows: when performing WCET analysis, it is assumed that the program execution is uninterrupted (no pre-emption or interrupts) and that there are no interfering background activities, such as direct memory access (DMA) and refresh of dynamic random access memory (DRAM). For the most part, there are two principal ways of obtaining the WCET of a program: measure execution time for certain inputs, or perform static analysis on the program. Most systems in industry have relied on measurements of execution time when designing realtime systems [77]. In general, end-to-end measuring of an execution time could be an unsafe practice, since one cannot know whether the worst case has been captured in the measurements. A number of approaches using measurements to derive the WCET can be found in the literature [9, 77, 78, 94]. In contrast, a static analysis could give relatively safer results for the WCET analysis [96]. Typically, the WCET analysis technique may be divided into two levels: high-level timing analysis and low-level timing analysis. High-level analysis performs the analysis on

33 2.1. WORST-CASE EXECUTION TIME ANALYSIS 15 process B A time executing preempted Figure 2.1: Running multi-tasks in a system the control flow of the application without considering hardware architecture whilst lowlevel analysis is focused on the execution time of basic blocks on hardware architecture. Various approaches which are related with these two levels are examined in further detail in section and Given the high-level analysis and low-level analysis, the final WCET estimate can be calculated. Three different calculation methods for the WCET analysis are discussed in further detail in section Since the WCET analysis is still active research, various state-of-the-art techniques have been carried out in the WCET research community. Furthermore, it is clear that it is impossible to cover the entire work done in this limited report. Therefore, only a range of selected approaches are given here High-Level Timing Analysis The major aim of the high-level analysis is to analyse possible program flows from the source program, without regard to the time for each atomic unit, which is also known as a basic

34 16 CHAPTER 2. BACKGROUND AND RELATED WORK block 1, of flow. This level is only concerned with the programming language issues rather than low-level issues, such as hardware architectures and operating systems. A number of techniques which address particular issues in the high-level analysis are given below. Bounding loop iterations, language restrictions and annotations Several researchers [64, 97, 18] have noticed that general purpose programming languages not only do not provide timing information, such as loop bounds, but also may lead programmers to being unable to analyse the programs, such as those possibly resulting from mutually recursive function calls. Kligerman and Stoyenko [64, 105] have argued that in order to derive the execution time of real-time tasks, language restrictions are needed. Furthermore, Puschner and Koza [97] have also pointed out that bounded loop iterations are also required because most general purpose programming languages, such as Ada, C, Java, do not provide minimum or maximum loop iterations. There are a number of approaches [64, 97, 18] which have attempted to provide annotations or constructs and gather flow information to estimate the WCET value in the literature. Three selected approaches are given below. Kligerman and Stoyenko [64] have proposed a real-time programming language which provides language restrictions, such as prohibiting the use of recursions, dynamic data structures, and goto-statements, and requiring bounded loop structures, such as time bounded loops and simple for-loops. In this approach, loop bounds are constructed as part of the language s semantics. The iteration constructs in Real-Time Euclid are given in Figure2.2. Using these restrictions and bounded loops, the time maximally spent in each loop can easily be derived. An algorithm for calculating an upper bound of the WCET of Real-Time Euclid programs is also introduced in this approach. Puschner and Koza [97] have also proposed an approach to analyse high level language 1 An atomic unit or a basic block is a continuous section of code in the sense that control flow only goes in at the first instruction and leaves through the last one.

35 2.1. WORST-CASE EXECUTION TIME ANALYSIS 17 loop nolongerthan compiletimeexpn : timeoutreason [invariant booleanexpn] declarationsandstatements end loop for [decreasing][id]: compiletimeexpn.. compiletimeexpn TimeExpn declarationsandstatements [invariant booleanexpn] end for Figure 2.2: Iteration constructs in Real-Time Euclid [64] code to compute bounds for the execution time of tasks by extending the bounded loop concept of Kligerman and Stoyenko s approach [64]. In Puschner and Koza s approach [97], language constructs have been introduced in order to let the programmers integrate knowledge about the actual behaviour of algorithms which cannot be expressed using standard programming language features. The constructs introduced in this approach are scopes, markers, and loop sequences. Markers are used to define the number of loop iterations if this number cannot be estimated from the program automatically, e.g., if a general loop is used. Nevertheless all loops are forced to have an explicitly stated upper bound [97]. Chapman et al. [18] have presented another approach for the WCET analysis which is combined with program proof in a single programming environment for the SPARK Ada subset, so-called SPATS (Static Proof and Timing Analysis). This approach has provided timing annotations including loopcount, budget, milestone, mode, dead, live, label, etc.. All annotations in this approach are introduced as comments, (i.e. is defined as comment in Ada). Introducing mode timing annotations, the SPATS method allows WCET analysis predictions which depend on the program s input data. Automatic Deriving Loop Iterations Manual annotations have been widely used in the WCET analysis field to add timing information and flow information to the source code level. However, the correctness of these

36 18 CHAPTER 2. BACKGROUND AND RELATED WORK manual annotations has also been questioned by several researchers. Park [90] has argued that manual annotations need to be validated prior to their use in the WCET analysis. Also, Healy et al. [50] have shown that manual annotations have several disadvantages, such as, the user still needs to add the assertions, and there is no guarantee that the user will specify the correct maximum and minimum loop iterations. Therefore, manual annotations may cause problems for the WCET analysis and how to validate these manual annotations is also a crucial issue in the WCET field. A number of researchers [90, 16] have tried to use assertional program logic to validate the correctness of the manual annotations. In addition, the annotations can be verified at run-time by assertions, such as pre- and post-conditions. Alternatively, some researchers have attempted to obtain the bounded loop iterations from the source code automatically. Ermedahl and Gustafsson [39] have argued that manual annotations are error prone and should be avoided. Furthermore, they have also suggested that the ability to automatically derive annotations from the source code is highly desirable. Essentially, most techniques for automatically deriving loop iteration are based on either symbolic execution [19], or abstract interpretation [24]. The approach of Liu and Gome [81] is based on the symbolic execution method, and both the Healy et al. [50] approach and the Ermedahl and Gustafsson [39] approach are based on the abstraction interpretation method. A brief summary of each approach is given below. Liu and Gomez [81] have demonstrated an automatic iteration bound analysis which is based on the symbolic execution flow analysis technique. This approach automatically derives iteration bound at the source-language level, the so-called language-based approach. First, the original program is transformed to construct a timing function, which takes the original input and primitive parameters as arguments and returns the running time. Then, using partially known input structures [101], the timing function is automatically transformed into an iteration-bound function. Following this, the symbolic evaluation and optimisations are carried out. Finally, measurements of primitive parameters are conducted.

37 2.1. WORST-CASE EXECUTION TIME ANALYSIS 19 Liu and Gomez [81] have also implemented this approach and performed a number of experiments for analysing Scheme programs. They show that the measured worst-case time values are closely bounded by the calculated bounds. However, they have not given any concrete example of their approach. The Healy et al. [50, 51] approach is based on abstract interpretation techniques and describes three complementary methods to support timing analysis by bounding the number of loop iterations. In this approach, an algorithm is presented that determines the minimum and maximum number of loop iterations with multiple exits and also detects infeasible paths. The loop-invariant variables on which the number of loop iterations depends are identified by the minimum and maximum values of the variables which are provided by the user. Furthermore, a method is also given to predict tightly the execution time of loops whose number of iterations is dependent on the counter variables of outer level loops [50]. The authors state that the method has been successfully integrated in an existing timing analyser that predicts the performance for optimised code on a machine that exploits caching and pipelining. An example program that illustrates the structure of the analysis for this approach is given in Figure 2.3. In the figure, the analysis starts at the top with the initial value of x, which is the interval [0..3]. As the while-loop cannot terminate at this point for any possible input value, the analysis continues in iterations #1. The infeasible loop termination path is indicated with a dashed line. The flow analysis extracted the useful information including safe bounds on possible values for each x value, and infeasible paths. The analysis shows that the loop may iterate one to three times. Even though several researchers [39, 81, 32] have argued that manual annotations are error-prone and have proposed various approaches to estimate bounding loop iterations automatically, these automatic methods are limited in the types of programs they can handle and manual annotations probably have to be used in some cases [36]. For example,

38 20 CHAPTER 2. BACKGROUND AND RELATED WORK x = [0..3] /* Input limits for 0 <= X <= 3 */ while (x<4) { if (x<3) x =x * 2; S1 else x = x+1; S2 if (x==1) x=x+2; else x= x+1; S3 S4 } S1 #1 S2 [0..4] [4..4] S3 S4 S3 S4 [3..3] [1..5] [5..5] [1..5] #2 merge [1..3] S1 S2 [4..5] [2..4] [4..4] S3 S4 S3 S4 [3..5] [5..5] merge [4..5] [3..5] #3 merge [3..3] [4..5] S1 S2 [4..4] S3 S4 [5..5] merge [5..5] Figure 2.3: An example program and flow analysis for abstract interpretation [46] the analysis method employed may not calculate the kind of information needed, and the automatically deduced information is not tight enough. Since, currently, these automatic approaches address only part of the problem with manual annotations, manual annotations are still needed for the WCET analysis. Compiler Integration In order to estimate the WCET as accurately as possible, the mapping from the sourcelanguage level to the machine-language level is crucial. The compilers, especially optimising compilers, make it more difficult to analyse the machine-language level since they could

39 2.1. WORST-CASE EXECUTION TIME ANALYSIS 21 change the flow information during compilation. Therefore, some approaches [16, 41, 84, 34, 111] have attempted to integrate compilation with WCET analysis, to map the time constraints and flow information between the high and low level timing analysis to get the most accurate estimations. Moreover, Engblom et al. [34] state that using compiler integration has some benefits, such as the analysis can take advantage of the compiler s information about the program, and it can handle the problems introduced by optimising compilers. Vrchoticky [111] discusses the problem of calculating accurate source-level execution time bounds for real-time programs in the presence of code-optimization transformations. His approach is integrated with the compiler and produces timing trees which are high-level language descriptions of the syntactic program structures, execution constraints implied by the user, and the execution time of the basic code constituents. However, the compiler [111] is designed for the Modula/R programming language and is limited to transformations which keep the program well-structured. Another compiler integration approach has been carried out by Engblom et al. [34], called Co-transformation. This technique presents an approach to the mapping of execution from the source code of a program to object code for the purpose of WCET analysis. In order to cope with structural changes in the program due to code optimisation, the compiler generates a transformation trace and transformation definitions during compilation time. This transformation information is coded in a special-purpose optimisation description language (ODL) that is interpreted by the WCET analyser to map the information from the source code to the re-structured object code. A basic block diagram which illustrates how WCET estimates are computed using a co-transformer is given in Figure 2.4. Kirner and Puschner [63] have presented a transformation of meta-information in parallel with code optimisation for WCET analysis, using the abstract co-interpretation approach. The major aim of the approach introduced is to formulate a correct update for the flow facts

40 22 CHAPTER 2. BACKGROUND AND RELATED WORK Timing Analysis Tool Co Transformer High Level Analysis Exec info Transformed exec info Calculator WCET Transformation trace Transformation definitions Low level info Legend: Data Source code Compiler Low level info Low Level Analysis Program Figure 2.4: Components of a WCET tool [34] of a program following the transformation of the program. To do this, firstly, the program transformation is denoted as a function. Abstract and concrete semantics of the function are then gathered to compute the meta information transformation function. Flow Analysis So far, we have shown how to provide timing information via techniques such as language restrictions, providing bounded loop iterations, and addressing compiler optimisations. In fact, these techniques are also used for gathering flow information from the program. On the whole, flow analysis is a tool for discovering properties of the run-time behaviour of a program without actually running it [83]. Control flow analysis and data flow analysis are the most commonly used techniques to analyse the program flow in the WCET field. Typically, control-flow analysis is carried out from a combination of source-code and annotations or annotated source-code. These flow analyses are used in several approaches to analyse the flow of the applications from both the source-code level or object-code level: which means from high-level analysis (source code) to low-level analysis (machine code).

41 2.1. WORST-CASE EXECUTION TIME ANALYSIS 23 In general, information extracted from the flow analysis technique is closely related with the final WCET calculation method (section 2.1.3). The flow analysis applied for a pathbased calculation method is different from the analysis applied for a tree-based calculation. The flow analysis in the path-based calculation method aims to find the path of a program with the longest execution time. On the other hand, the analysis of the tree-based calculation method aims to compute the final WCET by bottom-up traversal of the tree which represents a program. Further details of the flow analyses combined with the calculation methods are discussed in section Low-Level Timing Analysis The goal of low-level analysis is to determine the execution time for each basic block in the program. Therefore, it is mainly concerned with processor architectural issues, such as instruction cache, data cache, multilevel cache, pipelining and branch predictions. Improvements in computer architecture have led to more advanced caching and pipelining techniques being applied in most modern processors. As a result, applications or systems using these advanced processors have become more unpredictable. In order to address these issues, the WCET community have put more and more effort into the low-level timing analysis in recent years. For the most part, low-level timing analysis may be classified into two categories: global effect analysis and local effect analysis [31, 38]. These analyses are detailed below. Global Effect Analysis Global effect analysis is an approach to considering the execution time effects of machine features that may reach across the entire program, e.g. instruction caches [73, 85, 86], data caches [61, 115], branch predictors [21, 82], and so on. Essentially, cache memories have often been disabled for hard real-time systems to provide sufficient predictability for

42 24 CHAPTER 2. BACKGROUND AND RELATED WORK scheduling analysis [84]. In order to address the pessimistic WCET in term of cache memory issues, a number of research approaches [84, 76, 71, 89] have been proposed. Mueller [84] has proposed a technique, called Static Cache Simulation (SCS), to predict statically which instructions will be in the instruction cache during execution. In this approach, an algorithm is presented to calculate the information required for instruction categorisation within each function. The instructions are classified as always-hit, alwaysmiss, first-miss and first-hit, by analysing the call graph and the control flow for each function. Based on the information provided by the compiler, the call graph of the program and the control-flow graph of each function are constructed by the Static Cache Simulator. Then, based on a given cache configuration, the cache behaviour is simulated. Mueller states that the method of SCS is statically to predict the behaviour of a large number of instruction cache references for a given program or task with a specific cache configuration. This approach has mainly addressed instruction caches. A basic diagram for the Static Cache Simulation is given in Figure 2.5. source files assembly compiler assembler object files linker files executable program source level debugger control flow information cache configuration static cache simulator code instrumentation cache analysis library routines cache prediction user requests timing analyzer timing prediction Figure 2.5: Overview of Static Cache Simulation [84] Mueller has extended his original approach [84] to address the set-associated cache issue and multi-level cache issues [85, 86]. Also, the approach [84] has been integrated with a

43 2.1. WORST-CASE EXECUTION TIME ANALYSIS 25 pipelining technique by Healy et al. [49]. Alternatively, Lim et al. [76] have proposed extensions to Shaw s Timing Schema [103, 91]. This approach introduces a WCTA (Worst Case Timing Abstraction), which contains detailed timing information about every execution path that might be the worst case execution path of the program construct. This extension leads to a revised timing schema and adds the concatenation and pruning operations on WCTAs which are newly defined to replace the add and max operations on time-bounds in the original Timing Schema [103]. They have proposed a structure of an atomic object, which consists of two sets of instructions references, to provide cache information in the WCTA as Figure 2.6. In this approach, they also provide an extension of the proposed technique to address the data cache issue. Struct timing_information { block_address first_reference[n block ]; block_address last_reference[n block ]; time t execution ; } Figure 2.6: Structure of atomic object [76] Others approaches [41, 72, 89, 61, 114, 115] which also address the cache memory issues, including data caches and branch predictors, may be found in the literature. Local Effect Analysis Pipelined processors are widely used in most systems nowadays. However, this technique may make the systems unpredictable. Some researchers have suggested developers should not use processors which provide pipelined techniques in hard real-time systems. Alternatively, they ignore the pipeline effect on the systems. It is possible that this approach also makes the WCET analysis pessimistic. Therefore, several researchers have tried to improve

44 26 CHAPTER 2. BACKGROUND AND RELATED WORK the WCET analysis to take into account pipelining analysis. On the whole, the local effect analysis handles machine timing effects that depend on a single instruction and its immediate neighbours [36]. Pipeline analysis is performed in two stages; each block is analysed once to determine how the instructions exercise the pipeline, and then for a particular path the details gathered are concatenated to give an overall result for the path s processing time with the pipeline. Zhang et al. [117] have proposed a pipeline timing analysis technique which can mechanically calculate the WCETs of programs. Their analysis technique is based on a mathematical model of the pipelined Intel 80C188 processor. This model takes into account the overlap between instruction execution and opcode perfecting. In their approach, the WCET of each basic block in a program is individually calculated based on the mathematical model. Healy et al. [49] have demonstrated another approach for bounding the worst-case performance of large code segments on machines that exploit both pipelining and instruction caching. First, based on the Static Cache Simulation approach [84], the cache analysis is carried out. Then, the cache categorisations are used in the pipeline analysis of a sequence of instructions representing paths with the program. A timing analyser uses the pipeline path analysis to estimate the worst-case execution performance of each loop and function in the program. This approach also has a graphical user interface that allows a user to request timing predictions on portions of the programs [49]. In order to analyse the timing effects of RISC s pipelined execution and cache memory, Lim et al. [74] have extended their previous approach [76]. The extended timing schema (ETS) accurately accounts for the timing effects of pipelined execution and cache memory not only within but also across program constructs [74]. In this approach, the timing interactions among instructions within a basic block is analysed by building its reservation table. In the reservation table, not only are the conflicts in the use of pipeline stages recorded but also data dependencies among instructions are considered [74]. This approach

45 2.1. WORST-CASE EXECUTION TIME ANALYSIS 27 extends the WCTA (Worst-Case Timing Abstraction) to provide pipeline information in the structure of an element in a WCTA (Figure 2.7). Struct timing_information { time t max ; reservation_table head[d head ]; reservation_table tail[d tails ]; block_address first_reference[n block ]; block_address last_reference[n block ]; } Figure 2.7: Structure of atomic object [74] Calculation The aim of the calculation phase of WCET analysis is to calculate the final WCET estimate for the program given the high and low-level analysis results [35]. The calculation methods proposed in the literature may be classified into three approaches: the path-based method, the tree-based method, and the IPET-based method (Implicit Path Enumeration Techniques). Further discussion of each method is given below. Path-Based Approach The path-based approach calculates the final WCET estimate by calculating time for different paths in a program, searching for the path with the longest execution time. A number of WCET analysis projects [16, 114, 104] have used this method to calculate the WCET bound in their approach. One of these projects is discussed below. Based on the path-based method, Chapman et al. [17, 18] have demonstrated a SPATS (SPARK Proof and Timing System) which integrates both classical program proof and WCET analysis through a program s basic-path control-flow graph [15]. They have also introduced a Graph-To-WCET (GTW) algorithm to find the longest path of the whole program, and then, using the longest path, the WCET is calculated. The algorithm transforms

46 28 CHAPTER 2. BACKGROUND AND RELATED WORK a b e c f d Figure 2.8: An example of a basic-path graph [15] a cyclic basic-path graph into a path-expression that can be evaluated for worst-case timing assuming suitable bounds on each loop are available. A brief diagram of the basic-path graph is given in Figure 2.8 and the calculation WCET operators of the diagram is given below. The graph is transformed to the regular expression: (a.f) (e) (b.c.d) This can be applied to calculate the WCET by the substitution of operators as below: W CET = max(( wcet(a) + wcet(f)), wcet(e), (wcet(b) + wcet(c) + wcet(d))) In general, this approach can take into account caches and branch predictions issues in a straightforward way and can report the worst-case path. However, the method suffers when programs have a large number of possible paths, since each path has to be explicitly considered [36]. Tree-Based Approach The tree-based method calculates the final WCET estimate by a bottom-up traversal of a tree representing the program [75]. A number of approaches [103, 91, 75, 97, 20, 9] have used

47 2.1. WORST-CASE EXECUTION TIME ANALYSIS 29 this method to calculate the final WCET estimate. This section presents further details of Lim et al. s [75] approach as follows. The high-level and low-level analyses which are applied by Lim et al. [76, 74] have been introduced in the previous sections. In their approach, the program can be transformed into a syntax tree as illustrated in Figure 2.9. start S1 S1 while (exp1) { S2 if (exp2) S3 else S4 } end exp1 while S2 if exp2 branch S3 S4 Figure 2.9: An example of a syntax tree [75] Then, a set of timing formulas (Table 2.1) of the ETS (Extended Timing Schema) are applied to the syntax tree representation in a bottom-up fashion [75]. The timing formulas are defined based on those of the original Timing Schema [103], but revised to accommodate the timing variations due to advanced architectural features such as pipeline and caching (section 2.1.2). Statement S : S 1 ; S 2 Timing Formula W (S) = W (S1) p W (S2) S : if(exp) then S 1 else S 2 W (S) = (W (exp) p W (S 1)) (W (exp) p W (S 2)) S : while (exp) S 1 W (S) = ( N pi=1 W (exp) p W (S 1)) p W (exp) S : f(exp 1,..., exp n ) W (S) = W (exp 1 ) p... p W (exp n) p W (f()) Table 2.1: Timing formulas of the ETS [75]

48 30 CHAPTER 2. BACKGROUND AND RELATED WORK The tree-based approach has been extended by Colin and Bernat to consider the possibility of giving symbolic information on the relative execution frequency of sub-branches [20]. This approach is conceptually simple and computationally inexpensive. However, it suffers from handling unstructured and optimised code since it makes the syntax-tree and transformation rules hard to construct [38]. IPET-Based Approach The IPET-based (Implicit Path Enumeration Technique) approach uses arithmetic constraints to model the program flow and iterations of basic blocks. In the IPET-based approach, the final WCET estimate is computed by maximising a goal function that ties two constraints together [37]: program structural constraints and program functionality constraints. Program structural constraints represent the constraints related to the control flow of a program, whereas program functionality constraints express the constraints involved in loop bounds, path information and function calls. A number of WCET research groups [71, 89, 98, 62] have carried out WCET analysis based on the IPET-based approach. There are two dominant solution methods used for the IPET approach: Integer Linear Programming (ILP) as in [71, 98] and Constraint Satisfaction Problems (CSP) [89]. Solving these constraints, the final WCET estimation can be calculated. For example, a program contains N basic blocks (B i ). Each basic block B i of the program takes a constant time C i to execute. Let X i be the execution count of the basic block. Then, the WCET estimate is generated by maximising the sum of the products of the execution counts and execution time: W CET = N i=1 C i X i One of the IPET approaches using ILP-solver, which is proposed by Li and Malik [71], is given below. In this approach [71], the WCET analysis may be divided into three stages:

49 2.1. WORST-CASE EXECUTION TIME ANALYSIS 31 program structural constraints, program functionality constraints, and solving constraints. Program Structural Constraints In general, the program structural constraints can be automatically extracted from the control-flow graph (CFG) [71]. Therefore, the constraints can be deduced from the CFG as follows: at each node, the execution count of the basic block must be equal to both the sum of the control flow going into it, and the sum of the control flow going out from it. As shown in Figure 2.10 the following constraints can be produced from the CFG for a simple if then else statement. d 1 B 1 if (p) ; X 1 d 2 d 3 B 2 q = 1 ; B 3 X 2 q = 2 ; X 3 d 4 d 5 if (p) q=1; else q=2; r=q; X 1 = d 1 = d 2 + d 3 X 2 = d 2 = d 4 X 3 = d 3 = d 5 X 4 = d 4 + d 5 = d 6 B 4 r = q ; X 4 d 6 Figure 2.10: An example of if-then-else statement and its CFG [71] Program Functionality Constraints The functionality constraints are used to denote loop bounds and other path information that depend on the functionality of the program, such as estimating execution iterations, function calls and inclusive paths. In the example program for the Check data() function

50 32 CHAPTER 2. BACKGROUND AND RELATED WORK x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 Check_data() { int i,morecheck,wronggone; morecheck=1;i=0;wrongone= 1; while (morecheck) { if (data[i] < 0) { wrongone=i; morecheck=0; } else if (++i >= DATASIZE) morecheck =0; } if (wrongone >=0) return 0; else return 1; } Figure 2.11: An Example of the IPET-Based approach [71] (Figure 2.11), the following functionality constraints can be produced. In this example, the maximum bound for the while-loop is needed. We can see that the maximum loop count for the while-loop is indirectly related with the DAT ASIZE. Therefore, given the maximum value for the DAT ASIZE is 10, we can produce the following two constraints. For this example, these constraints state that each time the while-loop is entered, the loop body will be executed any number of counts between 1 to 10. These semantics are valid over an arbitrary number of instances of the function. 1X 1 X 2 X 2 10X 1 Following this, the user can provide additional information to tighten the bound of the estimated program execution time. From the source code, we see that the X 3 and X 5 are mutually exclusive and each of them is executed at most once. Besides, we can see that

51 2.1. WORST-CASE EXECUTION TIME ANALYSIS 33 the X 3 and X 8 are always executed together. Thus, we can get two further functionality constraints from the source code as follows. (X 3 = 0 & X 5 = 1) (X 3 = 1 & X 5 = 0) X 3 = X 8 The symbol & represents conjunction whilst represents disjunction. The first of the two constraints is a set of constraint sets and the conditions for at least one of the constraint set members must be met. These constraints are valid for exactly one instance of the function. Solving Constraints In this stage, each set of the functionality constraint sets is combined with the set of structural constraints. Based on the example (Figure 2.11), two sets of constraints are produced. Set1 Set2 X 1 X 2 0 X 1 X X 1 X X 1 X 2 0 X 3 = 0 X 3 = 1 X 5 = 1 X 5 = 0 X 3 X 8 = 0 X 3 X 8 = 0 Then, this combined constraint set is passed to the ILP solver. The ILP solver produces the basic block counts (X i values) that result in this maximum value. The analysis methods presented so far can only solve high-level analysis. In order to take into account the low-level issues the IPET-based approach needs to be extended. In [72], Li et al. extend their previous approach [71] to provide low-level analysis. Therefore, the equation for the WCET can be modified as below. W CET = N n i i j (C hit i.j Xhit i.j + C miss i.j X miss i.j )

52 34 CHAPTER 2. BACKGROUND AND RELATED WORK The major contribution of the IPET-method is that the analysis does not explicitly enumerate program paths. The IPET approach can handle any programs, which means that it does not require a restricted program structure [71, 36]. The main disadvantage of the IPET approach is that it cannot report which execution paths are involved in the maximum (or minimum) execution time of the program. What is more, Engblom et al. [36] argue that a possible disadvantage of the IPET approach is that, since the problem has been abstracted, it is rather hard to develop algorithm heuristics that use knowledge of the problem domain. This IPET-method is also very computationally expensive and its computation cost will increase in direct relation to the complexity of the program analysed. As a result, it is more difficult to include low-level analysis, such as cache or branch prediction Combining WCET Analysis with Measurement Techniques Modern processors are tending to become more and more complex and use sophisticated pipelines and caches to accelerate the execution of applications. As a result, analysing these modern processors to model the timing properties of these architectures becomes increasingly difficult. These advanced features offered by modern architectures may also lead to the results of static analysis becoming very pessimistic. To address these issues, a number of research approaches [9, 77, 94] have attempted to integrate the static analysis with measurement techniques. Petters and Farber [94] have proposed a measurement-based WCET analysis which is integrated with compiler techniques to be able to analyse the WCET bounds of C programs without modelling the target hardware architectures. The tool chain of this approach is shown in Figure An optimised control flow graph is extracted during compilation. Based on the control flow graph, infeasible paths are eliminated on the assembler code, which is translated from C during compilation. From the figure, we can observe that based on the predefined instrumentation information, the instrumentation is performed in several

53 2.1. WORST-CASE EXECUTION TIME ANALYSIS 35 C code Processor Info compilation Control Flow Graph Assembler Code Analysis and CFG Reduct Instrument 1st Stage Instrument. Info Instrumented Assembler Assembly and Linking Object Code Instrument 2nd Stage Measurement Executable Deinstrumentation Final Executable Figure 2.12: Tool chain of a measurement-based WCET analysis [94] stages. The first step is to insert a procedure calls sttrace(id) into the assembler code. The sttrace(id) procedure, which includes a cache flushing instruction to guarantee a WCET bound, is tagged with a unique id to correlate the time stamps with the code. Following this, object code is generated from the instrumented assembler and linker. Then, the object code is manipulated to execute the path prescribed in the instrumention database. After this, the code is ready to be measured to derive the execution time of the code. After all the measurements are accomplished, the instrumented object code is deinstrumented by substituting the instrumented code with nops to ensure the memory and data are the same

54 36 CHAPTER 2. BACKGROUND AND RELATED WORK as the measurement of the executable code. Although this approach can achieve relatively tight WCET measurements, the approach needs to be provided with the instrumention database. It could also be difficult to apply this approach on a large program since the the size of paths to be measured could be increased with the possible paths of the programs. Control flow graph Timing of program sub paths Software to analyse High Level Analysis WCET calculation WCET estimate Flow information (iteration counts, infeasible paths, etc.) Figure 2.13: The context of a measurement WCET method for timing program subpaths[78] Lindgren et al. [78] have proposed a measurement-based low-level analysis approach to derive the execution time of each basic block with control flow information. In the paper, the control flow information is assumed to have been done by a separate flow analysis. To perform this method, a set of linearly independent paths has to be identified. Based on this information, the target software is instrumented in order to determine the independent paths visited during the execution of the program. The instrumented code is then executed and the execution time is measured, while checking the sub-paths that have been visited. This is repeated with different input data to the instrumented code, until a sufficient number of paths have been traversed. An equation is established with linearly independent equations representing the visited sub-paths as unknowns and the measured execution time of the target program for these paths. Solving the equation will then provide estimations of execution time that can be used in the calculation of the final WCET value.

55 2.1. WORST-CASE EXECUTION TIME ANALYSIS 37 They have also demonstrated how pipeline effects can be taken into account while measuring sub-paths of the program. However, they assumed that the WCET bound of each basic block are input data independent. As a result, the cache issues cannot be considered in this method. Bernat et al. [9] have proposed the concept of a probabilistic hard real-time system as a system which has to meet all the deadlines but for which a high probabilistic guarantee suffices. In the light of this notion, a combination of measurement and the analytical WCET approach is introduced to be able to compute the probabilistic bounds of the execution time of the worst case path of the section of code. The approach has also introduced the concept of the execution time profiles (ETPs) to provide for the variability of the execution time of a basic block. Based on the ETPs of basic blocks, three mathematical operators to calculate the WCET of the program are proposed. These three operators can be used in different situations: ETPs are independent, ETPs are dependent and dependencies between ETPs are unknown. Although this approach is very promising, the analysis is carried out on applications using procedural languages and the use of this approach in object-oriented programs is still open Summary This section has surveyed a number of static WCET analysis techniques, including highlevel timing analysis, low-level timing analysis, and calculation methods. It can be observed that in order to achieve a safe and tight WCET estimation for a real-time program, both the source-code level and machine-code level of the program need to be considered. As mentioned before, the high-level timing analysis is mainly concerned with analysing possible program flows from the source program. Due to the lack of timing information in general-purpose programming languages, most approaches [64, 97, 18] have proposed bounding loop iterations and language restrictions. These timing annotations or constructs

56 38 CHAPTER 2. BACKGROUND AND RELATED WORK make real-time applications more predictable and analysable. On the other hand, some researchers have argued that these manual annotations or constructs are error-prone and should be avoided. Therefore, a number of research groups [81, 39, 50] have attempted to obtain the information on maximum number of loop iterations from the source code automatically. Unfortunately, these automatic methods are still limited in the types of program they can handle and manual annotations probably have to be used in some cases [36]. A number of research groups [117, 84, 76, 74, 49, 36] have proposed various low-level timing analysis techniques in order to take into account caches and pipeline issues. In general, the purpose of low-level analysis is to estimate the execution time for either each basic block or a particular path in which they are concatenated basic blocks in the program. Integrating the high-level and low-level timing analysis, the safer and tighter WCET value can be estimated. Due to the increasing complexity of modern architecture, many researchers are now looking into using a measurement-based approach in low-level analysis. Three such techniques [9, 78, 94] have been reviewed in this chapter. Moreover, in order to fill the gap between high-level and low-level timing analyses, compiler integration techniques have been proposed [16, 41, 84, 34, 111]. In general, these approaches have presented how to map the time constraints and flow information between the high-level and low-level timing analysis to achieve the most accurate estimations. Given the high-level and low-level timing analysis results, the final WCET value for the program can be calculated. In general, three major calculation methods, path-based methods [16, 114, 104], tree-based methods [103, 91, 75, 97], and IPET-based methods [71, 89, 98], can be found in the literature.

57 2.2. REAL-TIME JAVA Real-Time Java Java, developed at Sun Microsystems under the guidance of Gosling and Joy [44], is designed to be a machine-independent programming language that is both safe enough to traverse networks and powerful enough to replace native executable code [88]. It was originally called Oak, and designed for use in embedded consumer-electronic devices such as cellular phones and Personal Digital Assistants (PDA). In fact, Java is much more than just a programming language. It is built with four distinct components: the Java programming language, the Java Application Programming Interface (API), the Java bytecodes and the Java Virtual Machine (JVM). The Java technology with its significant features, including object-oriented, platform independence, dynamic linking and dynamic loading, strong notions of safety and security, and simplified object model, has many advantages for developing real-time and embedded systems. However, the non-deterministic behaviour of memory management, poor performance of most Java implementations, and the lack of real-time facilities have hindered the acceptance of Java in real-time and embedded applications. There have been two recent proposals for Real-Time Java: Real-Time Specification for Java (RTSJ) [10], and the Real-Time Core Extensions [23]. The former was proposed by the Real-Time for Java Expert Group (RTJEG) led by Greg Bollella, and the latter was proposed by the J-Consortium led by Kelvin Nilsen [23]. An overview of both specifications is given in the following subsections Real-Time Specification for Java The Real-Time Specification for Java (RTSJ) proposed by the Real-Time for Java Expert Group (RTJEG) has been approved by the Sun Java community for the implementation of Real-Time Java. For the most part, the RTSJ specification is an almost complete specification to provide a Java environment for real-time or embedded developers. The RTSJ

58 40 CHAPTER 2. BACKGROUND AND RELATED WORK accommodates the variation of underlying systems, techniques, algorithms, and mechanisms for Real-Time Java. The RTSJ specification includes mandatory and optional requirements for accommodating the real-time technologies and the Java technologies. It also addresses building larger scale real-time systems and providing dynamic real-time resource management technologies. Seven enhanced areas of extended semantics for the modifications to the Java language specification and the JVM specification are given in the RTSJ specifications. A summary of each area is given as follows. Scheduling The RTSJ introduces the concept of a schedulable object, any instance of any class implementing the interface Schedulable. Its scheduling and dispatching are managed by the instance of a Scheduler to which it holds a reference. The RTSJ requires four classes that are schedulable objects; RealtimeThread, NoHeapRealtimeThread, AsyncEventHandler, and BoundAsyncEventHandler. A schedulable object has a scheduling parameters object (SchedulingParameters) that contains the priority of the thread. The object may also provide other parameters, such as importance value, for a particular scheduling algorithm. Each schedulable object is also associated with a dispatching parameters object (ReleaseParameters) that includes cost, deadline, and two asynchronous event handlers. In the RTSJ, periodic threads, aperiodic threads, and sporadic threads are classified by the characteristics of their dispatching parameters. These objects extend from the object ReleaseParameters, such as PeriodicParameters object, AperiodicParameters object, and SporadicParameters object. The RTSJ allows a feasibility analysis that determines if a schedule has an acceptable value for the metric that measures how well the system is meeting the temporal constraints. The assignment of priorities in the RTSJ s base scheduler is under programmer control. The

59 2.2. REAL-TIME JAVA 41 initial default and required scheduling algorithm is fixed-priority preemptive with at least 28 unique priority levels and is represented in all implementations by the PriorityScheduler subclass of Scheduler. Memory management The RTSJ provides three extensions to the memory model to support memory management in a manner that does not interfere with the ability of real-time code to provide deterministic temporal behaviour. This goal is achieved by allowing the allocation of objects outside of the garbage-collected heap for both short-lived and long-lived objects. The RTSJ introduces four basic types of memory area: Scoped Memory: This provides a mechanism for dealing with a class of objects that have a lifetime defined by a syntactic scope. Physical and Raw Memory: This allows objects to be created within specific physical memory regions that have particular important characteristics, such as memory that has substantially faster access. Immortal Memory: This represents an area of memory containing objects that, once allocated, exist until the end of the application. Heap Memory: This represents an area of memory that is the same as the heap of the current Java language. In order to provide deterministic information from the GarbageCollector object, the RTSJ offers a GarbageCollector class, which offers methods for deriving information about the behaviour of the garbage collector, such as getoverhead(), getreclamationrate(), and getpreemptionlatency(). However, the memory model is relatively complex and it mainly relies on dynamic analysis approaches.

60 42 CHAPTER 2. BACKGROUND AND RELATED WORK Synchronisation The specification provides the ability to set the monitor control policy and three wait-free queues for synchronisation. In fact, the monitor control policy is extensible. Therefore, new mechanisms can be added by future implementations. The RTSJ also provides two policies, i.e. PriorityCeilingEmulation and PriorityInheritance, to avoid priority inversion among real-time threads during Java synchronisation. The priority inheritance protocol must be implemented by default. The RTSJ provides three wait-free queue classes to provide protected, non-blocking, shared access to objects accessed by both regular Java threads and NoHeapRealtimeThreads. These classes are provided explicitly to enable communication between the real-time execution of NoHeapRealtimeThreads and regular Java threads. Asynchronous Event Handling The asynchronous event facility comprises two classes: AsyncEvent and AsyncEventHandler. The RTSJ also provides a Clock class and a Timer class to handle time related asynchronous events. An instance of AsyncEventHandler is very similar to a thread and it is also a Runnable object. The difference between an AsyncEventHandler and a simple Runnable is that an AsyncEventHandler has associated instances of ReleaseParameters, SchedulingParameters and MemoryParameters. These parameter objects control the actual execution of the handler once the associated AsyncEvent is fired. A specialised form of an AsyncEvent is the Timer class, which represents an event whose occurrence is driven by time. Timers are driven by Clock objects. There is a special Clock object, Clock.getRealtimeClock(), that represents the real-time clock. The Clock class may be extended to represent other clocks the underlying system might make available.

61 2.2. REAL-TIME JAVA 43 Asynchronous Transfer of Control The Asynchronous Transfer of Control (ATC) model is provided with Asynchronously- InterruptedException class and Interruptible interface. These facilities build on the thread interruption and the throwing and handling of asynchronous exceptions mechanisms. Asynchronous Thread Termination The RTSJ accommodates safe asynchronous thread termination through a combination of the asynchronous event handling and the asynchronous transfer of control mechanisms [10]. Physical Memory Access The RawMemoryAccess class allows the programmer to construct an object that represents a range of physical addresses and then access the physical memory with byte, short, int, long, float, and double granularity. Device drivers, memory-mapped I/O, flash memory, battery-backed RAM, and similar low-level software can be be implemented with the RawMemoryAccess. In addition, the RTSJ provides ScopedPhysicalMemory and Immortal- PhysicalMemory classes to create objects that represent a range of physical memory addresses and in which Java objects can be located Core Real-Time Extensions for the Java Platform The Real-Time Core Extension (RTCore) [23] proposed by J-Consortium consists of two separate APIs: the Baseline Java API for traditional Java threads, and the Real-Time Core API for real-time tasks. The RTCore introduces a Real-Time Core domain, including stylised Core source file, Core class file format, Core verifier, Core static and dynamic execution environment and Core libraries, to develop real-time Java tasks. The RTCore specification also proposes the concept of Execution-Time Analysable Code, an analysable

62 44 CHAPTER 2. BACKGROUND AND RELATED WORK and predictable code structure. In addition, the Core Verifier has the capability of determining particular bodies of Core code to estimate their worst-case execution times (WCET). A summary of the major features connected with the Real-Time applications is reviewed below. Scheduling The CoreTask class which extends CoreObject is provided to create real-time tasks in the RTCore specification. The work() method, which has to be provided with an implementation in the Core task, is similar to the run() method in the traditional Java thread. The scheduling is priority-based and preemptable with 128 priority levels. The highest priority levels are provided for interrupt handlers. Memory Management A separated heap for real-time tasks is supported in the RTCore. The AllocationContext class extends CoreTask and supports the release() method, which makes all objects allocated within a context eligible by the garbage collector. Every task has an associated AllocationContext object. The RTCore assumes that memory sharing among real-time tasks and traditional thread must be supported by the underlying RTOS. Synchronisation The RTCore specification provides the ability to set the monitor control policy to support synchronisation between tasks, threads and the native domain. This is achieved by introducing three main classes: SignalingSemaphore, CountingSemaphore, and Mutex. The main difference between CountingSemaphore and SignalingSemaphore is that SignalingSemaphore is not buffer when the semaphore s count is increased. The priority inheritance protocol is provided by default. However, objects can use the priority ceiling protocol by

63 2.3. PORTABLE WCET ANALYSIS 45 implementing the PCP interface. Asynchronous Event Handling In the RTCore specification, to support asynchronous events, the body of each interrupt handler must be execution-time analyzable and blocking inside an interrupt handler is excluded. To avoid abortion of the interrupt handlers, the body of code of the interrupt handlers must comprise the atomic-synchronized context. ISR Task class is provided to implement interrupt service routines. The associated work() logic of a ISR Task can be triggered by physical or software interrupts. ATCEventHandler and ATCEvent are also introduced to handle asynchronous transfer of control event handlers. Physical Memory Access To be able to transfer data into and out of physical device ports, IOPort and ISR Task classes are provided. This provides the ability to perform accessing memory, I/O ports and memory-mapped I/O channels. An ISR Task object can be associated with a hardware interrupt. 2.3 Portable WCET Analysis In general, most WCET approaches are tied to either a particular language or target architecture. Unfortunately, these approaches are not appropriate for the Java architecture, since Java programs are write once, run everywhere. A portable WCET analysis approach based on the Java architecture has been proposed by Bernat et al. [8]. It has been extended by Bate et al. [5] to address low-level analysis issues. The portable WCET analysis has attempted to maintain the portability of Java in WCET analysis by proposing a timing analysis scheme based on Java bytecode. The analysis uses a three-step approach: highlevel analysis, low-level analysis and calculation. The analysis assumes that the programmer

64 46 CHAPTER 2. BACKGROUND AND RELATED WORK has already compiled the Java source into a class file with a Java compiler High-Level Analysis The first step is the high-level analysis in which annotated Java class files are analysed. Annotations are expressed by calls to a predefined static class WCETAn [8] and portable WCET information is computed in the form of so-called Worst-Case Execution Frequency (WCEF) vectors. WCEF vectors [5] represent execution-frequency information about basic blocks and more complex code structures that have been collapsed during the first part of the analysis. The WCEF vectors returned by the first analysis step are stored back into class files as code attributes. The class files are then ready for distribution to target Java virtual machines on which the real-time code is to run. Portable WCET Annotations WCET annotations are introduced as a predefined class (the WCETAn class). The class consists of a set of procedure calls that encapsulate the annotations of the code. The type of information that is required for annotation is mainly code block boundaries, maximum loop iterations, modes, bounds on execution time of method invocations. They are classified into three groups: creating tags, naming block of code, and asserting properties. The annotation defined in [8] is given in Figure Although these annotations are expressive enough to denote procedure programming languages, they cannot cope with the dynamic behaviour provided in object-oriented programming languages, such as dynamic dispatching. How dynamic dispatching issues can be addressed in the WCET analysis is presented in Chapter 4.

65 2.3. PORTABLE WCET ANALYSIS 47 1 public class WCETAn { 2 //WCET t a g s 3 s t a t i c class Mode { } ; // method modes 4 s t a t i c class Label { } ; // s e c t i o n o f code 5 6 //Naming t a g s 7 s t a t i c void Define Mode (Mode m) { } ; 8 s t a t i c void Use Mode (Mode m) { } ; 9 s t a t i c void Identify Code ( Label l ) { } ; // A s s e r t i o n s 12 s t a t i c void Loopcount ( int n ) { } ; 13 s t a t i c void Loopcount ( int n, Mode m) { } ; 14 s t a t i c void Dead Path (Mode m, Label l ) { } ; 15 s t a t i c void Begin WCET ( Label l ) { } ; 16 s t a t i c void End WCET ( Label l ) {} } 19 } Figure 2.14: WCETAn Class Low-Level Analysis The low-level analysis mainly focuses on extracting the execution time of the target platform. This takes the form of the definition of a timing model of a virtual machine. Particularly, this stage performs platform-dependent analysis (i.e. in the context of specific hardware and virtual machine) of each Java bytecode instruction implementation and native methods. It can also include a combination of the common sequence of Java bytecode instructions. Therefore, a Virtual Machine Timing Model (VMTM) is a timing model that contains a list of the worst-case execution time of native methods and Java bytecode instructions that are extracted from the target virtual machine [5]. During this stage, information about the potential effects of pipelines and caches may be captured. It should be noted that the final estimation of WCET bounds is closely related with the WCET values provided in the target VMTM. However, there is no appropriate approach to deriving a VMTM given in [5]. How a target VMTM can be derived is discussed in Chapter 5.

66 48 CHAPTER 2. BACKGROUND AND RELATED WORK Calculation Finally, a real-time enabled target virtual machine performs the combination of the highlevel analysis with the VMTM to compute the actual WCET bound of the analysed code sections. The WCEF vectors can be built in a bottom-up fashion (i.e. tree-based approach) with a mechanism similar to the timing schema [103]. As the resources used by this stage are manageable, the calculations can easily be performed even on small virtual machines [5]. 2.4 Gain Time Reclaiming Typically, the WCET analysis and schedulability analysis are carried out separately. As has been shown in this chapter, sophisticated techniques are used in WCET analysis, for instance to model caches and pipelines, to achieve safe and tight estimation. However, most WCET analysis approaches are only considered in relation to procedural programming languages. Performing analysis on object-oriented programs must take into account additional dynamic features, such as dynamic dispatching and memory management. Arguably, object-oriented programming languages support more dynamic behaviour than procedural programming languages, and some of these features may result in object-oriented applications having a more pessimistic worst-case behaviour. In addition, given the need to provide 100% deadline predictability for hard real-time tasks, it is inevitable that the processor and other resources will be under-utilised at run-time [2]. As a result, object-oriented real-time systems may suffer from significantly lower utilisation and poorer overall performance of the whole system than procedural real-time systems. In contrast with the WCET analysis, a number of research groups have proposed various flexible scheduling algorithms, for instance priority server algorithms [13] and a slack stealing algorithm [70], to provide a more flexible real-time execution environment with greater

67 2.4. GAIN TIME RECLAIMING 49 performance of the whole system. In general, these flexible scheduling algorithms are mainly focused on improving the performance of the aperiodic tasks at run-time. They have, however, paid insufficient attention to the fact that, for the most part, hard real-time tasks are not executing via the worst-case execution time path. Therefore, even though they have demonstrated very complex scheduling algorithms to improve the average performance of the whole system, the improvements are still limited and the overhead of the implementation is extremely high or it is sometimes not even possible to implement such algorithms in practice [13]. Generally, the spare capacity of a real-time system may be divided into three groups [26]: extra capacity, gain time, and spare time. Extra capacity is the capacity which is not allocated for hard real-time tasks during the design phase. This can be identified off-line. Gain time is produced when the hard real-time tasks execute in less than their worst-case execution time estimations. This may only be reclaimed at run-time since it depends on the actual executions of tasks [26]. Spare time may be defined as a situation in which sporadic tasks do not arrive at their maximum rate. Most flexible scheduling algorithms are mainly focused on reclaiming the extra capacity of the system. Only a few research approaches [2, 29, 48] have discussed how to reclaim gain time. Even here, they have tended to focus on procedural programming languages, rather than on object-oriented programming languages. A brief survey of the related work on gain time analysis is given below. Haban and Shin have proposed an approach [48] placing software triggers at the end of basic blocks in task code to measure actual execution time. In [48], comparing the actual execution time which is calculated at the software triggers point with pre-determined WCET values, the gain time of the specific basic block can be calculated. In a similar way, Dix et al. have proposed an approach [29] adding milestones into task code to calculate the maximum remaining execution time of the particular task. However, both approaches reclaim the gain time after they have been generated and do not integrate with WCET analysis.

68 50 CHAPTER 2. BACKGROUND AND RELATED WORK Audsley et al. [2] have introduced a gain point mechanism to reclaim gain time of the basic blocks of a task code as early as possible. In [2], the use of gain point can be grouped into four separate forms, including static gain point for static code, dynamic gain point for loop constructs, efficiency gain point for detecting hardware speed-ups, and resource usage gain point for identifying spare resources. Yet, Audsley et al. s approach and the previous two approaches do not take into account object-oriented programming features and gain time resulting from functionality constraints impacting on the program s execution. A gain-time reclaiming approach taking account of object-oriented features is proposed in Chapter 6.

69 Chapter 3 A Computational Model & The WCET Analysis Framework Despite Java s initial promise, such as platform independence and support for concurrency, the language appears to be unfavourable in the area of high-integrity systems [68] and real-time systems [14]. Its combination of object-oriented programming features, its automatic garbage collection, and its poor support for real-time multi-threading are all seen as particular impediments. The success of high-integrity real-time systems undoubtedly relies upon their capability of producing functionally correct results within defined timing constraints. To support a predictable and expressive real-time Java environment, two major international efforts have attempted to provide real-time extensions to Java: the Real-Time Specification for Java (RTSJ) [10] and the Real-Time Core extensions to Java [23]. These specifications have addressed the issues related to using Java in a real-time context, including scheduling support, memory management issues, interaction between non-real-time and real-time Java programs, and device handling, among others. However, the expressive power of all these features, along with the regular Java semantics, means that very complex programming models can be created, necessitating complexity in supporting real-time virtual machines and tools. Consequently, Java, with the real-time 51

70 52CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK extensions as they stand, seems too complex for confident use in high-integrity systems [68]. Furthermore, in addition to the difficulties with analysing applications developed in these frameworks with all the complex features, there is no satisfactory static analysis approach that can evaluate whether the system will produce both functionally and temporally correct results in line with the design at run-time. For the above reasons, to encourage the use of Java in the development of high-integrity real-time systems, the following issues need to be addressed: Unpredictable run-time behaviour must be restricted, Unanalysable syntax and semantic must be restricted or must be annotated with bounded annotations, and Dependable analysis techniques, in line with the restricted or predictable computational model, should be provided. Bearing these requirements in mind, a restricted programming model that removes language features with high overheads and complex semantics, on which it is hard to perform timing and functional analyses, is introduced in [55]. Timing and functional analysis techniques, based on the restricted model, are also proposed. The proposed high-integrity real-time Java environment, called XRTJ, supports the following attributes: Predictable programming model Dependable static analysis environment Reliable distributed run-time environment Although the language environment has been developed with the whole software development process in mind, from the design phase to run-time phase, only the former two

71 3.1. XRTJ ENVIRONMENT OVERVIEW 53 attributes are relevant to this thesis. This means that the reliable distributed run-time environment is out of the scope of this thesis and it is mentioned here as a complete overview of the language environment. The major aim of this chapter is twofold: presenting an overview of the proposed programming environment ( 3.1) with a restricted computational model ( 3.1.1) in which our WCET analysis framework is proposed ( 3.3); and illustrating the framework of the portable WCET analysis on which the major contributions of this thesis are made. 3.1 XRTJ Environment Overview The extensible high-integrity Real-Time Java (XRTJ) environment we have proposed in [55], is targeted at cluster-based distributed high-integrity real-time Java systems, such as consumer electronics and embedded devices, industrial automation, space shuttles, nuclear power plants and medical instruments. As shown in Figure 3.1, this programming environment includes: a restricted programming model proposed for high-integrity Java applications, called Ravenscar-Java profile [67]; annotation 1 class files storing additional information that cannot be expressed in Java bytecodes, called Extensible Annotations Class (XAC) format [54]; an annotation-aware compiler that extracts annotations from the source code during compilation, called XRTJ- Compiler [54]; a static analysis environment integrated with static analysis techniques, called XRTJ-Analyser [54]; and a modified real-time Java virtual machine, called XRTJ- Virtual Machine [55] that supports a run-time environment. The XRTJ environment can be divided into two main parts: a Static Analysis Environment, which offers a number of tools that conduct various static analyses, such as timing analysis; a Run-Time Environment, in which highly predictable and dependable distributed 1 The term annotations, in this thesis, means both manual annotations and annotations generated by the XRTJ-Compiler automatically.

72 54CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK Java Program (+ Annotations) S ta tic Ana lys is Environme nt XRTJ Ana lys e r System Config. XRTJ Compile r Static models Extensible Annotation Class Ravanscar Java Profile Java APIs + Real Time APIs XRTJ Virtual Machine Java Class File Run Time Archite cture Figure 3.1: A basic block model of the XRTJ environment capabilities are provided. The static analysis environment supports various analysis techniques by means of the XRTJ-Analyser where timing analysis and program safety analysis 2 can be statically carried out. To facilitate the various static analysis approaches and to provide information, such as maximum iterations of loops, that cannot be expressed in either Java source programs or Java bytecode, an annotation class format called Extensible Annotations Class (XAC) file ( 3.1.2), which stores non-functional information, is proposed [54]. To generate XAC files, an annotation-aware compiler, named XRTJ-Compiler ( 3.1.3), which can derive additional information from either manual annotations or source programs, or both, is also introduced. Taking advantage of the knowledge accumulated with the compiler, different analysis tools may be integrated into the XRTJ-Compiler to carry out various analyses either on source programs or Java bytecode. Java programs extended with specific annotations, such as timing annotations or model checking annotations 3, are compiled into Java class files and XAC files by either a simple 2 The timing analysis addresses timing issues in terms of temporal correctness, whilst the program safety analysis aims to ensure program safety in terms of functional correctness and concurrency issues. 3 Model-checkers, such as JPF2[11], which requires special annotations, may be employed in our architecture to facilitate safety checks of concurrent programs.

73 3.1. XRTJ ENVIRONMENT OVERVIEW 55 XAC translator and a traditional Java compiler or the XRTJ-Compiler. A conformance test that verifies whether the applications obey the rules defined in the Ravenscar-Java profile or whether the manual annotations are correct is also conducted during the compilation. The XAC files, together with the Java class files, are used by the XRTJ-Analyser to perform various static analyses. As shown in Figure 3.1, various static models, such as the hardware model of a target platform, can be provided to perform different static analysis approaches on the XRTJ-Analyser. In the XRTJ-Analyser, system information about the target machines needs to be provided. Based on the system configuration information, the XRTJ-Analyser loads related static models to conduct specific analyses. The distributed run-time environment provides mechanisms for underlying systems to facilitate both functionally and temporally correct execution of applications. This infrastructure is targeted at a cluster-based distributed infrastructure where remote objects are statically allocated during the design phase. In order to accommodate a diverse set of implementations on the underlying platforms or virtual machines, two run-time environments with different levels of distribution are supported in the XRTJ run-time environment. One is called Initialisation Distributed Environment, in which RMI is only allowed for use in the initialisation phase of an application, and the other is called Mission Distributed Environment, where a restricted real-time RMI model [112] can be used during the mission phase. The run-time environment is beyond the scope of this thesis, see [55] Ravenscar-Java Profile This section gives a summary of the Ravenscar-Java profile [67], which is a Java (augmented by the RTSJ [10]) profile for the development of software-intensive high-integrity real-time systems. This profile is based on the approach proposed by Puschner and Wellings [100] and has been extended with Java s sequential languages constructs [6] and object-oriented features. In addition, it has been updated with the current version of RTSJ [10]. The

74 56CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK profile also fits within the J2ME framework [106], fulfils the NIST Real-Time Java profile requirements [14] and is consistent with well-known guidelines for high-integrity software development, such as those defined by the U.S. Nuclear Regulatory Commission [52]. Initialisation Phase main() invoked Create Initialiser thread main() terminates Initialise all necessary objects and real time threads Start all threads Mission Phase New Thread New Thread New Thread Heap Memory Immortal Memory Scoped Memory Allocatable Memory Figure 3.2: Two execution phases of Ravenscar Virtual Machine [67] Its computational model defines two execution phases [100], i.e. initialisation and mission, as shown in Figure 3.2. In the initialisation phase of an application, all necessary threads and memory objects are created by an Initializer thread, whereas in the mission phase the application is executed and multithreading is allowed based on the imposed scheduling policy. There are several new classes that should ultimately enable safer construction of Java programs (for example, Initializer, PeriodicThread, and SporadicEventHandler), and the use of some existing classes is restricted or simplified due to their problematic features in static analysis. For instance, the use of any class loader is not permitted in the mission phase, and the size of a scoped memory area, once set, cannot be changed. Further restrictions include the following (see [67] for a full list)

75 3.1. XRTJ ENVIRONMENT OVERVIEW 57 No nested scoped memory areas are allowed, Priority Ceiling Emulation must be used for all shared objects between real-time threads, Processing groups, overrun and deadline-miss handlers are not supported, Asynchronous Transfer of Control is not allowed, and Object queues are not allowed (i.e. no wait, notify, and notifyall operations). Restrictions are also imposed on the use of the Java language itself, for example continue and break statements in loops are not permitted, and Expressions with possible side effects must be eliminated. Most subsets of Java or the RTSJ (e.g. [6, 100]) overlook some important elements of the language, for example, multithreading and the object-oriented programming model. Thus many of the advantages of Java are lost. However, the Ravenscar-Java profile attempts to cover all the language issues, as well as the run-time model Extensible Annotation Class (XAC) Format One of the key components in the XRTJ architecture is the Extensible Annotation Class (XAC) format that provides information for various analysis tools that cannot be stored as code attributes in Java class files without making them incompatible with the traditional Java architecture [54]. The information provided in the XAC file is extracted from the source code or manual annotations for particular purposes. Although we sometimes use the term XAC file, it is not necessarily a physical file. XAC is an annotation structure that can be stored in separate files or as additional attributes in Java class files [80].

76 58CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK The XAC format has been designed with two main goals in mind: portability, to support platform independence, and extensibility, to hold extra information needed for other analysis tools. Therefore, the XAC files are easy to extend for various purposes or apply in annotation-aware tools or JVMs. In addition, using separate XAC files has benefits for distributed systems as XAC files do not increase the size of traditional Java class files. Additionally, if the XAC files are not required at run-time, they do not need to be either loaded into the target JVM or transferred among distributed machines. This can avoid manipulation between the XAC files and the Java class files since they are treated as independent modules. <@XAC> <@Class_Name, Checksum> TAG_Count = n <BODY>... </BODY> TAG = TAG_ID <TAG_Name> <SPEC> <SUBTAG_COUNT = n > <METHOD = Constant_Pool_index > <METHOD BODY>... </METHOD_BODY> </SPEC> <TAG_BODY>... </TAG_BODY> </TAG_Name> < METHOD = #Ref > <METHOD BODY>... </METHOD_BODY> Figure 3.3: The Format of the XAC file Each XAC file is generated for a specific Java class file, and so the relationship between a Java class file and an XAC is one to one. Since the XAC files are not defined for any particular operating systems, these files are easy to apply in annotation-aware tools or JVMs. As the XAC file is designed with extensibility in mind, the annotation formats of the XAC file may be variable. All data structures need to be declared in the specification

77 3.1. XRTJ ENVIRONMENT OVERVIEW 59 area. The format of the XAC file is given in Figure 3.3. In order to speed up the reading of these files, the XAC file can be expressed in binary format, which is similar to the way Java class files are treated. The XAC file includes class namespace, checksum, total number of tags, data format specification and contents of the annotations. The checksum can help the JVMs to verify the consistency between the Java class file and the XAC file. The outermost layer of the XAC file defines how many TAGs are contained in this XAC file. Each TAG encompasses a specification and its body. The specification may declare the format of the contexts. The body includes the annotations whose formats are defined in the specification. The offset numbers of bytecode in a method are stored with the associated annotations in the XAC file. Therefore, the corresponding bytecode and annotation may easily be reconstructed in analysis tools. An example of the content of XAC file is given in Figure 3.7. We have noted that JSR-175 (Java Specification Request) have proposed a standard annotation format to be able to apply in various general purpose analysis tools. It has been incorporated with Java 1.5. The specification offers well-defined annotation formats that could provide additional information for particular methods. Unfortunately, this does not address the need for the annotation of statements in a method. Therefore, for the current stage, the XAC format still needs to be used to provide expressive annotations in the XRTJ environment XRTJ-Compiler Compiler techniques have been applied to analysis approaches, such as worst-case execution time analysis and program safety analysis, in order to achieve more accurate results. For example, Vrchoticky [111] has suggested compilation support for fine-grained execution time analysis, and Engblom et al. [34] have proposed a WCET tool called Co-transformation,

78 60CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK integrated with compilation support, to achieve safer and tighter estimation of timing analysis approaches. These approaches show that compilation support can not only address the optimisation issues introduced by compilers, but also provide additional information that may accumulate from the source-code level for particular analysis tools. In the XRTJ environment, an annotation-aware compiler, called XRTJ-Compiler, is introduced in order both to manipulate annotations and check that the contexts of source program code obey those rules defined in the Ravenscar-Java profile. The XRTJ-Compiler extracts both manual annotations introduced for timing analysis and specific annotations that can be derived from source code level for various purposes. In particular, the requirements of other static analysis tools, such as information needed for model checkers and other safety analysis tools, may be produced by the XRTJ-Compiler and can be stored in associated XAC files. The prototype implementation of the XRTJ-Compiler is discussed in Chapter Static Analysis Environment The static analysis environment offers two levels of analysis stages to accommodate various static analysis tools. The one in which source code files that may contain manual annotations are accessible to analysis tools is called source-level analysis, whereas the one in which only Java class files with XAC files are accessible to analysis is called bytecode-level analysis. As shown in Figure 3.4, the static analysis environment consists of two components: XRTJ-Compiler and XRTJ-Analyser. The XRTJ-Compiler is introduced to conduct various analysis approaches at source-level. In this stage, an XAC file for each Java class file is generated. The XRTJ-Analyser is targeted at applications that combined various static analysis techniques to carry out different static analysis approaches on Java class files

79 3.2. STATIC ANALYSIS ENVIRONMENT 61 together with XAC files. As the environment is targeted at high-integrity real-time applications, the static environment can be classified into two main groups by the attributes of the analysis tools: timing analysis and program safety analysis. The former emphasises the analysis of timing issues in terms of temporal correctness, whereas the letter highlights program safety in terms of functional correctness and concurrency issues, such as safety and liveness. These static analysis approaches may be carried out individually or combinatorially. A block diagram of the XRTJ architecture for the static analysis environment is given in Figure 3.4. Java Program (+ Annotations) X RTJ Compiler Source Level XAC Translator Traditional Java Compiler Conformance Test Extensible Annotation Class (*.xac) Java Class File (*.class) Bytecode Level Safety Policies VMTMs Program Safety Analysis X RTJ Analyser Timing Analysis Scheduling algorithms System Configuration Analytical Results Figure 3.4: Static Analysis Environment

80 62CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK Timing Analysis Timing analysis is crucial in real-time systems to guarantee that all hard real-time threads will meet their deadlines in line with the design. In order to ensure this, appropriate scheduling algorithms and schedulability analysis are required; estimating WCET bounds of real-time threads is of vital importance (as shown in Chapter 2). In order to offer a predictable and reliable environment for high-integrity real-time applications, a number of timing analysis issues need to be addressed, for example: How the WCET analysis can be carried out on a highly portable real-time Java architecture, How the run-time characteristics of Java, such as the high frequency of method invoking and dynamic dispatching, can be addressed, How schedulability analysis can be conducted statically, and What techniques need to be provided to take account of the supporting distributed run-time environment. In this thesis, how WCET analysis can be carried out in the static analysis environment proposed in the XRTJ environment is mainly discussed. The WCET analysis on distributed applications and the schedulability analysis approaches are beyond the scope of this thesis. 3.3 The Portable WCET Analysis Framework As mentioned in Section 2.3, only portable WCET analysis [8, 5] takes account of the portability supported in the Java architecture. Following the philosophy of the analysis, our framework (Figure 3.5), therefore, also uses the three-step approach to be able to offer a comprehensive WCET analysis bearing portability and dynamic dispatching issues in mind.

81 3.3. THE PORTABLE WCET ANALYSIS FRAMEWORK 63 This section is mainly concerned with an overview of our framework proposed in the XRTJ environment. Java Class Files Java Source Files Target Virtual Machine Native Method Platform-Independent Analysis Platform-Dependent Analysis Java Class Files XAC Files VMTM Estimating WCET : Static Analysis : Dynamic Analysis (Measurement) WCET Figure 3.5: The Portable WCET Analysis Framework The first step of the framework is the platform-independent analysis. At this stage, the technique analyses annotated Java programs or Java class files to produce portable WCET information. Manual annotations in our approach are assumed to be comments in Java [54], so that annotated Java programs can also be compiled with traditional Java compilers to generate Java class files. Here, one should note that the control flow of the program must be unchanged to avoid erroneous mapping between annotations and bytecodes during optimisation or transformation of code. Taking advantage of the knowledge derived with the compiler, portable WCET information can be extracted from either source programs or

82 64CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK Java bytecode statically. Portable WCET information is computed in the form of so-called Worst-Case Execution Frequency (WCEF) vectors by the XRTJ-compiler. WCEF vectors [5] represent execution-frequency information about basic blocks 4 and more complex code structures that have been collapsed during the first part of the analysis. Then portable WCET information can be stored into the XAC files [54]. Note that the static analysis is used during this stage. As shown in Figure 3.5, the platform-independent analysis extracts WCEF vectors from the source code or Java class files. The analysis technique is integrated with the XRTJ- Compiler in our environment. Therefore, the XRTJ-Compiler derives abstract syntax trees, WCET annotations [54, 53], and gain time reclaiming graphs [57] during compilation. Then, portable information, including WCEF vectors, WCET annotations and gain time reclaiming graphics, is stored as XAC files by the XRTJ-Compiler automatically. Therefore, after compilation, the class files and XAC files are ready for WCET analysis tools. In this thesis, two major contributions are made on this analysis level by introducing a solution to how dynamic dispatching issues can be addressed and how utilisation and performance of the whole systems can be improved at run-time. In parallel, analysis of the target platform is performed, so-called platform-dependent analysis. This analysis forms the Virtual Machine Timing Model (VMTM), which is a timing model for the target virtual machine including a list of the worst-case execution time of native methods and Java bytecode instructions. During this stage, information about the potential effects of pipelines [5] of the common sequence of bytecodes and models of the execution time of cache hits and misses may be captured. Although the platform-independent analysis can be carried out by a static analysis approach, the use of a static analysis technique to perform the platform-dependent analysis has a number of challenges. It should be noted that when deriving VMTM it is necessary to 4 See Section 2.3

83 3.3. THE PORTABLE WCET ANALYSIS FRAMEWORK 65 take into account the implementation aspects of not only the Java virtual machine, but also the operating system [22]. In addition, the analysis should also bear in mind the hardware architecture for various embedded systems. Deriving a VMTM with static analysis technique seems very complicated since modern processors have tended to become more and more complex. In order to accommodate a diverse set of implementations on the underlying platforms and virtual machines for embedded systems the measurement-based analysis technique is used in our approach. This will be explored further in Chapter 5. The final stage is the estimating of the WCET bounds of each thread. In the XRTJ environment, a WCET analysis tool in the XRTJ-Analyser performs the combination of the results of platform-independent analysis with the target VMTM to compute the actual WCET bound of the analysed code sections WCET Annotations All annotations in our approach are introduced with the characters //@ for single line and for multiple lines. These formats are assumed to be comments in Java. Similar structures of annotations are applied in the JML (Java Modelling Language) approach [69], and the ESC/Java (Compaq Extended Static Checker for Java) [42]. However, these projects mainly focus on recording detailed design decisions for a software module [69] or checking for potential runtime errors by a modular program checker [42]. As all annotations are provided as comments in Java programs, these codes, including annotations, can be compiled by a traditional compiler to generate traditional Java class files and run on a traditional Java virtual machine. In our framework, the WCET annotations introduced in [8] are assumed to be available. Bernat et al. [8] have presented a high-level WCET analysis technique using JBC to provide WCET annotations by means of a WCETAn class. This has sufficient capabilities to provide

84 66CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK WCET annotations from the source level to the bytecode level. The approach has shown how to extract data flow and control flow information from the Java class files without relying on the source code. The WCETAn class allows scope or code-block boundaries, the maximum number of iterations of loops, execution modes and arbitrary path execution frequencies to be described [99]. We modified the syntax of the annotations introduced by Bernat et al. [8] into our annotations format to accumulate timing information from the source level. These modified annotations are then translated into XAC files to provide high-level WCET information, instead of using method calls of the WCETAn class in the class files. The syntax and description of the annotations in our approach are given in Appendix A public vod Call ForLoop ( int ndoloop ) { //@Mode( Quick Mode ) 5 //@Mode( Normal Mode ) //@Loopcount ( 5 0 ) 8 //@Loopcount ( 1 0, Quick Mode ) 9 //@Loopcount ( 3 0, Normal Mode ) 10 for ( i =0; i <ndoloop ; i ++) { } } Figure 3.6: A fragment of the annotated Call ForLoop method An example of how two types of annotation (//@Mode 5 and //@Loopcount 6 ) are used is shown in Figure 3.6. The code example demonstrates how three different loop bounds are provided. In Line 7, the loop bound is specified to be 50 iterations in the worst case. However, in the particular modes of operation called Quick Mode and Normal Mode, which 5 Mode annotations are used to specify the state in which a code is to be executed (A.2.2). 6 Loopcount annotation can be used to denote the maximum iterations of loops (A.3).

85 3.3. THE PORTABLE WCET ANALYSIS FRAMEWORK 67 are declared at Line 4 and 5, the loop can only be executed 10 and 30 iterations respectively <! WCET i n f o r m a t i o n > 3 <TAG=1> 4 <WCET> 5 <SPEC> 6 <SUBTAG COUNT=4> 7 <Method=Constant Pool index > 8 <Method Body> 9 <00 PC Offset=int, Mode=name> 10 <01 PC Offset=int, Use Mode=name> 11 <02 PC Offset=int, LoopCount=int> 12 <03 PC Offset=int, LoopCount=int, Mode=name> 13 </Method Body> 14 </SPEC> 15 <TAG BODY> 16 <! Constant Pool index #36= 17 SimpleIO$ControllerThread. Call ForLoop ( I )V > 18 <Method=#36> 19 <Method Body> 20 <00 #1, Quick Mode> 21 <00 #1, Normal Mode> 22 <02 #10, 50> 23 <03 #10, 10, Quick Mode> 24 <03 #10, 30, Normal Mode> 25 </Method Body> </TAG BODY> 28 </WCET> Figure 3.7: A fragment of the XAC file for Call ForLoop method The context in which this fragment of code is used determines the mode. This allows the analysis tool to use tighter loop bounds for different calls and therefore reduce the pessimism. Without the annotations, extra pessimism would be incurred in the analysis by considering that all calls use the maximum number of iterations. Using an XAC translator or compiler, the XAC file can be produced from the annotated Java program. The text format of the XAC file is given in Figure 3.7. This format is similar to the format used by XML. In the figure, we can see that four types of WCET annotation format are defined

86 68CHAPTER 3. A COMPUTATIONAL MODEL &THE WCET ANALYSIS FRAMEWORK in the specifications. Each annotation has a unique identification number. For instance, 03 is defined for the Loopcount(int, Mode name) annotation. Based on the offset of the method of the JBC in the JCF file, the annotation of the Loopcount(int, Mode name) can be given as <03 #10, 10, Quick Mode> and <03, #10, 30, Normal Mode> in the body area. 3.4 Summary In this chapter, we have presented an overview of the XRTJ environment, which is expected to facilitate the development of distributed high-integrity real-time systems based on Java technology. The three main aims of the XRTJ are to develop a predictable programming model, a sophisticated static analysis environment, and a reliable distributed run-time architecture. Based on the computational model, a portable WCET analysis framework is proposed. This chapter has offered an overview of the framework and related components that are introduced for the XRTJ environment. The research work discussed in the rest of the thesis is based on the framework illustrated in this chapter.

87 Chapter 4 Addressing Dynamic Dispatching Issues Even though the use of an object-oriented programming environment in software applications has several benefits, such as reusability, maintainability, and scalability, some dynamic features bring about the cost of the extra overhead incurred in the analysis stage or run-time stage. The issues connected to the overhead of using dynamic dispatching have been noted by compilation and optimization research groups in recent years. Unfortunately, insufficient attention has been paid to the issues related with dynamic dispatching features in real-time system research areas. Addressing dynamic dispatching is crucial in real-time systems since the use of dynamic dispatching may potentially result in the system being unanalysable or unpredictable. In hard real-time systems, each real-time thread has to be analyzable and predictable. In this thesis, we introduce a determinable approach based on knowledge gathering from the design phase. This approach introduces annotations to define the scope of the dynamic dispatching methods for hard real-time threads. Based on the design knowledge, the possible dynamic methods which are going to be invoked can be denoted with the annotations, by developers. Using the annotations, a tighter worst-case execution time of a particular method can be calculated. 69

88 70 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES The major aim of this chapter is to explore annotations that are expressive enough to cope with the issues involved in the usage of the dynamic dispatching features. Although the annotations are mainly assumed to be provided from design knowledge, they need not be provided manually. To a certain extent, annotations could be extracted automatically by a tool, through data flow analysis and recognition of the scope of class hierarchies from the source code. However, this is beyond the scope of this thesis. The rest of this chapter is organised as follows. Section 4.1 discusses the major issues connected with using and restricting dynamic binding features in object-oriented hard realtime systems. Section 4.2 introduces annotations and shows how these annotations can be denoted in real-time applications in order to estimate tighter WCET values. Following this, Section 4.3 discusses how the correctness of annotations can be validated, while an example to demonstrate the approach and how tight WCET estimations can be calculated is given in Section Dynamic Dispatching Issues This section is mainly concerned with the major issues connected with using and restricting dynamic dispatching features. Although these issues will be mainly discussed in connection with the Java architecture, the approach can apply to other object-oriented programming languages, such as C++, in which the dynamic dispatching feature is available [116]. Java distinguishes itself from other general-purpose object-oriented languages by its portability, networking, memory management, concurrent programming, and security features. However, the run-time characteristics of Java make it more difficult to be optimised during compilation [116], compared to other object-oriented programming languages, such as C++. One of such characteristics is that Java applications rely heavily on method invocations and default dynamic dispatching.

89 4.1. DYNAMIC DISPATCHING ISSUES 71 Therefore, addressing dynamic dispatching on the Java architecture is a big challenge in WCET analysis. The following subsections discuss the major issues connected with restricting and using the dynamic dispatching feature in Java for WCET analysis Issues connected with Restricting Dynamic Dispatching Features From the Java virtual machine point of view, methods can be mainly divided into Java methods and native methods. Invoking a Java method in the Java virtual machine (JVM) creates a new stack frame, whereas invoking a native method does not push a new stack frame [110]. Native methods are inherently implementation dependent since they can be implemented in any other programming languages that support an interface to access from/to JVM. Native methods are not discussed here since analysis of WCET estimation of native methods is language-dependent. Our approach assumes that the WCET estimations of the calls of each native method have to be known. For the most part, Java methods may be classified into two main groups: class (static) methods, and instance methods. A class (static) method is a method which does not need an instance to be invoked, whereas an instance method requires an instance before it can be invoked. In other words, class (static) methods are invoked based on the type of object reference, whereas instance methods are invoked based on the actual object [110]. Class methods are translated into invokestatic Java bytecode instructions, which use static binding techniques at run-time. By definition, a static method cannot be overridden and therefore no dynamic dispatching is required. In contrast, instance methods are translated into the invokevirtual Java bytecode instructions, which use dynamic binding. Note that, although instance methods are normally invoked with invokevirtual, in specific situations two other Java bytecode instructions may be used: invokespecial and invokeinterface. In the Java language semantic, Java methods can be defined with five access modifiers: public, private, protected, static, and final. By definition, private, static,

90 72 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES and final methods cannot be overridden by any other classes, and only public and protected methods can be overridden by child classes. This means that private, static, and final methods are translated into invokestatic or invokespecial, whereas public and protected methods are translated into invokevirtual. The invokespecial instruction is applied for instance initialisation, private methods, and methods invoked with the super keyword. It differs from invokevirtual in the manner in which it uses static binding. The invokeinterface performs the same function as invokevirtual, but it is used solely when the type of reference is an interface. In the following sections, we use instance method to mean those methods that are translated into either invokevirtual or invokeinterface instructions and which may be overridden by child classes. Hence, the dynamic features, such as inheritance and overriding, are offered in the instance methods, which are defined as public and protected in Java. Therefore, if one prohibited the use of dynamic binding features in object-oriented real-time applications, only static, private, and final methods could be used. Obviously, these restrictions could eliminate the major advantages of object-oriented programming. Arguably, these kinds of applications no longer appear object-oriented and have even less expressive power than procedural languages in terms of reusability and extensibility. An alternative approach is to force the programmer only to use static binding. In Java, this can be achieved partially by disallowing assignment and parameter association between objects in the same class hierarchy. However, it could be limited for the development of real-time applications in practice. For the above reasons, we argue that dynamic dispatching should be allowed in hard real-time systems in an appropriate way.

91 4.1. DYNAMIC DISPATCHING ISSUES 73 1 class A { 2 // WCET: 100ms 3 public void m1( ) { } } 8 9 class B extends A { 10 // WCET: 25ms 11 public void m1( ) { } } class C extends A { 18 // WCET: 200ms 19 public void m1( ) { } } class D extends A { 26 // WCET: 300ms 27 public void m1( ) { } } Figure 4.1: Classes for Example Issues involved in Using Dynamic Dispatching Unlike procedural programming languages, the WCET analysis of object-oriented programming analysis needs to consider more dynamic characteristics. For example, in Figure 4.1, class A is a parent class and has a public method called m1(). Then, class B, class C and class D extend the class A and override the method m1(). Considering a Java program in Figure 4.2, we can observe that call of the ax.m1() (Line 5) in the Call m1() method is unknown until run-time since it uses dynamic binding features. As a result, all possible calls

92 74 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES class App { 3 Call m1 (A ax ) { 4 // dynamic d i s p a t c h i n g occur 5 ax.m1 ( ) ; 6 } 7 public s t a t i c void main ( S t r i n g [ ] a r g s ) { 8 A a= new A( ) ; 9 B b= new B ( ) ; 10 C c= new C ( ) ; C a l l m l ( a ) ; 13 Call m1 ( b ) ; 14 Call m1 ( c ) ; i f ( x >5) { 17 a = c ; 18 a.m1 ( ) ; 19 } else { 20 a = b ; 21 a.m1 ( ) ; 22 } Call m1 ( a ) ; 25 } 26 } Figure 4.2: Example 1 of ax.m1() need to be taken into account for the WCET estimation of Call m1() method. Assume that the WCET values for A.m1(), B.m1(), C.m1() and D.m1() are 100ms, 25ms, 200ms and 300ms respectively. In this situation, if we estimate the WCET value of ax.m1() with A.m1() method, it is very pessimistic if the instance type is B, or it is even unsafe if the instance type is C or D. From the source codes, it can also be observed that the WCET of the first call a.m1(a) (Line 18) and the second call a.m1(a) (Line 21) are different since an instance of parent class can denote an instance of any descendant of the class (Line 17 and Line 20). In view of the above discussions, using dynamic dispatching in a hard real-time system may result in the whole system being not only unpredictable and unanalysable, but also

93 4.2. DYNAMIC DISPATCHING WITH ANNOTATIONS 75 either unacceptably pessimistic or unsafe. Therefore, every single instance method must be analysed carefully if dynamic dispatching features are allowed. 4.2 Dynamic Dispatching with Annotations The use of annotations within static analysis techniques, such as WCET analysis [96] and model checking analysis [42], to facilitate the analysis is not novel. In general, annotations allow the addition of information which is not provided in the source code directly. The major aim of using annotations is to provide information about the program on which static analysis needs to be carried out. For example, it is possible that the maximum loop bound of a loop could not be known directly from the semantic of the code section. The annotations may be inserted manually or automatically. Manual annotations are mainly used in WCET analysis to provide information that could not easily be extracted from the source code, such as loop bounds [97], exclusive paths [71], modes [16]. In addition, they could also be applied as specifications to validate the source code or input data range [16]. In object-oriented programming languages, the execution time of a particular instance method may be different in its descendant classes if child classes override the method. Therefore, without annotations, it is clear that the WCET analysis will be pessimistic by dynamic binding features. We aim to deal with these issues by providing annotations which are expressive enough to identify the target method which is going to be invoked. At times when the target method cannot be identified, the annotation provided must also be able to denote the scope of dynamic dispatching methods. To achieve this aim, the requirements which need to be met are: Annotation to identify exactly which method is going to be invoked in a dynamic dispatching method, based on the design knowledge.

94 76 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES Annotation to indicate exactly which methods are going to be invoked in a nested scope, based on the design knowledge. Annotation to identify methods which might be invoked when several methods are possible, based on the design knowledge. In our approach, we assume that the execution time of looking up in the dispatch table is bounded and this execution time is included in the WCET of the dynamic dispatching method. This section introduces annotations, which not only address the dynamic dispatching feature, but can also offer WCET analysis that achieves tighter WCET estimation. The syntax of the annotations is described in Appendix A. Based on this syntax, the annotations to meet the requirements mentioned are discussed and proposed in the following subsections Annotation for Dynamic Dispatching Methods As shown in Figure 4.2, the type of the instance object a, which originally has the type of Class A, was changed to the type of B object or C object after the if statement (Line 16 22). Most of these dynamic type changes can be analysed by current compiler optimisation approaches, such as Class Hierarchy Analysis (CHA) [27] and Rapid Type Analysis (RTA) [4], for dynamic compilation. However, the dynamic compilation approaches are solely for optimising dynamic binding and they cannot guarantee that all dynamic binding will be resolved before run-time. In fact, in WCET analysis for hard real-time systems, the execution time of every single instance method has to be known prior to executing it. Moreover, in procedural programming languages, the relationships between functions or procedures are relatively simple. They have one call hierarchy and neither inheritance nor polymorphism is supported. Unfortunately, in object-oriented languages, the naming of methods is relatively difficult to understand and analyse if several objects are using the same

95 4.2. DYNAMIC DISPATCHING WITH ANNOTATIONS 77 method name, for example using overriding and overloading. As a result, well-structured annotations need to be considered in relation to the class hierarchy information class B extends A { 3 //@WCET Label( B.m1( v ) ) ; 4 public void m1 ( ) {.... } } class C extends A { 9 //@WCET Label( C.m1( v ) ) ; 10 public void m1 ( ) {.... } 11 } } class App { public s t a t i c void main ( S t r i n g [ ] a r g s ) { 19 A a= new A( ) ; i f ( x >5) { 22 a = c ; 23 //@UseWCET(C.m1( v ) ) ; 24 a.m1 ( ) ; 25 } else { 26 a = b ; 27 //@UseWCET(B.m1( v ) ) ; 28 a.m1 ( ) ; 29 } } 32 } Figure 4.3: Example 1 with annotations In this section, two major WCET annotations (i.e. WCET Label() and UseWCET()), which offer expressive power to cope with object-oriented features, are introduced in order to address dynamic dispatching problems. The WCET Label() annotation is provided for defining a label for a specific mode for particular executing characteristics in either class or instance methods, whereas the UseWCET() annotation is offered to denote a specific mode

96 78 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES or method in the applications. In both annotations, we can also state a mode to define the execution context of the method. The syntax of the annotations is given below. W CET Label Annotation ::= WCET Label( Class M ethod N ame {, mode name } ) U sew CET Annotation ::= UseWCET( Class M ethod N ame {, mode name } ) In addition, in order to take into account the polymorphism features, argument types need to be considered. In a combination of the full class name, method name and argument types, the full name for each method can be defined in the annotations. An illustration of how these annotations are used is given in Figure 4.3. This illustration combines Figure 4.1 and Figure 4.2 from the previous section, but this time, with annotations added Annotations for Nested Scopes in Objected-Oriented Programs Consider the program in Figure 4.4. Here, the execution of the calls of the method a.m1() at Line 21 is different according to the change of the type of object a at line 19, in line with the expression statement of the if-statement. It can be observed that the annotations used in the previous section are not expressive enough to represent complicated structures or nested scopes. The example shows that it could be pessimistic or unsafe if calls of a method are data dependent. To represent the dynamic behaviour of a program, two additional annotations: DefineScope() and ScopeWCET(), are used. Not only can these two annotations capture the WCET estimation for a specific scope, they can also be used for nested scopes. The syntax of the annotations is given as follows.

97 4.2. DYNAMIC DISPATCHING WITH ANNOTATIONS 79 1 class A { 2 // g i v e n WCET i s 100ms 3 public void m1 ( ) {... } 4 } 5 6 class B extends A { 7 // g i v e n WCET i s 25ms 8 public void m1 ( ) {... } 9 } class App { 12 public s t a t i c void main ( S t r i n g [ ] a r g s ) { 13 A a= new A( ) ; 14 B b= new B ( ) ; //@ DefineScope ( ForScope ) 17 for ( int i =0; i <5; i ++) { 18 i f ( i ==2) // t y p e changing 19 a = b ; 20 //@ ScopeWCET( ForScope, 2 UseWCET(A. m1( v))+3 UseWCET(B. m1( v ) ) ) 21 a.m1 ( ) ; 22 } 23 } 24 } Figure 4.4: Example 2 Def inescope Annotation ::= DefineScope( scope name ) ScopeW CET Annotation ::= ScopeWCET( scope name, ncount UseW CET Annotation { asoperator ncount UseW CET Annotation } ) The DefineScope() annotation defines a simple or nested scope to establish the WCET, whereas the ScopeWCET() annotation is provided to denote the WCET estimation for the whole specific scope. As shown in Figure 4.4, DefineScope() annotation denotes the for-loop at line 17 and ScopeWCET() has been used at Line 20 to denote a.m1() method

98 80 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES invocation. A similar concept to the scope annotation, which has been used in procedural programming languages [33], is introduced in order to represent a particular scope or nested loops in WCET analysis. Using these scope annotations, we can achieve relatively tight WCET for a specific scope or nested loops. An example of the use of these annotations is given in Figure Annotation for Class Hierarchy Using the annotations presented in previous sections (Section and 4.2.2), we can address most issues involved when using dynamic dispatching features. However, the drawback is that these annotations need to know exactly which methods are going to be invoked at run-time. Unfortunately, it could be possible that there is more than one particular method that may be invoked. In this section, the maxwcet(...) annotation, which can denote a set of class hierarchies, is introduced. This annotation can suggest that the WCET of a dispatching method should be considered to be the maximum WCET of the class family 1 containing that method. Subsets of the class family can also be specified. The syntax of the annotation is given as follows. maxw CET annotation ::= maxwcet( &Class M ethod N ame {, asoperator {&}Class Method Name {, mode name }} ) maxwcet( &Interf ace M ethod N ame {, asoperator {&}F ull Method Name {, mode name }} ) 1 A class family of a class is a set of the classes including the class itself and all the child classes inherited from it.

99 4.2. DYNAMIC DISPATCHING WITH ANNOTATIONS 81 In the maxwcet(...) annotation, & denotes the whole class family of a class stated by Class Method Name. In these annotations, + and - can be used to express the union or subtraction of a single class or a class family for the method. The mode name in the annotation is a type of mode which is described in Appendix A. A.mx() //@ maxwcet(&a.mx() &C) B.mx() C.mx() D E F H I J Figure 4.5: maxwcet(...) annotation As shown in Figure 4.5, given class A has two descendant classes B and C, and B and C each has a number of descendant classes, if we would like to denote a method (A.mx()) of the class A, in which we do not want to consider the class C family, we can use the maxwcet(...) as illustrated in Figure 4.5. The concept of these annotations is similar to CHA [27], which is used for optimisation in compilation techniques, whereas here the annotation may provide tighter specification for a specific method. If we apply the CHA approach, the WCET value could be very pessimistic. This is because the CHA approach is not aimed at WCET analysis and since it does not use design knowledge, this approach has to consider all the possible class hierarchies. For example in Figure 4.2, we only need to consider the execution time of ax.m1() (Line 5) which takes the WCET value from either A.m1(), B.m1() or C.m1(). If we use the CHA approach or other WCET approaches to calculate the WCET value for the ax.m1() with max(a.m1(),b.m1(),c.m1(),d.m1()), the estimation will be very pessimistic. It is clear

100 82 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES that using maxwcet(...) annotation can not only address these issues, but can also achieve tighter WCET estimation. Similarly dynamic dispatching is also provided in interface objects. Therefore, the maxwcet(...) annotation needs to be taken into account to be able to denote interface methods. As shown in Figure 4.6, I.mx() method can be overridden in subclasses that implement I interface object. Note that the maxwcet(...) make it possible to denote adding or subtracting the methods of subclass or sub-interface objects if the annotation is annotated to a call of the parent interface method. //@ maxwcet(&i.mx() &C) I.mx() A.mx() X.mx() B.mx() C.mx() Y.mx() Z.mx() B1 B2 B3 C1 C2 C3 Y1 Y2 Y3 Z1 Figure 4.6: maxwcet(...) annotation used for interfaces 4.3 Correctness of Annotations It is an open question for most annotation-based approaches as to how to check if the provided annotations are correct. In our approach, the dynamic dispatching WCET annotations can be validated in two phases. Certain degrees of the correctness of the annotations can be verified during the static analysis stage by integrating it with optimisation techniques [27, 4, 28]. For example, the boundary of the class hierarchy can be identified by CHA [27]

101 4.3. CORRECTNESS OF ANNOTATIONS 83 or RTA [4] during compilation. In line with the information accumulated by the compiler, Class Method Name of objects indicated in maxwcet(...) annotation to add or subtract from a parent class can be checked if they are correct or can be eliminated if they are illegal. X.mx() R.mx() X1 X2 X3 P1.mx() P2.mx() //@ maxwcet(&p2.mx() &C2) C1.mx() C2.mx() Y.mx() C11 C12 C13 C21 C22 C23 Y1 Y2 Y3 Y4 Figure 4.7: maxwcet(...) annotation in an application As shown in Figure 4.7, a method call of P2.mx() in an application can only be denoted with those gray coloured objects given in Figure 4.7 by inheritance rules. Apart from them, other objects denoted in the maxwcet(...) of a method call of P2.mx() are incorrect and can be eliminated easily. Formally, the validation of the maxwcet(...) can be performed as follows: Let P be a particular object that has a parent class R and a number of child classes Cs. Let P.m be a method call of P object. Then, adding or subtracting the class hierarchy of P.m in the Class Method Name of the maxwcet(...) annotations can only be from Cs. However, there is a limitation to performing static checks. A situation could arise where

102 84 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES user input obeys the inheritance rules, but conflicts with the design knowledge. For example, referring to Figure 4.7, the design knowledge specifies that subclass C2.mx() should be eliminated from P2.mx(). If the user input indicates the elimination of subclass C1.mx() from P2.mx(), this should not be allowed. However, as the user input obeys the inheritance rules, this illegal input cannot be checked statically. In this situation, these annotations can be added as assertions into the applications during compilation and they can be used as guards at run-time to check if the object types are correct. 4.4 Evaluation 1 a b s t r a c t class Sensor { public int CollectData ( ) ; 4 public int AccessSensor ( ) ; } Figure 4.8: An abstract Sensor class We use an example which is part of the sensor control system of an aircraft control system to discuss how we can provide tight WCET annotations in the parent classes and how to use them in the child classes. The semantics and real-time APIs (Application Program Interfaces) used in the example are in line with the RTSJ specification [10]. In Figure 4.9, it can be observed that an abstract Sensor class (Figure 4.8) has three subclasses: Temperature Sen class, Pressure Sen class and Speed Sen class. The Temperature Sen class has three child classes: AirTempSen class, JetEngineTempSen class and LandingDeviceTempSen class. The purpose of these classes is to detect the surrounding environment, such as air temperature, Jet-engine temperature, and landing devices temperature, and then to report temperature information to related device objects or systems. Similarly, the AirSpeedSen

103 4.4. EVALUATION 85 Se nso r T emperature_sen CollectDa ta () AccessSe nso r( ) Pressure_Sen Speed_Sen Air TempSen CollectData() AccessSensor() JetEngineT empsen CollectDa ta () AccessSe nso r() LandingDev icet empsen CollectData() AccessSensor() AirSpee dsen VelocitySen Figure 4.9: A class hierarchy for Sensor Control Systems class and VelocitySen class, which are inherited from the Speed Sen class, detect the air speed and velocity of the aircraft respectively. Consider the Temperature Sen class. In order to understand our approach more easily, in the example, we only consider the highlevel timing analysis of the program and produce the WCET estimation. Of course, in reality, we have to consider the low-level timing analysis to calculate the tight and safe WCET estimations. As shown in Figure 4.10, the Temperature Sen class has CollectData(...) and AccessSensor(...) methods. The WCET annotations can be added to analyse and validate the design and WCET behaviour of the Temperature Sen class as follows. In this object, we add two annotations for the CollectData(...) method: one is the default //@WCET Label(...) and the other is TakeOff Mode. We assume that the WCET values for the default //@WCET Label(...) and the TakeOff Mode are 100ms and 200ms respectively.

104 86 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES 1 class Temperature Sen extends Sensor { //@WCET Label( Temperature Sen. CollectData (V) ) //WCET: 100ms 4 //@WCET Label( Temperature Sen. CollectData (V), TakeOff Mode ) //WCET: 200ms 5 public int CollectData ( ) { // Overridden by c h i l d r e n c l a s s e s return r e s u l t ; 8 } 9 //@ WCET Label( Temperature Sen. AccessSensor (V) ) //WCET: 200ms 10 public int AccessSensor ( ) { // Overridden by c h i l d r e n c l a s s e s return r e s u l t ; 13 } } Figure 4.10: A fragment of the Temperature Sen Java program In a similar way, we can analyse other methods either in parent or in child classes. In the child classes, we assume that the WCET values, denoted with default //@WCET Label(...), for CollectData(...) method in the AirTempSen, JetEngineTempSen, and LandingDeviceTempSen are 110ms, 120ms and 130ms respectively. The WCET values for the TakeOff Mode are twice the WCET value of the default //@WCET Label(...) in the same class in each child class. Moreover, the WCET values for AccessSensor(...) method in the AirTempSen, JetEngineTempSen, and LandingDeviceTempSen are 210ms, 220ms and 230ms respectively. Here we shall have to omit the source codes of these child classes. Then, as shown in Figure 4.11, we can analyse the real-time thread (SensorController), which is defined as a PeriodicThread in accordance with the RTSJ. In the SensorController object, assume that we only need to consider the AirTempSen object and JetEngineTempSen object. Therefore, with design knowledge, we can add an //@maxwcet(&temperature Sen.CollectData()-&LandingDevice) annotation for TempSen.CollectData() code to achieve tighter WCET estimation in the Call Sensor() method. Therefore, for this code, we can assume that the WCET value is 220ms. In the

105 4.4. EVALUATION 87 1 import javax. r e a l t i m e s t a t i c AirTempSen ATTempSen ; 4 s t a t i c JetEngineTempSen JETempSen ; class public S e n s o r C o n t r o l l e r extends RealTimeThread (... ) { 7 Temperature Sen TempSen ; S e n s o r C o n t r o l l e r ( P r i o r i t y P a r a m e t e r s schp, P e r i o d i c P a r a m e t e r s r e l P ) { 10 super ( schp, r e l P ) ; 11 TempSen = JETempSen ; // d e f a u l t with JetEngineTempSen } 14 private void C a l l S e n s o r ( Temperature Sen calltempsen ) { // With d e s i g n knowledge, we can denote t h e maxwcet (... ) as below. 17 //@ maxwcet(& Temperature Sen. AccessSensor &LandingDevice TempSen ) 18 // t h e WCET i s max (200, 210, 220) = > WCET: 2 2 0ms 19 calltempsen. AccessSensor ( ) ; // g i v e n t h e r e s t o f t h e WCET i s 100ms 22 } 23 public void run ( ) { 24 do { //@ UseWCET( JetEngineTempSen. AccessSensor (V) ) //WCET: 220ms 27 TempSen. AccessSensor ( ) C a l l S e n s o r ( TempSen ) ; // 220ms+100ms 30 TempSen = ATTempSen ; // g i v e n 7 ms / / ty p e changing occur //@ UseWCET( AirTempSen. AccessSensor (V) ) //WCET: 210ms 33 TempSen. AccessSenor ( ) ; //@ UseWCET( AirTempSen. CollectData (V), TakeOff Mode ) //WCET: 220ms 36 TempSen. CollectData ( ) ; TempSen = JETempSen ; // g i v e n 7 ms / / ty p e changing occur 39 // g i v e n t h e r e s t o f t h e WCET i s 200ms 40 } while ( w a i t f o r N e x t P e r i o d ( ) ) ; 41 } 42 } Figure 4.11: A fragment of the SensorController Java program

106 88 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES run() method, analysing the source code, we can denote two annotations. Finally, we can calculate a tight WCET value for the SensorController periodic real-time thread. The WCET value for the run method in the SensorController thread can be calculated as follows: WCET( S e n s o r C o n t r o l l e r. run ( ) ) = 220ms+ (220ms+100ms)+ 7ms+ 210ms+ 220ms+ 7ms+ 200ms = 1184ms 4.5 Summary This chapter has explored the ways in which dynamic dispatching can be addressed with annotations in object-oriented hard real-time applications. Firstly, issues related with restricting dynamic dispatching features or connected with the use of such features were discussed. Based on the discussion, annotations were introduced. The chapter has illustrated how these WCET annotations could be provided manually to facilitate the static WCET analysis of object-oriented applications. Three categories of annotations have been introduced respectively in subsections to 4.2.3, based on some design knowledge. In Section 4.2.1, WCET Label and UseWCET are used to identify exactly which method is going to be invoked in a dynamic dispatching method. DefineScope() and ScopeWCET are introduced in Section to denote exactly which methods will be invoked in nested scope, while in Section 4.2.3, maxwcet() is provided to identify methods which might be invoked when several methods are possible. Following this, the use of these annotations in a sensor control system was presented. Through this example, we see that the requirements have been met by the annotations introduced. Note that in a situation where we do not have the design knowledge but the program is

107 4.5. SUMMARY 89 predictable and analysable, all the possible method invocations can be taken into consideration. In such cases, annotations cannot be provided. On the other hand, in a case where we have predictable and analysable code and design knowledge, the annotations introduced in this chapter are expressive enough to provide the information for WCET. Hence, the completeness of the annotations provided is achieved. Our approach shows that allowing the use of dynamic dispatching not only can provide a more flexible way to develop object-oriented hard real-time applications, but it also does not necessarily result in unpredictable timing analysis. In addition, correctness of the annotations can be verified with optimisation techniques during the static analysis stage and limitations of static checks have been discussed. Where static checks fail, the annotations can be added as assertions to check if the object types are correct at run-time. However, one should note that the annotations have their limitations. The major limitation of the annotations is that they have to be annotated at the type of each object at the method level. To enable the minimal use of annotations, we can develop annotations which are expressive enough to be denoted at the object level in the future. It should also be noted that with limited design knowledge, the result of the WCET analysis with these annotations could still be pessimistic. To address these issues, we introduce a gain time reclaiming framework, presented in Chapter 6.

108 90 CHAPTER 4. ADDRESSING DYNAMIC DISPATCHING ISSUES

109 Chapter 5 Virtual Machine Timing Models As mentioned previously in Section 3.3, portable WCET analysis uses a three-step approach: platform-independent analysis (i.e. analysing the Java programs), platform-dependent analysis (i.e. producing a virtual machine time model 1 for a target platform), and conducting a combination of the platform-independent analysis with the platform-dependent analysis to compute the actual WCET bound of the analysed code sections. The VMTM is built when the target virtual machine is available. Only one VMTM needs to be derived for the same hardware configuration. In this framework, it should be noted that portable WCET analysis is highly dependent on the VMTM of the target platform. This means that the final estimation of any real-time thread on the specific platform is closely related to the worst-case execution time model of the target platform. Therefore, it is of vital importance to derive a safe and tight VMTM for the target platform. Unfortunately, there is currently no recognised approach to show how a VMTM for a particular platform can be built efficiently. Therefore, from a practical standpoint, bringing this approach into engineering practice still has a number of issues to be addressed. Deriving a VMTM with static analysis technique seems very sophisticated since modern processors have tended to become more and more complex. Some research approaches 1 See Section

110 92 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS [9, 77, 94] have integrated measurement techniques with static analysis to address modern complicated processor issues. However, these approaches have attempted to estimate WCET bounds of the applications on a particular target platform without considering portability. As a result, these techniques cannot take advantage of the platform independent feature supported in Java. Typically, a VMTM consists of WCET estimations of bytecode instructions, and the WCET estimation of Java and native methods that are provided in the target profile. Common sequences of bytecode instructions can also be provided in the VMTM to be able to consider pipeline effects [5]. However it should be noted that in a virtual machine interpreter, the pipeline effect can be achieved from the machine instructions between the end of the first bytecode and the beginning of the second bytecode. Other timing properties that may be incurred by any latencies in the target profile need to be considered while analysing a virtual machine to extract a VMTM as well. For example, as shown in Figure 5.1, the WCET estimation of a periodic thread in the Ravenscar-Java profile needs to take into account the WCET bounds of releasing and waiting for each period in the while-loop (Line 14 17). Here, apart from applicationlogic.run() at Line 15, the WCET estimations of other parts need to be considered in the estimation of each periodic thread. This chapter is mainly concerned with two measurement approaches that demonstrate how to extract Java VMTMs for portable WCET analysis from interpreter virtual machines. The following subsections discuss how VMTM can be derived by a profiling-based approach, and present how to build a portable benchmark model to extract VMTMs from various target platforms.

111 5.1. DERIVING JAVA VIRTUAL MACHINE MODELS 93 1 package r a v e n s c a r ; 2 public class PeriodicThread extends NoHeapRealtimeThread { 3 4 public PeriodicThread ( P r i o r i t y P a r a m e t e r s pp, P e r i o d i c P a r a m e t e r s p, 5 java. lang. Runnable l o g i c ) { 6 super ( pp, p, ImmortalMemory. i n s t a n c e ( ) ) ; 7 a p p l i c a t i o n L o g i c = l o g i c ; 8 } 9 10 private java. lang. Runnable a p p l i c a t i o n L o g i c ; public void run ( ) { 13 boolean noproblems = true ; 14 while ( noproblems ) { 15 a p p l i c a t i o n s L o g i c. run ( ) ; 16 noproblems = waitfornextperiod ( ) ; 17 } 18 // A d e a d l i n e has been missed, 19 // I f allowed, a r e c o v e r y r o u t i n e would be p l a c e d here 20 } public void s t a r t ( ) { 23 super. s t a r t ( ) ; 24 } } Figure 5.1: An illustration of the PeriodicThread class [67] 5.1 Deriving Java Virtual Machine Models Deriving the VMTM of a target platform is crucial in the portable WCET analysis since the results of the analysis are highly dependent on the outcome of the VMTM. Arguably, in the real-time and embedded fields, efficient analysis of the virtual machine to produce VMTM of the target platform is highly desirable. This is because the development life-cycle of the software built for embedded systems is short and the applications are required to be reusable and compatible among various architectures. Therefore, how to efficiently derive VMTMs for different platforms is the key issue for the portable WCET analysis approach. In this

112 94 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS chapter, we propose two measurement approaches, profiling-based analysis and benchmarkbased analysis, which demonstrate how the VMTM can be extracted from a target platform [56]. Although the platform-independent analysis (i.e. high-level analysis) can be carried out by a static analysis approach, the use of a static analysis technique to perform the platformdependent analysis has a number of challenges. It should be noted that when deriving the VMTM it is necessary to take into account the implementation aspects of not only the Java virtual machine, but also the operating system. The analysis should also bear in mind the hardware architecture for various embedded systems. In order to accommodate a diverse set of implementations on the underlying platforms and virtual machines for embedded systems the measurement-based analysis technique is used in our approach. Note that there are several possible ways in which the execution time can be measured, such as using clock cycle counters and using timers. In our approach, we use the rdtsc instruction, which has high resolution and very low overhead at run-time, provided in x86 architecture [58] to extract the time-stamp counter of the processor. Although we only show the use of a software approach on the x86 architecture under the Linux platform here, our approach can also be applied to other CPU architectures and operating systems if they support instructions or libraries that can extract the time-stamp counter of the processor. For example, on the PowerPC architecture, the clock cycle counter can be read from two 32-bits time base registers with the assembler instructions mf tb and mf tbu. In addition, gethrtime() library routine can be used on the SPARC V9 architecture under Solaris 8 operating systems, and hardware data acquisition interfaces 2 can be used under Windows, Linux and Solaris operating systems. One should note that using machine level instructions in an analysis approach could result in the approach not being portable. We will address this issue in Section 5.3 when we describe one of our approaches which aims to achieve 2

113 5.2. PROFILING-BASED APPROACH 95 portability. To avoid out-of-order execution during profiling, a serializing instruction (e.g. cpuid) is invoked before extracting the time-stamp counter of the processor. This instruction will force the processor to accomplish the execution of other instructions before executing the next instruction. Performing this can ensure the measurement of a section of code will not be interfered with other instructions. Although this can lead to pessimistic measurement results, it undertakes to deliver predictability. Additionally, it is necessary to minimise the run-time overhead and influences of background process running in the operating systems including background tasks and interrupters while deriving a VMTM. We addressed these issues by running the test-bed under single user mode on Linux. In addition, other background processes are killed manually to reduce the influences as much as possible. Our approach assumes that the measurements are carried out on cache misses and also by taking into account the time incurred in flushing caches. Note that to achieve tighter estimations, both cache hits and cache misses need to be taken into account. Therefore, the execution time of the code under cache hits should be included in the VMTM. With the flow information of the applications, the final calculation can be considered with the WCET execution time of cache hits and cache misses. However, the issues related to the cache effects are out of the scope of this thesis. The measurements of the execution time are represented by machine cycle units in the rest of this chapter. 5.2 Profiling-Based Approach Observing the behaviour of a system to analyse the specific aspects of applications executing on the system is not novel. An automatic tracing analysis [108] has been proposed to extract temporal properties of applications and operation systems. The approach shows that the

114 96 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS empirical analysis can reduce the over-estimation of real-time applications. Accordingly, a profiling-based analysis technique can be applied to deriving a VMTM for a particular platform by instrumenting additional code into the virtual machine. Even though the idea is relatively straightforward, a number of issues need to be addressed to ensure the reliability of the derived VMTM. For example: where to insert the instrumenting code, how to minimise the side effects of the instrumenting code at run-time, and how to avoid the out-of-order execution during the measurement of the specific code section. Similar to the automatic tracing analysis approach [108], profiling the execution time of each bytecode can be divided into two steps. One is extracting run-time information and the other is analysing it. The former step involves investigating the context of the virtual machine where temporal information can be derived and instrumenting code to extract the time-stamp counter of the processor with very low runtime overhead. For instance, the instrumenting code has to accumulate the instruction mnemonics and the time-stamp counter every time the interpreter fetches a bytecode. The latter step analyses the accumulated data and builds up a VMTM for the target platform. To be able to trace the run-time information, instrumenting code needs to be provided into the Java virtual machine. The instrumenting mainly depends on the specific implementation of the JVM. However, on the whole, Java virtual machines conduct the interpretation of Java bytecode on a method-by-method basis. Therefore, to reduce the memory and run-time overhead needed for collecting the run-time information, the implementation of the profiling-based analysis can refer to a suggested implementation, given in Figure 5.2. Note that the major aim of collecting run-time data by the method-based

115 5.2. PROFILING-BASED APPROACH 97 Interpreter Engine Entering a Method Allocate the memory Code attributes Bytecode interpretation Extracting the CPU Time-stamp counter Extracting the CPU Time-stamp counter Leaving from the Method Dump into storage : location of the instrumenting code Figure 5.2: Instrumenting profiling code into an interpreter engine approach in the interpreter engine is to be able to reduce the memory and run-time overhead of the instrumenting code. Furthermore, the content of the running applications is irrelevant to the analysis, which means that the execution time of each bytecode is collected and the analysis is conducted as application independent. As shown in Figure 5.2, a small amount of memory, which can be allocated when invoking a method, is necessary to store the collected information during run-time. These accumulated data can be dumped into storages when returning from the method (i.e. finishing the interpretation of the method). Dumping accumulated data at this point can reduce the noise or side effects of the instrumenting code on the measurement results. Here, the data can be analysed with the requirements of the target platform and the VMTM can be built from the results. Note that to avoid pessimistic estimations, out-of-order execution issues need to be taken into account while extracting the execution time of each bytecode. In addition to extracting, a VMTM comprises deriving the WCET bounds of Java

116 98 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS bytecodes, tracing run-time information from the virtual machine to extract the WCET bounds of native method calls provided in the target profile is also involved. Here, we can use a similar method in which run-time information is traced in the automatic tracing analysis [108]. Therefore, instrumented code that extracts the time-stamp counter with events can be added into the Java virtual machine to be able to trace method calls, system events and timing properties of the specific profile. Profiling information contains thread ID, time-stamp counter, memory addresses, full namespace of methods, etc. Note that in order to achieve the actual execution time which is consistent with the extracted execution time, we maintain the instrumented code in the Java virtual machine. 5.3 Benchmark-Based Approach It should be noted that the analysis of the portable WCET analysis approach highly depends on the VMTM of a target platform, and the technique provided in the previous section needs enormous effort to be carried out, including modifications to the execution engine of the target Java virtual machine to derive the execution time of each bytecode. In order to conduct the profile-based approach, it is clear that the source code of the virtual machine is necessary. Although deriving the execution time of a single bytecode can be achieved by the previous mechanism, deriving the execution time of specific sets of bytecodes is unlikely to be accomplished. Furthermore, the implementations of the previous approach cannot be reused for building the VMTM of a new virtual machine. This means that creating a VMTM for a new virtual machine needs to be started from scratch. Therefore, to be able to apply portable WCET analysis to real-time and embedded systems effectively, two major issues need to be addressed: how the instrumenting code can be reused effectively on various platforms without modifying it, and

117 5.3. BENCHMARK-BASED APPROACH 99 how the execution time of a specific set of bytecodes can be measured. Pre complementary code Corresponding complementary bytecode(s) Extracting the CPU Time stamp counter Code to be measured Specific bytecode(s) or method to be measured Extracting the CPU Time stamp counter Post complementary code Corresponding complementary bytecode(s) : location of the instrumenting code Figure 5.3: A block diagram of the benchmark-based approach To address these issues, the benchmark-based analysis approach (Figure 5.3) is introduced. The aim of this approach is to provide a Java-based benchmark 3 that may produce a VMTM automatically after executing it on the target virtual machine. The principle behind this mechanism is to insert bytecodes into the instrumenting code on the bytecode level. The bytecodes inserted can be any one of the following types: single bytecode, a set of bytecodes, Java method call and native method call. The instrumenting code is developed in a native method using Java native interface (JNI) features and it can access the time-stamp counter of the processor in a Java program. Therefore, the native method in the 3 The term benchmark means a collection of Java programs that are instrumented with particular bytecodes or methods to be measured.

118 100 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS benchmark can be ported easily to different platforms without modifying the benchmark. The benchmark may then display portability and reusability. To achieve these goals, the following issues need to be addressed: where and how specific bytecodes can be inserted into the Java program to measure the execution time of the specific bytecodes, and how to maintain the integrity of the Java stack after the insertion of additional bytecodes. To prove the feasibility of this approach and reduce the time needed to develop the whole mechanism, a number of tools have been investigated. Taking advantage of the time-stamp counter instruction (rdtsc) [58] supported in x86 architecture, the bytecodes disassembler and assembler tools provided in the Kopi Compiler Suite [66], and the Java native interface feature, the benchmark-based analysis approach can be carried out. The procedure of how the benchmark can be established is given in Chapter 7. Note that the integrity of the Java stack of JVM needs to be borne in mind when inserting additional bytecodes. For instance, after executing the iload instruction, the virtual machine will load an integer onto the Java stack. Therefore, we need to add complementary bytecodes to remove the integer from the stack in order to maintain the data integrity of the Java stack for the whole program. Some bytecodes may also need to be provided with values or references before executing them, such as iadd and iaload. As a result, to ensure the data integrity of the Java stack, corresponding complementary bytecodes need to be added at the pre- or post- locations of the measurement bytecodes (Figure 5.3). One should note that the major purpose of the benchmark is to produce a VMTM that contains a collections of the WCET bounds of bytecodes and native methods available on the particular profile. It could be possible that a specific system or profile only needs to provide a concise VMTM with particular bytecodes and native methods. Therefore, a

119 5.3. BENCHMARK-BASED APPROACH 101 Java Class file + additional bytecode(s) to be measured RDTSC s JNI library for target platform Java APIs + (Real Time APIs) Target Virtual Machine Figure 5.4: Measuring WCET of a particular set of Java bytecode with RDTSC library compact benchmark that comprises these bytecodes and native methods that will only be used on such systems can be produced. Then, the benchmark can be executed on any particular target platform with a native method that can access the time-stamp counter of the target platform (Figure 5.4). This approach can be used to generate the execution time of a specific set of common sequence bytecodes since it makes it possible to insert any combination of bytecodes with this mechanism. It can be observed that generating instrumented Java programs can be automatically conducted by a simple program implementing the above procedure. To prove the concept of this approach, a Java program developed with a native method that can access the time-stamp counter of the processor have been developed. The instrumenting code is written in C with assembler instructions (i.e. cpuid and rdtsc) to avoid out-of-order execution and to extract time-stamp counter. The experiments have been carried out on the RTSJ-RI and the preliminary results of the analysis are given below. As shown in Figure 5.5, the cost of the instrumenting code can be measured from two continuous calls of the instrumenting code. In the experiment, the measurement of the instrumenting code is performed in iterations and this is repeated several times.

120 102 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS Instrumenting native method Instrumenting native method The execution time of the instrumenting code : Extracting the CPU Time Stamp Counter Figure 5.5: Measuring the execution time of the instrumenting code Two of the results from the experiment are shown in Table 5.1. The second columns show the maximum WCET value over the measurements. The third and fourth columns contains the WCET value below the th percentile and th percentile respectively, whilst the last column is the average WCET for the results obtained from the respective experiments. Using the same methodology other bytecodes or method calls can be measured. For example, iload bytecode instructions are added into the instrumenting code and the corresponding complementary bytecode (i.e. istore) is inserted at the post instrumenting code Table 5.1: Measurements of the WCET of the instrumenting code Experiment 100 th percentile th percentile th percentile Average

121 5.3. BENCHMARK-BASED APPROACH 103 Instrumenting native method iload Instrumenting native method The execution time of iload + the execution time of the instrumenting code istore : Extracting the CPU Time Stamp Counter Figure 5.6: Measuring the execution time of the iload with the instrumenting code (Figure 5.6). The experiment has been carried out by performing the measurement of the iload bytecode in iterations and this was repeated several times. Table 5.2: Measurements of the WCET of iload with the instrumenting code Experiment 100 th percentile th percentile th percentile Average Two of the results from the experiment are also given in Table 5.2. Similar to the previous table, the second, third and fourth columns present the WCET value below the th percentile, th percentile and th percentile respectively, while the last column is the average WCET for the results obtained from the respective experiments.

122 104 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS 1 empty iload Cumulative Probability e Machine Cycles Figure 5.7: Probability Distribution of the measurements with the benchmark-based analysis One of each of the results from both the instrumenting code and the iload with the instrumenting code are plotted as a 1-cumulative probability graph in Figure 5.7 to show the major distributions in the whole data set of the experiment. In this graph, empty represents the distributions of the measurements of instrumenting code, whereas iload represents the distributions of the iload bytecode with the instrumenting code. The x-axis represents the machine cycles whilst the y-axis represents the 1-cumulative probability of the machine cycle measurements. From the probability distribution graph, we can observe that most values obtained are quite consistent. However, some pessimistic measurements are present and they seem to appear at certain intervals or frequencies of iterations. Interestingly, one should note

123 5.3. BENCHMARK-BASED APPROACH 105 that the distribution of the probability these pessimistic values is very similar in both the instrumenting code and the iload with the instrumenting code. Such observation is strong proof that these measurements are caused by background kernel processes since we have ensured that apart from kernel processes there is no application running under the testbed. Therefore these values are considered to be response time rather than worst case execution time. Note that although the interference from background kernel processes is not considered in the WCET analysis, it should still be considered in the schedulability analysis. 100 Cumulative Probability Graph for "iload" Cumulative Probability Machine Cycles Figure 5.8: Cumulative Distribution of the measurements of the iload bytecode with the benchmark-based analysis One of the results presented in Table 5.2 is also shown in a cumulative probability graph in Figure 5.8. In this graph, the x-axis represents the machine cycles whilst the

124 106 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS y-axis represents the cumulative probability of the machine cycle measurements. The inset graph magnifies the results between the th and the th percentiles. Based on the experimental results presented, the evaluation in the next section will be focused on the specific stated percentile of the measurements of the WCET values. 5.4 Estimating WCET bounds with VMTMs 1 import javax. r e a l t i m e. ; 2 public class rth extends RealtimeThread { 3 s t a t i c int Data [ ] ; 4 public void run ( ) { 5 //... 6 bb Sort ( Data ) ; 7 waitfornextperiod ( ) ; 8 } 9 public s t a t i c void bb Sort ( int a [ ] ) { 10 int i, j, t ; 11 int s i z e = 1 0 ; 12 //@Loopcount ( 9 ) 13 for ( i=s i z e 1; i >0; i ) { 14 //@Loopcount ( 4 5 ) 15 for ( j =1; j<=i ; j ++) { 16 i f ( a [ j 1] > a [ j ] ) { 17 t = a [ j 1]; 18 a [ j 1] = a [ j ] ; 19 a [ j ] = t ; 20 } 21 } 22 } 23 } } Figure 5.9: The Bubble Sort Algorithm in Java The evaluation of our analysis is illustrated with an example code of the Bubble Sort algorithm presented in Figure 5.9. Figure 5.10 shows the individual basic block of the algorithm with offset numbers. The maximum number of iterations of the outer and inner loops can be assumed as 10 1 and 10(10 1)/2 respectively when the size is equal to 10.

125 5.4. ESTIMATING WCET BOUNDS WITH VMTMS 107 B1: 0~ 9 B8: 57~58 B2: 12~14 B6: 49~51 B3: 17~25 B4: 28~45 B5: 46~46 B7: 54~54 B9: 61~61 Figure 5.10: Individual basic blocks with their offset numbers The WCEF vectors of the bubble sort algorithm, generated by our prototype compiler during compilation, are given in Figure 5.11 in text mode. In this example, only 14 different Java bytecodes are generated by the Java compiler. In the figure, BB denotes a basic block and the bracketed range of integers which follow denotes the offset numbers of the basic block. The WCEF vectors for each basic block are shown in the statements which follow. For example, in Line 5, the offset of the first basic block is stated as 0 9. Lines 6 to 11 state the WCEF vectors associated with this first basic block. A summary of the VMTM for the Bubble Sort example is shown in Table 5.3. This table shows the different statistical analysis results of the VMTM carried out with benchmarkbased analysis. Each bytecode is measured times continuously. As shown in Figure 5.12, although the VMTM derived with the benchmark-based approach shows rather

126 108 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS 1 <WCEFVectors> <Method=bb Sort : ( [ I )V> 4 <SubTAG BODY> 5 BB: ( 0 9 ) 6 bipush : 1 7 i s t o r e : 1 8 i l o a d : 1 9 i c o n s t 1 : 1 10 i s u b : 1 11 i s t o r e 1 : 1 12 goto : 1 13 BB: ( ) 14 i c o n s t 1 : 1 15 i s t o r e 2 : 1 16 goto : 1 17 BB: ( ) 18 a l o a d 0 : 2 19 i l o a d 2 : 2 20 i c o n s t 1 : 1 21 i s u b : 1 22 i a l o a d : 2 23 i f i c m p l e : 1 24 BB: ( ) 25 a l o a d 0 : 4 26 i l o a d 2 : 4 27 i c o n s t 1 : 2 28 i s u b : 2 29 i a l o a d : 2 30 i s t o r e 3 : 1 31 i a s t o r e : 2 32 i l o a d 3 : 1 33 BB: ( ) 34 i i n c : 1 35 BB: ( ) 36 i l o a d 2 : 1 37 i l o a d 1 : 1 38 i f i c m p l e : 1 39 BB: ( ) 40 i i n c : 1 41 BB: ( ) 42 i l o a d 1 : 1 43 i f g t : 1 44 BB: ( ) 45 return : 1 46 </SubTAG BODY> </WCEFVectors> Figure 5.11: WCEF Vectors of the bubble sort algorithm in text mode constant outcomes, the VMTM produced with the profiling-based approach presents relatively pessimistic results if 99.0 th percentile of the measurements have been taken into account as the WCET bounds. This can be explained by the ad-hoc measurements of the profiling-based analysis, which can produce pessimism because it derives the execution time of each bytecode from various methods invoked on the VM and most methods are invoked during the initialisation phase of the VM. As a result, some measurements could be the worst-case response time of the bytecode instead of the WCET bounds. However, it can be observed that 90.0 th percentile of the measurements of the profiling-based analysis are very close to the results derived with the benchmark-based analysis. The experiment also shows that the profiling-based analysis has some difficulties in controlling which particular

127 5.4. ESTIMATING WCET BOUNDS WITH VMTMS 109 Measurements of WCET below the 99.0th percentile 200 profiling-based benchmark-based 150 Machine cycles aload bipush iaload iastore ifgt ificmpl iinc iload Names of bytecode istore isub goto icnst0 icnst1 100 Measurements of WCET below the 90.0th percentile profiling-based benchmark-based 80 Machine cycles aload bipush iaload iastore ifgt ificmpl iinc iload Names of bytecode istore isub goto icnst0 icnst1 Figure 5.12: Comparing the profiling-based and benchmark-based analyses

128 110 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS Table 5.3: A VMTM derived with the benchmark-based analysis Bytecode th percentile th percentile Average aload bipush iaload iastore ifgt if icmple iinc iload istore isub goto iconst iconst bytecodes are to be measured and the number of the measurements of the bytecodes. Therefore, in order to obtain reliable measurements with the profiling approach, it also needs to be provided with the specific amount of the particular bytecodes whose measurements are required. Using Table 5.3, three different WCET bounds (i.e th percentile, th percentile, and average) can be estimated. The WCEF of the bubble sort algorithm is obtained with the tree-based calculation method as follows: WCET( bb Sort ( ) ) = B1+ 10 B8+ 9 (B2+B7)+ 46 B6+ 45(B3+B4+B5)+ B9 The final WCET bounds of the algorithm with different approaches (i.e. end-to-end

129 5.5. SUMMARY 111 measurement, benchmark-based analysis and profiling-based analysis) have been conducted. The estimations take account of 99.9 th percentile and 90.0 th percentile of the measurements in Table 5.4. Note that the method of estimating the pipeline effects is beyond the scope of this chapter and the technique proposed in [5] can be integrated into our approach with benchmark-based analysis. Table 5.4: Comparing the final WCET bounds Approach 99.9 th percentile 90.0 th percentile End-to-end measurement Benchmark-based Profiling-based Summary Since the aim of portable code is to support hardware interchangeability, the WCET analysis for such portable applications needs to bear portability in mind. The portable WCET analysis has been proposed with a three stage approach to analyse the highly portable and reusable Java applications for real-time and embedded systems. In this chapter, we have proposed two approaches (i.e. profiling-based and benchmark-based [56]) that can derive VMTMs efficiently. This may assist the use of portable WCET [8, 5] in practice. The major advantage of the profiling-based approach can be extended to integrate with other tracing or profiling techniques, such as POSIX-trace [108], whereas the disadvantages of the approach are that it needs the source code and knowledge of the target virtual machine and it takes time to instrument the additional code into the virtual machine. In contrast, the benchmark-based analysis is highly portable and only needs to provide a native method to access the time-stamp counter of the target processor. However, the benchmark-based

130 112 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS analysis is less convenient to integrate with other profiling techniques. Therefore, these techniques can be applied to various applications that depend on the requirements of the systems. It is possible to develop a suite of benchmark tests to allow for the derivation of the VMTM. These benchmarks would have to extract and measure certain features which are important for the WCET analysis of the VM. For example, in a basic Ravenscar-Java VM, certain timing properties of the Ravenscar-Java profile have to be taken into account. These timing properties are listed as follows: Memory allocation time for objects and the time for entering and leaving a scope memory area. Latencies of the loading of schedulable objects in a scheduling algorithm. Access time on clocks and time parameters objects Access time of monitors when entering a synchronization method. Time incurred in various object creations Latency of method calls which include both Java and native method calls Execution time of common sequences of bytecodes Based on the experimental results, the outcomes of the benchmark-based analysis approach (Figure 5.12) encourage us to carry on future work on the use of portable WCET analysis in real-time and embedded Java-based systems, whereas the results of the profilingbased analysis approach reminds us that taking account of other run-time issues, such as cache effects and branch prediction issues, can achieve relatively tighter WCET estimations. This experiment has been performed using an Intel processor. It has been argued that a measurement approach is not suitable to be used in hard real time systems. However,

131 5.5. SUMMARY 113 according to the probabilistic hard real time theory proposed in [9], the measurement approach is still valid on architectures which have less than certain target probabilities of missing the deadline. Therefore, although the Intel processor is proven not to possess such a property in our experiment, this does not mean that a measurement approach cannot be performed on hardware. Based on the probabilistic real time systems concept [9], if an architecture can be proven to have an accuracy rate above the stated threshold, our proposed benchmark-based analysis can still be applied. Due to modern hardware architecture is becoming increasingly complex and unpredictable, many research groups have been looking into producing Java VM on hardware [102, 87]. This approach is indeed very promising as it can aid the derivation of more precise VMTMs. There are still a number of issues that need to be addressed in our approach as future work, such as taking into account the timing properties of the RTSJ, cache issues, branch prediction issues and extending for just-in-time compiler techniques.

132 114 CHAPTER 5. VIRTUAL MACHINE TIMING MODELS

133 Chapter 6 A Gain Time Reclaiming Analysis As mentioned in Chapter 2, gain time is unused processor resources reserved for hard realtime threads during the design phase and it is produced when the hard real-time tasks execute in less than their worst-case execution time estimations. This may only be reclaimed at run-time since it depends on the actual executions of the hard real-time threads [26]. Given the need to provide 100% deadline predictability for hard real-time tasks, it is inevitable that the processor and other resources will be under-utilised at run-time [2]. Based on the results of the experiments from the previous chapter, it can be observed that there are many challenges when performing WCET analysis on modern processors. The use of such hardware architectures may result in large over estimations and bring about penalties of under utilisation of system resources. Furthermore, to avoid unpredictable behaviour in hard real-time applications, some dynamic features of the object-oriented programming are often prohibited from being used [47]. As a result, it is likely that the design of object-oriented real-time systems could become inflexible and the performance and utilisation relatively poor. Performing WCET analysis on object-oriented programming languages is not an easy task since it needs to balance predictability, flexibility and reusability. In addition, the run-time characteristics of Java, such as high frequency of method invocation, dynamic 115

134 116 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS dispatching and dynamic loading, make Java more difficult than other object-oriented programming languages, such as C++, for conducting WCET analysis. To be able to relax the prohibited from being used dynamic dispatching feature, annotations are introduced in Chapter 4. It should be noted that without having proper or sufficient design knowledge, the result of the WCET analysis could still be pessimistic. To offer a more flexible way to develop object-oriented real-time applications in the real-time Java environment without loss of predicability and performance, a gain time reclaiming framework that takes account of object-oriented features is necessary. This chapter presents a gain time reclaiming framework [57] integrated with WCET analysis to balance the tradeoff among flexibility, efficiency and predictability. In our approach, the predictability of hard real-time tasks is strengthened during the design phase and the performance of the whole system is reinforced with gain time reclaiming during run-time. The rest of this chapter demonstrates how gain time can be reclaimed in objectoriented real-time systems. An overview of implementation issues in the real-time Java environment and evaluation of our approach with a practical example are also given in this chapter. 6.1 Gain Time Reclaiming Typically, gain time can be reclaimed at run-time as soon as it can be identified. Reclaiming can be conducted by analysing or tracing the run-time behaviour of applications or measuring the actual executing time. The former can be performed by analysing the data-flow control of the applications, whereas the latter can be carried out by profiling the actual execution time at run-time. This chapter mainly discusses the former analysis approach on a Java architecture and gives suggestions about integrating with the automatic tracing analysis [108] for the latter technique as future work.

135 6.1. GAIN TIME RECLAIMING 117 From the point of view of the syntax of the programming languages, the levels of flow of control can be classified as follows[45]: local flow of control, which identifies statements within a routine or method to be executed; method invocations and routine calls, performing the parameter transfer and flow-ofcontrol manipulation needed to activate a new routine; and non-local jumps, which divert the control flow from the currently running routine into an ancestor routine. exceptions, which divert the control flow to an exception handler when a user or system error is detected by the system. In general, real-time threads are not allowed to use non-local jumps since this may result in unpredictable and unanalysable behaviour. However, one should note that the execution time of exception flows need to be considered for reclaiming, if they are taken into account in WCET analysis, such as in [17]. Here, we assume that exceptions are analysable. Since exceptions are analysable, they can be treated as the combination of the first and second levels of control flow. Therefore, gain time reclaiming in this thesis will be focused on the first two levels of flow of control in object-oriented programming languages. Integrating these levels with WCET analysis techniques, early gain time reclaiming in object-oriented programming languages may be classified into three mechanisms: structural constraint reclaiming ( 6.2), object constraint reclaiming ( 6.3), and functional constraint reclaiming ( 6.4). Gain time can be represented with machine cycles of the target machine if the source code of the application is translated into machine code directly. However, it could be difficult to estimate the exact machine cycles of Java applications because of the portability of Java

136 118 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS architecture. In this case, the concept of WCEF vectors [5] of basic blocks can be used instead of pre-calculated units. These WCEF vectors may be used to calculate the exact gain time when the information about the target machine is available. However, machine cycle units are used in the rest of the chapter for the sake of clarity. 6.2 Structural Constraint Reclaiming Gain Time (Path n) = WCET(S) ( WCET(Z) + WCET (Path n) ) Gain Time ( Rn ) = WCET(Rn) (n n ) ( WCET(Z) + WCET (X) ) S Rn Z Z true false case 0: case 1: case 2: case 3: case n: Path 0 WCET path Path 2 Path 3 Path n n time iteration Loop X (a) Selection Code (b) Repetition Code Figure 6.1: Structural Constraint Reclaiming Structural Constraint Reclaiming is a reclaiming technique for the gain time of local flows of control. A local flow of control is made up of a number of basic blocks in the form of sequences, selections (i.e. some pieces of code to be selected for execution based on the value of some expressions) and repetitions (i.e. pieces of code to be executed zero or more times based on the value of some expressions). Therefore, as shown in Figure 6.1,

137 6.2. STRUCTURAL CONSTRAINT RECLAIMING 119 the overestimated WCET bounds which suffer from the structure of the program can be reclaimed as soon as the exact execution path of selection code or the exact number of iterations of repetition code are determined. In the figure, the formula indicated by a gray coloured illustrates where gain time can possibly be reclaimed and how the amount of gain time can be calculated. Here, the final WCET calculation is carried out with tree-based analysis 1. This formula can be applied to the path-based approach and the IPET-based approach, if they do not use restrictions on possible directions of the execution paths (i.e. restrictions exclusive paths or inclusive paths). The use of these restrictions is discussed in Section 6.4. Formally, based on the WCET analysis rules defined in Timing Schema [103], the gain time of the structural constraints can be defined as follows: Let S be a selection code with the expression Z, and let P be an actual executed path of S in a particular execution. Then, the gain time of P can be calculated by subtracting the sum of the WCET of Z and the WCET of P from the WCET of S. This schema can be used in any type of selection code, such as if-then-else and switch-case. Consider a repetition code R with the expression Z and loop X. Here, assume that n is the maximum loop bound of R used in the static WCET analysis and n is an actual number of executed iterations in a specific execution. Then, the gain time for n iterations of R can be computed by multiplying the subtraction of n from n by the sum of the WCET of Z and the WCET of X. This schema is valid for any type of repetition code with a bounded number of iterations, such as for-loop, while-loop and do-while. The structural constraint reclaiming of a specific thread may be represented with a 1 See Section 2.1.3

138 120 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS Structural Gain Time Reclaiming Graph (SGTRG), which illustrates the exact places (i.e. offset number of the machine code or Java bytecode) and amounts (i.e. machine cycles or WCEF vectors) of gain time that may be reclaimed. Formally, the SGTRG can be annotated with gain time reclaiming nodes (l, g), where l represents the offset number of the Java bytecode and g indicates the amounts of gain time that can be reclaimed. Compiler techniques [45] that are applied to the analysis of the local flow of control are used to identify the exact places of the basic blocks of the selection and repetition code, and derive an SGTRG for each routine or method. Based on the SGTRG, the gain time of structural constraints is reclaimed by determining the actual execution path of selection code or the exact iteration bounds of repetition code at run-time. As shown in Figure 6.2, the if-then-else basic block can reclaim 30 cycles at Line 10, if the condition expression is TRUE (i.e. data[i]<0) and the while-loop is part of its worst-case path. One should note that gain time reclaiming of repetition code needs to know the maximum loop iteration used in the WCET analysis. For example, the while-loop (Line 5-26), given in Figure 6.2, needs to be provided with such information to be able to reclaim the gain time of the repetition code at run-time. Based on the maximum loop iteration and the actual number of iterations, gain time reclaiming of repetition code can be reclaimed. 6.3 Object Constraint Reclaiming Object Constraint Reclaiming is a reclaiming technique for the gain time of the dynamic dispatching methods of object constraints. In order to guarantee all hard real-time tasks in object-oriented real-time systems, most WCET researchers suggest prohibiting the use of dynamic dispatching, dynamic loading and garbage collection features [47]. We have argued for the need to use dynamic dispatching and demonstrated how to estimate a tight WCET

139 6.3. OBJECT CONSTRAINT RECLAIMING public check data ( ) { 2 int i, morecheck, wrongone ; 3 i =0; morecheck =1; wrongone= 1; while ( morecheck ) { / Say WCET( i f ) = 70 c y c l e s 8 WCET( e l s e ) = 100 c y c l e s. / 9 i f ( data [ i ] < 0 ) { 10 / Here 30 c y c l e s o f t h e s t r u c t u r a l 11 gain time o f t h e c u r r e n t i f e l s e 12 statement can be reclaimed. 13 ( i. e c y c l e s ) / wrongone=i ; morecheck =0; 16 } 17 else { 18 / Here 5 0 c y c l e s o f t h e f u n c t i o n a l 19 gain time o f t h e below i f e l s e 20 statement can be reclaimed. 21 ( i. e c y c l e s ) / i f (++ i >= DATASIZE) 24 morecheck =0; 25 } 26 } / Say WCET( i f ) = 100 c y c l e s 29 WCET( e l s e ) = 50 c y c l e s. / 30 i f ( wrongone >= 0) { 31 // Error path ; return 0 ; 34 } 35 else { 36 // Normal path ; return 1 ; 39 } 40 } Figure 6.2: An example of gain time reclaiming [71]

140 122 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS estimation of hard real-time tasks in Chapter 4 [53]. In our approach [53], a //@maxwcet() annotation is used to indicate the WCET of a dynamic dispatching method call. However, we cannot avoid the fact that the use of //@maxwcet() might have relatively pessimistic results if the class family is extremely large or the WCET estimations for different classes are spread over a wide range. In order to compensate for the penalty of the flexibility of the object-oriented programming, object gain time reclaiming is required. This section is mainly concerned with analysing the type of objects to reclaim the object constraints. Here, one should note that although modifying the reference of a parameter object in a method call cannot change the type of the parameter object, it is possible that the type of the associated objects in the parameter object could be modified by the method call. Therefore, our approach assumes that the type of associated object in a parameter object cannot be modified. The gain time of the object constraints can be defined formally as follows: Consider an object X with a method m can be overridden by child classes, and assume that the WCET of X.m should take into account the WCET of the class family F. Let i be an instance of X and A be an actual type of the instance to be executed in a particular execution. Then, the gain time of i.m in the particular execution can be calculated by subtracting the WCET of A.m from the WCET of F.m. Based on the thread-based CFG 2, all instances of various objects that are created in each real-time thread can be identified. Then, by analysing the assignment or type changing code of each instance we can distinguish the lifetimes of particular types of each instance, and produce an Object Type Lifetime Graph (OTLG), a diagram which represents the lifetimes of types of particular instances in a specific thread. Formally, an OTLG is made up of two types of component: node and edge. A node, named type changing node in Figure 6.4, 2 A thread-based CFG is a control flow graph which illustrates not only the local flow of control of the thread, but also the local flow of control of each method invocation of the thread in detail.

141 6.3. OBJECT CONSTRAINT RECLAIMING 123 denotes a place where the type of an instance is changed, whereas an edge illustrates the lifetime of a particular type of an instance between two nodes. An OTLG can be represented with a number of type changing nodes that can be formally expressed with Θ(l, t), where l indicates the offset number of the Java bytecode and t denotes the possible types of the instance at run-time. In the OTLG, symbolic type references are used to represent the relationship between the dynamic dispatching objects of the same class family during run-time. After discriminating between the lifetimes of specific types of each instance, analysing method invocations on each type of the instance can determine the amount of gain time that can be reclaimed. Here, symbolic references that are represented in OTLGs are solved by analysing the associated instances in the specific thread incrementally so that gain time can be reclaimed as early as possible. Following this, the exact places and amounts of object gain time reclaiming can be identified and illustrated in an Object Gain Time Reclaiming Graph (OGTRG), which is a diagram which illustrates places where the type of the instance should be traced or the object constraint reclaiming may take place. Formally, an OGTRG is made up of two types of nodes: type changing node and gain time reclaiming node. A type changing node denotes a place where the type of instance is changed but where it is not possible to identify the exact amount of gain time, whereas a gain time reclaiming node indicates a place where the exact amount of gain time of a particular type of the instance occurs. As mentioned above, type changing nodes and gain time reclaiming nodes can also be denoted with Θ(l, t) and (l, g) in an OGTRG respectively. The gain time reclaiming of all instances in the real-time task can be merged together and provided for the run-time environment to reclaim them. Considering the example in Figure 6.4, four instances (i.e. aa,bb,cc, and dd) need to be analysed to carry out the object constraint reclaiming analysis in the App real-time thread. Using the object constraint reclaiming mechanism mentioned above, an object

142 124 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS 1 / 2 Assume t h a t Class A i s a parent c l a s s. 3 Class B, C and D extend A, and 4 o v e r r i d e t h e m1( ) methd. 5 / 6 class App extends RealtimeThread { public void run ( ) { 9 A aa= new A ( ) ; B bb= new B ( ) ; 10 C cc = new C ( ) ; D dd= new D( ) ; / 13 I n i t i a l v a l u e s o f x, y and z 14 are from t h e environment. 15 / 16 i f ( x > 5) { 17 cc = dd ;... ; 18 } i f ( y == 5) { 21 aa = dd ;... ; 22 } 23 else { 24 aa = bb ;... ; 25 } 26 bb = cc ; i f ( z == true ) { 29 aa.m1 ;... ; 30 aa.m1; 31 } else { 32 aa.m1 ;... ; 33 } 34 bb.m1 ;... ; 35 bb.m1; 36 } 37 } Figure 6.3: An example of object constraints

143 6.3. OBJECT CONSTRAINT RECLAIMING 125 run method CFG object aa run method OTLG object aa run method OGTRG object aa run method false x>5 true cc=dd false y==5 true aa=dd bb=cc aa=bb false y==5 true aa=bb aa=dd false z==true true aa.m1 aa.m1 aa.m1 aa = A aa = B or D aa = B or D Based on the type of aa identified above, 1 or 2 times of the gain time of aa.m1 can be reclaimed false z==true true aa.m1 aa.m1 aa.m1 bb.m1 bb.m1 : begin / end : basic block : expression object bb run method x>5 true cc=dd bb=@cc false object bb run method bb = B cc = C or D bb object bb run method Here, the type of the bb can be known as soon as the type of the cc is determined. Therefore, based on the type of cc, 2 times of the gain time of bb.m1 can be reclaimed. : type changing node : gain time reclaiming node bb.m1 bb.m1 Figure 6.4: Producing OGTRG from Figure 6.3 gain time reclaiming graph for each instance is conducted by a modified compiler or tool automatically. A diagram which illustrates the transformation of two instances (i.e. aa and bb) from CFG to OGTRG is given in Figure 6.4. In the figure, the type of instance aa can be identified in the second if-then-else statement (i.e. type changing node) and the exact number of method invocations can be determined in the last if-then-else statement (i.e. gain time reclaiming node). Therefore, as soon as the expression of the last if-then-else statement is executed, the object gain time of instance aa can be reclaimed. Note that solving the symbolic expression of an associated class family can improve the reclaiming as early as possible. As shown in Figure 6.4, the gain time of instance bb can be reclaimed as soon as the type of instance cc is determined. Therefore, the gain time reclaiming node of instance bb can be indicated at the first if-then-else statement.

144 126 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS Our approach can also be applied to the analysis of method-based applications that are built as libraries or packages in object-oriented programming. In such library packages, the type of the objects can be generic and, therefore, the WCET estimation of the library package will be pessimistic. This can be compensated with the integration of our approach to reclaim gain time as soon as caller methods provide the types of input parameters. 6.4 Functional Constraint Reclaiming Functional Constraint Reclaiming is a reclaiming technique for the gain time of the restrictions of possible directions of execution flows. Identifying the exclusive paths [71] or various modes [18], based on design knowledge, to calculate the WCET estimations of the real-time applications has been widely used in the WCET field. By analysing real-time threads with annotations that indicate exclusive paths or modes, relatively safe and tight WCET bounds can be estimated. We acknowledge such efforts and contributions in the WCET domain. However, one should note that it is possible that the WCET estimations of the exclusive paths or different modes are spread over a wide range, and the exact execution path or mode cannot be determined during the design phase. In such cases, the reductions of the pessimistic estimations can still be limited. Moreover, analysing functional constraints in object-oriented programming is much more complicated than in procedural programming. This means that data dependency issues in object-oriented programming languages can lead to pessimistic WCET estimations that suffer from not only structural constraints but also object constraints. Hence, as well as analysing the exclusive paths and modes in the structure of the programs, the analysis of functional constraints must take into account the pessimistic bounds that suffer from the dynamic characteristic of object types in objectoriented programming.

145 6.4. FUNCTIONAL CONSTRAINT RECLAIMING 127 For the most part, these pessimistic WCET estimations that suffer from data dependencies can be addressed with the structural and object constraint reclaiming mechanisms. However, to be able to reclaim gain time as early as possible, these mechanisms can be integrated with the WCET analysis approaches that bear data dependencies issues in mind. The rationale of functional constraint reclaiming is based on the concept of identifying the exclusive paths or various modes of the WCET analysis approaches. Therefore, we assume that the exclusive paths or various modes can be distinguished with annotations that are introduced by WCET analysis. Taking advantage of these annotations, a modified compiler or gain time reclaiming tool derives these exclusive (or inclusive) paths and modes from the programs. Based on the identified paths and modes, we can determine the places where these paths or modes can be distinguished. As soon as the exact path or mode can be discriminated or executed, the gain time of the path or mode can be reclaimed. Here, the gain time of particular paths can be reclaimed if the exclusive path of the current executing path is the worst-case execution path or the inclusive path of the current path is not the worst-case execution path. This means that the gain time of the exclusive or inclusive path of the current executing path can be reclaimed by determining if the current path is the worst-case execution path. Similarly, as soon as the specific mode is identified and it is not the worst-case execution mode, then the gain time of the mode can be reclaimed. In both cases, the functional constraint reclaiming should consider the data dependencies issues connected with object constraints. Formally, the gain time of the functional constraints can be defined as follows: Suppose that S is a section of code that includes exclusive paths, and P is an actual executed path of S in a particular execution. Then, the gain time of P can be computed by subtracting the WCET of P from the WCET of S. This schema can also be used in calculating exclusive modes of functional constraints.

146 128 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS The functional constraint reclaiming of a specific thread may be represented with a Functional Gain Time Reclaiming Graph (FGTRG), which is a gain time reclaiming graph that illustrates the exact places and amounts of gain time that may be reclaimed. The functional gain time reclaiming can be assumed as an advanced mechanism that is a combination of the structural and object constraint reclaiming mechanisms of a particular path or mode. Therefore, the formal definitions of the nodes introduced for OGTRG (i.e. Θ(l, t) and (l, g)) can also be applied to FGTRG. For example in Figure 6.2, the WCET estimations of two if-then-else statements (i.e. Line 9-25 and Line 30-39) are exclusive to each other, but the structure of the program may increase the WCET estimation. In other words, the WCET of these two if-then-else statements will not be executed at the same time due to its functional constraints. Therefore, based on WCET annotations identifying the exclusive paths or modes, the gain time associated with the normal mode (i.e. the else statement at Line 35-39) can be reclaimed in Line 18 at run-time. It can be noted that using functional constraint reclaiming may reclaim the gain time earlier than the structural constraint reclaiming. However, the gain time that can be reclaimed by functional constraint reclaiming could overlap with that which can be reclaimed by the structural and object constraint reclaiming. Therefore, combining different constraint reclaiming techniques leads to the necessity of eliminating the gain time that overlaps with other constraints. This is performed by combining the gain time reclaiming graphs into CFG. Then, the overlapped gain time of each basic block can be examined and eliminated. 6.5 Implementation Guideline We use the XRTJ architecture [55] to demonstrate how the gain time reclaiming is implemented for object-oriented programs. However, the implementation approach is not

147 6.5. IMPLEMENTATION GUIDELINE 129 restricted to our architecture. Following the philosophy of portable WCET analysis, the gain time reclaiming approach may be divided into two stages: analysing gain time reclaiming graphs that can be conducted at compilation, and run-time reclaiming that can be carried out at the target platform. To accommodate different target environments, two alternative implementations are suggested. One is instrumented method reclaiming and the other is run-time support reclaiming. The former adds instrumented methods that are capable of reclaiming the gain time and of cooperating with scheduling algorithms into Java class files during compilation, whereas the latter extracts gain time reclaiming graphs as annotations during compilation and these reclaiming graphs are transformed as reclaiming methods at run-time. Further details are discussed in the following subsections. Note that it could be possible that the actual reclaimed gain time is less than the runtime overhead of the reclaiming. In this situation, the gain time should be either neglected or accumulated until it is worth reporting. Which gain time reclaiming nodes need to be removed can be identified on the basis of an acceptance value that may be used to examine if the overhead of the gain time reclaiming is acceptable at run-time. The technique with which we can determine this is not in the scope of this thesis. Furthermore, to support repetition code with the gain time reclaiming mechanism, either the run-time system, such as the Virtual Machine, must support a mechanism to count the exact iteration of the loop at run-time or additional code must be introduced by a modified compiler to count the loops. In addition, a type tracing mechanism needs to be provided for type changing nodes where the exact amount of gain time cannot be identified. This can be implemented by identifying the position where the pointer of the object is changed.

148 130 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS Java Program (+ Annotations) Java Program (+ Annotations) XAC Translator XRTJ Compiler Traditional Java Compiler XAC Translator XRTJ Compiler Traditional Java Compiler Gain time reclaiming Gain time reclaiming Java Class File (+ Instrumented gain time reclaiming methods) Extensible Annotation Class (+ gain time reclaiming annotations) Java Class File (a) Instrumented Method Reclaiming Approach (b) Run time Support Reclaiming Approach Figure 6.5: Analysing gain time reclaiming during compilation Analysing Gain Time Reclaiming Graphs Based on the thread-based CFG of each real-time thread, gain time reclaiming graphs can be produced from analysable source programs or Java class files by a modified compiler or tool that supports our gain time reclaiming mechanisms discussed above. As shown in Figure 6.5, these gain time reclaiming graphs can be extracted during compilation in order to reduce the run-time overhead. The three gain time reclaiming graphs of a particular thread can be merged together into a reclaiming graph, called Thread Gain Time Reclaiming Graph (TGTRG), which illustrates places where the gain time reclaiming may take place in the thread. Formally, a TGTRG is made up of gain time reclaiming nodes (l, g) with the amounts of gain g time that can be reclaimed, type changing node Θ(l, t), since it is a combination of SGTRG, OGTRG and FGTRG. At this stage, during compilation, instrumented methods that are capable of reclaiming the gain time and of cooperating with scheduling algorithms during compilation are added in the instrumented method reclaiming approach, and gain time reclaiming graphs

149 6.5. IMPLEMENTATION GUIDELINE 131 are generated as annotations in the run-time support reclaiming approach. Run-Time Reclaiming Java Class File (+ Instrumented gain time reclaiming methods) Extensible Annotation Class (+ gain time reclaiming annotations) Java Class File Reconstructing Reconstructing Translating WCEF to machine cycles if necessary Translating WCEF to machine cycles Instrumenting gain time reclaiming methods Reclaiming Reclaiming gain time as soon as the instrumented methods are executed. Reclaiming Reclaiming gain time as soon as the instrumented methods are executed. Real Time Java Virtual Machine Real Time Java Virtual Machine (a) Instrumented Method Reclaiming Approach (b) Run time Support Reclaiming Approach Figure 6.6: Gain time reclaiming at run-time As shown in Figure 6.6, the gain time reclaiming may be carried out at the target platform in two stages: Reconstructing and Reclaiming. Here, the aim of reconstructing is to compute the information which are necessary to enable gain time reclaiming on the target platform. Note that the implementations of reconstructing are different in instrumented method reclaiming implementation and run-time support reclaiming implementation. The implementation of the reconstructing stage of instrumented method reclaiming is relatively simple. The exact machine cycles of gain time which are provided in the instrumented method with WCET vectors are calculated. In contrast, the reconstructing stage of run-time support reclaiming implementation includes mapping Java bytecode with associated gain time reclaiming nodes, and instrumenting gain time reclaiming methods. Here, a modified Java virtual machine or tool loads the associated Java class files and the

150 132 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS TGTRG of each task-based real-time thread stored in the XAC files, and reproduces the relationship between gain time reclaiming nodes and Java bytecodes. Then, the gain time reclaiming graphs can be translated into method invocations that support the integration with the scheduling algorithms. The exact machine cycles of gain time which are provided with WCET vectors can be calculated at the reconstructing stage. Based on the instrumented gain time reclaiming methods, gain time can be reclaimed automatically as soon as reclaiming methods are executed. The reclaimed gain time can be collected by the associated scheduling algorithm and can be used to improve the overall performance of the whole system. How to integrate the gain time reclaiming with the scheduling algorithm is outside the scope of this chapter. Techniques such as Dual-Priority Scheduling [25] and Dynamic Sporadic Server [13] are applicable. 6.6 Summary This chapter has demonstrated a gain time reclaiming framework integrated with WCET analysis for high performance real-time Java systems. Our approach shows that integrating WCET with gain time reclaiming can both provide a more flexible environment to develop real-time Java applications and may also improve the utilisation of the whole system. As discussed above, three types of gain time reclaiming mechanism can be applied to real-time Java programs. The analysis of structural constraint reclaiming and object constraint reclaiming can be fully automatic with a modified compiler or tool that supports our mechanisms, whereas the functional constraint reclaiming partially needs to be integrated with annotations used in WCET analysis. Furthermore, these mechanisms can also be applied to the development of library packages or generic applications to improve the utilisation and overall performance of the whole system. This will be discussed in Chapter 8 with two case studies.

151 6.6. SUMMARY 133 Attention should be drawn to the fact that the granularity of the gain time reclaiming should be balanced to avoid large run-time overheads. This is discussed as future work in Chapter 9. In addition, it should be noted that the impact of gain time identification on static WCET analysis may depend on different implementation techniques integrated with the calculation method of the WCET analysis. The impact of automatic gain time identification on the tree-based approach has been discussed in [3]. Although some implementations may incur a non negligible impact on WCET analysis, particular methods may offer acceptable complexity of integration with WCET analysis tools.

152 134 CHAPTER 6. A GAIN TIME RECLAIMING ANALYSIS

153 Chapter 7 Prototype Implementation This chapter describes the prototype implementations and a summary of tools that we have used in the thesis to evaluate the feasibility of our proposed approach. The discussion is divided into two main sections that are based on the analysis framework proposed in Chapter 3. Section 7.1 illustrates the prototype implementation of the platform-independent analysis phase, in which an annotation-aware Java compiler, called XRTJ-Compiler, is discussed. Section 7.2 discusses how VMTMs can be derived with two different approaches for various target platforms. The tool chain of the XRTJ environment is presented in Figure XRTJ-Compiler The prototype implementation of XRTJ-Compiler is based on an open source 1 Java compiler called Kopi Java Compiler (KJC), which is part of the Kopi project [66] that includes a set of tools to edit and generate Java class files. KJC is a Java compiler written in Java. The basic classes for KJC compiler are contained in the package at.dms.compiler. All classes necessary to read or write a class file are defined in the package at.dms.classfile. The package at.dms.kjc consists of all classes of the KJC compiler. The KJC compiler 1 It is available under the terms of the GNU Public License. 135

154 136 CHAPTER 7. PROTOTYPE IMPLEMENTATION Platform Independent Analysis Platform Dependent Analysis Profiling Based Benchmark Based Java Program +Annotations JVM Instrumented Java Template XRTJ Compiler Adding Instrumentation Code Adding sections of code to be measured Target JVM +Instrumentation code Instrumented Code to be measured XAC files Java Class Files Profiling Timing Information Extracting Timing Information Measurement Values Gain Time Reclaiming Analysing Measurements VMTM WCET Calculation Figure 7.1: Tool chain of the XRTJ environment translates a Java source file to a class file in five steps. As shown in Figure 7.2, the compiler performs the scanning of tokens and parsing into Java-semantic syntax tree; and checks interfaces, initializers, and code body, before generating a Java class file Extracting XAC files As the annotations proposed in our approach are represented as comments in Java, the XRTJ-Compiler can derive annotations from comment statements of the source code. In

155 7.1. XRTJ-COMPILER Java files Scanning 1. Extract comments as JavaSytleComment objects 2. Tokens Parsing 1. Extract annotation tokens 2. Add them into the parse trees 3. Relocate the annotation tokens to corresponding syntax tokens 3. ASTs Interface Checking Initializers Checking Code Body Checking 4. Annotated ASTs 5. Code Sequence 6. Instructions Code Generating 1. Analyse the AST to carry over annotations with corresponding nodes 1. Store annotations with the corresponding instruction 7. Java bytecodes 1. Produce bytecodes from the instruction codes 2. Dump into the class files 8. Class files 1. Produce the template of XAC file 2. Load the bytecodes from the memory buffer 2. Calculate the offset number of the bytecodes 3. Update the offset number of each annotation 4. Generate XAC annotations of each method 9. XAC files XAC Generating 1. Dump into a separate XAC file for each class. Figure 7.2: XRTJ-Compiler the KJC compiler, a JavaStyleComment object array is defined in JStatement to handle Java comment strings for various purposes. During the parsing stage, Java comments are derived from the source token strings and stored as JavaSytleComment objects with the associated JStatement objects. However, the syntax tree or object representation of the source is recursively built with Java semantic-like objects. Therefore, modifications need to be made to the parser to keep the annotations with the associated token objects rather than stored with JStatement. Here, the XRTJ-Compiler

156 138 CHAPTER 7. PROTOTYPE IMPLEMENTATION assumes that the annotation located before a statement is associated with the statement. Therefore, annotations are passed over to the descendant tokens from the JStatement when the object representation of the Java statement is recursively built as a tree. The annotation will be stored in the first token of the Java statement that follows the annotation. During the code generation phase, Java-semantic objects (i.e. the intermediate object representation) are translated into CodeSequence objects. The CodeSequence objects are stored in the AttributeList as code attributes in the CodeInfo objects. As in Java class format, CodeInfo objects are stored in MethodInfo and the MethodInfo objects are stored in ClassInfo. We have modified at.dms.kjc and at.dms.classfile packaged to extract XAC files after generating class files. The implementation of the extraction of annotations in the XRTJ-Compiler is similar to the generating of Java class files in the KJC compiler. The derivation of an XAC file is invoked in ClassInfo and associated annotations of each bytecode are generated recursively. It should be noted that an XAC file is generated after producing the associated class file since offset numbers of the annotations need to be provided in the XAC file to be able to be reconstructed with associated bytecode in static analysis tools. Examples of XAC files generated from the XRTJ-Compiler are given in Figure 5.11 and Figure Deriving WCEF Vectors and CFG The prototype implementation has also been developed with deriving control flow graphs(cfg) and Worst-Case Execution Frequency (WCEF) 2 vectors. Generating control flow graphs during compilation is relatively straightforward. A ControlFlowGraph object is provided in the at.dms.ssa package to be able to offer single-static analysis in the KJC compiler. When generating XAC files, the XRTJ-Compiler creates a control flow graph for each method. To 2 WCEF vectors represent execution-frequency information about basic blocks and more complex code structures that have been collapsed during the first part of the portable WCET analysis.

157 7.1. XRTJ-COMPILER <WCEFVectors + CFG> <SubTAG=1> 5 <Method=bb Sort : ( [ I )V> 6 <SubTAG BODY> 7 BB: ( 0 9 ) S u c c e s s o r s : [ 1. BB: ( ) ] 10 BB: ( ) S u c c e s s o r s : [ 1. BB: ( ) ] 13 BB: ( ) i f i c m p l e : 1 16 S u c c e s s o r s : [ 1. BB: ( ) 2.BB: ( ) ] 17 BB: ( ) S u c c e s s o r s : [ 1. BB: ( ) ] 20 BB: ( ) 21 i i n c : 1 22 S u c c e s s o r s : [ 1. BB: ( ) ] 23 BB: ( ) i f i c m p l e : 1 26 S u c c e s s o r s : [ 1. BB: ( ) 2.BB: ( ) ] 27 BB: ( ) 28 i i n c : 1 29 S u c c e s s o r s : [ 1. BB: ( ) ] 30 BB: ( ) 31 i l o a d 1 : 1 32 i f g t : 1 33 S u c c e s s o r s : [ 1. BB: ( ) 2.BB: ( ) ] 34 BB: ( ) 35 return : 1 36 S u c c e s s o r s : [ ] 37 </SubTAG BODY> 38 </SubTAG=1> </WCEFVectors + CFG> Figure 7.3: An example of CFG in an XAC file

158 140 CHAPTER 7. PROTOTYPE IMPLEMENTATION be able to do this, an individual basic block needs to be identified in each method. Then, WCEF vectors of individual blocks can be generated simply by examining the CFGs. From the CFG of each method, the total number of identical bytecode instructions in each basic block can be calculated. The XRTJ-Compiler then inserts the WCEF vectors and control flow graphs into the XAC file. A sample XAC file containing WCEF vectors is given in Figure 5.11 and its control flow graph is given in Figure 7.3. The XAC file is derived from the bubble sort algorithm given in Figure 5.9 and its individual basic blocks with their offset numbers is given in Figure Identifying Gain Time Reclaiming Points As discussed in Chapter 6, the gain time reclaiming approach can be implemented in two alternative ways: instrumented method reclaiming and run-time support reclaiming. Due to the limited amount of time and resources, we have developed a simplified version of the instrumented method reclaiming approach and a part of the run-time support reclaiming approach. The major purpose of these implementations is merely to demonstrate how the gain time reclaiming approach can be integrated into the Java architecture. Note that the derivation of most optimised gain time reclaiming graphs is out of the scope of this prototype implementation. The implementation is integrated into the XRTJ-Compiler. During compilation, the compiler needs to identify possible gain time reclaiming nodes of applications to be able to extract gain time reclaiming graphs. In line with our approach, three types of gain time reclaiming graphs can be derived by the XRTJ-Compiler. Identifying structural gain time reclaiming nodes includes identifying selection code (i.e. if-then-else and switch-case) and repetition code (i.e. for-loop, while-loop and do-while). In the XRTJ-Compiler, the intermediate representation is built up with Javasemantic objects with associated annotations. Therefore, based on the object representation

159 7.1. XRTJ-COMPILER 141 syntax tree, structural gain time reclaiming points can be identified. We used CodeLabel objects in the prototype implementation to be able to insert labels for analysing and extracting structural reclaiming annotations. Then, these reclaiming annotations are stored as structural reclaiming nodes with the offset number of the associated bytecodes. These gain time reclaiming nodes can be optimised with other reclaiming nodes by analysing them with the control flow graph of the method. The prototype does not implement the optimisation part. The object gain time reclaiming points is implemented in the prototype, without including the optimisation phase. The object reclaiming points are identified at object assigning points where JAssignmentExpression objects are used in the syntax tree of the XRTJ- Compiler. However, as this assignment object is used in all data types, including primitive data, arrays and object types, the XRTJ-Compiler needs to analyse each assignment expression and distinguish the object type assignment. Performing this can produce the type changing nodes of the application. After accumulating the exact location of type changing nodes, these annotations can be stored as object time reclaiming nodes for the static analysis tools. Implementation of the functional gain time reclaiming part involves two states. One is the identification of functional constraints reclaiming annotations and the other is the insertion of gain time reclaiming nodes. The implementation of the former is similar to extracting manual WCET annotations, discussed in the previous Section However, as the implementation of functional gain time reclaiming nodes involves the integration and optimisation of other reclaiming points, we have not included this in the prototype. Therefore, the identification of functional gain time reclaiming points is simply the extraction of WCET annotations for this prototype. All these gain time reclaiming nodes are stored with the offset number of the associated bytecode. Therefore, annotation-aware tools or the run-time support virtual machine can

160 142 CHAPTER 7. PROTOTYPE IMPLEMENTATION load them to analyse the applications statically or dynamically. As discussed in Chapter 6, the gain time reclaiming approach can be implemented without modifications to the Java virtual machine. The instrumented method reclaiming approach is implemented in the XRTJ-Compiler without any optimisation techniques. The prototype implementation is carried out with the insertion of Java methods and native methods to gauge the gain time reclaiming nodes. The implementation is straightforward as it only involves inserting additional method into the Java application by the compiler. 7.2 Virtual Machine Timing Model As discussed in Chapter 5, deriving virtual machine timing models for various target platforms is conducted using the measurement approach. Extracting program execution time can be carried out with either measuring time by cycle counting (i.e. cycle counter) or measuring time by interval counting (i.e. timers). These measuring mechanisms can be conducted with either software instructions and/or hardware interfaces. Here, we use the rdtsc instruction, which has high resolution and very low overhead at run-time, provided in x86 architecture [58] to extract the time-stamp counter of the processor. To ensure the instructions to be measured are executed in order during profiling, a serializing instruction (e.g. cpuid) is invoked before extracting the time-stamp counter of the processor. The prototype implementations of the two measurement approaches proposed in Chapter 5 is given below Profiling-based Approach The major aim of this implementation is to evaluate the profiling-based approach proposed in Chapter 5. The experimental implementation of this approach has been carried out on

161 7.2. VIRTUAL MACHINE TIMING MODEL 143 the reference implementations of RTSJ provided by TimeSys [109]. Basically, the instrumenting code, including the serializing and time-stamp counter instructions, is added into the interpreter engine. Before starting the interpretation of a method, a buffer to store run-time information is prepared. Then, the execution time of each bytecode can be measured as the time interval between the point when an opcode is fetched and the point before fetching the next one. The run-time information captured by the interpreter is classified by the opcode mnemonics. On completion of the method, the captured run-time information can be conducted with statistical analysis to produce the VMTM. The experimental results have been discussed in Chapter Benchmark-based Approach To minimise the cost of developing the benchmark-based approach, we used the set of tools offered in Kopi Compiler Suite that can be used to edit and generate Java class files. One is Java disassembler (Dis), which is a disassembler that can convert Java class files into Kopi assembly language files. Another is Java assembler (KSM), which is a compiler that can translate Kopi assembly language files into Java class files. Java Program + JN I J a va c Java Class file disassembler Kopi Assembly Language Java Class file + additional bytecode(s) to be measured Assembler Instrumenting bytecode(s) Figure 7.4: Instrumenting a specific set of Java bytecode

162 144 CHAPTER 7. PROTOTYPE IMPLEMENTATION As shown in Figure 7.4, a Java program with a native method that can access the time-stamp counter can be translated into Java bytecodes by a traditional compiler. Then, the class file can be translated into Kopi assembly language to enable the easy insertion of a specific set of Java bytecodes under text mode. As discussed in Chapter 5, additional complementary bytecodes need to be provided in order to maintain the data integrity of the Java stack for the whole program. RDTSC s JNI library for the target platform Java Class file + additional bytecode(s) to be measured ge ne ra te s Java APIs + (Real Time APIs) Target Virtual Machine Virtual Machine Timing Model Figure 7.5: Measuring WCET of the specific set of bytecodes on the target VM and generating VMTMs After injecting the specific bytecodes including the bytecodes to be measured and the additional complementary bytecodes, the file saved in the Kopi assembly language format can be translated into standard Java class files. As presented in Figure 7.5, these individual instrumenting Java programs are combined together into a comprehensive benchmark that can generate VMTM automatically. Then, the individual Java program or the benchmark is ready to be used for measuring the execution time of the specific set of bytecodes on any target platform. The evaluation of the benchmark was discussed in Chapter 5.

163 7.3. SUMMARY Summary This chapter has demonstrated the prototype implementation of our research work proposed in this thesis. The implementation has mainly involved modifications to the traditional Java compiler to extract timing information (i.e. WCET annotations), flow control information (i.e. control flow graph), and WCEF vectors for portable WCET analysis. Additionally, gain time reclaiming mechanisms are implemented in the prototype compiler. As the aims of the prototype implementation are to evaluate the proposal, there is still room for improvement, for example, refining the validation of the correctness of annotations, optimising the gain time reclaiming graphs, and reducing the run-time overheads of reclaiming on target architecture. However, the current prototype implementation has demonstrated the feasibility of applying our framework on the real-time Java architecture. In addition to developing an annotation-aware compiler with gain time reclaiming techniques, two measurement approaches have been evaluated. The evaluation was carried out with modifications to RTSJ-RI, additional tools that can manipulate Java class files and developing a Java native method to be able to access time stamp counters from x86 processors. The prototype implementation has shown that the two proposed approaches can be used to derive virtual machine timing models from various target platforms with their range of available resources.

164 146 CHAPTER 7. PROTOTYPE IMPLEMENTATION

165 Chapter 8 Case Studies This chapter presents two case studies which demonstrate how our approach can be applied to practical systems in order to create a balance of the flexibility, predictability, and utility of the system. Section 8.1 discusses a case study on a real-time Java project for the implementation of the Attitude and Orbit Control Systems (AOCS). The aim of this case study is to demonstrate how our approach can be applied to real-time Java architectures in which dynamic dispatching methods are used by means of the interface facility. Section 8.2 describes how our approach can also be used in other applications where dynamic dispatching methods are applied in the class facility. 8.1 Case Study One: Attitude and Orbit Control Systems Background of AOCS Typically, an autonomous satellite system (Figure 8.1) is controlled from a ground station through a radiofrequency link. The communication between satellite systems and ground stations may depend on the specific mission assigned to the satellite. It could be continuous or extend only over a section of the orbit. The communications may be classified into two groups: telecommands and telemetry data. Telecommands are instructions sent by the ground station to the satellite, whereas telemetry data are sent by the satellite to the 147

166 148 CHAPTER 8. CASE STUDIES Figure 8.1: A typical structure of Satellite System [92] ground station. Telecommands are normally sent as asynchronous data packets to override internal decisions taken by the on-board software. However, telemetry data could be sent both synchronously and asynchronously. Telemetry data can be classified into two broad categories: payload telemetry and housekeeping telemetry. The former provides the data collected by the satellite for specific missions and the latter delivers information about the general status of the satellite. Figure 8.2: A block diagram of Satellite System [92] As shown in Figure 8.2, the AOCS is a subsystem of the satellite system and the major intention of the subsystem is to control the attitude and orbit of the satellite. A summary of its main functions is given as follows [92]:

167 8.1. CASE STUDY ONE: ATTITUDE AND ORBIT CONTROL SYSTEMS 149 Attitude Control Function: This controls the satellite at a nominal attitude which is defined by the ground station or generated internally by the system. The attitude is maintained in closed loop and autonomously on-board the satellite. It is a cyclical function and includes acquiring data from attitude sensors, processing the data to estimate the current attitude, computing the deviation of the attitude, and controlling attitude actuators. Orbit Control Function: This maintains the satellite at a nominal orbit defined by the ground. The function is mainly controlled by applying forces to the satellite. Telecommand Processing: This mainly handles commands sent from the ground through the central satellite computer. Commands may include setting nominal attitude, commanding reconfiguration of AOCS units, changing AOCS operation modes and commanding orbit control manoeuvre. Telemetry Processing: This is cyclical functions in which accumulated telemetry data are sent to the ground through the central satellite computer. Telemetry data that are contained in a telemetry packet include power status, health status, current operational mode, the latest estimate of satellite attitude, the latest readings from AOCS sensors, the latest set of commands sent to the AOCS actuators and telecommands log. Failure Detection and Isolation: This is a mission specific cyclical function in which the AOCS attempts to detect anomalies and isolate their cause. Failure Recovery: This carries out a failure recovery action where the detection of a failure occurs. Reconfigurations: This performs a reconfiguration when the system has detected a failure. The major aim of a reconfiguration is to exclude the faulty unit.

168 150 CHAPTER 8. CASE STUDIES Manoeuvre Execution: This executes a sequence of actions triggered by the ground through telecommands at specified time. Figure 8.3: AOCS System [92] The AOCS software is generally built as one single load module that is burned into PROM on the AOCS computer or uplinked by telecommands after its launch. The AOCS software is typically organised as a collection of tasks. Cyclic scheduling without preemption is used. The cycle period is the same as the AOCS control cycle. The AOCS software was originally developed in Ada83. A prototype built for the AOCS Framework Project is written in C++ to design an objectoriented software framework for the AOCS software. The AOCS Framework is aimed at offering reusable frameworks to cover general functionalities, such as telemetry handling, telecommand handling and failure detection, which are mostly used in satellite applications. As the primary design of the framework is made with reusability in mind, some of the functionalities can also be applied to other subsystems of mission-critical systems, such as spacecraft subsystems. The original AOCS Framework was written in C++. This has been ported to Java to offer greater flexibility, reusability and maintainability. Both the versions (i.e. C++ and Java) have used dynamic dispatching features provided by the languages. This is what led to our interest in using the project in our case study.

169 8.1. CASE STUDY ONE: ATTITUDE AND ORBIT CONTROL SYSTEMS 151 Our case study is carried out on a Real-Time Java Project [92], which is a follow-up project to AOCS Framework Project to port the Java architecture. A summary of the Real-Time Java project is given in the next section Real-Time Java Project for AOCS The primary concept of the AOCS framework is that of using interfaces that externally define the components of the application to be derived from the framework, which means that the framework design does not need assumptions about the way interfaces are implemented. Therefore, the framework can provide greater reusability and maintainability. This can be achieved by using abstract interfaces supported in Java technology. In the telemetry processing unit, for example, AOCS collects various types of data and sends them back to the ground. As a result, different types of data need to be handled in different ways. This could, therefore, imply that the data manager needs to be updated when telemetry data need to be updated. To be able to deal with different telemetry data formats and avoid modifications to the system whenever telemetry data formats or types are updated, abstract interfaces are used in AOCS implemented in Java. In the AOCS Framework, an abstract interface called Telemeterable is introduced to offer reusability of the telemetry data manager. In the abstract interface, a method called writetotelemetry(), which can be implemented in particular mechanisms to deal with different telemetry data formats in child classes, is defined. The classes of objects whose state might have to be sent to telemetry are made to inherit from the interface Telemeterable. Therefore, the telemetry manager does not need any knowledge on how each telemetry data object is implemented. As shown in Figure 8.4, a reference to the list of the telemetry data objects needs to be provided in the telemetry manager to be able to handle various types of telemetry data format. It can be observed that this implementation of the telemetry manager is totally independent of the telemetry layout and content.

170 152 CHAPTER 8. CASE STUDIES Telemeterable l i s t [N ] ; for ( a l l o b j e c t s i n l i s t ) do 84 l i s t [ i ]. writetotelemetry ( ) ; Figure 8.4: TelemetryManager The prototype implementation of the AOCS project in the Java language is organized as three packages: RtjAocsFrameworkpackage, JbedSmmApplicationpackage, and GroundStation package. The first package contains the code of the applications layer of the AOCS framework. This is self-contained except for the part that relates to the linking of Matlab-generated routines into the framework. The second package contains the code relative to the instantiation of the prototype SMM application in the Jbed environment [40]. It could only be used if the Jbed environment is available. A simulator of the ground stations is provided in the third package Estimating WCET bounds of TelemetryManager This section illustrates how the WCET bounds of the TelemetryManager thread were estimated. The class hierarchy of the Telemeterable objects is given in Figure 8.5 and the

171 8.1. CASE STUDY ONE: ATTITUDE AND ORBIT CONTROL SYSTEMS 153 Figure 8.5: The class hierarchy of the Telemeterable Object source code of the TelemetryManager object which is in line with RTSJ is given in Figure 8.6. As shown in the figure, each execution cycle of the TelemetryManager thread is relatively simple. It loads a list of Telemeterable objects and writes the telemetry data to the central computer according to the telemetry type of the data. However, it should be noted that objects in the telemetry list which are defined in the thread implement the Telemeterable interface. Therefore, the writetotelemetry() method invoked in the for-loop of the TelemetryManager could be any object that implements the Telemeterable interface in the RtjAocsFramework. As a result, the WCET estimation of the thread needs to consider all telemeterable objects. In the RtjAocsFramework package, there are two main classes (i.e. AocsObject and AocsEvent) that implement telemeterable interface directly. The AocsObject object has 31 direct extending subclasses and 46 indirect extending subclasses. In addition AocsEvent

172 154 CHAPTER 8. CASE STUDIES 1 / TelemetryManager : A s c h e d u l a b l e o b j e c t / 2 package RtjAocsFramework. TelemetryManagement ; 3 4 import RtjAocsFramework. TelemetryManagement. Telemeterable ; 5 import RtjAocsFramework. TelemetryManagement. Telemeterable ; 6 import RtjAocsFramework. TelemetryManagement. TelemetryStream ; 7 import RtjAocsFramework. TelemetryManagement. TelemetryModeManager ; 8 import javax. r e a l t i m e. ; public class TelemetryManager extends AocsObject 11 implements ActiveObject, Scheduable { public s y n c h r o n i z e d void run ( ) { 14 // Load t h e l i s t to be s e n t to t h e TM stream in t h i s frame 15 tmlist = modemanager. g e t T e l e m e t r y L i s t ( ) ; 16 // I n i t i a l i z e t h e counter o f b i t s s e n t to t h e TM stream 17 int tmdatasize =0; 18 // Reset t h e t e l e m e t r y b u f f e r 19 tmstream. r e s e t B u f f e r ( ) ; 20 // Get t e l e m e t r y b u f f e r s i z e 21 int tmstreamsize=8 tmstream. g e t B u f f e r S i z e ( ) ; 22 // Send t h e TM l i s t to t h e TM stream Telemeterable t ; 25 //@Loopcount ( 1 0 ) 26 for ( t=tmlist. f i r s t ( ) ;! tmlist. i s L a s t ( ) ; t=tmlist. next ( ) ) { 27 tmdatasize+=t. gettelemetryimagelength ( ) ; 28 i f ( tmdatasize>tmstreamsize ) { 29 r e p o r t F a i l u r e ( EventTypeId.TOO MANY TM DATA, this, toomanytmdata ) ; 30 break ; 31 } 32 Telemeterable. writetotelemetry ( tmstream) 33 &ObjectMonitoring. writetotelemetery ( tmstream ) / 34 t. writetotelemetry ( tmstream ) ; 35 } // Flush t h e TM b u f f e r 38 tmstream. f l u s h B u f f e r ( ) ; 39 } } Figure 8.6: Telemetry Manager Class

173 8.1. CASE STUDY ONE: ATTITUDE AND ORBIT CONTROL SYSTEMS 155 has 9 direct extending subclasses. Hence, 88 objects implement the telemeterable directly or indirectly. Among those, 61 classes given in Table 8.1 have the overridden writetotelemetry() method. Thus, if we consider the WCET execution of the writeto- Telemetry() method in the for-loop at line numbers in Figure 8.6, the WCET bounds could be extremely pessimistic. Estimating the WCET value of each method of writetotelemetry() is not the major aim of these case studies and so the procedure to calculate the WCET bounds of each method is not described here. We base our discussion on the simulated WCET bounds given in Table 8.1. For example, if we assume that the rest of the code apart from writetotelemetry() in the telemetrymanager is 5000 machine cycles, using the WCET bounds of writeto- Telemetry() given in the Table 8.1, the WCET estimation of the thread can be calculated as below: WCET( TelemetryManager. run ( ) ) = maxlist WCET( Telemeterable. writetotelemetry ( ) ) c y c l e s = = c y c l e s For the most part, each AOCS system has specific missions to accumulate particular telemetry data. Therefore, taking account of the WCET values of all the telemeterable objects in the telemetrymanager thread can result in very pessimistic WCET estimations. To be able to achieve tight WCET estimations, WCET annotations can be used. In line with the design knowledge of the system, we can define boundaries of telemetry objects in which WCET estimations need to be considered for a particular AOCS system, for example, in the case where the AOCS system is aimed at collecting specific data and is not responsible for monitoring the telemetry objects. Here, we can denote an annotation at Line 32 for the telemetrymanager thread with annotation //@maxwect(&telemeterable.writetotelemetry()-objectmonitoring.write-

174 156 CHAPTER 8. CASE STUDIES No. Object Simulated WCET Estimations 1. RtjAocsFramework.AocsData.AocsData RtjAocsFramework.AocsData.AocsDataPool RtjAocsFramework.AocsEvent.AocsEvent RtjAocsFramework.AocsEvent.ChangeEvent RtjAocsFramework.AocsEvent.EventRepository RtjAocsFramework.AocsEvent.FailureEvent RtjAocsFramework.AocsEvent.ManoeuvreEvent RtjAocsFramework.AocsEvent.ModeEvent RtjAocsFramework.AocsEvent.ReconfigurationEvent RtjAocsFramework.AocsEvent.RecoveryEvent RtjAocsFramework.AocsEvent.TelecommandEvent RtjAocsFramework.BasicObjects.AocsObject RtjAocsFramework.ControllerManagement.Controller RtjAocsFramework.ControllerManagement.ControllerManager RtjAocsFramework.FailureDetection.FailureDetectionManager RtjAocsFramework.FailureDetection.MonitoringCheck RtjAocsFramework.FailureRecovery.FailureRecoveryManager RtjAocsFramework.FailureRecovery.LocalRecoveryActions RtjAocsFramework.FailureRecovery.ModeChange RtjAocsFramework.FailureRecovery.ObjectReset RtjAocsFramework.FailureRecovery.Reconfiguration RtjAocsFramework.FailureRecovery.RecoveryAction RtjAocsFramework.FailureRecovery.RecoveryStrategy RtjAocsFramework.FailureRecovery.SystemResetOnConfigurationError RtjAocsFramework.FailureRecovery.SystemResetOnTooManyFailures RtjAocsFramework.ManoeuvreManagement.AttitudeSlew RtjAocsFramework.ManoeuvreManagement.Manoeuvre RtjAocsFramework.ManoeuvreManagement.ManoeuvreManager RtjAocsFramework.ModeManagement.AocsMissionModeManager RtjAocsFramework.ModeManagement.ModeChangeAction RtjAocsFramework.ModeManagement.ModeManager RtjAocsFramework.ObjectList.ObjectList RtjAocsFramework.ObjectMonitoring.DeltaChange RtjAocsFramework.ObjectMonitoring.OutOfRangeChange RtjAocsFramework.ObjectMonitoring.SimpleChange RtjAocsFramework.OperatingSystemObjects.DummyAocsClock RtjAocsFramework.ReconfigurationManagement.ReconfigurerHelper RtjAocsFramework.SequentialDataProcessing.AdderBlock RtjAocsFramework.SequentialDataProcessing.BiasScalingCompensator RtjAocsFramework.SequentialDataProcessing.DifferenceBlock RtjAocsFramework.SequentialDataProcessing.D Block RtjAocsFramework.SequentialDataProcessing.I Block RtjAocsFramework.SequentialDataProcessing.LimitBlock RtjAocsFramework.SequentialDataProcessing.MatlabBlock RtjAocsFramework.SequentialDataProcessing.MatlabGainBlock RtjAocsFramework.SequentialDataProcessing.MatlabIntegratorBlock RtjAocsFramework.SequentialDataProcessing.PassThruBlock RtjAocsFramework.SequentialDataProcessing.P Block RtjAocsFramework.SequentialDataProcessing.SplitterBlock RtjAocsFramework.SequentialDataProcessing.TwoByTwoMatrixBlock RtjAocsFramework.SequentialDataProcessing.XmathUcbBlock RtjAocsFramework.SequentialDataProcessing.XmathUcbPidBlock RtjAocsFramework.TelecommandManagement.TelecommandManager RtjAocsFramework.TelemetryManagement.Telemeterable RtjAocsFramework.TelemetryManagement.TelemetryManager RtjAocsFramework.TelemetryManagement.TestTelemetryStream RtjAocsFramework.UnitManagement.BasicUnitReconfigurer RtjAocsFramework.UnitManagement.TestAnalogActuator RtjAocsFramework.UnitManagement.TestAnalogSensor RtjAocsFramework.UnitManagement.Unit RtjAocsFramework.UnitManagement.UnitManager 2100 Table 8.1: Objects overriding the writetotelemetry() method in the AOCS framework

175 8.1. CASE STUDY ONE: ATTITUDE AND ORBIT CONTROL SYSTEMS 157 ToTelemetry()). Then, the WCET execution time for the telemetrymanager can be calculated as follows: WCET( TelemetryManager. run ( ) ) = maxlist WCET( T elemeterable. writetotelemetry ( ) ) c y c l e s = = c y c l e s Therefore, using WCET annotations can achieve tighter estimation compared to not taking dynamic dispatching issues into account. Also, the effort of using WCET annotations for the dynamic dispatching issues is similar to using WCET annotations of maximum loop bounds Analysis with the Gain Time Reclaiming Approach Performing the WCET estimation of the TelemetryManager can be integrated with the gain time reclaiming approach as follows. Figure 8.7 illustrates the procedure of the object gain time reclaiming graph from the control flow graph of the TelemetryManager periodic thread. As shown in the figure, type changing nodes can be identified in the for-loop expressions (Line 26 in Figure 8.6) and gain time reclaiming instrumenting methods are added during compilation. Therefore, as soon as the t object, which is an instance Telemeterable object, is assigned a Telemeterable object from tmlist, the object gain time can be reclaimed. In addition, a structural constraint can also be identified in the for-loop. Since there is no overlapped gain time between structural and object constraints, elimination of overlapped gain time is not necessary. Based on the thread gain time reclaiming graph, including a structural gain time reclaiming node at Line 15 and an object gain time reclaiming node at Line 26 in Figure 8.6, the gain time of each period of the TelemetryManager thread can be reclaimed at run-time. For

176 158 CHAPTER 8. CASE STUDIES CFG run method OTLG object t run method OGTRG object t run method Optimised OGTRG object t run method for loop... tmlist=modemanager.gettelemetrylist()... t=tmlist.first() t=tmlist.next() Here, the types of t object and the size of the Telemeterable object list can be known as soon as the list is loaded.! tmlist.islast() true false... t.writetotelemetry(tmstream) : begin / end... : basic block... : expression : gain time reclaiming node : type changing node Figure 8.7: Object gain time reclaiming in the TelemetryManager Object example, as shown in Figure 8.8, if the context of the tmlist for the thread includes four instructions including ManoeuvreEvent, Manoeuvre, AttitudeSlew and ChangeEvent, the gain time of the particular period can reclaim cycles (i.e cycles). Figure 8.8 compares the WCET estimation the TelemetryManager thread and its gain time which can be reclaimed in a particular list of Telemeterable objects in tmlist at run-time. Although optimisation of the gain time reclaiming graph is not implemented in our prototype implementation, it can be observed that both structural and object gain time can be reclaimed at Line 15 in Figure 8.6 by optimising the gain time reclaiming graph as

177 8.2. CASE STUDY TWO: FIDO ROVER 159 Execution time For loop (Line 26 35): maximum loop bounds is Given the context of the tmlist is { ManoeuvreEvent, Manoeuvre, AttitudeSlew and ChangeEvent}, actual iterations is Structural gain time can be reclaimed as soon as the number of iterations is known. Ma no e uv r e Ev e nt Ma no e uv r e At t i t ud e Sl e w Cha ng e Ev e nt Object gain time can be reclaimed as soon as the types of the Telemeterable objects are identified. : the rest of the code apart from the for loop in telemetrymanager.run() Figure 8.8: Comparing the simulations of the estimated WCET bounds and actual WCET bounds suggested in Figure Case Study Two: FIDO Rover This section presents another case study to show how our approach may be applied to high performance object-oriented real-time applications in order to improve the overall performance of the whole system without any loss of the flexibility and predictability of the system. The goal of this study is to demonstrate the use of class object hierarchy, whereas

178 160 CHAPTER 8. CASE STUDIES the previous emphasis was on the interface. Application Layer Periodic Tasks Command Processor (Asynchronous) Incoming Commands Symbol Database Engineering User Interface Support (Asynchronous) Rover Tasks Obstacle Avoidance Pancam and LCTFs Color Microscopic Imager Sun Sensor... Arm and Mast Sequence Device Layer Motion Vision Science Serial I/O Differential GPS... User Interface Device Driver Analog to Digital Digital to Analog Color Frame Grabber Encoder Digital I/O Serial I/O Ethernet network Wireless Ethernet Rover User Interface Data Display Data Graph Data Collection WITS Command Sequence Socket Wireless Ethernet Figure 8.9: FIDO Rover prototype and its software implementation layers [59] Considering the Rover Tasks part of the applications layer of the FIDO Rover system 1 in Figure 8.9, the command processor sends discrete instructions to particular periodic tasks at run-time to interact with the environment. Therefore, any sequences of commands may be sent by the command processor to a specific periodic task. In addition, it is possible that all instructions of the various types of robot cannot be known beforehand (i.e. during the design phase of the command processor). Consequently, if any new command is introduced or a new type of robot is developed, the command processor and the specific periodic task need to be revised and the whole system needs to be retested. Unfortunately, redevelopment and retesting are relatively expensive in high performance real-time applications. In order to provide the applications with greater flexibility, reusability and extensibility, the application layer may be redeveloped for a real-time Java environment. Based on the application layer of the FIDO Rover s architecture, Instructions classes can be classified into a subclass family based on their similar functionalities or characteristics. Part of the 1 The FIDO (Field Integrated Design and Operations) Rover system is a planetary exploration autonomous system and is being used in ongoing NASA field tests to simulate driving conditions on Mars.

Deriving Java Virtual Machine Timing Models for Portable Worst-Case Execution Time Analysis

Deriving Java Virtual Machine Timing Models for Portable Worst-Case Execution Time Analysis Erik Yu-Shing Hu, Andy Wellings and Guillem Bernat Real-Time Systems Research Group Department of Computer Science