COdesign and power Management in PLatformbased design space EXploration. Final report on run-time management

Size: px
Start display at page:

Download "COdesign and power Management in PLatformbased design space EXploration. Final report on run-time management"

Transcription

1 FP7-ICT (247999) COMPLEX COdesign and power Management in PLatformbased design space EXploration Project Duration Type IP WP no. Deliverable no. Lead participant WP3 D3.5.3 IMEC Prepared by Issued by Document Number/Rev. Classification Chantal Ykman-Couvreur (IMEC), Kai Hylla, Sven Rosinger (OFFIS), Gianluca Palermo (POLIMI) Alberto Rosti, Sara Bocchio (ST-I) IMEC COMPLEX/IMEC/R/D3.5.3/1.0 COMPLEX Submission Date Due Date Project co-funded by the European Commission within the Seventh Framework Programme ( ) Copyright 2012 OFFIS e.v., STMicroelectronics srl., STMicroelectronics Beijing R&D Inc, Thales Communications SA, GMV Aerospace and Defence SA, SNPS Belgium NV, EDALab srl, Magillem Design Services SAS, Politecnico di Milano, Universidad de Cantabria, Politecnico di Torino, Interuniversitair Micro-Electronica Centrum vzw, European Electronic Chips & Systems design Initiative. This document may be copied freely for use in the public domain. Sections of it may be copied provided that acknowledgement is given of this original work. No responsibility is assumed by COMPLEX or its members for any aplication or design, nor for any infringements of patents or rights of others which may result from the use of this document.

2 History of Changes ED. REV. DATE PAGES REASON FOR CHANGES IMEC First release of public report Page 2

3 Table of Contents Table of Contents Executive summary Abbreviations and glossary Introduction RRM framework Application terminology and assumptions Run-time decisions GRM implementation Interface between GRM and application Interface between GRM and user Interface between GRM and platform GRM software task Format of GRM input files High-level platform specification IP core type specification Application specification Interaction flow between GRM and CM Communication with GRM and CM Dedicated switching points Initialization of the application Actual execution of the application Power management of individual IP cores HW accelerators and black-box IP cores Model of computation Register Interface API documentation API implementation ReISC core Virtual Platform Power Monitoring APIs Application Level Power Driver APIs Power Manager Registers RCCU Registers ADC Registers RRM for audio-driven video surveillance domain Overview Experimental Results Binary size of GRM implementation Performance overhead and energy gain RRM for ultra-low power platforms Overview of the Methodology Tool-Flow Experimental Results References Page 3

4 simulation SystemC estimation & model generation BAC++ BAC++ SystemC exploration & optimization power/performance metrics HW tasks SW tasks executable specification design space definition MDA design entry COMPLEX/IMEC/R/D3.5.3/1.0 1 Executive summary This deliverable is the third report of Task 3.5, dealing with Run-time Resource Management (RRM). This task is coordinated by IMEC and also involves POLIMI and OFFIS. It started at month M7 and it ends at month M30. The goals of Task 3.5 are to develop a lightweight architecture for RRM in tightly constrained systems and sample run-time resource managers for both COMPLEX use cases 1 and 2. In addition to the RRM architecture, Task 3.5 also develops services and optimization heuristics to be supported by the RRM for alleviating the burden of the application programmer. h system specification in SystemC e a MARTE PIM or Matlab/ Simulink f b use-cases system input stimuli user constrained HW/SW sep. & mapping HW/SW task separation & testbench generation c d MARTE PDM (Platform Description Model) architecture/platform description (IP-XACT) g s design space instance parameters functional reimplementation hardware/software partitioning/separation runtime management embedded software/compiler optimizations IP platform selection & configuration memory configuration/management (static & dynamic) custom hardware synthesis constraints t l source analysis behavioral synthesis functional, power, & timing model generation i automatically pre-optimized power controller source analysis cross compilation functional, power, & timing model generation j virtual system generator with TLM2 interface synthesis bus cycle accurate SystemC model with self-simulating power & timing models m n o k virtual platform IP component models r user visualization/ reporting tool q p trace analysis tool simulation trace parameters for new design space instance exploration & optimization tool Figure 1: COMPLEX design flow This RRM corresponds with the pre-optimized power controller (m) in the COMPLEX design flow illustrated in Figure 1. For more details, see the COMPLEX Description of Work [1]. The goals of the first deliverable D3.5.1 [3] were to provide a preliminary vision of a generic and structured architecture for the RRM and to introduce both sample RRMs for COMPLEX use cases 1 and 2 respectively. Page 4

5 The goals of the second deliverable D3.5.2 [2] were to provide an updated vision of this RRM architecture and to present the status of the RRM implementation for both COMPLEX use cases. The goals of the public version of this third deliverable D3.5.3 are to describe the entire work performed in Task T3.5, including the description of experiments performed to analyze the efficiency of the RRM implementation for both COMPLEX use cases. To that end, the writing policy is as follows: Sections already provided in Deliverables D3.5.1 and D3.5.2 are explicitly mentioned below. Also for the sake of clarity, to give an overall picture of the RRM developed in COMPLEX, and to make this deliverable standalone, all RRM features resulting from work performed in other tasks are unified in this deliverable. Links to these tasks are also explicitly mentioned in the corresponding sections of this deliverable. Two versions of this deliverable are available: this one, being confidential, and the other one, being public, where confidential parts are removed. These confidential parts are also explicitly mentioned in the corresponding sections of this deliverable. The content of this deliverable is organized as follows: Section 2 (from D3.5.1 and D3.5.2) defines the abbreviations and some relevant terms used in the deliverable. Section 3 (from D3.5.1) overviews the challenges to be fulfilled by the RRM in future embedded computing. Section 4 (updated from D3.5.1 and D3.5.2) updates the RRM framework developed in Task T3.5. This RRM follows a distributed and hierarchical approach: it consists of both Central Manager (CM) and Global Resource Manager (GRM) at the platform level, and of Local Resource Managers (LRMs) at the Intellectual Property (IP) core level. Section 5 (from D3.5.1 and D3.5.2) describes the GRM implementation. It also describes the GRM interfaces with the application, the user, and the platform. Section 6 (from D3.5.2) describes the format of needed GRM input files for both GRM databases about high-level platform specification and available application configurations. Section 7 (from D3.5.2) describes the interaction flow between the GRM and the CM allowing managing in parallel the platform resources and the application functionalities. Section 8 (updated from D3.5.2) characterizes the Application Programming Interfaces (APIs) for power management of individual IP cores, as required by the GRM. Section 9 (updated from D3.5.2) presents the demonstrator used to instantiate the RRM in the COMPLEX use case 2. This demonstrator is taken from the audio-driven video surveillance domain. This section also describes the experiments performed to analyze the efficiency of the RRM for this demonstrator. Page 5

6 Section 10(new) describes the RRM framework, with some initial results, adopted for the power management of an ultra-low power platform as used in the COMPLEX use case 1. Page 6

7 2 Abbreviations and glossary The table below lists the abbreviations with their definition used in the deliverable. ADC ADVT API CM DMA DPM DSE DSU DTD DVFS GPIO GPT GRM I2C IP ISS ITIM HW LLVM LRM QoE QoS RCCU ReISC RRM RTC SPI SW UART USART USB WD WWD Analog Digital Converter Advanced Timer Application Programming Interface Central Manager Direct Memory Access Dynamic Power Management Design Space Exploration Debug Support Unit Document Type Definition Dynamic Voltage and Frequency Scaling General Purpose Input Output General Purpose Timer Global Resource Manager Inter Integrated Circuit Intellectual Property Instruction Set Simulator Internal Timer Hardware Low-Level Virtual Machine Local Resource Manager Quality of user Experience Quality of Service Reset Clock Control Unit Reduced energy Instruction Set Computer Run-time Resource Management Real-Time Clock Serial Parallel Interface Software Universal Asynchronous Receiver Transmitter Universal Synchronous Asynchronous Receiver Transmitter Universal Serial Bus Watch Dog Window Watch Dog Page 7

8 Also, some relevant terms used in the deliverable are shortly described in the following. More information can be found in Section 3. The application functionality is specified at different granularity levels: (1) the application is organized into application modes, each one specifying a different subset of functionalities; (2) each application mode consists of communicating jobs, each one mapped entirely on one IP core; (3) each job can consist of communicating tasks, all of them running on the same IP core. An application configuration specifies an application mapping on the platform. It is mainly characterized by: its Quality of Service (QoS) required by the user, its application mode, the job implementation, the assignment of its jobs on the IP cores of the platform, its average execution time and energy consumption, and its user benefit/value. Page 8

9 3 Introduction To address the challenges introduced by future embedded computing, a generic and structured architecture for RRM of embedded multi-core platforms is refined in Task T3.5. This RRM needs to fulfil the following features. First, the RRM has to support a variety of applications: mobile communications, networking, automotive and avionic applications, multimedia in the automobile and Internet interfaced with many embedded control systems. These applications may run concurrently, start and stop at any time. Each application may have multiple configurations, with different constraints imposed by the external world or the user (deadlines and quality requirements, such as audio and video quality, output accuracy), different usages of various types of platform resources (processing elements, memories and communication bandwidth) and different costs (performance, power consumption). Second, this RRM should support a holistic view of platform resources. This is needed for global resource allocation decisions optimizing a utility function (also called Quality of user Experience (QoE)), given the available platform resources. This QoE will allow trade-off, negotiated with the user, between diverse QoS requirements and costs. E.g., in the COMPLEX use case 2, this QoE should enable careful management of the energy stored in the battery. Third, this RRM should transparently optimize the platform resource usage and the application mapping on the platform. This is needed to facilitate the application development and manage the QoS requirements without rewriting the application. Next, this RRM should dynamically adapt to changing context. This is needed to achieve a high efficiency under changing environment. QoS requirements and platform resources must be scaled dynamically (e.g., by adjusting the clock frequencies and voltages, or by switching off some functions) in order to control the energy/power consumption and the heat dissipation of the platform. Finally, this RRM should allow different heuristics (e.g., for platform resource allocation and task scheduling), since a single heuristic cannot be expected to fit all application domains and optimization goals. Also, the software development productivity is of paramount importance. To address this challenge, and to facilitate the RRM implementation, a generic and structured architecture for the RRM is required. It should be valid for any used design flow, for any target platform, and for any application domain. The development of an RRM architecture is the first goal of Task T3.5. Nevertheless, since the RRM is intended for embedded platforms, a lightweight implementation only is acceptable. To address this challenge: This RRM should interface with design-time exploration to alleviate its run-time decision making. This is the goal of the RRM interface with the tool developed in Task T3.4. The RRM implementation should be instantiated from the RTM architecture, based on the target platform and the application domain. The development of sample run-time resource managers for both COMPLEX use cases 1 and 2 is the second goal of Task 3.5, in collaboration with Task 4.1. Page 9

10 4 RRM framework Figure 2: Two-level optimization flow The outcome of Task T3.5 is an RRM framework, intended for embedded heterogeneous MP- SoC platforms. It is based on a two-level optimization flow outlined in Figure 2. At design time, a set of Pareto-optimal application configurations are derived by an automated design space exploration, based on the tool MOST, and developed in Task 3.4. The RRM then dynamically switches between these predefined configurations in order to continuously maximize the QoS of the application, while meeting the platform constraints. Page 10

11 Figure 3: Distributed and hierarchical RRM approach The target platform for our RRM framework is an embedded heterogeneous MP-SoC platform. This platform consists of multiple IP cores, and these IP cores can be of different types (e.g., HW accelerator, FPGA, multi-cpus). No task migration between different IP cores is considered in COMPLEX. Hence, the management of the communication infrastructure is not considered in COMPLEX. Our RRM architecture follows a distributed and hierarchical approach, illustrated in Figure 3. On the one hand, both CM and GRM are loaded on the host processor of the platform. They are software tasks, with the same priority, specified in C, and running in parallel. They are used to adapt both platform and applications at run time and to find global and optimal trade-offs in application mapping based on a given optimization goal. On the other hand, each IP core can execute its own resource management without any restriction, through an LRM. Such an LRM encapsulates the local policies and mechanisms used to initiate, monitor and control computation on its IP core. Page 11

12 Figure 4: Communication with GRM and CM As illustrated in Figure 4, the GRM manages the platform resources, whereas in parallel with the GRM, the CM manages the application functionalities: The platform consists of multiple IP cores, and these IP cores can be of different types, but from the GRM viewpoint, these different types are managed similarly through APIs. The GRM selects the application configurations and reconfigures the IP cores (i.e., either switch on/off or perform DVFS) accordingly. This allows the GRM to be a generic entity, being unaware of the application functionalities, and hence reusable for other embedded platforms. The CM informs the GRM about actions to be done, it creates the threads of the application on the slave IP cores, and it performs some pre-processing before thread execution in parallel with the GRM. A detailed description of the communication with the GRM and the CM is given in Section Application terminology and assumptions The target applications have to fulfil the following terminology and assumptions, to enable the run-time management strategy based on the GRM/LRMs. Ideally, for any application, all functionalities should be accessible at any time. However, based on the user requirements, the available platform resources, the limited energy/power Page 12

13 budget of the platform, and the target platform autonomy, it may not be possible to integrate all these functionalities on the platform at the same time. Hence the application developer has to organize the application into application modes, each one specifying a different subset of functionalities. The application, within a selected mode, consists of jobs communicating with each other, where: o In view of system robustness, and to make more lightweight dynamic power management, each job is mapped entirely on one IP core. Nevertheless, a job can consist of multiple tasks communicating with each other, but all of them have to run on the same IP core. o Whereas the functional specification of a job is fixed, there may be several specific algorithms or implementations for a given job. Also a job implementation can take several forms (fixed logic, configurable logic, SW) and offer different characteristics. To conform to the hierarchical approach of the RRM, jobs and communication between them are managed at the platform level by the GRM, whereas tasks and communication between them are managed at the IP core level by the LRM. An application configuration specifies an application mapping on the platform. It is mainly characterized by: its QoS required by the user, its application mode, the job implementation, the assignment of its jobs on the IP cores of the platform, its average execution time and energy consumption, and its user benefit/value. These available application configurations are provided at design time, structured and stored in a GRM database to enable fast exploration during run-time decisions. The user benefit/value is the returned value of the utility function (see below) applied to the application configuration. It is computed at the initialization of the application (see Section 5.2). The QoS requirements and the optimization goal are defined through the Quality of user Experience (QoE) manager. This goal is translated into an abstract and mathematical function, called utility function. Examples of utility functions are: performance of the application, energy/power consumption of the platform, battery life, revenue if the user has to pay for the application, QoS of the application, weighted combination of them. In the COMPLEX use cases, the considered IP cores are either processors or custom HW blocks. Only one application is considered, but the application consists of several application modes. There is no task migration between different IP cores. Each job consists of only one task. Page 13

14 4.2 Run-time decisions Figure 5: Run-time decisions In our RRM framework, the run-time decisions are illustrated in Figure 5. In contrast to selection of application configurations and task mapping and scheduling that involve coarsegrain run-time decisions, have on aimpact on the usage of the platform resources, and hence require dynamic reconfiguration, fine-grain DVFS does not require reconfiguration, is cheaper, and can be performed more frequently. The optimal selection of application configurations is the focus of the RRM implementation in the COMPLEX use case 2, whereas the fine-grain DVFS is the focus of the RRM implementation in the COMPLEX use case 1. Page 14

15 5 GRM implementation Figure 6: Architecture of the GRM implementation The GRM should be a middleware providing a bridge between the application, the user, and the platform. As mentioned in Section 4, the GRM focuses on the management of the platform resources, so leaving the CM in charge of the management of the application functionalities. Also in COMPLEX, no task migration between different IP cores is considered, and each job consists of only one task. Taking these considerations into account, the architecture of the GRM implementation developed in COMPLEX is as illustrated in Figure 6. To provide a bridge between the application, the user, and the platform, generic services are supported by the GRM. These services are classified into managers to structure the interface between the GRM and the application, the user, and the platform, respectively. 5.1 Interface between GRM and application The interface with the application is provided by the application manager. This manager provides the following main services: GRM_ConfigureApplication(), GRM_SelectApplicationConfiguration(), and GRM_ReconfigurePlatform(). GRM_ConfigureApplication() loads the available application configurations derived by the design-time exploration. Input required for this service is an XML file characterizing these application configurations (see Section 6.3). GRM_SelectApplicationConfiguration() selects an application configuration with the maximum user value, while meeting the platform constraints and taking the available platform resources into account. This function is frequently executed at run time, so that a lightweight implementation is mandatory. It relies on a QoS-aware optimization heuristic, such as the one implemented for the COMPLEX use case 2. It allows providing the user Page 15

16 with the maximum QoS according to the energy budget and the battery duration of the platform. The pseudo-code of this heuristic is as follows: GRM_SelectApplicationConfiguration() { elapsed_time = host_clock(); remaing_time = battery_duration(platform) elapsed_time; GRM_EstimateElapsedEnergy(); remaining_energy = energy_budget(platform) elapsed_energy(platform); remaining_frames = remaining_time / audio_frame_proc; max_energy_per_frame = remaining_energy/remaining_frames; /* See Figure 8 */ for each appl_config in sorted_pareto_set { if (energy_per_frame(appl_config) >= max_energy_per_frame) break; } } where it is assumed that: The application starts at time 0. audio_frame_proc is a QoS required by the user. It corresponds to the maximum allowed execution time to process one audio frame. energy_budget(platform) is a constraint initially provided in the high-level platform specification input file (see Section 6.1). Nevertheless, whenever the battery is recharged, this energy budget has to be updated accordingly. Due to the translation of the application configuration space into a two-dimension design space [user value, cost] and the preprocessing performed by GRM_SortApplicationConfigurations() (see Section 5.2), the complexity of GRM_SelectApplicationConfiguration() is only O(n), where n is the size of the sorted Pareto set. GRM_ReconfigurePlatform() requests each IP core of the platform to switch to the power mode specified in the newly selected application configuration. Such a request is performed by the service GRM_SwitchToPowerMode() (see Section 5.3). 5.2 Interface between GRM and user The interface with the user (or external entity accessing the application specification) is provided by the QoE manager. The QoE is a subjective measure of the application value from the user perspective. It is influenced by the user terminal device (e.g., low- or high-definition TV), his environment (e.g., in the car or at home), his expectations, the nature of the content and its importance (e.g., a simple yes/no message or an orchestral concert). Changes in user preferences may involve (re)negotiation between user and QoE manager. Indeed, the platform resources may not be sufficient to provide the desired QoS to the application. The user needs a simple way to communicate with the QoE manager in order to control and customize the QoS of his application. Page 16

17 The QoE manager provides the following main services: GRM_DeriveUserValues() and GRM_SortApplicationConfigurations(). The negotiation between user and QoE manager involves the selection of the utility function and of the optimization heuristic: The utility function allows to model in an abstract and mathematical way the user benefit for the application. It allows a trade-off between diverse QoS requirements and costs. Examples of utility functions are: performance of the application, energy/power consumption of the platform, battery life, revenue if the user has to pay for the application, fair sharing of platform resources, and weighted combination of them. Once selected, the utility function is applied to each application configuration, to derive its user value. This is performed by the service GRM_DeriveUserValues(). This utility function is then optimized by the GRM in the service GRM_SelectApplicationConfiguration() of the application manager. The selection of the optimization heuristic allows fitting the current application domain and optimization goal. In the COMPLEX use case 2, the optimization goal is to maximize the QoS of the application, whereas the platform constraints are the energy budget and the battery duration of the platform. To that end, the utility function models the QoS of an application configuration as a weighted sum of its audio and image frequency and resolution and of the amount of application functionalities provided by its application mode. Its pseudocode is currently as follows: user_value(appl_config) = appl_mode_id * appl_mode_id + audio_frequency + image_resolution + image_frequency GRM_SortApplicationConfigurations() is developed as follows: o The input for this service is the set of Pareto-optimal application configurations in the multi-dimension space, as the one illustrated in Figure 7 for the COMPLEX use case 2. This set is derived by a design-time exploration, such as MOST in Task T3.4. o This service keeps the Pareto-optimal application configurations in the twodimension design space [user value, cost] and sorts them in ascending order according to the user value. In the COMPLEX use case 2, the considered cost is the energy consumption per audio frame, and the considered two-dimension design space is illustrated in Figure 8. o This service is a preprocessing to speed up the run-time execution of GRM_SelectApplicationConfiguration(). Moreover the sorting is performed through the efficient standard C function qsort(). Page 17

18 Figure 7: Multi-dimension space of application configurations Figure 8: Two-dimension design space of application configurations 5.3 Interface between GRM and platform The interface with the platform is provided by the platform manager and the IP core manager. On the one hand, the platform manager provides GRM services related to platform configuration and resource monitoring. On the other hand, the IP core manager conforms to practices of each IP core separately and mainly provides GRM services to set the power mode of an IP core. Page 18

19 The platform manager provides the following main services: GRM_ConfigurePlatform() and GRM_EStimateElapsedEnergy(). The IP core manager provides the following main services: GRM_ConfigureIPcoreType() and GRM_SwitchToPowerMode(). GRM_ConfigurePlatform() loads the high-level platform specification, the platform constraints (e.g., battery duration, energy budget), and the power mode table of each IP core. A power mode of an IP core is characterized by: its supply voltage and clock frequency, its average dynamic and leakage power consumption, its available power mode transitions. A power mode transition also specifies its switching time and power consumption. Inputs required for this service are a textual file for platform specification and an XML file for each power mode table (see Sections 6.1 and 6.2). GRM_EstimateElapsedEnergy()estimates the energy consumption of the platform elapsed from the application start up to this function call. Currently no sensor is used, but a very simple energy model is used: elapsed_energy(platform) = Σ IP core elapsed_energy(ip core). The pseudo-code of this function is as follows: GRM_EstimateElapsedEnergy() { en = 0; cur_time = host_clock(); for each IP core { en += elapsed_energy(ip core); pm = current_power_mode(ip_core); en += (avg_leakage(pm) + avg_dyn_power(pm)) * (cur_time switching_time(ip core)); } } where en denotes the estimated elapsed energy of the platform, switching_time(ip core) denotes host_clock() at the last switching point (see Section 7.2) of the IP core, elapsed_energy(ip core) denotes the estimated energy consumption of the IP core elapsed from the system start up to the last switching point. This latest is updated during the run of the application whenever a switching to a new power mode is performed on the IP core. This updating is computed as follows: cur_pm = current_power_mode(ip core); new_pm = new_power_mode(ip core); tm = power_mode_transition(cur_pm, new_pm); cur_time = host_clock(); elapsed_energy(ip core) += (avg_leakage(pm) + avg_dyn_power(pm)) * (cur_time switching_time(ip core)); elapsed_energy(ip core) += switching_power(tm) * switching_time(tm); The pseudo-code of GRM_SwitchToPowerMode() is as follows: GRM_SwitchToPowerMode(IP core) { cur_pm = current_power_mode(ip core); new_pm = new_power_mode(ip core); tm = power_mode_transition(cur_pm, new_pm); PerformDVFS(IP core, new_pm); Update elapsed_energy(ip core); } where PerformDVFS() is implemented in conformity with the IP core practice and makes use of the corresponding API. If the IP core is an HW block or a black-box IP core, this API is Page 19

20 implemented through the function lrm_request_mode() (see Section 8.1). If the IP core is a SW platform core, this API is coordinated with the platform provider. The current implementation status is described in Section 8.2 for the ReISC DSP core of the platform used in COMPLEX use cases 1 and GRM software task As mentioned in Section 4, the GRM is a SW task running in parallel with the CM on the host processor. This SW task is implemented through GRM_Execute(), in conformity with the interaction flow between the GRM and CM described in Section 7. Pseudo-code of this function is as follows: GRM_Execute() { GRM_ConfigurePlatform("platform.dat"); GRM_ConfigureApplication("application.xml"); GRM_DeriveUserValues(UTILITY); GRM_SortApplicationConfigurations(); while (1) { sem_wait(grm_action); /* GRM is waked up */ /* Select an application configuration */ if (grm_action == SIG_selection) { GRM_SelectApplicationConfiguration(); } } /* Reconfigure the platform */ else if (grm_action == SIG_recrequest) { GRM_ReconfigurePlatform() At the initialization of the application, thus without run-time overhead, the GRM has to execute only once the following services: GRM_ConfigurePlatform(), GRM_ConfigureApplication(), GRM_DeriveUserValues(), and GRM_SortApplicationConfigurations().Nevertheless, during the run of the application, the GRM has to frequently execute the following services: GRM_SelectApplicationConfiguration() and GRM_ReconfigurePlatform(). So a lightweight implementation is mandatory. Experiments on the GRM overhead and feasibility are reported in Section 9.2. Page 20

21 6 Format of GRM input files As mentioned in Section 5, the GRM needs three types of input file: A textual file describing the high-level platform specification and one XML file for each IP core type of the platform. Such an XML file characterizes the power modes and power mode transitions available on the IP core. These input files are needed to execute the service GRM_ConfigurePlatform(). An XML file characterizing the available application configurations. It is needed to execute the service GRM_ConfigureApplication(). The updated format of these three types of input file is summarized in the following subsections. A first updating is the introduction of the field unit. Indeed, the GRM needs to combine measures together and consistently. These measures are specified in different input files and derived from different independent external tools. A second updating is the use of the same XML file format for each IP core type. A third updating is the use of an XML file format for both IP core type and application specification with similar constructs. 6.1 High-level platform specification The textual file for platform specification is as follows: # PLATFORM Platform platform_stm Number_of_IP_cores 6 Number_of_IP_core_types 2 Energy_budget joules Battery_duration 24 hours # IPCORES IP_core ipcore_0 REISC HOST IP_core ipcore_1 REISC SLAVE IP_core ipcore_2 REISC SLAVE IP_core ipcore_3 REISC SLAVE IP_core ipcore_4 REISC SLAVE IP_core ipcore_5 HW SLAVE # IPCORE TYPES IP_core_type REISC REISC.xml IP_core_type HW HW.xml 6.2 IP core type specification A unified XML format is used for any IP core type. The XML file characterizes the power modes and power mode transitions available on any IP core type. It is illustrated below, where two power modes and the corresponding power mode transition are specified: <Power_mode_table> <Power_mode> <parameters> <parameter name= ID value= 0 unit= no \> <parameter name= clock_frequency value= 0 unit= MHz /> Page 21

22 <parameter name= supply_voltage value= 0.0 unit= volts /> <patameter name= avg_dyn_power value= 0.0 unit= milli_watts /> <parameter name= avg_leakage value= 5.0 unit= milli_watts /> </parameters> <Power_mode_transitions> <pm_trans> <parameter name= pm_id value= 1 unit= no /> <parameter name= switching_time value= 2.0 unit= milli_sec > <parameter name= switching_power value= unit= milli_watts /> </pm_trans> </Power_mode_transitions> </Power_mode> <Power_mode> <parameters> <parameter name= ID value= 1 unit= no \> <parameter name= clock_frequency value= 300 unit= MHz /> <parameter name= supply_voltage value= 1.2 unit= volts /> <patameter name= avg_dyn_power value= unit= milli_watts /> <parameter name= avg_leakage value= unit= milli_watts /> </parameters> <Power_mode_transitions> <pm_trans> <parameter name= pm_id value= 0 unit= no /> <parameter name= switching_time value= 0.1 unit= milli_sec > <parameter name= switching_power value= 20.0 unit= milli_watts /> </pm_trans> </Power_mode_transitions> </Power_mode> </Power_mode_table> 6.3 Application specification The format of the XML file characterizing the available application configurations is illustrated below for one application configuration in the COMPLEX use case 2: <point> <parameters> <parameter name="appl_mode" value="0" unit= no /> <parameter name="audio_frequency" value="128" unit= kbits_per_sec /> <parameter name="image_resolution" value="101376" unit= pixels_per_image /> <parameter name="image_rate" value="10" unit= frames_per_sec /> </parameters> <scheduling> <sched name="task_id" value="0" name="ipcore_id" value="1" name="power_mode_id" value="1"/> <sched name="task_id" value="1" name="ipcore_id" value="2" name="power_mode_id" value="1"/> </scheduling> <system_metrics> <system_metric name="execution_time" value="7.56" unit= milli_sec /> <system_metric name="energy_consumption" value="11.89" unit= milli_joule /> </system_metrics> </point> Page 22

23 7 Interaction flow between GRM and CM This section describes the interaction flow between the GRM and the CM allowing managing in parallel the platform resources and the application functionalities, as advertised in Section 4. One issue is to optimize the work repartition between the GRM and the CM in order to be as efficient and reactive as possible. A correct synchronization is required between the GRM and the CM to guarantee that the IP cores are reconfigured completely before launching the thread executions. 7.1 Communication with GRM and CM The following legend is used in the next figures of Section 7: Figure 9: Communications with GRM and CM Page 23

24 Communications with the GRM and the CM are performed either through signals or through shared variables. As illustrated in Figure 9, two shared variables are used: new_ac is shared between the GRM and the CM. It is written by the GRM and read by the CM. It stores the ID of the application configuration selected by GRM_SelectApplicationConfiguration(). Whenever the CM wants to update the application configuration, it copies new_ac into current_ac. current_ac is shared between the CM and the slave threads. It is written by the CM and read by the slave threads. It stores the ID of the application configuration currently executed on the platform. It is through current_ac that the CM communicates the application configuration to the slave threads. Three types of signal communication are also used: Communication between the GRM and the CM, whose interrupt signals are defined as follows: o As soon as GRM_ReconfigurationPlatform() is completed for the last selected application configuration, the GRM starts waiting for sig_selection. o Regularly, the CM wants to update the current application configuration. To that end, it sends sig_selection to the GRM to request the execution of GRM_SelectApplicationConfiguration(). o ack_selection is sent by the GRM to the CM to indicate that the variable new_ac has been updated: The CM can read new_ac at any time to be aware of the newly selected application mode. As soon as the CM agrees to perform the reconfiguration, the CM needs to copy new_ac into the variable current_ac. Before each new processing, the active slave threads need to read current_ac to be aware of the options to be executed. o As soon as the CM agrees to switch to the newly selected application configuration, it sends sig_request to the GRM to request the execution of GRM_ReconfigurePlatform(). o As soon as sig_terminating is received, sig_reconfig(ipcore) is sent by the CM to the GRM to indicate that the IP core can be reconfigured. o IPC_ready(ipcore) is sent by the GRM to the CM to indicate that the IP core is ready to execute the thread. Communication between the GRM and the IP cores of the platform, whose interrupt signals are defined as follows: Page 24

25 o sig_switch_pm(power_mode) is sent by the GRM to the IP core to request the switching to the given power mode. o ack_switch_pm is sent by the IP core to the GRM to indicate that the reconfiguration is completed with the new power mode. Communication between the CM and the slave threads, whose interrupt signals are defined as follows: o sig_wakeup is sent by the CM: Either to create the thread and start its execution. Or to reactivate the thread after a new power mode switching. o sig_terminating is sent by the slave thread to the CM as soon as a switching point (see Section 7.2) is met. This signal indicates that the thread completed its current processing, it is in a stable state and it can deal with hardware adaptations. This signal also implies that the thread read current_ac and that it is aware whether it has to continue its execution or to enter a sleep mode before executing a new configuration. Detailed interaction flows both at the initialization of the application and during its actual execution are given in Sections 7.3 and Section 7.4. As illustrated in Figure 10 and Figure 12, a timeout must also be used for robustness in case of faulty IP cores or slave threads, in which case some special action must be taken by the CM. E.g., see wait(sig_terminating and ack_switch_pm for 50ms). Page 25

26 7.2 Dedicated switching points Figure 10: Application configuration switching Page 26

27 Activation of new application configuration must be seamless to avoid damage and to maintain the real-time behaviour and the data integrity of the application. Obviously, this cannot be done at any time: the threads should be in a stable state (e.g., not yet started, having completed some processing, and being in a waiting state). This means that all threads will probably not switch at the same time. Hence the only robust solution is to reconfigure the IP cores and the threads one by one as soon as a thread is ready for it until the complete platform and application are reconfigured. As a consequence, the main issue in application configuration switching is to find appropriate moments for performing such a switching. To that end, dedicated switching points must be specified by the application developer inside the source code of the job. A smooth transition from the current application configuration to the new one is illustrated in Figure 10 and performed as follows, where each thread performs the common following steps: 1. Through the signal ack_selection, the GRM indicates the CM that an application configuration switching is required. 2. Whenever a thread reaches a dedicated switching point (i.e., after some reaction time), the thread checks whether a switching is requested by reading the shared variable current_ac. 3. If a switching is requested: a. The thread sends a signal sig_terminating to the CM. b. The thread enters a sleep mode until reception of the signal sig_wakeup, requesting the activation of the thread after reconfiguration of the IP core. These IP core reconfiguration and thread activation are not immediate. They need some freeze time due to DVFS for example. During this freeze time, the CM can perform some pre-processing before next thread activations (e.g., prepare a list of next actions for the GRM, keep sending signals sig_reconfig). c. The thread starts its execution accordingly to the newly selected configuration. Hence the switching mechanism takes place at two levels: At the platform level: all IP cores, managed by the GRM, are reconfigured accordingly the newly selected power modes. At the application level: all jobs, managed by the CM, switch safely to the newly selected configuration. In the COMPLEX use case 2, where the application consists of three for loop jobs (i.e., alarm processing, audio activity detection, and video image processing), the switching points are set between two successive iterations of each job. Page 27

28 7.3 Initialization of the application Figure 11: Initialization of the application The initialization of the application is illustrated in Figure 10, where it is assumed that the currently selected application configuration consists in executing Job 1 and Job 2. Page 28

29 7.4 Actual execution of the application Figure 12: Actual execution of the application Page 29

30 The actual execution of the application is illustrated in Figure 12, where it is assumed that Job 1 has to switch to a new configuration, Job 2 has to stop its execution, and Job 3 has to start its execution. Page 30

31 8 Power management of individual IP cores Individual IP cores of the platform are controlled by the GRM through an LRM (see Section 4). Each IP core provides a set of power modes that can be activated by the GRM. These modes (typically a combination of supply voltage and clock frequency) allow the core operating at different performance levels or being deactivated completely. During the characterisation phase, a power mode table (see Section 6.2) is generated. This XML file contains all information about available power modes and allowed transitions between them. It contains also information about the switching overhead in terms of delay and power. In both COMPLEX use cases 1 and 2, the platform consists of HW accelerators and SW cores (e.g., ReISC DSP cores). The APIs and power mode tables for HW accelerators and for blackbox IP cores are provided in Section 8.1. The ones of SW platform cores must be provided by the platform provider. The current implementation status is described in Section 8.2 for the REISC DSP core of the platform used in COMPLEX use cases 1 and HW accelerators and black-box IP cores For HW modules as well as black-box IP cores of the virtual prototype, information about available power modes is handled by the non-functional model. This model can be accessed by the GRM using a TLM2-based interface, which is implemented as register interface, accessible using a TLM2 socket. This is shown in Figure 13. TLM2- communication interface 31 IF function call Desired Recent Status Memory/ FIFO 0 BAC++ Communication adapter functional model (augmented behaviour) non-functional model (V dd, V th, clock-tree, leackage, etc.) observer (calculates power and timing) Figure 13: TLM-based LRM interface for HW accelerator modules Using the register interface, power modes can be requested and the actual state of power mode management can be obtained. The registers are accessed through the TLM generic_payload pattern. If one of the registers is read or written, the interface adapter communicates directly with the non-functional model, using methods of a generic base class. All non-functional models are derived from that class, so a generic approach for accessing the models is available and only one type of interface adapter must be provided. In order to keep the interface as simple as possible, the interface simply calls the appropriate getter and setter methods for each register. The register file has the structure, shown in Table 1. Page 31

32 8.1.1 Model of computation Power mode switching can only be performed, if the particular module has completed its computation, i.e. is idle. Whenever a power mode is requested, it is checked whether the mode is a valid one (i.e. the mode id is known) and whether there exists a valid transition from the current mode to the requested one. If the mode is not known or if the transition is not possible, the status register is set accordingly (see Section ). If the requested mode is valid and a transition is possible but the module is currently active, the requested mode is accepted but pending. If the module completes its computation and becomes idle, the power mode switching is performed if such is pending. That is, new supply voltage and clock frequency are applied and the observer is informed about the overhead in terms of power and timing, caused by the mode switching. As long as the requested mode is pending, the request can be revoked by requesting the currently active power mode. Same is true for unknown states or impossible transitions. Figure 14 shows the flow for requesting and revoking power mode switching. Figure 14: Power mode request/revoke flow For modelling individual power modes, an approach similar to the power state machine for black-box IP cores (see COMPLEX deliverable D2.3.2 [4]) is used. Each power mode is represented by a state of the state machine. A state i.e., a power mode is enriched with attributes like supply voltage and clock frequency, for example. A power mode switch is done by executing a state transition. Such a transition is also enriched with attributes but in this case with attributes describing the overhead of the transition. Such attributes are a timing overhead (delay) and a power overhead. When switching to a state with a lower supply voltage, no overhead is given, since the module can be used immediately. Functionality is retained if internal capacities have a higher voltage level. The correct voltage level is automatically reached during operation. Figure 15 shows such an annotated power mode machine. Page 32

33 Figure 15: Example power mode machine Transitions have guards assigned. These guards are responsible for consuming the input word i.e., the power mode switch requests. The power mode switch is triggered by the simulation logic, shown in Figure 14. If a switch should be performed, an event (which is equal to the desired power mode) is fired by the simulation logic and consumed by the appropriate guard Register Interface Table 1: LRM register interface Desired Resent V Reserved The following sections describe the functionality of each register Register Desired 31:0 Desired Contains the ID of the desired power mode, as requested by the GRM. The register is r/w. S Register Recent 31:0 Recent Contains the ID of the recent power mode. The register is read-only. It is only valid, if the valid bit of the status register is set. Page 33

34 Register Status 31 Valid If set, the content of the register file is valid. If a any of the registers of the interface is written, this bit is set imediately. That is, this bit can be read in the next cycle and will then contain a valid value. This bit is readonly. 30:2 Reserved Reserved bits. The content is not defined. 1:0 Status Determins the current status of the LRM. These bits are read-only. The content if these bits is only valid, if the valid bit of this register ist set. The following values are possible: 00: OK; Desired power mode is accepted and active. 01: Pending; Desired power is accepted, but not activeded, yet. It will be come active as soon, as is it is possible. 10: Invalid transition; The desired power mode is known, but it is not possible to switch from the current power mode to the desired one. 11: Invalid mode; The desired power mode is not known API documentation A generic API for accessing the registers is provided. This API can be used by all software cores (e.g., the software running on them) to access the LRM interface of a particular HW module. All methods of the API return a response type, conveying whether the call was successful or not. The definition of this response is shown in the listing below. //The return type of all API calls typedef enum { //No error occurred. LRM settings are correct and no power mode //switch is pending. LRM_STATUS_OK = 0x0 // Power mode request accepted. Mode is switch as soon as a power // mode switch is possible., LRM_STATUS_PENDING = 0x1 // The requested mode is known, but the transition from the actual // mode to the requested one is not allowed., LRM_INVALID_TRANS = 0x2 //The requested power mode is unknown., LRM_INVALID_MODE = 0x3 } lrm_response; The ID of a power mode is simply an integer number: //The c/c++ type of the power mode ID //The ID of a power mode equals its ID given in the power mode table. typedef unsigned int power_mode; The GRM can request a HW or IP module to switch to a certain power mode. This is done using the method shown below. If the requested power mode is not known, or the transition is not possible, an error is returned. If mode and transition are valid, the mode request is acknowledged and the module will switch to the requested mode as soon as possible. Page 34

35 // Request a certain power mode. lrm_addr: The base address of the LRM register interface pm : The ID of the requested power mode. // Typically 0x1, 0x2, or 0x3. It might happen, that the mode switch // is performed immediately. In this case 0x0 is returned. lrm_response lrm_request_mode( volatile void * lrm_addr, power_mode pm ); The current and the requested power mode can be obtained from the interface using the following two methods: // Gets the content of the current power mode register. lrm_addr: The base address of the LRM register interface pm : The ID of the module's current power mode. // Should be always 0x0. Later some more error codes might be added. lrm_response lrm_get_current_mode( volatile void* lrm_addr, power_mode* pm ); // Gets the content of the requested power mode register. // If not equal to the content of the current power mode register, a // power mode switch is pending. And is performed as soon as possible. // If equal, no switch is pending. lrm_addr: The base address of the LRM register interface pm : The ID of the module's requested power mode. // Should be always 0x0. Late some more error codes might be added. lrm_response lrm_get_requested_mode( volatile void* lrm_addr, power_mode* pm ); The content of the status register is available using the following method: // Gets the status of the LRM. lrm_addr: The base address of the LRM register interface lrm_response lrm_get_status( volatile void* lrm_addr ); In the COMPLEX use case 2, a HW accelerator is used to implement an FFT. Its power table is the one given in Section API implementation Two versions of the LRM API have been implemented. The first one implements the API as free C functions which could for instance be used by code running in an ISS. These plain C functions perform the register accesses directly via the volatile pointer that is passed as first argument. The pointer is interpreted to point to a struct type that reproduces the layout of the LRM register interface. Listing 1 shows the declaration of this structure. /*! 32bit register 'status' of the LRM register interface with 3 sub fields */ union lrm_status_register_type { struct { unsigned int status:2; unsigned int reserved:29; unsigned int valid:1; Page 35

36 } fields; unsigned int all; }; /*! layout of the LRM register interface */ typedef struct { unsigned int desired; // 32bit register 'desired' mode unsigned int recent; // 32bit register 'recent' mode union lrm_status_register_type status; // 32bit register 'status' } lrm_register_if_type; Listing 1: C implementation of the LRM register interface As an example for a C-style implementation of an LRM API function, Listing 2 shows the definition of the lrm_request_mode function. Note that it waits for the valid bit to be set by the power mode model after writing to the desired register before returning the LRM status. The helper function lrm_status_to_response which is not shown simply converts the value of the status field into an enumerator of the lrm_response enumeration. This conversion step was added in order to avoid a direct dependency between the possible values of the status field, which might change in the future, and the integer values of the response enumerators. lrm_response lrm_request_mode(volatile void *lrm_addr, power_mode pm) { volatile lrm_register_if_type *lrm_reg_p = (volatile lrm_register_if_type*)lrm_addr; union lrm_status_register_type status; } lrm_reg_p->desired = pm; do { status.all = lrm_reg_p->status.all; } while (!status.fields.valid); return lrm_status_to_response(status.fields.status); Listing 2: C implementation of the lrm_request_mode API function A second implementation of the API consists of namespace encapsulated C++ functions that construct a TLM2 generic payload and pass it to the socket of the currently active initiator module. The address used in the TLM transaction is derived from the given lrm_addr pointer. The corresponding initiator module is obtained from the SystemC simulation kernel using a helper function called lrm_initiator. Listing 3 shows the TLM version of the lrm_request_mode API function. It must be noted that the actual TLM implementation of the API makes some assumptions on the initiator module. That is, it is assumed that the initiator module was derived from the corresponding wrapper base class from the COMPLEX library and provides interface methods synchronize and adjust_global_cycle_count, which are used to control the consumption of execution time in the initiator module, as well as tlm_read and tlm_write methods that perform the actual construction of a tlm_generic_payload object and its transportation over the initiator module s socket. Page 36

37 namespace cplx { namespace vp_tlm { /*! helper function returning the actual initiator module as obtained from the SystemC kernel */ inline tlm_initiator_wrapper_base * lrm_initiator() { sc_core::sc_process_handle hndl = sc_get_current_process_handle(); tlm_initiator_wrapper_base *initiator = dynamic_cast<tlm_initiator_wrapper_base*>(hndl.get_parent_object()); sc_assert(initiator); return initiator; } lrm_response lrm_request_mode_tlm(volatile void* lrm_addr, power_mode pm) { volatile lrm_register_if_type* lrm_reg_p = (volatile lrm_register_if_type*)lrm_addr; unsigned int buscycles = 0; tlm_initiator_wrapper_base *initiator = lrm_initiator(); // synchronize with initiator // (let cpu time that elapsed before this transaction pass): initiator->synchronize(); initiator->simple_tlm_write<unsigned int>( sc_dt::uint64((unsigned long)&(lrm_reg_p->desired)), &pm, buscycles); lrm_status_register_type status; do { initiator->simple_tlm_read<unsigned int>( sc_dt::uint64((unsigned long)&(lrm_reg_p->status.all)), &status.all, buscycles); } while (!status.fields.valid); // \todo maybe add some timeout or delay? // notify initiator on consumed bus cycles for this communication: initiator->adjust_global_cycle_count(buscycles); return lrm_status_to_response(status.fields.status); } } // namespace vp_tlm } // namespace cplx Listing 3: TLM version of the lrm_request_mode API function Both implementations of the LRM API have been compiled into the COMPLEX library libcomplex-osci.a. Page 37

38 8.2 ReISC core The ReISC SoC is a system on chip, taped-out by STMicroelectronics at the end of 2009 in a 90 nm technology. It is the first system on chip of a new family of ultra-low power products. It encompasses the proprietary ReISC 3 core (Reduced energy Instruction Set Computer), providing hardware support for 8/16/20/32 data sizes, variable 16 bit-based instruction length and secure data. ReISC 3 is a micro-controller core targeted at ultra-low power applications. It operates up to 50 MHz frequency, contains embedded memories (1 Mbytes Flash memory and 32 Kbytes SRAM) and an extensive range of enhanced I/Os and peripherals. The ReISC SoC contains one 12-bit ADC, three general purpose 16-bit timers plus one internal timer, as well as standard and advanced communication interfaces: one I2C, two GPIOs, two SPIs, one USART, and one USB. A comprehensive set of power-saving modes, internal to the ReISC SoC platform, allow the design of low-power applications. It can apply different power reduction techniques such as clock gating and power gating; it can also select among four clock sources. The architecture is hierarchically organized in power islands that can be switched off under the control of the Power Manager unit; finer control on the power consumption can also be obtained by the RCCU that allows setting the enabling status of the peripherals and to enable or disable their clock. Moreover the power status of a peripheral depends also on the status defined in its registers. A glimpse of the organization in power islands is summarized here and shown in Figure 16: An ALWAYS ON power island includes the ReISC core, the RCCU, the Power Manager, the Timers, all the other components that are kept always enabled. A FUNCTIONAL STATE power island (with retention flip-flops) contains the other peripherals that can be switched on/off, e.g., the SPI and the GPIOs. An ANALOG power island includes the ADC and the clock sources. Page 38

39 Interrupt Controller REISC Core DSU JTAG TAP I-Side D-Side DSU MUX DMA (7ch) I3 I2 I1 T2 X BAR T1 T4 T3 Flash ITF I Ram ITF D Ram ITF Peripherals Decoder 384 KB Flash I2Ram 16KB I1Ram 16KB IbRam 1KB DbRam 1KB D1Ram 16KB D2Ram 16KB RCCU PWR MNG EXTEVCTRL GPIO0(16 port) GPIO1(8 port) USB SysRegs Window WDG RTC ITF Int WDG ITF Clock Gen 1KBRam RTC KERN Int WDG KERN (ADV) TIMER0 (GP) TIMER1 (GP) TIMER2 (INT) TIMER3 PLL XTAL 32KHz XTAL 1-25 MHz RC 32KHz RC 16 MHz SPI0 SPI1 SCI0 I2C0 ADC ADC HM Always ON Functional State: - RUN - SNOOZE - SLEEP FF retention implemented Analog No connection between I2 and T3 inside XBAR: fetch from peripherals not allowed No connection between I1 and T2 inside XBAR: moving data with DMA from flash not allowed Figure 16: ReISC SoC architecture showing power islands The overall power consumption within the ReISC SoC can be controlled by two peripherals: the Power Manager and the Reset and Clock Control Unit, by setting their registers described in sections and The Power Manager controls and monitors the overall power consumption at the SoC level. It allows putting the processor in deep sleep mode or in snooze mode. The Power Manager provides the possibility to power down the RAM, the Flash memory and the ADC. The Reset and Clock Control Unit provides the ability to enable and select one of the four clock sources available and to enable/disable the clock of the peripherals. Page 39

40 Moreover the power state of a peripheral depends on its status registers. For instance the power consumption of an ADC depends on the fact that it is enabled and sampling. An excerpt of the registers that control an ADC is shown in section The functional simulation of an application, such as the application in the COMPLEX use case 1, is made on a virtual platform simulation framework of the ReISC SoC, shown in Figure 17. It consists of an ISS of the ReISC 3 processor which communicates with the hardware models of the peripherals through a bus model. A SystemC wrapper implements the interface among the instruction-set simulator (ISS) and the rest of the system: the peripherals that are mostly modelled in SystemC. Only the components that are closely linked to the ISS or to the memory, have been left under the direct control of the ISS. In Figure 17 the SystemC peripherals are shown in orange, while the parts in yellow are modelled in C within the ISS. Figure 17: Architecture of the ReISC SoC virtual platform The power consumption status is determined by the registers that control the peripherals; to perform power profiling of an application running on the ReISC processor it is necessary to provide two sets of APIs: Virtual Platform Power Monitoring APIs: enhance the virtual platform with the capability of monitoring power consumption. Application Level Power Driver APIs: provide the possibility control the power consumption from the application. Page 40

41 8.2.1 Virtual Platform Power Monitoring APIs The APIs that dynamically trace the power consumption of an application running on the ReISC SoC are developed as state machines that monitor the the evolution of the system components. These APIs are written in SystemC (they belong to the Orange domain in Figure 17), have access to the registers of the peripherals, and are added as an extension of the Virtual Platform just for the purpose of providing the ability to monitor the power consumption. For instance to compute the power profile of the ADC, it is necessary to observe the events from the Power Manager, the RCCU and the ADC. A SystemC model that implements the FSM that traces the power state transition is shown in the following code. The Vitual platform has access to the registers that control the peripherals, the evolution of the power consumption can be monitored by functions as the following. SC_MODULE(POWER_FSM_ADC) { sc_in<bool> clock; sc_in<bool> PWRMNG_CMD_HM_12; sc_in<bool> RCCU_PERIPHCKEN_8; sc_in<bool> ADC0_CR2_0; sc_signal<power_state_adc> next_state; sc_signal<power_state_adc> current_state; void getnextst(); void setstate(); }; SC_CTOR(POWER_FSM_ADC) { current_state = IDLE; SC_METHOD(getnextst); dont_initialize(); sensitive << SC_METHOD(setstate); dont_initialize(); sensitive << clock.pos(); } Page 41

42 void POWER_FSM_ADC::getnextst() { switch(current_state) { case IDLE: if(pwrmng_cmd_hm_12 == 1) next_state = OFF; else if (RCCU_PERIPHCKEN_8 == 0) next_state = NOCLOCK; else if (ADC0_CR2_0 == 1) next_state == SAMPLE; break; case SAMPLE: if(pwrmng_cmd_hm_12 == 1) next_state = OFF; else if (RCCU_PERIPHCKEN_8 == 0) next_state = NOCLOCK; else if (ADC0_CR2_0 == 0) next_state == IDLE; break; case OFF: if(pwrmng_cmd_hm_12 == 0) { next_state = IDLE; if (RCCU_PERIPHCKEN_8 == 0) next_state = NOCLOCK; else if (ADC0_CR2_0 == 1) next_state == SAMPLE; } break; case NOCLOCK: if(pwrmng_cmd_hm_12 == 1) next_state = OFF; else if (RCCU_PERIPHCKEN_8 == 1) { next_state = IDLE; if (ADC0_CR2_0 == 1) next_state == SAMPLE; } break; }// end switch }//end getnextst void POWER_FSM_ADC::setstate() { current_state = next_state; trace_power_state(current_state); } The transitions among the power states are registered with their corresponding timestamps, so that it is possible to compute the power profile of the system components. A state machine similar to the one described in this example is needed for each component that needs to be profiled about its power consumption. So it is possible to have a set of concurrent power state machines that monitor all the components of the SoC. The power profiling is obtained by integrating with respect to time the power consumption spent in the power states Application Level Power Driver APIs On the application side it is necessary to provide a set of functions that control the registers governing the peripherals. Such functions provide power driver APIs written in C, and running on the ISS (belonging to the Yellow domain in Figure 17). These function have access to the registers that control the peripherals, they work under the control of the operation system. An example of driver for the ADC is shown in the following code excerpt. Page 42

43 void ADC_powerControl(int targetstate) { if(targetstate == ADCSAMPLE)//CR2 CONT ON { //CLOCK enable, POWER on ADC0->ADC_CR2 =0x1;//ue on ADC0->ADC_CR2 =0x2;//cont on RCCU0->RCCU_PERIPHCKEN =EXT_RCCU_PERIPHCKEN_ADC0;//CLOCK ENABLE PWRMNG0->PWRMNG_PD_HM&=~PWR_MNG_ADCOKINV33;//POWER ON } if(targetstate == ADCIDLE)//CR2 CONT ON { //CLOCK enable, POWER on ADC0->ADC_CR2 =0x1;//ue on ADC0->ADC_CR2&=~0x2;//cont off RCCU0->RCCU_PERIPHCKEN =EXT_RCCU_PERIPHCKEN_ADC0;//CLOCK ENABLE PWRMNG0->PWRMNG_PD_HM&=~PWR_MNG_ADCOKINV33;//POWER ON } if(targetstate == ADCOFF) { //CLOCK disable, POWER off ADC0->ADC_CR2&=~0x1; ADC0->ADC_CR2&=~0x2;//cont off PWRMNG0->PWRMNG_PD_HM =PWR_MNG_ADCOKINV33;//POWER OFF RCCU0->RCCU_PERIPHCKEN&=~EXT_RCCU_PERIPHCKEN_ADC0;//CLOCK DISABLE } if(targetstate == ADCNOCLOCK) { //CLOCK disable, POWER off RCCU0->RCCU_PERIPHCKEN&=~EXT_RCCU_PERIPHCKEN_ADC0;//CLOCK DISABLE ADC0->ADC_CR2&=~0x2;//cont off ADC0->ADC_CR2&=~0x1; PWRMNG0->PWRMNG_PD_HM&=~PWR_MNG_ADCOKINV33;//POWER ON &=~ } } Application Level Power Driver APIs: an example of how they are used follows. //ADC POWER PLATFORM TEST STARTING vtaskdelay(10); ADC_powerControl(ADCIDLE); vtaskdelay(10); ADC_powerControl(ADCSAMPLE); vtaskdelay(10): ADC_powerControl(ADCNOCLOCK); vtaskdelay(10); ADC_powerControl(ADCOFF); vtaskdelay(10); //UARTGPIO POWER PLATFORM TEST STARTING..\n"); vtaskdelay(10); UARTGPIO_powerControl(UARTGPIOIDLE); Power Manager Registers Register name Address Function PWRMNG_STATE 0xFE800 PWRMNG _PD 0xFE804 Page 43

44 PWRMNG _CMD PWRMNG _FLHPDT PWRMNG _PD_HM 0xFE808 0xFE80C 0xFE810 PWRMNG_STATE A Name Bit Rights Reset Description Power manager functional state 0x0 -> rst 0x1 -> rst2 0x2 -> run 0x3 -> clk_gate A 3 0 R 0x0 0x4 -> retention 0x5 -> snooze 0x6 -> waiting flash and psw wake-up time 0x7 -> wakeup 0x8 -> no retention 0x9 -> sleep PWRMNG_PD_HM H G F E D C B A Name Bit Rights Reset Description A 1 0 RW 0x0 D2 ram pd_mode (msb), pd (lsb) B 3 2 RW 0x0 D1 ram pd_mode (msb), pd (lsb) C 5 4 RW 0x0 I2 ram pd_mode (msb), pd (lsb) D 7 6 RW 0x0 I1 ram pd_mode (msb), pd (lsb) E 9 8 RW 0x0 usb ram pd_mode (msb), pd (lsb) F 10 RW 0x0 Flash power down (Stop) G 11 RW 0x0 Flash deep power down (DeepPD) H 12 RW 0x0 Adc_okinV33 power down 0 -> adc functional 1 -> adc power down RCCU Registers Register name Address Function RCCU_CKEN 0xFEC00 RCCU_CKSEL 0xFEC04 RCCU_CKRDY 0xFEC08 Page 44

45 RCCU_CKRDYIE RCCU_CKRDYF RCCU_PLLDIV RCCU_RCHSTRIM RCCU_PERIPHRST RCCU_PERIPHCKEN 0xFEC0C 0xFEC10 0xFEC14 0xFEC18 0xFEC1C 0xFEC20 RCCU_PERIPHCKEN R Q P O N M L K J I H G F E D C B A Name Bit Rights Reset Description A 0 RW 0x0 wwdg0 B 1 RW 0x0 spi0 C 2 RW 0x0 spi1 D 3 RW 0x0 sci0 E 4 RW 0x0 i2c0 F 5 RW 0x0 gptim0 G 6 RW 0x0 gptim1 H 7 RW 0x0 gptim2 I 8 RW 0x0 adc0 J 9 RW 0x0 usb0 K 10 RW 0x0 iwdg0 L 11 RW 0x0 dma0 M 12 RW 0x0 gpio0 N 13 RW 0x0 gpio1 O 14 RW 0x0 gptim3 P 15 RW 0x0 rtc0 Q 16 RW 0x0 evctl0 R 17 RW 0x1 dsu 0 -> clock disable; 1 -> clock enable ADC Registers Register name Address Function ADC0_SR 0xFDC00 ADC0_CR1 0xFDC04 ADC0_CR2 0xFDC08 ADC0_CR E D C B A Page 45

46 Name Bit Rights Reset Description A 0 RW 0x0 ADCON 0 -> OFF 1 -> ON B 1 RW 0x0 CONT (Continuos conversion) 0 -> single conversion mode 1 -> continuos conversion mode C 2 RW 0x0 DMA mode 0 -> disable 1 -> enable D 3 RW 0x0 EXT_TRIG (Conversion on external event) 0 -> disable 1 -> enable E 6 4 RW 0x0 EXT_SEL (External Event Select) Page 46

47 9 RRM for audio-driven video surveillance domain 9.1 Overview Figure 18: Audio-driven surveillance application Figure 19: Country border protection Page 47

48 The application, represented in Figure 18, is from the audio-driven video surveillance domain for surveillance of critical areas. In particular, assume the situation illustrated in Figure 19. A territory, being hundreds of kilometers long territory, is controlled by a set of fixed cameras. Whenever some border fence is damaged, the territory becomes vulnerable, and the current surveillance system needs to be reinforced. To that end, a solution for a short period is needed: it must be deployed instantly, and it must avoid to deal with a complex and costly infrastructure. Figure 20: Platform architecture On the platform side, the hardware architecture is an embedded MP-SoC platform with one core acting as host processor, and controlling different HW/SW Processing Units (PUs). The programming model is component-based, which means that each PU can be seen as a coprocessor. Additional HW accelerators can be included to perform the computationally intensive tasks. To enable dynamic power management, different power islands are taken into account: the host processor, the image acquisition pipeline, and each PU. The overall target platform, represented in Figure 20 consists of three slave SW cores and one host SW processor. The platform constraints are the battery duration (e.g., 24 hours) and the energy budget in the battery. In the application, three jobs are considered: audio activity detection on the first SW core for environment noise detection, video processing on the second SW core for move detection and image selection, and alarm on the third SW core. Two application modes are allowed: either audio with alarm, or audio with video. QoS options are characterized at two levels in the application configurations. These options influence the number of operations executed by the jobs. Hence they offer the possibility to apply dynamic voltage and frequency scaling (DVFS) to each power island for controlling the power consumption of the platform. Page 48

49 The first QoS level is related to the amount of application functionalities provided by the application configuration. This QoS should get the highest priority. The second level is related to the QoS of the inputs and outputs of the application: Sampling frequency of audio inputs: two sampling frequencies can be configured: 16 bits at 8 or 16 KHz, corresponding to a bit rate of 128 or 256 Kbits/sec. Image resolution and rate: this depends on the camera packaged with the system. However in order to reduce the power consumption, image resolution and rate can be reduced. For video surveillance systems, it is usually not necessary to store high-definition images. An image resolution of 352x288 pixels (CIF format) or 704 x 576 pixels (4CIF format) at a rate of 16 or 25 images per second is considered. This application being embedded, autonomy is a critical requirement, and control and reduction of energy consumption is crucial. The optimization goal is to maximize the QoS of the application, whereas the platform constraints are the energy budget and the battery duration of the platform. To that end, the utility function models the QoS of an application configuration as a weighted sum of its audio and image frequency and resolution and of the amount of application functionalities provided by its application mode. 9.2 Experimental Results In order to perform initial experiments on the overhead and feasibility of the presented runtime management approach, both the GRM and the CM have been integrated in a POSIX implementation of the audio-surveillance video application. First, this implementation has then been deployed and tested on an X86-based platform running at 800 MHz. This section analyzes the obtained results. Second, due to the current unavailability of the ST-I platform (including the host, the four ReISC DSP cores, and the Free RTOS), and since the RRM framework needs a host and an OS, the RRM framework will be evaluated on an ARM-based TI OMAP 4460 embedded platform running at 700 MHz. Obtained results will be reported in Deliverable D4.2.2, entitled Final report on evaluation of design tools Binary size of GRM implementation As explained in Section 0, the GRM is implemented in C and compiled into a library libgrm.a which is then linked to the application. The current binary size of libgrm.a is 107 KB on the X86-based platform, without taking the GRM databases into account. Two GRM databases are required to store the high-level specification of the platform, the IP core types, and the application configurations: The current binary size of the platform database is: * ipcore_nr + 54 * ipcore_type_nr Page 49

50 + 20 * ipcore_type_nr * power_mode_nr * (1 + power_mode_nr) bytes, where ipcore_nr denotes the number of IP cores in the platform, ipcore_type_nr denotes the number of IP core types, and power_mode_nr denotes the maximum number of power modes per IP core type. E.g., in our demonstrator, where the platform consists of 4 SW cores, with maximum 5 available power modes, the binary size of the platform database is 1506 bytes. Similarly, the current binary size of the application configuration database is: ( * job_nr) * appl_config_nr bytes, where job_nr denotes the number of jobs in the application and appl_config_nr denotes the number of application configurations. E.g., in our demonstrator, where the application consists of three jobs (i.e., alarm processing, audio activity detection, and video image processing), the binary size of an application configuration is 88 bytes Performance overhead and energy gain Figure 21: Energy-per-frame evolution with and without GRM Figure 21 illustrates the energy-per-frame evolution of our demonstrator for two platform constraints (different energy budgets, same battery duration) with and without the GRM. Due to an optimized adaptive selection of application configurations, our GRM allows optimizing the QoS of the application while keeping the platform battery alive during its whole required duration. In contrast, this cannot be ensured without such an RRM framework. Without the GRM, only one application configuration may be activated from the start of the application. With the GRM, several ones may be successively activated during the run of the application: Page 50

51 in this demonstrator, among the 16 available application configurations, 4 (resp. 9) are activated to satisfy the platform constraint 1 (resp. 2). Figure 22: Performance of GRM initialization Figure 22 illustrates the CPU processing of the GRM services executed at initialization on the X86-based platform. Both GRM_ConfigurePlatform() and GRM_ConfigureApplication() require more processing due to parsing of high-level platform specification and available application configurations. Nevertheless, these services are executed only once without any run-time overhead. Page 51

52 Figure 23: Performance of GRM run-time services Figure 23 illustrates the CPU processing of the GRM services executed at run time on the X86-based platform. The performance of GRM_SelectApplicationConfiguration() includes the one of GRM_EstimateElapsedEnergy(). The performance of GRM_ReconfigurePlatform() includes all waiting times for IP cores being ready and for IP core reconfiguration. Nevertheless the average execution time on the X86-based platform is still < 0.5 ms. Figure 24: GRM CPU processing overhead Page 52

53 Finally, a global analysis of the GRM CPU processing overhead compared to the application processing shows an overhead of only 1.16% on the X86-based MHz (see Figure 24). Observe that this overhead is only 0.6% on a TI OMAP 4460 embedded platform running at 700 MHz. This shows that the overhead of the proposed run-time energy management mechanism is negligible and there is no significant impact on the application. In conclusion, the experiments performed so far should indicate that the proposed combined approach of design-time exploration of application configurations with run-time optimization can improve the overall QoE of the system. Page 53

54 10 RRM for ultra-low power platforms This section presents the RRM implementation in the COMPLEX use case 1. Since the target platform consists of a single core, the RRM instantiation is very simple. It focuses on a new heuristic for fine-grain DVFS in the set of required run-time decisions introduced in Section 4.2. This heuristic consists of a methodology and a tool-chain developed to perform the optimization of the energy consumption associated to software execution of a tiny embedded system. The optimization is made combining the SW estimation process (developed in Task 3.2 Embedded Software Optimization) together with design space exploration methodologies (developed in Task 3.4 Design Space Exploration) in order to exploit finegrain DVFS. The proposed approach operates at compile-time, with the granularity of single C function, by augments the source code with calls directed to drive at run-time the voltage and frequency scaling of the core. The design-part of the methodology uses the concepts developed in Task 3.2 (see for details Deliverable D3.2.2 [5]) for software estimation and Task 3.4 (see for details Deliverable D3.4.2 [2]) for exploration. Regarding the SW estimation part, the methodology developed in T3.2 has been enhanced with the modification of expressing the energy costs of the basic entities of the LLVM intermediate representation in terms of effective capacitance, rather than as average current absorption per clock-cycle. This choice allows accumulating energy figures independently from the actual clock frequency and core supply voltage. This approach is crucial for exploring the different voltage/frequency operating modes Overview of the Methodology The methodology that we are presenting here uses the power modes of the target platform in predetermined positions of the code as the knobs (parameters) of the methodology. In fact, due to the single core version of the platform and to the application characteristics, no different mapping or application reconfigurations are needed. Thus, the instantiation of the global view of the RRM manages only the power states of the platform, making it acting as a Power Manager. The methodology we adopted supposes that the target processors provides a set of operating modes (Voltage and Frequency pairs), which will be referred to in the following as explicit modes. Obviously, at a certain point in time, the target processor can run only to single operating mode. In the approach, we considered the function as the smallest granularity for the analysis. This means that if a function is assigned to an operating modes OM1=<V1, F1>, the processors runs at the frequency F1 considering the supply voltage F1. Moreover, in our methodology when a function is assigned to a specific explicit mode, the tool-chain will augment the C source code by inserting RRM calls (that wrap the platformspecific library function calls) devoted to switch to the selected mode on entering the function and back to the previous mode, on exiting. In addition to the explicit modes, the proposed methodology adds two implicit modes (described in the following) that can be used from the application developer to guide the run- Page 54

55 time manager without explicitly forcing to a particular frequency but deriving it from the execution context. - Force. When the mode of a function is set to force to a specific explicit mode, all its callers will be executed in the same operating condition as the caller, regardless of their specific explicit assignments. The force mode is especially useful as a mean to classify a certain function as having a high importance, so high that its execution should not undergo any operating condition changes. - Inherit. The inherit mode has, in a sense, a dual meaning. It specifies that the mode of a function is not explicitly set, but is rather inherited from its caller. Thanks to this mode, small functions that do not constitute a critical portion of the task on their own, but can be let free to operate under the control of their callers. These special modes provide a simple yet powerful means to have a function being executed in different modes, depending on its context, thus providing more flexibility to the approach. The timing diagram of Figure 25 shows the effects of three different mode assignments on the processor operating conditions. In the figure, the labels X1 and X2 indicate explicit mode assignments, F1 and F2 force modes, and I the inherit mode. Figure 25: Mode Assignment Effects A part from the obvious effects of explicit assignments shown in Figure 25(a), it is interesting to observe the behaviour resulting from forcing and inheriting modes. To this purpose, we concentrate the attention on function f1(). When f2() forces the mode 1 and calls f1(), Figure 25-(b) shows that the explicit mode assignment of f2() is ignored. On the other hand, when called directly form f3(), f1() is executed in mode 2, as specified by its explicit assignment. Furthermore, observing the diagram in Figure 25-(c), it can be noted that the operating mode in which f1() is executed is always that of its caller. Page 55

56 Though the behaviour in these two last cases is the same, it is obtained in two dual ways: in the first case the caller imposes its mode to all callers, while in the second is the caller that delegated the decision on the operating mode to its caller. Once explained the meaning of the different operating modes (both explicit and implicit cases), it is clear to define the optimization goal. The goal of the design-time optimization is to find the assignment of a mode to each function in such a way to minimize the overall energy of a program run with a constraint on the maximum allowed time for the task. On the other side, as already explained before, the goal of the run-time part of the methodology is to deploy at run-time the operating modes defined at design-time. Considering N possible processor operating modes and F functions, the exact solution of the problem requires examining N^F assignments. Given the exponential complexity, this problem becomes soon intractable. For 4 modes and 20 functions, for example, the number of assignments is close to 3.5 billions, which makes using a heuristic design space exploration approach necessary Tool-Flow This section is intended to describe the estimation and optimization flow adopted. More detail about the estimation methodology/tool used and by the exploration tool can be found in D3.2.2 Final report on software and hardware optimization and D3.4.3 Final report on design space exploration respectively. The implemented estimation flow is based on the LLVM compiler infrastructure, upon which the toolset SWAT has been developed. A simplified view of the portion of the flow strictly related to the estimation process necessary for the target problem is outlined in Figure 26. The input is the set of C source files collecting the code of task being considered, a model of the target CPU (cpu.lib) and an assignment of modes to functions (task.modes). Performing a sequence of transformations the flow produces the energy and time estimates T and E. Note that, in this context, the mode assignment file is consider constant, later when talking about the exploration part, this constraint will be removed. Page 56

57 Figure 26: Simplified SWAT estimation flow The transformation performed by the tools of the flow have been collected into four phases, indicated by numbered black boxes, and are detailed in the following. 1) Front-End. This phase compiles each source file into architecture-independent LLVM assembly code, which is then used to build a model (*.bbmodel) of each basic block consisting of the list of op-codes, functions called, size, execution time in clock cycle and effective capacitance. Data for timing and energy characterization is in the target CPU library (cpu.lib), which is the result of the processor characterization. 2) Instrumentation. Instrumentation is performed by first enriching each basic block of the LLVM code with all the relevant figures in the form of a special comment (metainstrumentation), then by translating the comments into actual calls to tracing functions based on expansion rules collected into an instrumentation library. The output of this phase is a new, instrumented, LLVM assembly file (*.i.ll). 3) Back-End. The back-end of the SWAT flow performs two main operations. First, it translates all the instrumented LLVM files into host assembly code, which is then assembled and linked into an executable program. Secondly, it runs the executable and collects the execution trace (bbtrace) consisting of a list of the identifiers of the basic blocks that have been executed. 4) Post-Processing. The post-processing phase analyzes the execution trace and combines the dynamic information with the static costs models, accounting for the specific operating modes specified in the allocation file (task.modes). This produces the total timing and energy of the specific run of the task. Page 57

58 This estimation flow is then combined with the optimization engine MOST that performs design space exploration over the possible mode assignment. It is worth noting that steps 1--3 of the estimation flow need not to be repeated for each assignment. They are, in fact, performed only once with the goal of producing an execution trace and a set of cost models. Steps 1--3 (and in particular step 3, that involves task execution) are much more timeconsuming than the post-processing phase only. The proposed tool-chain is thus efficient enough to enable design space exploration with simulation-in-the-loop. Figure 27: SWAT/MOST optimization flow The optimization flow, sketched in Figure 27, is built around the design exploration engine MOST. 5) Design space exploration. The tool requires a configuration file (task.dse) specifying which are the parameters and which values each parameter can assume. In our case the parameter are the modes of each function and the values are integers in the range between 1 and N corresponding to the target processor modes. Based on this configuration, the DSE engine generates a specific mode assignment (task.modes) which is fed as input, together with the execution trace and the basic block models, to the SWAT postprocessor. The execution time and energy estimated by SWAT are used by MOST to selects a new, potentially better, assignment. This loop is repeated until a sub-optimal assignment (task.opt.modes) is found. 6) Code augmentation. Using a set of predefined macros and a simple code generator, this tool adds to the original source code, at the beginning and at the end of each function, the suitable code performing mode switching. Those inserted calls are the lightweight instantiation of the RRM. In order to better clarify the code augmentation step, here in the follow will be presented a simple example explaining how it has been implemented. Page 58

59 The code generation is the last step of the optimization flow and its goal is to augment the original source code with calls to suitable and user-definable APIs devoted to changing the operating mode of the processor on entry and/or on exit of a function. This process requires adding two macros at the beginning and at the exit of each function that is considered in the exploration process. Considering, for example, a function: and assuming that the function as a single exit point, the only task left to the programmer is to modify the function definition as follows: The expansion of the two macros generates new code that depends in turn on other macros built based on the function name passed as argument, which, in our example would be VFS_FMODE_foo. Since functions can not only be assigned explicit modes, but also can be defined as forcing or inheriting the operating mode of caller/callee, it is necessary to implement a sort of mode stack where to save the mode of the current function before entering one of its callees and restore this mode on exit. Rather than using a separate stack, our mechanism is based on four entities, namely: - A global variable vfs_cm storing the current mode. The variable is static in a support library that need to be compiled along with the application. - A global variable vfs_fm indicating whether the current mode is being forced or if it is explicit/inherited. This variable is also static in the support library. - A variable vfs_sm, local to each function, holding the saved mode, i.e. the current mode upon function entry. - A macro VFS_SET_MODE(m) that is platform specific and will be expanded to a call to the suitable function exposed by the target API and responsible of changing the operating mode. Such a function will usually write the relevant CPU registers. Page 59

60 Thanks to the local variable added to each function, a separate stack for modes is not necessary as it is distributed into the activation frames of the function themselves. Using these variables and exploiting the macros, the code of the original function is transformed into that shown hereafter. It is worth noting that, since macros are expanded before compile time, only one of the branches of the conditional constructs in the preamble will actually be compiled, the other being dead code. Although a minimal overhead is introduced by this mechanism, the macrobased approach tends to limit it to a minimum. Despite of up to now the methodology has been tested only by considering the power states of the software processor, the same can be applied with simple extensions to the state of the peripherals in different voltage and frequency islands Experimental Results The experimental results presented here after refer to the STMicroelectronics ultra low power ReISC core. Due to the public nature of the document, energy and timing figures appearing in the graphs of this section have been scaled by a constant factor in order not to disclose proprietary information. The ReISC core considered in this work presents provides dynamic voltage and frequency scaling capabilities over three different modes. In order to validate the approach on a large number of differently structured tasks, synthetic code has been used. To this purpose a parametric tool for code generation has been developed. It can generate random programs based on the parameters summarized in the following table along with the ranges used to generate the specific tasks for which results are reported. Page 60

61 Let us start considering a simple example, with three functions only. In this case the possible assignments are 5^3=125. Since the exploration engine do not perform an exhaustive analysis, much fewer assignments have been generated, as shown in the plot of Figure 28. Figure 28: Energy and Time for a 3-functions task As it can be noted, to a reduction of the execution time corresponds an increase in the energy consumption. In this example the execution time constraint was set to 1.65us. The solution found, highlighted in the plot was characterized by an execution time of 1.626us and an energy consumption of 412nJ. This corresponds to an average power consumption of 253uW, as the plot of Figure 29. Figure 29: Average Power for a 3-functions task Page 61

62 Finally, the results obtained applying the proposed optimization methodology to a set of 33 randomly generated tasks are reported in Figure 30. The plot shows the energy consumption of the optimized task (black bars) with that obtained maintaining the system either in the highest voltage/frequency mode (white bars) or in its deepest low-power modes. It must be noted that the optimized tasks (and of course the tasks run in full active mode) do respect their deadlines, while the tasks run in the lowest power mode do not. The energy gains obtained by the mode allocation technique with respect to the full active mode of the processor are shown in Figure 31, where a maximum energy saving of 29.4% can be observed. The average gain for the test cases considered is 20.1%. Figure 30: Absolute energy consumption comparison Figure 31: Energy consumption gain w.r.t full voltage and frequency mode Page 62

COdesign and power Management in PLatformbased design space EXploration. Preliminary report on run-time management

COdesign and power Management in PLatformbased design space EXploration. Preliminary report on run-time management FP7-ICT-2009-4 (247999) COMPLEX COdesign and power Management in PLatformbased design space EXploration Project Duration 2009-12-01 2012-11-30 Type IP WP no. Deliverable no. Lead participant WP3 D3.5.1

More information

RUN-TIME RESOURCE MANAGEMENT BASED ON DESIGN SPACE EXPLORATION

RUN-TIME RESOURCE MANAGEMENT BASED ON DESIGN SPACE EXPLORATION RUN-TIME RESOURCE MANAGEMENT BASED ON DESIGN SPACE EXPLORATION CHANTAL YKMAN-COUVREUR, PHILIPP HARTMANN, GIANLUCA PALERMO, FABIEN COLAS-BIGEY, LAURENT SAN OUTLINE Purpose and target Run-time resource management

More information

Choosing IP-XACT IEEE 1685 standard as a unified description for timing and power performance estimations in virtual platforms platforms

Choosing IP-XACT IEEE 1685 standard as a unified description for timing and power performance estimations in virtual platforms platforms hoosing IP-XAT IEEE 1685 standard as a unified description for timing and power performance estimations in virtual platforms platforms Emmanuel Vaumorin (Magillem Design Services) Motivation New needs

More information

Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties

Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties EMC2 Project Conference Paris, France Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties Funded by the EC under Grant Agreement 611146 Kim Grüttner

More information

Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties

Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties FP7-ICT-2013-10 (611146) CONTREX Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties Project Duration 2013-10-01 2016-09-30 Type IP WP no. Deliverable

More information

System Level Design with IBM PowerPC Models

System Level Design with IBM PowerPC Models September 2005 System Level Design with IBM PowerPC Models A view of system level design SLE-m3 The System-Level Challenges Verification escapes cost design success There is a 45% chance of committing

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Applications to MPSoCs

Applications to MPSoCs 3 rd Workshop on Mapping of Applications to MPSoCs A Design Exploration Framework for Mapping and Scheduling onto Heterogeneous MPSoCs Christian Pilato, Fabrizio Ferrandi, Donatella Sciuto Dipartimento

More information

A Predictable RTOS. Mantis Cheng Department of Computer Science University of Victoria

A Predictable RTOS. Mantis Cheng Department of Computer Science University of Victoria A Predictable RTOS Mantis Cheng Department of Computer Science University of Victoria Outline I. Analysis of Timeliness Requirements II. Analysis of IO Requirements III. Time in Scheduling IV. IO in Scheduling

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9 Implementing Scheduling Algorithms Real-Time and Embedded Systems (M) Lecture 9 Lecture Outline Implementing real time systems Key concepts and constraints System architectures: Cyclic executive Microkernel

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann

AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel. Alexander Züpke, Marc Bommert, Daniel Lohmann AUTOBEST: A United AUTOSAR-OS And ARINC 653 Kernel Alexander Züpke, Marc Bommert, Daniel Lohmann alexander.zuepke@hs-rm.de, marc.bommert@hs-rm.de, lohmann@cs.fau.de Motivation Automotive and Avionic industry

More information

ReconOS: An RTOS Supporting Hardware and Software Threads

ReconOS: An RTOS Supporting Hardware and Software Threads ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming

More information

Modelling, simulation, and advanced tracing for extra-functional properties in SystemC/TLM

Modelling, simulation, and advanced tracing for extra-functional properties in SystemC/TLM Modelling, simulation, and advanced tracing for extra-functional properties in SystemC/TLM Philipp A. Hartmann philipp.hartmann@offis.de OFFIS Institute for Information Technology R&D Division Transportation

More information

System Architecture Directions for Networked Sensors[1]

System Architecture Directions for Networked Sensors[1] System Architecture Directions for Networked Sensors[1] Secure Sensor Networks Seminar presentation Eric Anderson System Architecture Directions for Networked Sensors[1] p. 1 Outline Sensor Network Characteristics

More information

Test and Verification Solutions. ARM Based SOC Design and Verification

Test and Verification Solutions. ARM Based SOC Design and Verification Test and Verification Solutions ARM Based SOC Design and Verification 7 July 2008 1 7 July 2008 14 March 2 Agenda System Verification Challenges ARM SoC DV Methodology ARM SoC Test bench Construction Conclusion

More information

EC EMBEDDED AND REAL TIME SYSTEMS

EC EMBEDDED AND REAL TIME SYSTEMS EC6703 - EMBEDDED AND REAL TIME SYSTEMS Unit I -I INTRODUCTION TO EMBEDDED COMPUTING Part-A (2 Marks) 1. What is an embedded system? An embedded system employs a combination of hardware & software (a computational

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17, Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17, 2014 1 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems

More information

V8uC: Sparc V8 micro-controller derived from LEON2-FT

V8uC: Sparc V8 micro-controller derived from LEON2-FT V8uC: Sparc V8 micro-controller derived from LEON2-FT ESA Workshop on Avionics Data, Control and Software Systems Noordwijk, 4 November 2010 Walter Errico SITAEL Aerospace phone: +39 0584 388398 e-mail:

More information

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer) ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be

Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be Hello, and welcome to this presentation of the STM32L4 power controller. The STM32L4 s power management functions and all power modes will also be covered in this presentation. 1 Please note that this

More information

AN4749 Application note

AN4749 Application note Application note Managing low-power consumption on STM32F7 Series microcontrollers Introduction The STM32F7 Series microcontrollers embed a smart architecture taking advantage of the ST s ART- accelerator

More information

Ultra Low Power Microcontroller - Design Criteria - June 2017

Ultra Low Power Microcontroller - Design Criteria - June 2017 Ultra Low Power Microcontroller - Design Criteria - June 2017 Agenda 1. Low power technology features 2. Intelligent Clock Generator 3. Short wake-up times 4. Intelligent memory access 5. Use case scenario

More information

Attack Your SoC Power Challenges with Virtual Prototyping

Attack Your SoC Power Challenges with Virtual Prototyping Attack Your SoC Power Challenges with Virtual Prototyping Stefan Thiel Gunnar Braun Accellera Systems Initiative 1 Agenda Part #1: Power-aware Architecture Definition Part #2: Power-aware Software Development

More information

Hardware Design and Simulation for Verification

Hardware Design and Simulation for Verification Hardware Design and Simulation for Verification by N. Bombieri, F. Fummi, and G. Pravadelli Universit`a di Verona, Italy (in M. Bernardo and A. Cimatti Eds., Formal Methods for Hardware Verification, Lecture

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

SWE 760 Lecture 1: Introduction to Analysis & Design of Real-Time Embedded Systems

SWE 760 Lecture 1: Introduction to Analysis & Design of Real-Time Embedded Systems SWE 760 Lecture 1: Introduction to Analysis & Design of Real-Time Embedded Systems Hassan Gomaa References: H. Gomaa, Chapters 1, 2, 3 - Real-Time Software Design for Embedded Systems, Cambridge University

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2013 Agenda Introduction

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu

More information

Parameterized System Design

Parameterized System Design Parameterized System Design Tony D. Givargis, Frank Vahid Department of Computer Science and Engineering University of California, Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu Abstract Continued growth

More information

Abstraction Layers for Hardware Design

Abstraction Layers for Hardware Design SYSTEMC Slide -1 - Abstraction Layers for Hardware Design TRANSACTION-LEVEL MODELS (TLM) TLMs have a common feature: they implement communication among processes via function calls! Slide -2 - Abstraction

More information

Modeling and SW Synthesis for

Modeling and SW Synthesis for Modeling and SW Synthesis for Heterogeneous Embedded Systems in UML/MARTE Hector Posadas, Pablo Peñil, Alejandro Nicolás, Eugenio Villar University of Cantabria Spain Motivation Design productivity it

More information

AN Sleep programming for NXP bridge ICs. Document information

AN Sleep programming for NXP bridge ICs. Document information Rev. 01 5 January 2007 Application note Document information Info Keywords Abstract Content SC16IS750, Bridge IC, Sleep programming The sleep programming of NXP Bridge ICs such as SC16IS750 (I 2 C-bus/SPI

More information

A framework for automatic generation of audio processing applications on a dual-core system

A framework for automatic generation of audio processing applications on a dual-core system A framework for automatic generation of audio processing applications on a dual-core system Etienne Cornu, Tina Soltani and Julie Johnson etienne_cornu@amis.com, tina_soltani@amis.com, julie_johnson@amis.com

More information

SoC Design Environment with Automated Configurable Bus Generation for Rapid Prototyping

SoC Design Environment with Automated Configurable Bus Generation for Rapid Prototyping SoC esign Environment with utomated Configurable Bus Generation for Rapid Prototyping Sang-Heon Lee, Jae-Gon Lee, Seonpil Kim, Woong Hwangbo, Chong-Min Kyung P PElectrical Engineering epartment, KIST,

More information

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency

MediaTek CorePilot 2.0. Delivering extreme compute performance with maximum power efficiency MediaTek CorePilot 2.0 Heterogeneous Computing Technology Delivering extreme compute performance with maximum power efficiency In July 2013, MediaTek delivered the industry s first mobile system on a chip

More information

European Component Oriented Architecture (ECOA ) Collaboration Programme: Architecture Specification Part 2: Definitions

European Component Oriented Architecture (ECOA ) Collaboration Programme: Architecture Specification Part 2: Definitions European Component Oriented Architecture (ECOA ) Collaboration Programme: Part 2: Definitions BAE Ref No: IAWG-ECOA-TR-012 Dassault Ref No: DGT 144487-D Issue: 4 Prepared by BAE Systems (Operations) Limited

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF ELECTRONICS AND INSTRUMENTATION ENGINEERING QUESTION BANK VI SEMESTER EE6602 EMBEDDED SYSTEMS Regulation 2013 Academic Year

More information

COMPLEX EMBEDDED SYSTEMS

COMPLEX EMBEDDED SYSTEMS COMPLEX EMBEDDED SYSTEMS Embedded System Design and Architectures Summer Semester 2012 System and Software Engineering Prof. Dr.-Ing. Armin Zimmermann Contents System Design Phases Architecture of Embedded

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

Intel Research mote. Ralph Kling Intel Corporation Research Santa Clara, CA

Intel Research mote. Ralph Kling Intel Corporation Research Santa Clara, CA Intel Research mote Ralph Kling Intel Corporation Research Santa Clara, CA Overview Intel mote project goals Project status and direction Intel mote hardware Intel mote software Summary and outlook Intel

More information

MARTE Based Modeling Tools Usage Scenarios in Avionics Software Development Workflows

MARTE Based Modeling Tools Usage Scenarios in Avionics Software Development Workflows MARTE Based Modeling Tools Usage Scenarios in Avionics Software Development Workflows Alessandra Bagnato, Stefano Genolini Txt e-solutions FMCO 2010, Graz, 29 November 2010 Overview MADES Project and MADES

More information

Interoperability in Aerospace Public Use Case of CRYSTAL project

Interoperability in Aerospace Public Use Case of CRYSTAL project Interoperability in Aerospace Public Use Case of CRYSTAL project December 3 rd, 2013. Francesco Brunetti, Politecnico di Torino Summary CRYSTAL Overview; CRYSTAL WP2.08: Public Use Case; Public Use Case,

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

Design and Verification of FPGA Applications

Design and Verification of FPGA Applications Design and Verification of FPGA Applications Giuseppe Ridinò Paola Vallauri MathWorks giuseppe.ridino@mathworks.it paola.vallauri@mathworks.it Torino, 19 Maggio 2016, INAF 2016 The MathWorks, Inc. 1 Agenda

More information

Embedded System Design and Modeling EE382V, Fall 2008

Embedded System Design and Modeling EE382V, Fall 2008 Embedded System Design and Modeling EE382V, Fall 2008 Lecture Notes 4 System Design Flow and Design Methodology Dates: Sep 16&18, 2008 Scribe: Mahesh Prabhu SpecC: Import Directive: This is different from

More information

Overview of Microcontroller and Embedded Systems

Overview of Microcontroller and Embedded Systems UNIT-III Overview of Microcontroller and Embedded Systems Embedded Hardware and Various Building Blocks: The basic hardware components of an embedded system shown in a block diagram in below figure. These

More information

A Seamless Tool Access Architecture from ESL to End Product

A Seamless Tool Access Architecture from ESL to End Product A Seamless Access Architecture from ESL to End Product Albrecht Mayer Infineon Technologies AG, 81726 Munich, Germany albrecht.mayer@infineon.com Abstract access to processor cores is needed from the first

More information

System On Chip: Design & Modelling (SOC/DAM) 1 R: Verilog RTL Design with examples.

System On Chip: Design & Modelling (SOC/DAM) 1 R: Verilog RTL Design with examples. System On Chip: Design & Modelling (SOC/DAM) Exercises Here is the first set of exercises. These are intended to cover subject groups 1-4 of the SOC/DAM syllabus (R, SC, SD, ESL). These questions are styled

More information

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017!

HSA Foundation! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Advanced Topics on Heterogeneous System Architectures HSA Foundation! Politecnico di Milano! Seminar Room (Bld 20)! 15 December, 2017! Antonio R. Miele! Marco D. Santambrogio! Politecnico di Milano! 2

More information

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks

Last Time. Making correct concurrent programs. Maintaining invariants Avoiding deadlocks Last Time Making correct concurrent programs Maintaining invariants Avoiding deadlocks Today Power management Hardware capabilities Software management strategies Power and Energy Review Energy is power

More information

Introduction to MLM. SoC FPGA. Embedded HW/SW Systems

Introduction to MLM. SoC FPGA. Embedded HW/SW Systems Introduction to MLM Embedded HW/SW Systems SoC FPGA European SystemC User s Group Meeting Barcelona September 18, 2007 rocco.le_moigne@cofluentdesign.com Agenda Methodology overview Modeling & simulation

More information

Embedded Systems: Hardware Components (part II) Todor Stefanov

Embedded Systems: Hardware Components (part II) Todor Stefanov Embedded Systems: Hardware Components (part II) Todor Stefanov Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded

More information

Chapter 2 M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration

Chapter 2 M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration Chapter 2 M3-SCoPE: Performance Modeling of Multi-Processor Embedded Systems for Fast Design Space Exploration Hector Posadas, Sara Real, and Eugenio Villar Abstract Design Space Exploration for complex,

More information

Network Embedded Systems Sensor Networks Fall Hardware. Marcus Chang,

Network Embedded Systems Sensor Networks Fall Hardware. Marcus Chang, Network Embedded Systems Sensor Networks Fall 2013 Hardware Marcus Chang, mchang@cs.jhu.edu 1 Embedded Systems Designed to do one or a few dedicated and/or specific functions Embedded as part of a complete

More information

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006

Product Technical Brief S3C2412 Rev 2.2, Apr. 2006 Product Technical Brief S3C2412 Rev 2.2, Apr. 2006 Overview SAMSUNG's S3C2412 is a Derivative product of S3C2410A. S3C2412 is designed to provide hand-held devices and general applications with cost-effective,

More information

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006

Product Technical Brief S3C2413 Rev 2.2, Apr. 2006 Product Technical Brief Rev 2.2, Apr. 2006 Overview SAMSUNG's is a Derivative product of S3C2410A. is designed to provide hand-held devices and general applications with cost-effective, low-power, and

More information

Microcontroller basics

Microcontroller basics FYS3240 PC-based instrumentation and microcontrollers Microcontroller basics Spring 2017 Lecture #4 Bekkeng, 30.01.2017 Lab: AVR Studio Microcontrollers can be programmed using Assembly or C language In

More information

Introduction to Embedded Systems

Introduction to Embedded Systems Introduction to Embedded Systems Outline Embedded systems overview What is embedded system Characteristics Elements of embedded system Trends in embedded system Design cycle 2 Computing Systems Most of

More information

MaRTE-OS: Minimal Real-Time Operating System for Embedded Applications

MaRTE-OS: Minimal Real-Time Operating System for Embedded Applications MaRTE-OS: Minimal Real-Time Operating System for Embedded Applications FOSDEM 2009 Ada Developer Room Miguel Telleria de Esteban Daniel Sangorrin Universidad de Cantabria Computadores y Tiempo Real http://www.ctr.unican.es

More information

A MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models

A MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models A MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models Int. Conf. on HW/SW codesign and HW synthesis (CODES-ISSS 2012) Embedded

More information

CHAPTER 6 STATISTICAL MODELING OF REAL WORLD CLOUD ENVIRONMENT FOR RELIABILITY AND ITS EFFECT ON ENERGY AND PERFORMANCE

CHAPTER 6 STATISTICAL MODELING OF REAL WORLD CLOUD ENVIRONMENT FOR RELIABILITY AND ITS EFFECT ON ENERGY AND PERFORMANCE 143 CHAPTER 6 STATISTICAL MODELING OF REAL WORLD CLOUD ENVIRONMENT FOR RELIABILITY AND ITS EFFECT ON ENERGY AND PERFORMANCE 6.1 INTRODUCTION This chapter mainly focuses on how to handle the inherent unreliability

More information

STM32 F0 Value Line. Entry-level MCUs

STM32 F0 Value Line. Entry-level MCUs STM32 F0 Value Line Entry-level MCUs Key Messages 2 STM32 F0: Is the Cortex -M0 core generated with ST s STM32 DNA, for cost sensitive designs. The STM32 F0 is benefiting of STM32 DNA, providing the essential

More information

IBM. Software Development Kit for Multicore Acceleration, Version 3.0. SPU Timer Library Programmer s Guide and API Reference

IBM. Software Development Kit for Multicore Acceleration, Version 3.0. SPU Timer Library Programmer s Guide and API Reference IBM Software Development Kit for Multicore Acceleration, Version 3.0 SPU Timer Library Programmer s Guide and API Reference Note: Before using this information and the product it supports, read the information

More information

EVE2 BLE Datasheet. The EVE Platform features standardized IO, common OS and drivers and ultra-low power consumption.

EVE2 BLE Datasheet. The EVE Platform features standardized IO, common OS and drivers and ultra-low power consumption. Datasheet Main features Software Micro-kernel with scheduling, power and clock management Contiki OS Tickless design Drivers for peripherals Bluetooth 4.1 compliant low energy singlemode protocol stack

More information

The Architects View Framework: A Modeling Environment for Architectural Exploration and HW/SW Partitioning

The Architects View Framework: A Modeling Environment for Architectural Exploration and HW/SW Partitioning 1 The Architects View Framework: A Modeling Environment for Architectural Exploration and HW/SW Partitioning Tim Kogel European SystemC User Group Meeting, 12.10.2004 Outline 2 Transaction Level Modeling

More information

DIOGENE (Digital I/O GENerator Engine) Project Requirements

DIOGENE (Digital I/O GENerator Engine) Project Requirements SCO-DIOGENE-0-- 1 of 13 DIOGENE (Digital I/O GENerator Engine) Project Requirements Document : SCO-DIOGENE-0-.doc Revision : SCO-DIOGENE-0-- 2 of 13 APPROVAL Name Signature Date Prepared by Sergio Cigoli

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EE6602- EMBEDDED SYSTEMS QUESTION BANK UNIT I - INTRODUCTION TO EMBEDDED SYSTEMS PART A

DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EE6602- EMBEDDED SYSTEMS QUESTION BANK UNIT I - INTRODUCTION TO EMBEDDED SYSTEMS PART A DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING EE6602- EMBEDDED SYSTEMS QUESTION BANK UNIT I - INTRODUCTION TO EMBEDDED SYSTEMS PART A 1. Define system. A system is a way of working, organizing or

More information

How useful is the UML profile SPT without Semantics? 1

How useful is the UML profile SPT without Semantics? 1 How useful is the UML profile SPT without Semantics? 1 Susanne Graf, Ileana Ober VERIMAG 2, avenue de Vignate - F-38610 Gières - France e-mail:{susanne.graf, Ileana.Ober}@imag.fr http://www-verimag.imag.fr/~{graf,iober}

More information

S2C K7 Prodigy Logic Module Series

S2C K7 Prodigy Logic Module Series S2C K7 Prodigy Logic Module Series Low-Cost Fifth Generation Rapid FPGA-based Prototyping Hardware The S2C K7 Prodigy Logic Module is equipped with one Xilinx Kintex-7 XC7K410T or XC7K325T FPGA device

More information

Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations

Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations FZI Forschungszentrum Informatik at the University of Karlsruhe Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations Oliver Bringmann 1 RESEARCH ON YOUR BEHALF Outline

More information

Virtual PLATFORMS for complex IP within system context

Virtual PLATFORMS for complex IP within system context Virtual PLATFORMS for complex IP within system context VP Modeling Engineer/Pre-Silicon Platform Acceleration Group (PPA) November, 12th, 2015 Rocco Jonack Legal Notice This presentation is for informational

More information

Energy consumption in embedded systems; abstractions for software models, programming languages and verification methods

Energy consumption in embedded systems; abstractions for software models, programming languages and verification methods Energy consumption in embedded systems; abstractions for software models, programming languages and verification methods Florence Maraninchi orcid.org/0000-0003-0783-9178 thanks to M. Moy, L. Mounier,

More information

Executing Evaluations over Semantic Technologies using the SEALS Platform

Executing Evaluations over Semantic Technologies using the SEALS Platform Executing Evaluations over Semantic Technologies using the SEALS Platform Miguel Esteban-Gutiérrez, Raúl García-Castro, Asunción Gómez-Pérez Ontology Engineering Group, Departamento de Inteligencia Artificial.

More information

A Versatile Instrument for Analyzing and Testing the Interfaces of Peripheral Devices

A Versatile Instrument for Analyzing and Testing the Interfaces of Peripheral Devices Reprint A Versatile Instrument for Analyzing and Testing the Interfaces of Peripheral Devices P. Savvopoulos, M. Varsamou and Th. Antonakopoulos The 3rd International Conference on Systems, Signals & Devices

More information

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements.

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements. Contemporary Design We have been talking about design process Let s now take next steps into examining in some detail Increasing complexities of contemporary systems Demand the use of increasingly powerful

More information

Semantics-Based Integration of Embedded Systems Models

Semantics-Based Integration of Embedded Systems Models Semantics-Based Integration of Embedded Systems Models Project András Balogh, OptixWare Research & Development Ltd. n 100021 Outline Embedded systems overview Overview of the GENESYS-INDEXYS approach Current

More information

TRESCCA Trustworthy Embedded Systems for Secure Cloud Computing

TRESCCA Trustworthy Embedded Systems for Secure Cloud Computing TRESCCA Trustworthy Embedded Systems for Secure Cloud Computing IoT Week 2014, 2014 06 17 Ignacio García Wellness Telecom Outline Welcome Motivation Objectives TRESCCA client platform SW framework for

More information

Multiprocessor scheduling

Multiprocessor scheduling Chapter 10 Multiprocessor scheduling When a computer system contains multiple processors, a few new issues arise. Multiprocessor systems can be categorized into the following: Loosely coupled or distributed.

More information

How to get realistic C-states latency and residency? Vincent Guittot

How to get realistic C-states latency and residency? Vincent Guittot How to get realistic C-states latency and residency? Vincent Guittot Agenda Overview Exit latency Enter latency Residency Conclusion Overview Overview PMWG uses hikey960 for testing our dev on b/l system

More information

Reliable Embedded Multimedia Systems?

Reliable Embedded Multimedia Systems? 2 Overview Reliable Embedded Multimedia Systems? Twan Basten Joint work with Marc Geilen, AmirHossein Ghamarian, Hamid Shojaei, Sander Stuijk, Bart Theelen, and others Embedded Multi-media Analysis of

More information

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components

Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components Analyzing and Debugging Performance Issues with Advanced ARM CoreLink System IP Components By William Orme, Strategic Marketing Manager, ARM Ltd. and Nick Heaton, Senior Solutions Architect, Cadence Finding

More information

MULTICUBE: Multi-Objective Design Space Exploration of Multi-Core Architectures

MULTICUBE: Multi-Objective Design Space Exploration of Multi-Core Architectures !000111000 IIIEEEEEEEEE AAAnnnnnnuuuaaalll SSSyyymmmpppooosssiiiuuummm ooonnn VVVLLLSSSIII MULTICUBE: Multi-Objective Design Space Exploration of Multi-Core Architectures Cristina Silvano, William Fornaciari,

More information

Lecture 7: Introduction to Co-synthesis Algorithms

Lecture 7: Introduction to Co-synthesis Algorithms Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics

Veloce2 the Enterprise Verification Platform. Simon Chen Emulation Business Development Director Mentor Graphics Veloce2 the Enterprise Verification Platform Simon Chen Emulation Business Development Director Mentor Graphics Agenda Emulation Use Modes Veloce Overview ARM case study Conclusion 2 Veloce Emulation Use

More information

From MDD back to basic: Building DRE systems

From MDD back to basic: Building DRE systems From MDD back to basic: Building DRE systems, ENST MDx in software engineering Models are everywhere in engineering, and now in software engineering MD[A, D, E] aims at easing the construction of systems

More information

Multi-protocol controller for Industry 4.0

Multi-protocol controller for Industry 4.0 Multi-protocol controller for Industry 4.0 Andreas Schwope, Renesas Electronics Europe With the R-IN Engine architecture described in this article, a device can process both network communications and

More information

Embedded Systems. Information. TDDD93 Large-Scale Distributed Systems and Networks

Embedded Systems. Information. TDDD93 Large-Scale Distributed Systems and Networks TDDD93 Fö Embedded Systems - TDDD93 Fö Embedded Systems - 2 Information TDDD93 Large-Scale Distributed Systems and Networks Lectures on Lecture notes: available from the course page, latest 24 hours before

More information

Software Driven Verification at SoC Level. Perspec System Verifier Overview

Software Driven Verification at SoC Level. Perspec System Verifier Overview Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to

More information

Unlocking the Potential of Your Microcontroller

Unlocking the Potential of Your Microcontroller Unlocking the Potential of Your Microcontroller Ethan Wu Storming Robots, Branchburg NJ, USA Abstract. Many useful hardware features of advanced microcontrollers are often not utilized to their fullest

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

Integrating Concurrency Control and Energy Management in Device Drivers

Integrating Concurrency Control and Energy Management in Device Drivers Integrating Concurrency Control and Energy Management in Device Drivers Kevin Klues, Vlado Handziski, Chenyang Lu, Adam Wolisz, David Culler, David Gay, and Philip Levis Overview Concurrency Control: Concurrency

More information

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management

Outline. q Database integration & querying. q Peer-to-Peer data management q Stream data management q MapReduce-based distributed data management Outline n Introduction & architectural issues n Data distribution n Distributed query processing n Distributed query optimization n Distributed transactions & concurrency control n Distributed reliability

More information