CAP-OS: Operating System for Runtime Scheduling, Task Mapping and Resource Management on Reconfigurable Multiprocessor Architectures

Size: px
Start display at page:

Download "CAP-OS: Operating System for Runtime Scheduling, Task Mapping and Resource Management on Reconfigurable Multiprocessor Architectures"

Transcription

1 : Operating Sstem for Runtime Scheduling, Task Mapping and Resource Management on Reconfigurable Multiprocessor Architectures Diana Göhringer 1, Michael Hübner 2, Etienne Nguepi Zeutebouo 1, Jürgen Becker 2 Fraunhofer IOSB, German 1 ITIV, Karlsruhe Institute of Technolog (KIT) German 2 {dgoehringer, zeutebouo}@fom.fgan.de 1, {michael.huebner, becker}@kit.edu 2 Abstract Operating sstems traditionall handle the task scheduling of one or more application instances on a processor like hardware architecture. Novel runtime adaptive hardware eploits the dnamic reconfiguration on FPGAs, where hardware blocks are generated, started and terminated. This is similar to software tasks in well established operating sstem approaches. The hardware counterparts to the software tasks have to be transferred to the reconfigurable hardware via a configuration access port. This port enables the allocation of hardware blocks on the FPGA. Current reconfigurable hardware, like e.g. Xilin Virte 5 provide two internal configuration access ports (ICAPs), where onl one of these ports can be accessed at one point of time. In e.g. a multiprocessor sstem on an FPGA, it can happen that multiple instances tr to access these ports simultaneousl. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled b a special purpose operating sstem running on an embedded processor. This special purpose operating sstem, called CAP- OS (Configuration Access Port-Operating Sstem), which will be presented in this paper, supports the clients using the configuration port with the service of priorit-based access scheduling, hardware task mapping and resource management. Kewords- Operating Sstem, MPSoC, Reconfigurable Computing, FPGA, Scheduling, Task Mapping I. INTRODUCTION Scheduling of tasks within a given time frame and with respect to a required deadline due to real-time aspects is well known in computer science from operating sstems (OSs), especiall in real-time operating sstems (RTOSs). Scheduling strategies of conventional OSs var between preemptive and non-pre-emptive scheduling and are further classified e.g. between earliest deadline first or rate monotonic algorithm (see [1] for detailed descriptions). The classical scheduling and task mapping process of softwarebased sstems with a traditional OS has its counterpart in novel runtime reconfigurable hardware sstems. Within these sstems, tasks can be presented additionall to the traditional software representation, as phsical hardware realization e.g. on an FPGA. That means that a further degree of freedom for task mapping on hardware resources is available for the OS laer. For eample, compared to a task in a traditional software-based sstem that was mapped and eecuted on a resource as a software thread, the hardware reconfigurable variant of such a sstem would also allow running this task as a hardware block realized with logic resources on an FPGA. This difference and the new degree of freedom in task representation require the consideration of a novel concept for hardware task scheduling and mapping. In order to handle this process, a detailed analsis of the consequences e.g. due to data dependencies, priorit and real-time aspects has to be investigated in detail and formalized in a feasible algorithm for an efficient special purpose OS. Furthermore, the underling hardware resources, including the internal configuration access port (ICAP) have to be characterized in terms of timing, determinism, behavior in termination cases etc. Also, these results have to be accounted for in the special purpose OS approach b a cost function. The described investigation and the results can be eploited efficientl in the runtime adaptive Multiprocessor Sstem-on-Chip (RAMPSoC) approach as described in [2]. In this approach several processors, co-processors and hardware accelerators are available for concurrent task realization on an FPGA. The approach presented in this paper allows to schedule tasks of a control dataflow graph (CDG) and to map these tasks either in hardware or in software on a reconfigurable multicore hardware on the FPGA. The algorithm therefore considers data dependencies, phsical constraints from the configuration interface and the reconfigurable resources and additionall the abilit of the parallel data processing hardware of the RAMPSoC approach. The paper is organized as follows: Related work is presented in Section II. Section III describes briefl the RAMPSoC approach and its features. In Section IV the concept and the features of (Configuration Access Port-Operating Sstem) are described. Section V presents how is integrated into the RAMPSoC hardware architecture. A case stud and first results are presented in Section VI. Finall, the paper is closed b presenting the conclusions and an outlook in Section VII. II. RELATED WORK Scheduling for hardware reconfigurable architecture is used in approaches reported in different publications. The selected publications discussed in this paper are onl a subset of the numerous approaches developed in academic and industrial environment. However, the papers, which are references for the related work section, reflect the significant aspects in respect to the presented approach and allow an /10/$ IEEE

2 objective comparison of the benefits achieved in the proposed solution of the special purpose OS named CAP- OS. Dittmann et al. [3] describe a scheduling approach for a single processor and several accelerators, which can be configured at runtime. The solution provides a pre-emptive reconfiguration, which is important, if a task with a higher priorit has to substitute the configuration process of a lower prior task. The scheduling strateg is based on a deadline monotonic (DM) algorithm with some etensions related to the fact that a hardware / software reconfigurable sstem is targeted. The approach has some restrictions due to the fact that onl homogeneous shaped reconfigurable areas are supported. For this purpose, onl a fied (and non variant) time frame for reconfiguration of the hardware is considered in the algorithms. A further restriction is that data dependencies between the tasks are not considered within the scheduling algorithm. Furthermore, the approach requires drivers supporting the phsical reconfiguration of the FPGA. This certainl could be a standard ICAP driver with the related IP cores. Ullmann et al. [4] also targets, similar to the previousl described approach, a single processor solution with reconfigurable accelerators in a homogeneous shape and size. The scheduling is priorit-based and non-pre-emptive due to the fact, that this approach was developed for automotive applications, where a pre-emption of a certain tasks is not allowed. The reported runtime sstem in the paper includes the hardware drivers for the configuration access port. The runtime sstems included some features like contet load and save, which allows the resumption of tasks in hardware or software. ReconOS [5], uses an ecos real-time operating sstem as basis for the own solution. Also, here a single processor and reconfigurable accelerators loosel connected to the processor is the target hardware architecture. In comparison to the previousl described approach the authors use a fied priorit scheduling approach. For snchronization purposes, a communication method for the software and hardware threads over the ecos RTOS was developed. An interesting approach is that a task graph with dependent and independent tasks is used as input description for the scheduler. On the basis of the reported approaches described in the references as written above it is obvious, that a novel OS approach for a reconfigurable multiprocessor Sstem-on- Chip like RAMPSoC has to be introduced. One simple eample for this necessit amongst others is the fact that the reconfigurable regions are not longer homogeneous in their footprint and therefore the configuration times var between the different tasks, which ma have to be allocated to the hardware. This and other parameters have to be handled with the novel approach of the. III. THE RAMPSOC APPROACH The is used for runtime scheduling, task mapping and resource management on a RAMPSoC [2]. Fig. 1 shows an eample for a RAMPSoC architecture at one point in time. As can be seen, the RAMPSoC is a heterogeneous multiprocessor Sstem-on-Chip (MPSoC), consisting of a number of different processors connected over a communication infrastructure, which is a switchbased Network-on-Chip (NoC) in this eample. The processors can be etended with one or several hardware accelerators. Furthermore, also a Finite State Machine (FSM) together with a hardware function can be used instead of a processor, if desired. FPGA Virtual-I/O (Tpe 2) 1 2 (Tpe 1) FSM + Hardware Function (Tpe 1) 3 4 Figure 1. Eample of a RAMPSoC architecture at one point in time Dnamic and partial reconfiguration is used to adapt the hardware architecture of the RAMPSoC at runtime. The following runtime adaptations are supported b the RAMPSoC: Number and characteristics of processors Communication infrastructure (e.g. size, bandwidth, topolog) Number and functionalit of hardware accelerators Software for the processors. This wa, a good trade-off between performance, power consumption and area requirements can be achieved through runtime adaptation of the hardware architecture with respect to the needs of the applications. More details about the hardware architecture of the RAMPSoC and its benefits can be found in [2]. For an efficientl programming of such a fleible hardware architecture, an eas to use toolflow is required, which guides the user in partitioning the application at design time. It also generates the partial bitstreams for the several hardware modules. An overview of this toolflow can be found in [6]. These partial bitstreams together with the task graphs of the applications are required b the, which will be presented in detail in the net section. The is responsible for the runtime scheduling of the configurations of the different tasks, allocating the tasks to the processing elements and for resource management. Furthermore, the needs to respond to runtime demands of the application, such as one or several processors needing different accelerators. IV. CONCEPT OF THE For an adaptive MPSoC like RAMPSoC, a fleible RTOS is required, which schedules the reconfiguration of the tasks and their runtime allocation to a specific processing element. Furthermore, this RTOS has to assure that the

3 different applications meet their real-time requirements and that the utilization of the hardware resources and therefore the power consumption is kept low. Fig. 2 shows how the manages the underling RAMPSoC hardware architecture to fulfill the real-time requirements of the user applications. The further hides the compleit of the underling dnamic RAMPSoC architecture from the user. Figure 2. Abstraction Level Applications Task graphs Tasks from Bitstreams from User RAMPSoC for the tasks Runtime Resource Configuration Scheduling Allocation Management Xilkernel Thread Scheduling Hardware Drivers RAMPSoC Hardware Architecture processor ICAP s, accelerators FPGA Hardware Architecture LUTs, BRAM, DSP NoC, Bus, P2P, Memor I/Os embedded in the several abstraction laers of the sstem approach Resource allocation at runtime is done b partial and dnamic reconfiguration using the ICAP. Therefore, the scheduling algorithm has to consider the time required for reconfiguring a module, which depends on the data throughput of the ICAP interface and certainl on the size of the module. This time frame is not negligible since the data amount for hardware modules can be ver small, but also several hundred kilobtes. For each task, two different implementation options eist. A task can, either be eecuted in software on a processor or in hardware as a hardware accelerator. For the task implementation in software or in hardware different choices can eist, varing in size, performance and reconfiguration time. The scheduling algorithm has to choose the appropriate implementation tpe to fulfill the real-time constraints. Moreover, the presented scheduling approach tries to reuse eisting resources, which were alread configured onto the chip in a previous point of time, with the goal to reduce the overall reconfiguration overhead. Furthermore, the scheduling algorithm has to support pre-emptive reconfiguration, because while reconfiguring one task it can happen that a request for the reconfiguration of another task with higher priorit occurs. As onl one ICAP is available, the reconfiguration of the previous task has to be terminated and the new task needs to be reconfigured. After this, the reconfiguration of the interrupted task has to restart, because a continuation of the terminated reconfiguration is not supported b the FPGA vendor. This scheduling approach can handle both independent and dependent tasks. A group of interrelated tasks is called a task graph (TG). Each TG must fulfill the following requirements: The TG is a directed acclic graph (DAG) Each task runs on processors/hardware accelerators Each task has an identit (ID) Each task has the following information: o Neighborhood relation (predecessor/successor) o Algorithm tpe or hardware constraints (Algo- ID) o Eecution time, reconfiguration time o Communication costs The TG has a global deadline (D) The TG has either hard or soft real-time constraints, which are inherited b the tasks belonging to the TG For the configuration of a task the following two rules appl: It can be terminated It is onl feasible, after all predecessor tasks are completel reconfigured Fig. 3 shows an eample of such a TG including the global deadline, the interrelation and the communication costs. K 24 K 12 K 45 K 56 K 35 T6 K 13 T: Task D: Global Deadline K : Communication Costs between Task and Task Figure 3. Eample task graph with global deadline, interrelation and communication costs Within the, each task within a TG has a life ccle as shown in Fig. 4. Not_Read Read Config Eec Eit K 36 Figure 4. Life ccle states of a task Table 1 describes each of the states, which are traversed b a task during its life ccle, in detail. Table 1. Description of the life ccle states of a task Task States Not_read Read Config Eec Eit D Description This task is not read for reconfiguration, because its predecessors are not completel reconfigured. This task is read for reconfiguration and competes with the other Read task for the access to the ICAP. Onl tasks without predecessors, or whose predecessors have alread been reconfigured can enter this state. The task is under configuration via the ICAP onto the RAMPSoC. If a task with higher priorit becomes Read, the reconfiguration process is terminated, the task returns into the Read state and waits for a new possibilit to access the ICAP. After successful configuration the task starts eecution and enters this state. An eecution cannot be interrupted. After the eecution the task enters this state. The allocated processing element is now free for the net task. Important is here, if the configuration of a task is interrupted, the task returns into the Read state, the

4 configuration is lost and has to start all over again. As alread mentioned in the previous section, the multiprocessor model used for the scheduling is a heterogeneous runtime adaptive MPSoC that uses a message passing communication scheme. The runtime scheduling algorithm is onl performed for tasks, which are in state Read. The novel runtime scheduling approach is described in detail in the net subsection. A. The Novel Runtime Scheduling Approach The novel runtime scheduling algorithm is divided into two main steps. First, a static scheduling algorithm is used to roughl assign priorities to the tasks of each TG using the information given b the TG description. For this, the list scheduling algorithm is used, because it is a priorit-based static scheduling algorithm, which respects resource constraints. The available resources are the single ICAP and the maimum number of possible processors, which depends on the size of the chosen FPGA. First conservative estimates for the ASAP (As Soon As Possible) and the ALAP (As Late As Possible) start time for each task of a TG, consisting of m tasks, are calculated using the formulas: ASAP( T ) = ( t rec T pre( T ) ( T ) + t pre( T ) : Predeccessor of task T t t rec ee ee ( T )) ( T ) : Reconfiguration time of task T ( T ) : Eecution time of task T ALAP( T ) = D ( t succ( T rec T succ( T ) ) :Succcessor of task T D :Global deadline of the task graph µ(t ) : Mobilit of task ( T ) + t T ee µ T ) = ALAP( T ) ASAP( T ) ( ( T )) (1) f (2) f Based on the ASAP and ALAP start time of each task, a priorit can be assigned to each task in the TG using the urgenc or the mobilit of each task. The urgenc depends on the maimum number of successors of a task. The mobilit of a task (see Formula (3)) is the difference between its ALAP and ASAP start time and favors the tasks along the critical path. The TG in Fig. 5 has e.g. the following critical path: T6. Because of this, the mobilit is used here to assign the priorities to the tasks. The smaller the mobilit, the higher is the priorit of the task. At runtime, onl the Read tasks are scheduled for configuration according to their priorities, which have been calculated with the list scheduling algorithm. Fig. 5 shows such a TG, which is processed b the to schedule the reconfiguration of the different tasks. In the current time step, shown in Fig. 5, has alread been reconfigured and therefore and are now in the Read state. Normall, the task with the highest priorit will be reconfigured first. If there are two or more Read tasks and the difference between the mobilities of the two tasks with the highest priorit is smaller than the reconfiguration time of the task with the lower priorit (see Formula (4)) a dnamic cost function K(T ) (Formula (5)) is used to reassign the priorities of these two tasks. (3) T6 Current scheduling step Figure 5. Task graph to illustrate the functionalit of the scheduling T T T : Task is in state Eec : Task is in state Read : Task is in state Not_Read K(T ) considers the ratio between the mobilities of the two tasks K 1 (T,T ) (Formula (6)) and the ratio between the number of successors of the two tasks K 2 (T,T ) (Formula (7)). K(T ) is computed using Formula (5) to (7) and it is onl computed for the current two tasks with the highest priorit to be scheduled. T gets highest priorit if: µ ( T ) µ ( T ) > RT ( T ), µ ( T ) < µ ( T ) RT(T ) : Reconfiguration time of task Else decision is made using K(T ): K ( T ) > K( T ), T gets highest priorit K( T ) K( T ), T gets highest priorit K ( T ) = ω * K ( T, T ) + ω * K ( T, T ) ω,ω : Weighting factors µ ( T ) / µ ( T ), µ ( T ) < µ ( T ) µ ( T ) 0 K1( T, T ) = 0, else µ(t ) : Mobilit of task N ( T ) / N ( T ), N ( T ) > N ( T ) N ( T ) 0 K 2 ( T, T ) = 0, else N(T ) : Number of successors of task (4) f (5) f (6) f (7) f K 1 gets a greater weight in the cost function compared to K 2, because for real-time applications the eecution time is the most important factor. Therefore the default values were set to 0.6 for ω 1 and 0.4 for ω 2. These weights can be modified b the user depending on the requirements of the application. Additionall, multiple TGs can be scheduled at runtime. If some of these TGs have hard real-time and others onl soft real-time requirements, then all tasks of the TGs with the soft real-time constraints will be delaed. The will be reconfigured after the tasks with the hard real-time constraints, even though the might have a higher priorit according to the list scheduling algorithm. This is important, to assure, that the hard real-time TGs meet their constraints. Finall, an additional feature is supported b. This feature allows increasing the clock frequenc of a processing element at runtime b reconfiguring the corresponding digital clock manager (DCM). This reconfiguration is faster than reconfiguring a new hardware module and it is used to speed up the eecution time of a task. Hereb, it is assumed, that the eecution time stas in strong relation to the clock frequenc. This DCM

5 reconfiguration is used, if a task cannot complete within its ALAP time or, if another task urgentl requires the same processor. Therefore the single steps of the scheduling algorithm can be summarized as follows: (1) Calculate ASAP and ALAP start time for each task in the task graph (2) Calculate the mobilit of each task and schedule their priorities using a list scheduling algorithm (3) Select the Read tasks and schedule them dnamicall: a. dela tasks with soft real-time constraints b. reassign priorities using the cost function if necessar c. reconfigure the DCM, if necessar d. terminate the current reconfiguration, if a task with a higher priorit occurs This results in a pre-emptive scheduling approach, which allows the termination of a configuration. Furthermore, it uses a combination of static list scheduling and a novel dnamic scheduling approach. It considers resource constraints, such as a single ICAP or the maimal number of possible processors. Moreover, the clock frequenc of processing elements can be increased at runtime if necessar and the reconfiguration times as well as the communication costs between tasks are considered. B. Resource Allocation of the After the scheduling, the tries to allocate a resource for the Read task with the highest priorit. For the resource allocation, the decision is made as shown in Fig. 6. Blocked processor soon free? Yes No Wait for a processor to finish Yes blocked? No Yes New Task present? Space for reconfiguring a new processor? Figure 6. Decision tree for resource allocation First the analzes, if a processor is present and available on the reconfigurable hardware or not. If no processor is present, a new one is configured and allocated for the new task. If processors are present in the sstem, it searches for one, which is not blocked b another task. If all eisting processors are blocked, it is checked, if one of them will finish its eecution soon. This is important, because the reconfiguration and allocation takes an amount of time. If an eisting processor finishes in a shorter amount of time than the reconfiguration time of a new processor, the reuse of this No Allocate eisting No Yes Configure and allocate a new eisting processor is preferred. This also has the benefit to reduce the area utilization and therefore to reduce the overall power consumption. If none of the eisting processors will finish soon, it is analzed, if the maimal number of processors is reached or if there is still space to reconfigure a new processor. If there is space on the reconfigurable hardware, a new processor is reconfigured and allocated for the new task. If not, the new task has to wait, until one of the processors becomes available. C. Configuration Management After the Read task with the highest priorit has been successfull assigned to a processor, this task is handed over to the configuration management. The configuration management is responsible for handling the configuration of the tasks via the ICAP. It is also responsible for pre-empting a current configuration, if another task with higher priorit needs to be reconfigured. As mentioned before, a terminated configuration has to restart again from the beginning, because Xilin FPGAs do not support the continuation of a terminated configuration so far. Therefore, the configuration management of the distinguishes between two tpes of configurations as shown in Table 2. Table 2. Configuration tpes Configuration Tpe Soft Features Interruptible until 80% of the bitstream are reconfigured Elements Software, Hard Not interruptible, DCM The term soft means an interruptible and hard means a non-interruptible configuration. Soft configuration tpes are e.g. the configuration of software tasks or hardware accelerators for eisting processors. As soon as 80% of the corresponding bitstream of a soft configuration tpe is configured, this element changes to be a hard configuration tpe. The reason is to prevent the termination of a nearl finished configuration, because the alread configured data would be lost. Other eamples of hard configuration tpes are the configuration of the DCMs and of the processors, because the configuration of a DCM is urgent and fast and the processor is far less task specific than an accelerator. D. Communication Establishment between Tasks After successfull configuring a task, the tries to establish a communication with this task and to transfer information about the IDs of the communication partners to it. Fig. 7 illustrates the required steps, to successfull establish a communication between the different tasks at runtime z 1 2 : Snc 3 : Task Info 4 : Task ID 5 : End Figure 7. Runtime communication establishment steps between different tasks.

6 The five runtime communication establishment steps required after a task has been mapped onto a processor are: (1) sends snc word to processor (2) responds with the same snc word to ensure a correct communication (3) sends task info (Task ID, number of predecessor/successor tasks and their IDs) to processor. This task info is required b the task to find its communication partners at runtime. (4) sends its Task ID to all other processors and it checks each of its communication links for the Task ID of its communication partners. It has to send its Task ID to all other processors, because it could happen, that a predecessor and a successor will be mapped onto the same processor. An eample for such a case will be given in Section VI. (5) After eecution, processor informs that it is now free for a new task. V. INTEGRATION OF ON RAMPSOC is integrated into a RAMPSoC b implementing it in software on one of the microprocessors. On the selected microprocessor, a state-of-the-art RTOS with multithreading capabilities is implemented. On top of this RTOS, the CAP- OS is implemented using different threads for the different functionalities. As shown in Fig. 8 this microprocessor is directl connected with the Xilin ICAP primitive and with an eternal memor, in which the partial bitstreams of the tasks are stored. User applications FPGA Eternal Memor +RTOS +Microprocessor ICAP Virtual-I/O (Tpe 2) Figure 8. Integration of the on the RAMPSoC The microprocessor is connected with the other processors in this eample over a switched-based NoC, but a Point-to-Point connection with each of the other partners or a connection over a different NoC is also supported. Several possible choices for an on-chip microprocessor eist. As processor running the, the IBM PowerPC 405 (PPC405) [7] was chosen. It is available on Xilin Virte- 4FX FPGAs as a hard core IP. The main reasons for choosing the PPC405 are the support of high frequencies up to 450 MHz and the availabilit on the Virte-4FX100 FPGA on the used target FPGA board from Alpha-Data [8]. High frequencies are important to eecute the fast 1 2 (Tpe 1) FSM + Hardware Function (Tpe 1) 3 4 and to support the real-time requirements. Other possible microprocessors would be soft core IPs, such as Xilin MicroBlaze or Leon SPARC, but the lack the support of such high frequencies. After selecting the processor, an appropriate RTOS was chosen. The demands for the RTOS are: support of PPC405 and well tested multithreading capabilities small memor footprint Several different RTOS eist, but due to the reasons above, the Xilkernel [9] from Xilin was selected. The CAP- OS is programmed in C and its functionalities are implemented in several different threads, which are eecuted in Xilkernel using multithreading. For scheduling the different threads, Xilkernel offers two policies: round robin or priorit-based scheduling. Priorit-based scheduling was chosen, to eecute the different threads according to their priorities. Furthermore, the PowerPC is directl connected to the ICAP primitive and to an eternal memor (DDR2 SDRAM), in which the bitstreams are stored. The and Xilkernel are eecuted using on-chip memor for maimum performance. In the following subsection the implementation of the different threads are described in detail. A. Implementation of the The is programmed using si threads as shown in Table 3. Table 3. Realized threads of the Thread Priorit Description Test_main 0 Initial thread. Launches the other five threads. Init_proc 1 Generates a list containing all possible processors and their attributes. Eecutes onl once. Task_graph 2 Initialization of the tasks and generation of the task graphs. Calculation of ALAP and ASAP start time and the mobilit of each task. Matching of tasks with equal requirements (HW constraints, same algorithm) Schedule 3 Scheduling of the Read tasks and processor allocation. Configure 3 Configuration management for the scheduled and allocated task and communication establishment between the new configured task and its neighbors. Contr_Eit _Task 3 Controls the eecuting tasks. If a task finishes eecution the occupied processing element is freed. A lower priorit number means a higher priorit. Test_main is the startup thread and has a fied priorit. The priorities of the other five threads can change at runtime depending on the demands of the applications. The three threads with priorit level 3 (Schedule, Configure and Contr_Eit_Task) compete against each other, after the first three threads with higher priorit have finished eecuting. While the other threads onl eecute in the beginning once,

7 these three concurring threads eecute until the last task finishes eecuting. VI. CASE STUDY AND RESULTS The correct functionalit of the was evaluated b implementing a RAMPSoC sstem on the target Alpha- Data FPGA board. The was implemented using one of the available PPC405s and the Xilkernel RTOS. The maimum number of reconfigurable processors was set to four, to be below the number of tasks within our evaluation task graphs. As the target Virte-4FX 100 FPGA is quite big, a higher number of processors could be used, if necessar. For the reconfigurable processors the Xilin MicroBlaze (µblaze) [10] was chosen, due to its small area footprint and the good compatibilit to the PPC405. As shown in Fig. 9, the Fast Simple Links (FSLs) [11] are used for communication between the processors. The offer a FIFObased unidirectional communication and for the limited number of processors a NoC is not required. The PPC405 can be connected via FSL to 32 partners, while each µblaze could be connected to 16 partners. Eternal Memor DDR2 SDRAM FPGA +Xilkernel + PPC405 XPS-ICAP µblaze2 µblaze3 µblaze1 µblaze0 Static Region Dnamic Reconfigurable Region PCI : FSL Point-to-Point connections between the µblazes : FSL Point-to-Point connections between the PPC405 and the µblazes : PLB-Bus : Communication between User and over RS232 Figure 9. Implemented RAMPSoC sstem Additionall, the XPS-ICAP IP core from Xilin together with an eternal DDR2 SDRAM is connected via the PLBbus to the PPC405. The user communicates via RS232 with the. For the test dnamic and partial reconfiguration was not used, because the scope was to verif the and not the ICAP primitive. Instead of sending the partial bitstreams to the ICAP core, a counter within the Configure thread was used, to simulate the reconfiguration times of the different tasks. For reconfiguring a whole processor 5 ms, and for reconfiguring a software task onto an eisting processor 2 ms were assumed. These reconfiguration times are worst case scenarios. Software is assumed to be transferred via the ICAP core to the BRAMs of the corresponding processor like it was shown for eample in [12]. The reconfiguration times could also be reduced b using an ICAP with a direct DMA-access to eternal memor, such as presented in [13]. At sstem startup, it is assumed that onl the static part is present and the other processors are reconfigured ondemand. Phsicall the sstem, as shown in Fig. 9, was present from the beginning and after the simulation of the reconfiguration time is finished the corresponding processor is activated. To verif the functionalit of and the implemented sstem the two TGs of Fig. 10 are used. TG1 has hard real-time constraints. This could be e.g. an image processing application, which receives the images from a camera and has to present the results to the user via a monitor in real-time. Therefore the global deadline (D 1 ) of TG1 is 40ms using a camera with a frame rate of 25 Hz. If this deadline is missed, frames will be lost. TG2 is a soft real-time application, whose global deadline (D 2 ) can be missed, without causing problems. D 2 is set to 50 ms here. Task Graph 1: Hard Real-Time K 12 K 25 K 13 K 35 K 14 K 45 D 1 Task Graph 2: Soft Real-Time Task Description Algo-ID,, T6, T8 Same hardware requirements: e.g. receive/send data via PCI 0, T7 Same algorithm: e.g. same image processing filter 1 Different algorithm: e.g. different image processing filter 2 Different algorithm: e.g. different image processing filter 3 Figure 10. Two task graphs for the evaluation: D 1 = 40ms, D 2 = 50ms To measure the timing overhead, the was eecuted on the FPGA using TG1. To test, if the correctl reuses eisting resources, the two tasks and were set to have the same algorithm (same Algo-ID) as shown in Fig. 10. During the eecution on the FPGA the number of clock ccles, required per call b each thread, were measured. The results for the timing overhead provided b the are shown in Table 4. Table 4. Timing overhead of for processing TG1 Thread K 67 K 78 T6 T7 T8 D 2 Average number of clock clces per call Init_proc 2118 Task_graph 9022 Schedule 650 Contr_Eit_Task 227 The clock ccles of the Configure thread depend on the size of the bitstream and on the speed of the ICAP primitive. Therefore, the are not given here. Test_main onl launches the other five threads, but itself does not produce timing overhead and is therefore also not mentioned here. Of course, Init_proc depends on the number of processors (here four) and Task_graph depends on the TG (here TG1 with five tasks). Therefore, these numbers are just an eample for the given TG. The clock ccles required for the Schedule thread depends on the compleit of the scheduling. E.g. the increase slightl, if the cost function needs to be evaluated for two tasks. Contr_Eit_Task is ver stable.

8 a) P3 P2 P1 P0 C T6 T6 T8 T7 T8 P3 P2 P1 P0 With this eample it can be shown that worked correctl and assigned the tasks of TG1 without violating the global deadline. Also, the resource reuse worked correctl. was allocated onto the same processor as, because the have the same algorithm and this wa the reconfiguration time could be saved. Finall, a case stud using image processing tasks within the task graphs TG1 and TG2, was done. The eecution times for the single tasks were measured on a single µblaze. Fig. 11 shows the calculated results of and compares them against the ones, calculated using the scheduling approach of Dittmann et al. [3]. In these results the scheduling overhead is not included, because the overhead of the approach of Dittmann et al. was not known. Here, it was assumed that also the approach of Dittmann et.al. can differentiate between a SW and a HW reconfiguration, and therefore reuse eisting processors, which is not the case in [3]. VII. CONCLUSIONS AND OUTLOOK In this paper the concept and the features of a special purpose OS called were presented. The is responsible for the scheduling, the resource allocation and reconfiguration and for managing the access to the configuration access port. The has been integrated into the RAMPSoC approach to handle the runtime organization for the adaptive RAMPSoC hardware architecture. The was implemented using si threads on the Xilkernel RTOS running on a PPC405. The correct functionalit and the timing overheads of the CAP- OS were measured on the FPGA using an eemplaril TG. The benefits of the were shown using a case stud with two TGs and comparing the results against the scheduling approach of Dittmann et al. [3]. Future work will be the etension of the to support the reconfiguration of the communication infrastructure. Furthermore, it will be etended to handle not onl the demands of the user, but also the reconfiguration demands of the other processors within the RAMPSoC. These demands are mainl the reconfiguration of the accelerators, if at runtime for eample a different accelerator Time/ms Method b) misses deadline b) Dittmann et al. [3] T6 T6 T8 T Time/ms Solution Eecution Time Results T : Reconfiguration time of task Dittmann et al. [3] > MHz - Real-Time (TG1<40 ms, TG2>50ms) + Resources T : Eecution time of task < MHz + Real-Time C : Reconfiguration time for a DCM (DCM of P1 was reconfigured for 150 MHz) + Resources P : Figure 11. Theoretical results of and Dittmann et al. [3] 1 T7 T7 is required depending on the currentl processed data. Furthermore, the will be further evaluated and will be also tested using real dnamic and partial reconfiguration. Additional etensions of will be the support of merging several bitstreams and supporting bitstream relocation. Bitstream relocation is important to reduce the amount of required eternal memor for storing each bitstream for each possible location. REFERENCES [1] J. Blazewicz, K.H. Ecker, E. Pesch,G. Schmidt, J. Weglarz: Scheduling Computer and Manufacturing Processes ; Berlin (Springer) 2001, ISBN [2] D. Göhringer, M. Hübner, V. Schatz, J. Becker: Runtime Adaptive Multi- Sstem-on-Chip: RAMPSoC ; In Proc. of RAW 2008 at the IPDPS 2008, April [3] F. Dittman, S. Frank: Hard Real-Time Reconfiguration Port Scheduling ; In Proc. of DATE 2007, p , April [4] M. Ullmann, M. Hübner, B. Grimm, J. Becker: On-Demand FPGA Run-Time Sstem for Dnamical Reconfiguration with Adaptive Priorities ; In Proc. of FPL 2004, pp , August [5] E. Lübbers, M. Platzer: ReconOS: An RTOS supporting Hard- and Software Threads ; In Proc. of FPL 2007, August [6] D. Göhringer, M. Hübner, T. Perschke, J. Becker: New Dimensions for Multiprocessor Architectures: On Demand Heterogeneit, Infrastructure and Performance through Reconfigurabilit: The RAMPSoC Approach ; In Proc. of FPL 2008, pp , Sept [7] PowerPC Reference Guide ; UG011 (v.1.2), Jan.19, Available at [8] Alpha Data: [9] Xilkernel v3_00_a ; EDK 9.1i, December 12, Available at [10] MicroBlaze Reference Guide, Embedded Development Kit, EDK 9.2i, UG081 (v8.1). Available at [11] Fast Simple Link (FSL) Bus (v2.11a) ; DS449, June 25, Available at [12] O. Sander, L. Braun, M. Huebner, J. Becker: Data Reallocation b Eploiting FPGA Configuration Mechanisms ; In Proc of ARC 2008, Springer Volume 4943/2008, March [13] C. Claus, B. Zhang, W. Stechele, L. Braun, M. Hübner, J. Becker: A multi-platform controller allowing for maimum dnamic partial reconfiguration throughput ; In Proc. of FPL 2008, Sept

9 Year: 2010 Author(s): Göhringer, D.; Hübner, M.; Zeutebouo, E.N.; Becker, J. Title: : Operating sstem for runtime scheduling, task mapping and resource management on reconfigurable multiprocessor architectures DOI: /IPDPSW ( IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse an coprighted component of this work in other works must be obtained from the IEEE. Details: Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Societ: IEEE International Smposium on Parallel & Distributed Processing Workshops and Phd Forum, IPDPSW Vol.1 : Atlanta, Georgia, USA, April 2010 Piscatawa/NJ: IEEE, 2010 ISBN: ISBN: ISBN: pp

Fast dynamic and partial reconfiguration Data Path

Fast dynamic and partial reconfiguration Data Path Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,

More information

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo

Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Lo Cost-and Power Optimized FPGA based System Integration: Methodologies and Integration of a Low-Power Capacity- based Measurement Application on Xilinx FPGAs Abstract The application of Field Programmable

More information

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor

Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Efficiency and memory footprint of Xilkernel for the Microblaze soft processor Dariusz Caban, Institute of Informatics, Gliwice, Poland - June 18, 2014 The use of a real-time multitasking kernel simplifies

More information

ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware

ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn {enno.luebbers, platzner}@upb.de Outline

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

A Distributed Real-Time Operating System with Distributed Shared Memory for Embedded Control Systems

A Distributed Real-Time Operating System with Distributed Shared Memory for Embedded Control Systems 13 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing A Distributed Real- Operating Sstem with Distributed Memor for Embedded Control Sstems Takahiro Chiba, Mungrun Yoo and

More information

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB. Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics

More information

RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION

RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION RUN-TIME PARTIAL RECONFIGURATION SPEED INVESTIGATION AND ARCHITECTURAL DESIGN SPACE EXPLORATION Ming Liu, Wolfgang Kuehn, Zhonghai Lu, Axel Jantsch II. Physics Institute Dept. of Electronic, Computer and

More information

Memory-efficient and fast run-time reconfiguration of regularly structured designs

Memory-efficient and fast run-time reconfiguration of regularly structured designs Memory-efficient and fast run-time reconfiguration of regularly structured designs Brahim Al Farisi, Karel Heyse, Karel Bruneel and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

RTOS Based Priority Dynamic Scheduling for Power Applications through DMA Peripherals

RTOS Based Priority Dynamic Scheduling for Power Applications through DMA Peripherals RTOS Based Priority Dynamic Scheduling for Power Applications through DMA Peripherals Srikanth.K #1, Narayanaraju Samunuri *2 # M.Tech & VLSI-ES & Department of ECE & JNTU-Hyderabad H-No. 9-30/4, S.V.Nagar,

More information

Efficient Embedded Runtime Systems through Port Communication Optimization

Efficient Embedded Runtime Systems through Port Communication Optimization 13th IEEE International Conference on Engineering of Complex Computer Sstems Efficient Embedded Runtime Sstems through Port Communication Optimization Peter H. Feiler Software Engineering Institute, Carnegie

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

A Circle Detection Method Based on Optimal Parameter Statistics in Embedded Vision

A Circle Detection Method Based on Optimal Parameter Statistics in Embedded Vision A Circle Detection Method Based on Optimal Parameter Statistics in Embedded Vision Xiaofeng Lu,, Xiangwei Li, Sumin Shen, Kang He, and Songu Yu Shanghai Ke Laborator of Digital Media Processing and Transmissions

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

Chain Pattern Scheduling for nested loops

Chain Pattern Scheduling for nested loops Chain Pattern Scheduling for nested loops Florina Ciorba, Theodore Andronikos and George Papakonstantinou Computing Sstems Laborator, Computer Science Division, Department of Electrical and Computer Engineering,

More information

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system 26th July 2005 Alberto Donato donato@elet.polimi.it Relatore: Prof. Fabrizio Ferrandi Correlatore:

More information

A Coprocessor for Accelerating Visual Information Processing

A Coprocessor for Accelerating Visual Information Processing A Coprocessor for Accelerating Visual Information Processing W. Stechele*), L. Alvado Cárcel**), S. Herrmann*), J. Lidón Simón**) *) Technische Universität München **) Universidad Politecnica de Valencia

More information

Multimedia Systems 2011/2012

Multimedia Systems 2011/2012 Multimedia Systems 2011/2012 System Architecture Prof. Dr. Paul Müller University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de Sitemap 2 Hardware

More information

Performance Comparison of AODV and Soft AODV Routing Protocol

Performance Comparison of AODV and Soft AODV Routing Protocol World Academ of Science, Engineering and Technolog Performance Comparison of AODV and Soft AODV Routing Protocol Abhishek, Seema Devi, Joti Ohri International Science Inde, Computer and Information Engineering

More information

Simple example. Analysis of programs with pointers. Program model. Points-to relation

Simple example. Analysis of programs with pointers. Program model. Points-to relation Simple eample Analsis of programs with pointers := 5 ptr := & *ptr := 9 := program S1 S2 S3 S4 What are the defs and uses of in this program? Problem: just looking at variable names will not give ou the

More information

MULTI-OBJECTIVE OPTIMIZATION AND HEURISTIC APPROACHES FOR SOLVING SCHEDULING PROBLEMS. Gyula Kulcsár, Ferenc Erdélyi and Olivér Hormyák

MULTI-OBJECTIVE OPTIMIZATION AND HEURISTIC APPROACHES FOR SOLVING SCHEDULING PROBLEMS. Gyula Kulcsár, Ferenc Erdélyi and Olivér Hormyák MULTI-OBJECTIVE OPTIMIZATION AND HEURISTIC APPROACHES FOR SOLVING SCHEDULING PROBLEMS Gula Kulcsár, Ferenc Erdéli and Olivér Hormák Universit of Miskolc, Department of Information Engineering Egetem Road,

More information

Lecture 7: Introduction to Co-synthesis Algorithms

Lecture 7: Introduction to Co-synthesis Algorithms Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today

More information

A Hierarchical Multiprocessor Scheduling System for DSP Applications

A Hierarchical Multiprocessor Scheduling System for DSP Applications Presented at the Twent-Ninth Annual Asilomar Conference on Signals, Sstems, and Computers - October 1995 A Hierarchical Multiprocessor Scheduling Sstem for DSP Applications José Luis Pino, Shuvra S Bhattachara

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

Thermo vision system with embedded digital signal processor for real time objects detection

Thermo vision system with embedded digital signal processor for real time objects detection Thermo vision sstem with embedded digital signal processor for real time objects detection Snejana Pleshova, Aleander Beiarsi, Department of Telecommunications Technical Universit Kliment Ohridsi, 8 Sofia

More information

NUMERICAL PERFORMANCE OF COMPACT FOURTH ORDER FORMULATION OF THE NAVIER-STOKES EQUATIONS

NUMERICAL PERFORMANCE OF COMPACT FOURTH ORDER FORMULATION OF THE NAVIER-STOKES EQUATIONS Published in : Communications in Numerical Methods in Engineering (008 Commun.Numer.Meth.Engng. 008; Vol : pp 003-019 NUMERICAL PERFORMANCE OF COMPACT FOURTH ORDER FORMULATION OF THE NAVIER-STOKES EQUATIONS

More information

Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller

Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller Interrupt Service Threads - A New Approach to Handle Multiple Hard Real-Time Events on a Multithreaded Microcontroller U. Brinkschulte, C. Krakowski J. Kreuzinger, Th. Ungerer Institute of Process Control,

More information

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor

PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor PS2 VGA Peripheral Based Arithmetic Application Using Micro Blaze Processor K.Rani Rudramma 1, B.Murali Krihna 2 1 Assosiate Professor,Dept of E.C.E, Lakireddy Bali Reddy Engineering College, Mylavaram

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

A New Concept on Automatic Parking of an Electric Vehicle

A New Concept on Automatic Parking of an Electric Vehicle A New Concept on Automatic Parking of an Electric Vehicle C. CAMUS P. COELHO J.C. QUADRADO Instituto Superior de Engenharia de Lisboa Rua Conselheiro Emídio Navarro PORTUGAL Abstract: - A solution to perform

More information

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 5. Graph sketching

Roberto s Notes on Differential Calculus Chapter 8: Graphical analysis Section 5. Graph sketching Roberto s Notes on Differential Calculus Chapter 8: Graphical analsis Section 5 Graph sketching What ou need to know alread: How to compute and interpret limits How to perform first and second derivative

More information

INFORMATION CODING AND NEURAL COMPUTING

INFORMATION CODING AND NEURAL COMPUTING INFORATION CODING AND NEURAL COPUTING J. Pedro Neto 1, Hava T. Siegelmann 2, and J. Féli Costa 1 jpn@di.fc.ul.pt, iehava@ie.technion.ac.il, and fgc@di.fc.ul.pt 1 Faculdade de Ciências da Universidade de

More information

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj

SECURE PARTIAL RECONFIGURATION OF FPGAs. Amir S. Zeineddini Kris Gaj SECURE PARTIAL RECONFIGURATION OF FPGAs Amir S. Zeineddini Kris Gaj Outline FPGAs Security Our scheme Implementation approach Experimental results Conclusions FPGAs SECURITY SRAM FPGA Security Designer/Vendor

More information

A DYNAMICALLY RECONFIGURABLE PARALLEL PIXEL PROCESSING SYSTEM. Daniel Llamocca, Marios Pattichis, and Alonzo Vera

A DYNAMICALLY RECONFIGURABLE PARALLEL PIXEL PROCESSING SYSTEM. Daniel Llamocca, Marios Pattichis, and Alonzo Vera A DYNAMICALLY RECONFIGURABLE PARALLEL PIXEL PROCESSING SYSTEM Daniel Llamocca, Marios Pattichis, and Alonzo Vera Electrical and Computer Engineering Department The University of New Mexico, Albuquerque,

More information

ReconOS: An RTOS Supporting Hardware and Software Threads

ReconOS: An RTOS Supporting Hardware and Software Threads ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming

More information

Deadlock-Free Adaptive Routing in Meshes Based on Cost-Effective Deadlock Avoidance Schemes

Deadlock-Free Adaptive Routing in Meshes Based on Cost-Effective Deadlock Avoidance Schemes Deadlock-Free Adaptive Routing in Meshes Based on Cost-Effective Deadlock Avoidance Schemes Dong Xiang Yueli Zhang Yi Pan Jie Wu School of Software Tsinghua Universit Beijing 184, China School of Software

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors Hardware/Software Introduction Chapter 2: Custom single-purpose processors Outline Introduction Combinational logic Sequential logic Custom single-purpose processor design RT-level custom single-purpose

More information

SCope: Efficient HdS simulation for MpSoC with NoC

SCope: Efficient HdS simulation for MpSoC with NoC SCope: Efficient HdS simulation for MpSoC with NoC Eugenio Villar Héctor Posadas University of Cantabria Marcos Martínez DS2 Motivation The microprocessor will be the NAND gate of the integrated systems

More information

From Temporal Partitioning and Temporal Placement to Algorithmic Skeletons

From Temporal Partitioning and Temporal Placement to Algorithmic Skeletons From Temporal Partitioning and Temporal Placement to Algorithmic Skeletons Florian Dittmann, Franz J. Rammig Heinz Nixdorf Institute University of Paderborn, Germany Motivation Making reconfigurable computing

More information

Supporting the Linux Operating System on the MOLEN Processor Prototype

Supporting the Linux Operating System on the MOLEN Processor Prototype 1 Supporting the Linux Operating System on the MOLEN Processor Prototype Filipa Duarte, Bas Breijer and Stephan Wong Computer Engineering Delft University of Technology F.Duarte@ce.et.tudelft.nl, Bas@zeelandnet.nl,

More information

Implementation of Ethernet, Aurora and their Integrated module for High Speed Serial Data Transmission using Xilinx EDK on Virtex-5 FPGA

Implementation of Ethernet, Aurora and their Integrated module for High Speed Serial Data Transmission using Xilinx EDK on Virtex-5 FPGA Implementation of Ethernet, Aurora and their Integrated module for High Speed Serial Data Transmission using Xilinx EDK on Virtex-5 FPGA Chaitanya Kumar N.V.N.S 1, Mir Mohammed Ali 2 1, 2 Mahaveer Institute

More information

The Design And Experimental Study Of A Kind of Speech Instruction. Control System Prototype of Manned Spacecraft

The Design And Experimental Study Of A Kind of Speech Instruction. Control System Prototype of Manned Spacecraft The Design And Eperimental Stud Of A Kind of Speech Instruction Control Sstem Prototpe of Manned Spacecraft Hao Zhai Xiaolin Yang Jianhua Yang LanZhou Institute of Phsics BOX 94, Lanzhou, P.R. China, 730000

More information

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de

More information

Introduction to Real-Time Systems ECE 397-1

Introduction to Real-Time Systems ECE 397-1 Introduction to Real-Time Systems ECE 97-1 Northwestern University Department of Computer Science Department of Electrical and Computer Engineering Teachers: Robert Dick Peter Dinda Office: L477 Tech 8,

More information

VALLIAMMAI ENGINEERING COLLEGE

VALLIAMMAI ENGINEERING COLLEGE VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur 603 203 DEPARTMENT OF ELECTRONICS AND INSTRUMENTATION ENGINEERING QUESTION BANK VI SEMESTER EE6602 EMBEDDED SYSTEMS Regulation 2013 Academic Year

More information

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency

Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Dynamic Partial Reconfiguration of FPGA for SEU Mitigation and Area Efficiency Vijay G. Savani, Akash I. Mecwan, N. P. Gajjar Institute of Technology, Nirma University vijay.savani@nirmauni.ac.in, akash.mecwan@nirmauni.ac.in,

More information

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors

Embedded Systems Design: A Unified Hardware/Software Introduction. Chapter 2: Custom single-purpose processors Hardware/Software Introduction Chapter 2: Custom single-purpose processors Outline Introduction Combinational logic Sequential logic Custom single-purpose processor design RT-level custom single-purpose

More information

Chapter IV: Network Layer

Chapter IV: Network Layer Chapter IV: Network Laer UG3 Computer Communications & Networks (COMN) Mungjin Lee mungjin.lee@ed.ac.uk Slides copright of Kurose and Ross IP addresses: how to get one? Q: How does a host get IP address?

More information

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation

Introduction to Electronic Design Automation. Model of Computation. Model of Computation. Model of Computation Introduction to Electronic Design Automation Model of Computation Jie-Hong Roland Jiang 江介宏 Department of Electrical Engineering National Taiwan University Spring 03 Model of Computation In system design,

More information

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration Marie Nguyen Carnegie Mellon University Pittsburgh, Pennsylvania James C. Hoe Carnegie Mellon University Pittsburgh,

More information

Comparison of scheduling in RTLinux and QNX. Andreas Lindqvist, Tommy Persson,

Comparison of scheduling in RTLinux and QNX. Andreas Lindqvist, Tommy Persson, Comparison of scheduling in RTLinux and QNX Andreas Lindqvist, andli299@student.liu.se Tommy Persson, tompe015@student.liu.se 19 November 2006 Abstract The purpose of this report was to learn more about

More information

KINEMATICS STUDY AND WORKING SIMULATION OF THE SELF- ERECTION MECHANISM OF A SELF-ERECTING TOWER CRANE, USING NUMERICAL AND ANALYTICAL METHODS

KINEMATICS STUDY AND WORKING SIMULATION OF THE SELF- ERECTION MECHANISM OF A SELF-ERECTING TOWER CRANE, USING NUMERICAL AND ANALYTICAL METHODS The rd International Conference on Computational Mechanics and Virtual Engineering COMEC 9 9 OCTOBER 9, Brasov, Romania KINEMATICS STUY AN WORKING SIMULATION OF THE SELF- ERECTION MECHANISM OF A SELF-ERECTING

More information

Acceleration in the Wild, with Data Flow Computing

Acceleration in the Wild, with Data Flow Computing Acceleration in the Wild, with Data Flow Computing James Spooner, VP of Acceleration QCon, Finance Track, 08 March 2012 Acceleration in the Wild with Data Flow Deliberate, focused approach to improving

More information

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner

RiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01

Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01 Adapting Systems by Evolving Hardware Computer Systems Colloquium (EE380) Wednesday, 4:15-5:30PM 5:30PM in Gates B01 Jim Torresen Group Department of Informatics University of Oslo, Norway E-mail: jimtoer@ifi.uio.no

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Fall 2017 Lecture 10 Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 Chapter 6: CPU Scheduling Basic Concepts

More information

BDD Representation for Incompletely Specified Multiple-Output Logic Functions and Its Applications to Functional Decomposition

BDD Representation for Incompletely Specified Multiple-Output Logic Functions and Its Applications to Functional Decomposition BDD Representation for Incompletel Specified Multiple-Output Logic Functions and Its Applications to Functional Decomposition. Tsutomu Sasao and Munehiro Matsuura Department of Computer Science and Electronics,

More information

Introduction to Partial Reconfiguration Methodology

Introduction to Partial Reconfiguration Methodology Methodology This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Define Partial Reconfiguration technology List common applications

More information

Time-Multiplexed Multiple-Constant Multiplication

Time-Multiplexed Multiple-Constant Multiplication Time-Multipleed Multiple-Constant Multiplication Peter Tummeltshammer, Student Member, IEEE, James C. Hoe, Member, IEEE, and Markus Püschel, Senior Member, IEEE Abstract This paper studies area-efficient

More information

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition

Chapter 6: CPU Scheduling. Operating System Concepts 9 th Edition Chapter 6: CPU Scheduling Silberschatz, Galvin and Gagne 2013 Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Thread Scheduling Multiple-Processor Scheduling Real-Time

More information

EFFICIENT AUTOMATED SYNTHESIS, PROGRAMING, AND IMPLEMENTATION OF MULTI-PROCESSOR PLATFORMS ON FPGA CHIPS. Hristo Nikolov Todor Stefanov Ed Deprettere

EFFICIENT AUTOMATED SYNTHESIS, PROGRAMING, AND IMPLEMENTATION OF MULTI-PROCESSOR PLATFORMS ON FPGA CHIPS. Hristo Nikolov Todor Stefanov Ed Deprettere EFFICIENT AUTOMATED SYNTHESIS, PROGRAMING, AND IMPLEMENTATION OF MULTI-PROCESSOR PLATFORMS ON FPGA CHIPS Hristo Nikolov Todor Stefanov Ed Deprettere Leiden Embedded Research Center Leiden Institute of

More information

Dynamically Reconfigurable Coprocessors in FPGA-based Embedded Systems

Dynamically Reconfigurable Coprocessors in FPGA-based Embedded Systems Dynamically Reconfigurable Coprocessors in PGA-based Embedded Systems Ph.D. Thesis March, 2006 Student: Ivan Gonzalez Director: ranciso J. Gomez Ivan.Gonzalez@uam.es 1 Agenda Motivation and Thesis Goal

More information

The Structure of Boolean Neuron for the Optimal Mapping to FPGAs

The Structure of Boolean Neuron for the Optimal Mapping to FPGAs The Structure of Boolean Neuron for the Optimal Mapping to FPGAs Roman Kohut, Bernd Steinbach Abstract - In this paper, we present a new tpe of neuron, called Boolean neuron that ma be mapped directl to

More information

Homework index. Processing resource description. Goals for lecture. Communication resource description. Graph extensions. Problem definition

Homework index. Processing resource description. Goals for lecture. Communication resource description. Graph extensions. Problem definition Introduction to Real-Time Systems ECE 97-1 Homework index 1 Reading assignment.............. 4 Northwestern University Department of Computer Science Department of Electrical and Computer Engineering Teachers:

More information

Chapter 3 Part 1 Combinational Logic Design

Chapter 3 Part 1 Combinational Logic Design Universit of Wisconsin - Madison EE/omp Sci 352 igital Sstems undamentals Kewal K. Saluja and Yu Hen Hu Spring 22 hapter 3 Part ombinational Logic esign Originals b: harles R. Kime and Tom Kamisnski Modified

More information

Concurrent Programming. Implementation Alternatives. Content. Real-Time Systems, Lecture 2. Historical Implementation Alternatives.

Concurrent Programming. Implementation Alternatives. Content. Real-Time Systems, Lecture 2. Historical Implementation Alternatives. Content Concurrent Programming Real-Time Systems, Lecture 2 [Real-Time Control System: Chapter 3] 1. Implementation Alternatives Martina Maggio 19 January 2017 Lund University, Department of Automatic

More information

Concurrent Programming

Concurrent Programming Concurrent Programming Real-Time Systems, Lecture 2 Martina Maggio 19 January 2017 Lund University, Department of Automatic Control www.control.lth.se/course/frtn01 Content [Real-Time Control System: Chapter

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2019 Lecture 8 Scheduling Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ POSIX: Portable Operating

More information

Design of a Network Camera with an FPGA

Design of a Network Camera with an FPGA Design of a Network Camera with an FPGA Tiago Filipe Abreu Moura Guedes INESC-ID, Instituto Superior Técnico guedes210@netcabo.pt Abstract This paper describes the development and the implementation of

More information

CARUSO Project Goals and Principal Approach

CARUSO Project Goals and Principal Approach CARUSO Project Goals and Principal Approach Uwe Brinkschulte *, Jürgen Becker #, Klaus Dorfmüller-Ulhaas +, Ralf König #, Sascha Uhrig +, and Theo Ungerer + * Department of Computer Science, University

More information

A 16-BIT CORDIC ROTATOR FOR HIGH-SPEED WIRELESS LAN

A 16-BIT CORDIC ROTATOR FOR HIGH-SPEED WIRELESS LAN A 16-BIT CORDIC ROTATOR FOR HIGH-SPEED WIRELESS LAN Koushik Maharatna 1, Alfonso Troa, Swapna Banerjee 3, Eckhard Grass, Miloš Krsti 1 Dept. of EE, Universit of Bristol, UK, Koushik.Maharatna@bristol.ac.uk

More information

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong

AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4. Bas Breijer, Filipa Duarte, and Stephan Wong AN OCM BASED SHARED MEMORY CONTROLLER FOR VIRTEX 4 Bas Breijer, Filipa Duarte, and Stephan Wong Computer Engineering, EEMCS Delft University of Technology Mekelweg 4, 2826CD, Delft, The Netherlands email:

More information

Subject Name:Operating system. Subject Code:10EC35. Prepared By:Remya Ramesan and Kala H.S. Department:ECE. Date:

Subject Name:Operating system. Subject Code:10EC35. Prepared By:Remya Ramesan and Kala H.S. Department:ECE. Date: Subject Name:Operating system Subject Code:10EC35 Prepared By:Remya Ramesan and Kala H.S. Department:ECE Date:24-02-2015 UNIT 1 INTRODUCTION AND OVERVIEW OF OPERATING SYSTEM Operating system, Goals of

More information

Lookahead Widening. Denis Gopan 1 and Thomas Reps 1,2. 1 University of Wisconsin. 2 GrammaTech, Inc.

Lookahead Widening. Denis Gopan 1 and Thomas Reps 1,2. 1 University of Wisconsin. 2 GrammaTech, Inc. Lookahead Widening Denis Gopan and Thomas Reps,2 Universit of Wisconsin. 2 GrammaTech, Inc. {gopan,reps}@cs.wisc.edu Abstract. We present lookahead widening, a novel technique for using eisting widening

More information

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs

Agenda. Introduction FPGA DSP platforms Design challenges New programming models for FPGAs New Directions in Programming FPGAs for DSP Dr. Jim Hwang Xilinx, Inc. Agenda Introduction FPGA DSP platforms Design challenges New programming models for FPGAs System Generator Getting your math into

More information

Introduction to Operating Systems

Introduction to Operating Systems Module- 1 Introduction to Operating Systems by S Pramod Kumar Assistant Professor, Dept.of ECE,KIT, Tiptur Images 2006 D. M.Dhamdhare 1 What is an OS? Abstract views To a college student: S/W that permits

More information

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved

Hardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved Hardware Design MicroBlaze 7.1 This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: List the MicroBlaze 7.1 Features List

More information

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date:

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date: Subject Name: OPERATING SYSTEMS Subject Code: 10EC65 Prepared By: Kala H S and Remya R Department: ECE Date: Unit 7 SCHEDULING TOPICS TO BE COVERED Preliminaries Non-preemptive scheduling policies Preemptive

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

Center Concentrated X-Torus Topology

Center Concentrated X-Torus Topology Center Concentrated X-Torus Topolog Dinesh Kumar 1, Vive Kumar Sehgal, and Nitin 3 1 Department of Comp. Sci. & Eng., Japee Universit of Information Technolog, Wanaghat, Solan, Himachal Pradesh, INDIA-17334.

More information

Chapter 5 CPU scheduling

Chapter 5 CPU scheduling Chapter 5 CPU scheduling Contents Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Thread Scheduling Operating Systems Examples Java Thread Scheduling

More information

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD

Leveraging OpenSPARC. ESA Round Table 2006 on Next Generation Microprocessors for Space Applications EDD Leveraging OpenSPARC ESA Round Table 2006 on Next Generation Microprocessors for Space Applications G.Furano, L.Messina TEC- OpenSPARC T1 The T1 is a new-from-the-ground-up SPARC microprocessor implementation

More information

Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement

Clock Speed Optimization of Runtime Reconfigurable Systems by Signal Latency Measurement Department of Electrical Engineering Computer Engineering Helmut Schmidt University, Hamburg University of the Federal Armed Forces of Germany Meyer, Haase, Eckert, Klauer Clock Speed Optimization of Runtime

More information

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web.

Codesign Framework. Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web. Codesign Framework Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available in their web. Embedded Processor Types General Purpose Expensive, requires

More information

Investigation Free Fall

Investigation Free Fall Investigation Free Fall Name Period Date You will need: a motion sensor, a small pillow or other soft object What function models the height of an object falling due to the force of gravit? Use a motion

More information

Discussion: Clustering Random Curves Under Spatial Dependence

Discussion: Clustering Random Curves Under Spatial Dependence Discussion: Clustering Random Curves Under Spatial Dependence Gareth M. James, Wenguang Sun and Xinghao Qiao Abstract We discuss the advantages and disadvantages of a functional approach to clustering

More information

Modeling and Simulation Exam

Modeling and Simulation Exam Modeling and Simulation am Facult of Computers & Information Department: Computer Science Grade: Fourth Course code: CSC Total Mark: 75 Date: Time: hours Answer the following questions: - a Define the

More information

2.2 Absolute Value Functions

2.2 Absolute Value Functions . Absolute Value Functions 7. Absolute Value Functions There are a few was to describe what is meant b the absolute value of a real number. You ma have been taught that is the distance from the real number

More information

IBM Netfinity Availability Extensions for Microsoft Cluster Server

IBM Netfinity Availability Extensions for Microsoft Cluster Server Enhanced server availabilit for Windows NT environments IBM Netfinit Availabilit Extensions for Microsoft Cluster Server Continuing leadership in clustering technolog Executive Summar In toda s business

More information

Received 23 November 2010; Revised 12 April 2011; Accepted 7 June 2011

Received 23 November 2010; Revised 12 April 2011; Accepted 7 June 2011 Hindawi Publishing Corporation International Journal of Reconfigurable Computing Volume 2011, Article ID 439072, 10 pages doi:10.1155/2011/439072 Research Article A High-Speed Dynamic Partial Reconfiguration

More information

Precision Peg-in-Hole Assembly Strategy Using Force-Guided Robot

Precision Peg-in-Hole Assembly Strategy Using Force-Guided Robot 3rd International Conference on Machiner, Materials and Information Technolog Applications (ICMMITA 2015) Precision Peg-in-Hole Assembl Strateg Using Force-Guided Robot Yin u a, Yue Hu b, Lei Hu c BeiHang

More information

MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA

MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA MULTI-PROCESSOR SYSTEM-LEVEL SYNTHESIS FOR MULTIPLE APPLICATIONS ON PLATFORM FPGA Akash Kumar,, Shakith Fernando, Yajun Ha, Bart Mesman and Henk Corporaal Eindhoven University of Technology, Eindhoven,

More information

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain

Lecture 4: Synchronous Data Flow Graphs - HJ94 goal: Skiing down a mountain Lecture 4: Synchronous ata Flow Graphs - I. Verbauwhede, 05-06 K.U.Leuven HJ94 goal: Skiing down a mountain SPW, Matlab, C pipelining, unrolling Specification Algorithm Transformations loop merging, compaction

More information

Operating System Approaches for Dynamically Reconfigurable Hardware

Operating System Approaches for Dynamically Reconfigurable Hardware Operating System Approaches for Dynamically Reconfigurable Hardware Marco Platzner Computer Engineering Group University of Paderborn platzner@upb.de Outline operating systems for reconfigurable hardware

More information

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Di-Shi Sun and Douglas M. Blough School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling

Uniprocessor Scheduling. Basic Concepts Scheduling Criteria Scheduling Algorithms. Three level scheduling Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three level scheduling 2 1 Types of Scheduling 3 Long- and Medium-Term Schedulers Long-term scheduler Determines which programs

More information

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA

Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Design and Implementation of Low Complexity Router for 2D Mesh Topology using FPGA Maheswari Murali * and Seetharaman Gopalakrishnan # * Assistant professor, J. J. College of Engineering and Technology,

More information