Pareto-Based Application Specification for MP-SoC Customized Run-Time Management
|
|
- Andrea Ramsey
- 5 years ago
- Views:
Transcription
1 Pareto-Based Application Specification for MP-SoC Customized Run-Time Management Ch. Ykman-Couvreur 1, V. Nollet 1, Th. Marescaux 1, E. Brockmeyer 1, Fr. Catthoor 1,2, H. Corporaal 3 1 IMEC V.Z.W., Kapeldreef 75, 3001 Leuven, Belgium 2 Also prof. at Katholieke Univ. Leuven, Belgium 3 Prof. at Technical Univ. Eindhoven, The Netherlands Abstract In an MP-SoC environment, a customized run-time management should be incorporated on top of the basic OS services to globally optimize costs (e.g. energy consumption) across all active applications, according to constraints (e.g. performance, user requirements) and available platform resources. To that end, we have proposed a Pareto-based approach combining a designtime application mapping and platform exploration with a lowcomplexity run-time manager. This allows to alleviate the OS in its run-time decisison making and to avoid conservative worstcase assumptions. In this paper, we focus on the characterization of the Pareto-based application specification, resulting from our design-time exploration. This specification is essential as input for our run-time manager. A representative video codec multimedia application, simulated on our MP-SoC platform simulator, is used as case study. For the resulting Pareto-based specification, both binary size and performance overhead is negligible. Distributed PEs interconnected by a NoC Platform aspect Application aspect Dynamic set of appl. (e.g. multimedia) Fig. 1. MP-SoC environment Low-power, RT behavior, small memory footprint Non-functional aspect MP-SoC environment I. INTRODUCTION In a Multi-Processor System-on-Chip (MP-SoC) environment, an ideal Operating System (OS), also called run-time management layer should efficiently combine all application, platform, and non-functional aspects (Fig. 1). First, the OS should enable a dynamic set of multimedia applications (e.g. video messaging, web browsing, video conferencing), 3D games, and many other compute-intensive tasks [1]. These applications are becoming more heterogeneous, dynamic, and data intensive. When running them on mobile devices, which are typically battery-powered energy consumption is a major design issue. The OS also has to fulfill the Quality-of-Service (QoS) requirements of the user (e.g. reliability, performance, and video quality). Secondly, the OS has to support platforms [2] (e.g. TI OMAP and ST Nomadik) which consist of a large number of heterogeneous Processing Elements (PE). These platforms combine the advantages of parallel computing of multiple processors with single-chip integration of SoCs. They provide high computational performance at a low energy cost, while typical embedded systems (e.g. handheld devices such as PDAs and smartphones) are limited by the restricted amount of processing power and memory. Since the application complexity is growing, the major challenges are the right parallelization of these applications and their efficient mapping on the MP-SoC platform. Third, growing SoC complexity makes communication subsystem design as important as computation subsystem design [3], [4]. To provide reliable and scalable communication [5], a flexible interconnect Network-on-Chip (NoC) must be adopted. Designing such an NoC becomes another major task for future MP-SoCs. Finally, for memoryintensive applications such as multimedia applications, the memory subsystem represents an important component in the overall energy cost. In the memory subsystem, ScratchPad Memories (SPM) are used [6], [7], [8], since they perform better than caches in terms of energy per access, performance, on-chip area, and predictability. However, unlike caches, SPMs require complex design-time application analysis to carefully decide which data to assign to the SPM and software allocation techniques. To alleviate the OS in its run-time decision making, and to avoid conservative worst-case assumptions, we have proposed a customized run-time management [9] to map the applications on the platform. It consists of two phases. First, a designtime mapping and platform exploration per application leads to a multi-dimensional Pareto set of optimal mappings. Each mapping is characterized a code version together with an optimal combination of used platform resources, costs, and constraints. The different code versions refer to different parallelizations of the application into parallel tasks and to data transfers between SPMs and local memories. Second, a lowcomplexity run-time manager, incorporated on top of the basic OS services, maintains the high quality of the exploration. Whenever the environment is changing (e.g., when a new application/use case starts, or when the user requirements change), for each active application, our run-time manager reacts as follows: 1) It selects in a predictable way a mapping from its Pareto /06/$ IEEE 78
2 Energy 0 Proc 0 Proc 1 Proc 2 Proc 3 4 PEs 1 PE Application A Ck2 Ck1 Pareto point switch A starts B starts A stops B stops Fig. 2. Energy 0 4 PEs Pareto point switch 1 PEs Application B Time Ck1 Pareto point switch set, according to the available platform resources, in order to minimize the total energy consumption of the platform, while respecting all constraints. 2) It performs Pareto point switches (Fig. 2, restricted to two dimensions), i.e. it assigns the platform resources, it adapts the platform parameters, it loads the task binaries from the shared memory in the corresponding local memories, and it issues the execution of the code versions according to the newly selected Pareto points. When Application A starts, it is assigned to three PEs with a slow clock (ck2). As soon as Application B starts, a Pareto point switch is needed to map A on only two PEs. By speeding up the clock (ck1), the application deadline is still met. After A stops, B can be spread over three PEs in order to reduce the energy consumption. In [9], the design-time exploration phase of our approach, restricted to the usage of one processor, was presented. The main new contribution of this paper is the characterization of the Pareto-based application specification, efficiently merging all code versions present in the Pareto set, and resulting from our design-time exploration. This specification, to be stored into the MP-SoC platform, is essential as input for our run-time manager. A representative video codec multimedia application, simulated on our MP-SoC platform simulator, is used as case study to illustrate this application specification. The resulting binary size and performance overhead is negligible. The remainder of this paper is organized as follows. Section II summarizes the related work in the MP-SoC domain. Section III presents our customized run-time management approach. Section IV introduces our case-study application and our platform simulator. Section V characterizes the Paretobased application specification used in our approach. It also describes the experiments performed on our case study. Conclusions and future work are given in Section VI. Ck2 II. RELATED WORK In recent years, industrial MP-SoC components have been introduced by companies like Texas Instruments and ST Microelectronics. For embedded systems (limited by the number of PEs), Real-Time OSs (RTOSs) are focused on execution determinism, speed and small memory footprint. Current OSs like the TI DSP/BIOS kernel, the Quadros RTXC RTOS, and the Enea Systems OSE RTOS, are clearly focused on lowlevel run-time management (i.e., multiplexing the hardware and providing uniform communication primitives). They only provide an abstraction layer on top of the hardware, they expand and link together existing technologies, but they are not designed for the emerging MP-SoC environment. Support for SPMs, NoCs, dynamic power management, QoS-aware and application-specific run-time management, is lacking. Hence none of these existing OS represents the ideal glue layer for MP-SoCs. The user is supposed to implement his own runtime manager on top of the OS services. State-of-art tools and design practice also are not in a shape yet to meet the needs presented in Section I. Currently in the academic world, two diverging strategies [10] are developed to cope with the design complexity of application-specific and heterogeneous MP-SoC platforms: either the IP-driven approach [11], [12], or the design-flow-driven approach. In these IP-driven approaches, any application is synthesized separately and synthesis has no integral view on the entire system on the MP-SoC platform. Related to the design-flow-driven approach, several global optimization issues are considered: application parallelization, task scheduling, communication management, and dynamic reconfiguration. In this paper, we focus on task scheduling and dynamic reconfiguration, for which our approach offers trade-offs. For MP-SoC platforms, task scheduling becomes more complicated [13], and its impact on the performance and energy consumption becomes more significant. It consists of: mapping, determining the order in which those tasks are executed (i.e. temporal mapping), and on which processor each task must be executed (i.e. spatial mapping), and Dynamic Voltage/Frequency Scaling (DVS/DFS), determining the processor supply voltage and clock frequency if it is allowed. Energy consumption is increasingly an issue not only for battery operated devices. Even if unlimited power is available, a large number of components tightly packed onto a chip poses cooling and reliability problems. An important way to reduce the energy consumption is to shut down or slow down functional components which are idle or under utilized, by combining DVS with Dynamic Power Management (DPM). A survey of system-level design techniques can be found in [14] and [15] respectively. The most recent scheduling approaches, combining application mapping with DVS can be found in [16], [17], [18]. To support the massive data traffic, run-time communication management is a challenging task since inter-processor communications become responsible for significant execution time and energy consumption. Approaches, combining application 79
3 Application A Application B Design-time exploration Refined application code: Version 1 Version 2... Energy Pareto set Others Memory usage PE usage Refined application code: Version 1 Version 2... Energy Pareto set Others Memory usage PE usage Low-complexity run-time layer Constraints Customized run-time manager RTOS kernel Platform information Fig. 4. Our MP-SoC run-time management Fig. 3. Pareto set generated by our design-time exploration mapping with some communication management aspects can be found in [19], [20], [21], [22]. Related to dynamic reconfiguration, some aspects are currently considered. Multimedia applications are becoming more versatile and dynamic applications with multiple use cases need to be supported. Switching from one use case to another one at run time involves changing the application task graph configuration [23], [24]. The platform also needs to support a wide range and dynamic set of applications. This requires an efficient run-time support for platform resource management, task relocation, and reconfiguration of inter-task communication [25]. III. OVERVIEW OF OUR CUSTOMIZED RUN-TIME MANAGEMENT To meet the needs presented in Section I, our approach proposes a customized run-time management to map the applications on the platform, consisting of two phases: (1) a designtime mapping and platform exploration per application; (2) a low-complexity run-time manager incorporated on top of the basic OS services. This run-time manager globally optimizes costs (e.g. energy consumption) across all active applications, according to constraints (e.g. performance, user requirements) and available platform resources. It also performs low-cost switches between possible mappings of a same application, as required by environment changes. A similar conceptual approach was already developed for scheduling concurrent tasks on embedded systems. This was intended to optimize only the energy consumption while respecting the application deadlines [18], [14]. In contrast to the conventional approaches that generate only one solution for each application, the first phase is a design-time application mapping and platform exploration. For each application, this exploration generates a set of optimal mappings in a multi-dimensional design space (Fig. 3), instead of a two-dimensional one. Current dimensions are costs (e.g. energy consumption), constraints (e.g. performance, user requirements), and used platform resources (e.g. memory usage, processors, communication bandwidth, clocks, and processor supply voltage if it is allowed). Only points being better than the other ones in at least one dimension are retained. They are called Pareto points. The resulting set of Pareto points is called the Pareto set. This design-time exploration phase of our approach, restricted to the usage of one processor, was presented in [9]. Dependent on the application constraints, and on the availability of the platform resources, any one of these Pareto points, representing application mappings, will be best can be selected by the run-time manager. Each Pareto point is also annotated with a code version. The different code versions refer to different parallelizations of the application into parallel tasks and to data transfers between SPMs and local memories. The main contribution of this paper is the characterization and merging of all these code versions, called Pareto-based application specification. This latter is presented in section V. Hence, in total, our Pareto set is made up for any application of optimal mappings characterized by a code version together with an optimal combination of used platform resources, costs, and constraints. The description of data structures storing information related to this Pareto set and the Pareto points is out of scope of this paper. The full exploration is done at design time, whereas the critical decisions are taken during the second phase by a lowcomplexity run-time manager (Fig. 4). This latter provides the following services: Whenever a new application is activated, our run-time manager parses its Pareto set provided by the designtime exploration and stores it in the shared memory of the MP-SoC platform, including all task binaries. Whenever the environment is changing (e.g., when a new application/use case starts, or when the user requirements change), for each active application, our run- 80
4 time manager reacts as follows. First, it selects in a predictable way a mapping from its Pareto set, according to the available platform resources, in order to minimize the total energy consumption of the platform, while respecting all constraints. Second, it performs Pareto point switches (Fig. 2, restricted to two dimensions), as explained in Section I. The Pareto point switch technique bears some resemblance with dynamic reconfiguration. It can switch other mappings, but, in contrast to dynamic reconfiguration, it involves more complex run-time tradeoffs. IV. DEMONSTRATOR As driver application, an inter-frame compression technique for video images, called Quadtree Structured Difference Pulse Code Modulation (QSDPCM) is used [26]. It is representative for many today s video codec multimedia applications. It involves a three-stage hierarchical Motion Estimation (ME4, ME2, and ME1), followed by a quadtree-based encoding of the motion compensated frame-to-frame difference signal, a Quantization, and a Huffmann-based Compression (QC). Two image resolutions are allowed: either QCIF, with image size 176*144 pixels, or VGA, with image size of 640*480 pixels. In our experiments, the QCIF resolution is used. The starting algorithm, expressed in C code, has two image frames (the previous and current ones) as input, and one bit stream as output. The code is already tuned for efficient data management and processing by: (1) minimizing the size of internal arrays; (2) optimizing the loop performance and achieving software pipelining. To preserve these optimizations in later code refinements, any optimized loop is encapsulated in a function called kernel in the remainder of this paper. The resulting algorithm is illustrated in Fig. 5(a), where each module is a loop manipulating two pixel blocks at each iteration (the one from the current frame, and the other from the previous frame). Our MP-SoC simulator assumes a platform composed of: (1) processor nodes with local memories and buses; (2) distributed shared memory nodes; (3) communication assists similar to Direct Memory Access (DMA) controllers, providing high-level services to processors and shared memories for efficient data transfers; (4) I/O nodes; (5) a communication architecture, being the AEthereal NoC [27]. The main platform parameters that can be explored at present are: the network clock, the maximum number of time slots, the number of routers, the processor clock and supply voltage if it is allowed, the memory clock, the communication bandwidth between a processor and a shared memory, the number of processors to be used by the application, the memory usage, and some QoS requirement (either guaranteed throughput, or best effort). V. PARETO-BASED APPLICATION SPECIFICATION From our design-time exploration, a multi-dimensional Pareto set of optimal mappings is generated for any application to be mapped on the MP-SoC platform. Each mapping is characterized by a code version together with an optimal ME1() On 1 processor ME1_ On 2 processors (a) Starting algorithm ME1() On 3 processors ME1_1 (b) Relevant parallelizations Fig. 5. QSDPCM application... On k+2 processors 1 < k < 6 ME1_k combination of used platform resources, costs, and constraints. First, the structure of any standalone application code version is described in Section V-A. Then, the Pareto-based specification, merging all these codes, is characterized in Section V-B. A. Standalone Code Version Structure Any application code version present in the Pareto set refers to different parallelizations of the application into parallel tasks and to data transfers between SPMs and local memories, derived from the design-time exploration, as follows. Parallelization exploration Parallelizing an application can be done both at functional and data level. At the functional level, the algorithm is partitioned into smaller tasks, and synchronization requirements between them are identified to allow pipelined execution of these tasks. For instance, in video applications, images can be divided into block of rows. Any task parallelized at the data level deals with its own block of rows. Block transfer exploration: To optimize both performance and energy consumption in the memory subsystem, parts of data arrays stored in the SPM are copied in the processor local memory from where they are accessed multiple times [28]. These copy operations (also called Block Transfers (BT)) are performed through function calls in the application code, first to issue a BT, and next to synchronize its completion with processing. This allows to perform BTs in parallel with processing and hence to improve the application performance. This is illustrated in Fig. 6, where a BT into a copy cp prev frame is performed in parallel with a for loop processing. This allows to reduce the waiting time for this BT completion and 81
5 TABLE I BLOCK TRANSFER IMPACT ON ME1 BINARY SIZE (BYTES) BT Size of Binary size Total ME1 Solution copies of BT calls binary size BT BT BT BT Fig. 6. BT from SPM to processor local memory BT overhead. These QSDPCM parallelizations are illustrated in Fig. 5(b). Related to the BTs, three arrays (storing the current image frame, the previous one, and some internal data required in ) are too large and must be stored in the SPM. Several efficient BT solutions are explored. Table 1 reports for the task ME1 the resulting processor local memory usage for copies (difference up to a factor 2) and the binary size overhead for BT calls (up to 16% of the total ME1 binary). The current implementation of a BT issue (resp. sync) call costs about 378 (resp. 20) bytes in our MP-SoC platform simulator, which explains this important BT call size overhead. This needs to be optimized our near future work. Similar BT solutions are derived for the tasks QC and ME1 QC, whereas only one efficient BT solution is derived for ME42. Hence, considering all combinations of BT solutions in all tasks of any parallelized application gives rise to a huge number of different application code versions. A Pareto-based specification, merging all of them, and allowing efficient loading of any task binary into the platform is required. This specification is characterized in Section V-B. B. Merging code versions Fig. 7. Application code version structure to reach a performance gain of 16 cycles per iteration. Several efficient solutions, yielding different local memory usage and performance, exist for the copy sizes and the places in the code where to insert these BT calls. Such a code version (Fig. 7) is made up of a task set. Each task is made up of a skeleton to glue together the kernel calls (Section IV), the BT calls, and task synchronization for parallelization. Experiments Related to the functional-level parallelization, the QSDPCM can be naturally partitioned into either three tasks (ME42, ME1, and QC), or two tasks (ME42 and ME1 merged with QC). To further alleviate the computation effort of ME1, the input frames can be divided into row blocks to parallelize ME1 at the data level. Up to five parallel ME1 tasks have been considered, beyond which no performance gain is reached any more due to too large task synchronization and All code versions of a same application derived from the design-time exploration are merged into a generic one, called Pareto-based specification. This latter is made up of: A set of tasks, derived from the functional-level parallelization exploration of the application. For each task: The block of image rows, derived from the datalevel parallelization exploration, and used as input argument of the task. An extended task skeleton integrating all BT solutions, derived from the block transfer exploration. For each BT solution, implementation details specifying the size of copies to be allocated in the processor local memory, and the BT calls to be executed in the task skeleton. This Pareto-based specification is stored in the shared memory of the MP-SoC platform. However only the required task binaries are loaded in the corresponding local memories during Pareto point switches, as explained in Section I. This Pareto-based specification is illustrated on the QS- DPCM to show that, for this application, both code size and performance overhead are negligible. To analyze the 82
6 Fig. 8. Task binary sizes (bytes) in Pareto-based QSDPCM specification energy consumption overhead, an energy model in our MP- SoC platform simulator is required. This is currently under investigation. Experiments From the QSDPCM parallelization exploration (Fig. 5(b)), four different tasks are considered: ME42, ME1, QC, and ME1 QC. Binary sizes for these tasks, integrating all BT solutions, are detailed in Fig. 8. They include the sizes of all needed kernels, the extended skeleton, all implementation details. The kernels, which are independent from the standalone code versions, represent the major component of any task binary. The size of task synchronization and all BT calls, being part of the extended task skeleton, is also reported. The code size overhead of the Pareto-based specification is due to: (1) the size of implementation details, which is negligible; (2) the size overhead of the extended skeletons, due to integration of all BT solutions. Size overhead for each task binary is detailed in Fig. 9(a). Merging a standalone code version in the Pareto-based specification yields less than 5% size overhead. To analyze the performance overhead of the Paretobased specification, the QSDPCM mapping on six processors (Fig. 5(b)) is simulated on our MP-SoC platform simulator,using both standalone code version and Pareto-based specification. (processing and BT waiting times) comparison is reported in Fig. 9(b). Less than 0.17% performance overhead can be observed on each processor. VI. CONCLUSION In this paper, we characterize the Pareto-based application specification, used as input for our run-time manager. This specification merge all code versions of a single application derived from the design-time exploration. It refers to different parallelizations of the application and to data transfers between SPMs and local memories. It is also illustrated on a video codec multimedia application, and simulated on our MP-SoC Fig. 9. Comparison between standalone code versions and Pareto-based specification platform simulator. For this application, less than 5% binary size overhead per merged code version, and less than 0.17% performance overhead is observed. Our future work includes the optimization of the data transfer implementation in the NoC of our MP-SoC platform (to further reduce the binary size overhead), the run-time support integration to allow Pareto point switch at run time and the analysis of the resulting run-time overhead, an energy model in our MP-SoC platform simulator, and tests on other real-life applications. REFERENCES [1] P. Cumming, The TI OMAP platform approach to SoC. Kluwer Academic, [2] W. Wolf, The future of multiprocessor systems-on-chips, in Proceedings of the Design Automation Conference, pp , [3] D. Bertozzi, A. Jalabert, M. Srinivasan, R. Tamhankar, S. Stergiou, L. Benini, and G. De Micheli, NoC synthesis flow for customized domain specific multiprocessor systems-on-chip, IEEE Trans. Parallel Distrib. Syst., vol. 16, pp , February [4] S. Murali and G. De Micheli, Bandwidth-constrained mapping of cores onto NoC architectures, in Proceedings of the Conference on Design, Automation and Test in Europe, Paris, France, February [5] L. Benini and G. De Micheli, Networks on chips: a new SoC paradigm, IEEE Computer, pp ,
7 [6] S. Mamagkakis, D. Atienza, C. Poucet, F. Catthoor, D. Soudris, and J. Mendias, Custom design of multi-level dynamic memory management subsystem for embedded systems, in Proceedings of the IEEE Workshop on Signal Processing Systems, October 2004, pp [7] F. Poletti, P. Marchal, D. Atienza, L. Benini, F. Catthoor, and J. Mendias, An integrated hardware/software approach for run-time scratchpad management, in Proceedings of the Design Automation Conference, pp , June [8] M. Verma, L. Wehmeyer, and P. Marwedel, Dynamic overlay of scratchpad memory for energy minimization, in Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2004, pp [9] C. Ykman-Couvreur, E. Brockmeyer, V. Nollet, T. Marescaux, F. Catthoor, and H. Corporaal, Design-time application exploration for MP-SoC customized run-time management, in Proceedings of the International Symposium on System-on-Chip, pp , November [10] T. Kogel and H. Meyr, Heterogeneous MP-SoC - the solution to energyefficient signal processing, in Proceedings of the Design Automation Conference, 2004, pp [11] T. Henriksson, J. Kang, and P. van der Wolf, Implementation of dynamic streaming applications on heterogeneous multi-processor applications, in Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, Jersey City, NJ, September 2005, pp [12] S. Yoo, M. Youssef, A. Bouchhima, and A. Jerraya, Multi-processor SoC design methodology using a concept of two-layer hardwaredependent software, in Proceedings of the Conference on Design, Automation and Test in Europe, Paris, February [13] Y. Cho, S. Yoo, K. Choi, N.-E. Zergainoh, and A. Jerraya, Scheduler implementation in MP SOC design, in Proceedings of the Asia South Pacific Design Automation Conference, Shangai, China, January [14] C. Ykman-Couvreur, F. Catthoor, J. Vounckx, A. Folens, and F. Louagie, Energy-aware dynamic task scheduling applied to a real-time multimedia application on an Xscale board, Journal of Low Power Electronics, vol. 1, pp , December [15] L. Benini, A. Bogliolo, and G. De Micheli, A survey of design techniques for system-level dynamic power management, IEEE Trans. VLSI Syst., vol. 8, pp , June [16] A. Andrei, M. Schmitz, P. Eles, Z. Peng, and B. Al-Hashimi, Overheadconscious voltage selection for dynamic and leakage energy reduction of time-constrained systems, in Proceedings of the Conference on Design, Automation and Test in Europe, pp , February [17] P. Schaumont, B.-C. C. Lai, W. Qin, and I. Verbauwhede, Cooperative multithreading on embedded multiprocessor architectures enables energyscalable design, in Proceedings of the Design Automation Conference, pp , June [18] P. Yang and F. Catthoor, Dynamic mapping and ordering tasks of embedded real-time systems on multiprocessor platforms, in Proceedings of the International Workshop on Software and Compilers for Embedded Systems, pp , September [19] L. Smit, G. Smit, J. Hurink, H. Boersma, D. Paulusma, and P. Wolkotte, Run-time mapping of applications to a heterogeneous reconfigurable tiled system on chip architecture, in Proceedings of the International Symposium on System-on-Chip, November [20] A. Hansson, K. Goossens, and A. Radulescu, A unified approach to constrained mapping and routing on Network-on-Chip architectures, in Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, pp , September [21] J. Hu and R. Marculescu, Communication and task scheduling of application-specific networks-on-chip, IEE Proceedings - Computers and Digital Techniques, vol. 152, pp , September [22] O. Bringmann, A. Siebenborn, and W. Rosenstiel, Conflict analysis in multiprocess synthesis for optimized system integration, in Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, pp , September [23] J. Kang, T. Henriksson, and P. van der Wolf, An interface for the design and implementation of dynamic applications on multi-processor architectures, in Proceedings of the Workshop on Embedded Systems for Real-Time Multimedia, pp , September [24] M. Rutten et al., Dynamic reconfiguration of streaming graphs on a heterogeneous multiprocessor architecture, in Proceedings of the IS&T/SPIE s Annual Symposium on Electronic Imaging: Multimedia Processing and Applications, pp , January [25] V. Nollet, T. Marescaux, P. Avasare, J.-Y. Mignolet, and D. Verkest, Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles, in Proceedings of the Conference on Design, Automation and Test in Europe, pp , March [26] P.Strobach, QSDPCM a new technique in scene adaptive coding, in Proceedings of the Eur. Signal processing Conference, pp , September [27] J. Dielissen, A. Radulescu, K. Goossens, and E. Rijpkema, Concepts and implementation of the Philips network-on-chip, in Proceedings of the IP-based SOC Design, November [28] E. Brockmeyer, M. Miranda, H. Corporaal, and F. Catthoor, Layer assignment techniques for low energy in multi-layered memory organisations, in Proceedings of the Conference on Design, Automation and Test in Europe, pp ,
Design-time application mapping and platform exploration for MP-SoC customised run-time management
Design-time application mapping and platform exploration for MP-SoC customised run-time management Ch. Ykman-Couvreur, V. Nollet, Th. Marescaux, E. Brockmeyer, Fr. Catthoor and H. Corporaal Abstract: In
More informationMapping and Configuration Methods for Multi-Use-Case Networks on Chips
Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Srinivasan Murali CSL, Stanford University Stanford, USA smurali@stanford.edu Martijn Coenen, Andrei Radulescu, Kees Goossens Philips
More informationMapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.
Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable
More informationMapping and Configuration Methods for Multi-Use-Case Networks on Chips
Mapping and Configuration Methods for Multi-Use-Case Networks on Chips Srinivasan Murali, Stanford University Martijn Coenen, Andrei Radulescu, Kees Goossens, Giovanni De Micheli, Ecole Polytechnique Federal
More informationA Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors
A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors Murali Jayapala 1, Francisco Barat 1, Pieter Op de Beeck 1, Francky Catthoor 2, Geert Deconinck 1 and Henk Corporaal
More informationData Storage Exploration and Bandwidth Analysis for Distributed MPEG-4 Decoding
Data Storage Exploration and Bandwidth Analysis for Distributed MPEG-4 oding Milan Pastrnak, Peter H. N. de With, Senior Member, IEEE Abstract The low bit-rate profiles of the MPEG-4 standard enable video-streaming
More informationOptimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip
Optimization of Task Scheduling and Memory Partitioning for Multiprocessor System on Chip 1 Mythili.R, 2 Mugilan.D 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,
More informationResource Manager for Non-preemptive Heterogeneous Multiprocessor System-on-chip
Resource Manager for Non-preemptive Heterogeneous Multiprocessor System-on-chip Akash Kumar, Bart Mesman, Bart Theelen and Henk Corporaal Eindhoven University of Technology 5600MB Eindhoven, The Netherlands
More informationDesign and Implementation of Buffer Loan Algorithm for BiNoC Router
Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India
More informationSDR Forum Technical Conference 2007
THE APPLICATION OF A NOVEL ADAPTIVE DYNAMIC VOLTAGE SCALING SCHEME TO SOFTWARE DEFINED RADIO Craig Dolwin (Toshiba Research Europe Ltd, Bristol, UK, craig.dolwin@toshiba-trel.com) ABSTRACT This paper presents
More informationDATA REUSE DRIVEN MEMORY AND NETWORK-ON-CHIP CO-SYNTHESIS *
DATA REUSE DRIVEN MEMORY AND NETWORK-ON-CHIP CO-SYNTHESIS * University of California, Irvine, CA 92697 Abstract: Key words: NoCs present a possible communication infrastructure solution to deal with increased
More informationLow Power Mapping of Video Processing Applications on VLIW Multimedia Processors
Low Power Mapping of Video Processing Applications on VLIW Multimedia Processors K. Masselos 1,2, F. Catthoor 2, C. E. Goutis 1, H. DeMan 2 1 VLSI Design Laboratory, Department of Electrical and Computer
More informationLong Term Trends for Embedded System Design
Long Term Trends for Embedded System Design Ahmed Amine JERRAYA Laboratoire TIMA, 46 Avenue Félix Viallet, 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr Abstract. An embedded system is an application
More informationDesign of network adapter compatible OCP for high-throughput NOC
Applied Mechanics and Materials Vols. 313-314 (2013) pp 1341-1346 Online available since 2013/Mar/25 at www.scientific.net (2013) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/amm.313-314.1341
More informationSingle-Path Programming on a Chip-Multiprocessor System
Single-Path Programming on a Chip-Multiprocessor System Martin Schoeberl, Peter Puschner, and Raimund Kirner Vienna University of Technology, Austria mschoebe@mail.tuwien.ac.at, {peter,raimund}@vmars.tuwien.ac.at
More informationEffective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management
International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,
More informationHARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK
DOI: 10.21917/ijct.2012.0092 HARDWARE IMPLEMENTATION OF PIPELINE BASED ROUTER DESIGN FOR ON- CHIP NETWORK U. Saravanakumar 1, R. Rangarajan 2 and K. Rajasekar 3 1,3 Department of Electronics and Communication
More informationDesign Space Exploration of Real-time Multi-media MPSoCs with Heterogeneous Scheduling Policies
Design Space Exploration of Real-time Multi-media MPSoCs with Heterogeneous Scheduling Policies Minyoung Kim, Sudarshan Banerjee, Nikil Dutt, Nalini Venkatasubramanian School of Information and Computer
More informationWITH the development of the semiconductor technology,
Dual-Link Hierarchical Cluster-Based Interconnect Architecture for 3D Network on Chip Guang Sun, Yong Li, Yuanyuan Zhang, Shijun Lin, Li Su, Depeng Jin and Lieguang zeng Abstract Network on Chip (NoC)
More informationSystem Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework
System Modeling and Implementation of MPEG-4 Encoder under Fine-Granular-Scalability Framework Literature Survey Embedded Software Systems Prof. B. L. Evans by Wei Li and Zhenxun Xiao March 25, 2002 Abstract
More informationImproving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection
Improving Routing Efficiency for Network-on-Chip through Contention-Aware Input Selection Dong Wu, Bashir M. Al-Hashimi, Marcus T. Schmitz School of Electronics and Computer Science University of Southampton
More informationAn Application Mapping Scheme over Distributed Reconfigurable System
An Application Mapping Scheme over Distributed Reconfigurable System Chao Wang Lianghua Miao Bin Xie and Tianzhou Chen College of Computer Science Zhejiang University Hangzhou Zhejiang 310027 P. R. China
More informationBehavioral Array Mapping into Multiport Memories Targeting Low Power 3
Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,
More informationThe S6000 Family of Processors
The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which
More informationSTG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology
STG-NoC: A Tool for Generating Energy Optimized Custom Built NoC Topology Surbhi Jain Naveen Choudhary Dharm Singh ABSTRACT Network on Chip (NoC) has emerged as a viable solution to the complex communication
More informationProfiling Driven Scenario Detection and Prediction for Multimedia Applications
Profiling Driven Scenario Detection and Prediction for Multimedia Applications Stefan Valentin Gheorghita, Twan Basten and Henk Corporaal EE Department, Electronic Systems Group Eindhoven University of
More informationISSN Vol.04,Issue.01, January-2016, Pages:
WWW.IJITECH.ORG ISSN 2321-8665 Vol.04,Issue.01, January-2016, Pages:0077-0082 Implementation of Data Encoding and Decoding Techniques for Energy Consumption Reduction in NoC GORANTLA CHAITHANYA 1, VENKATA
More informationWorst Case Execution Time Analysis for Synthesized Hardware
Worst Case Execution Time Analysis for Synthesized Hardware Jun-hee Yoo ihavnoid@poppy.snu.ac.kr Seoul National University, Seoul, Republic of Korea Xingguang Feng fengxg@poppy.snu.ac.kr Seoul National
More informationSoftware Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors Francisco Barat, Murali Jayapala, Pieter Op de Beeck and Geert Deconinck K.U.Leuven, Belgium. {f-barat, j4murali}@ieee.org,
More informationSession: Configurable Systems. Tailored SoC building using reconfigurable IP blocks
IP 08 Session: Configurable Systems Tailored SoC building using reconfigurable IP blocks Lodewijk T. Smit, Gerard K. Rauwerda, Jochem H. Rutgers, Maciej Portalski and Reinier Kuipers Recore Systems www.recoresystems.com
More informationPerformance of Multihop Communications Using Logical Topologies on Optical Torus Networks
Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,
More informationAn Application-Specific Design Methodology for STbus Crossbar Generation
An Application-Specific Design Methodology for STbus Crossbar Generation Srinivasan Murali, Giovanni De Micheli Computer Systems Lab Stanford University Stanford, California 935 {smurali, nanni}@stanford.edu
More informationA Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup
A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington
More informationMapping C code on MPSoC for Nomadic Embedded Systems
-1 - ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Mapping C code on MPSoC for Nomadic Embedded Systems http://www.artist-embedded.org/ Lecturer: Diederik
More informationResource-efficient Routing and Scheduling of Time-constrained Network-on-Chip Communication
Resource-efficient Routing and Scheduling of Time-constrained Network-on-Chip Communication Sander Stuijk, Twan Basten, Marc Geilen, Amir Hossein Ghamarian and Bart Theelen Eindhoven University of Technology,
More informationOptimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation
Optimization of Dynamic Data Structures in Multimedia Embedded Systems Using Evolutionary Computation D. Atienza, C. Baloukas, L. Papadopoulos, C. Poucet, S. Mamagkakis, J. I. Hidalgo, F. Catthoor, D.
More informationFPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP
FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP 1 M.DEIVAKANI, 2 D.SHANTHI 1 Associate Professor, Department of Electronics and Communication Engineering PSNA College
More informationDesign guidelines for embedded real time face detection application
Design guidelines for embedded real time face detection application White paper for Embedded Vision Alliance By Eldad Melamed Much like the human visual system, embedded computer vision systems perform
More informationISSN Vol.05,Issue.09, September-2017, Pages:
WWW.IJITECH.ORG ISSN 2321-8665 Vol.05,Issue.09, September-2017, Pages:1693-1697 AJJAM PUSHPA 1, C. H. RAMA MOHAN 2 1 PG Scholar, Dept of ECE(DECS), Shirdi Sai Institute of Science and Technology, Anantapuramu,
More informationEnergy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS
Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory
More informationA Novel Technique to Use Scratch-pad Memory for Stack Management
A Novel Technique to Use Scratch-pad Memory for Stack Management Soyoung Park Hae-woo Park Soonhoi Ha School of EECS, Seoul National University, Seoul, Korea {soy, starlet, sha}@iris.snu.ac.kr Abstract
More informationMultimedia Decoder Using the Nios II Processor
Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra
More informationCross Clock-Domain TDM Virtual Circuits for Networks on Chips
Cross Clock-Domain TDM Virtual Circuits for Networks on Chips Zhonghai Lu Dept. of Electronic Systems School for Information and Communication Technology KTH - Royal Institute of Technology, Stockholm
More informationA Simplified Executable Model to Evaluate Latency and Throughput of Networks-on-Chip
A Simplified Executable Model to Evaluate Latency and Throughput of Networks-on-Chip Leandro Möller Luciano Ost, Leandro Soares Indrusiak Sanna Määttä Fernando G. Moraes Manfred Glesner Jari Nurmi {ost,
More informationLow-Power Data Address Bus Encoding Method
Low-Power Data Address Bus Encoding Method Tsung-Hsi Weng, Wei-Hao Chiao, Jean Jyh-Jiun Shann, Chung-Ping Chung, and Jimmy Lu Dept. of Computer Science and Information Engineering, National Chao Tung University,
More informationComputer-Aided Recoding for Multi-Core Systems
Computer-Aided Recoding for Multi-Core Systems Rainer Dömer doemer@uci.edu With contributions by P. Chandraiah Center for Embedded Computer Systems University of California, Irvine Outline Embedded System
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationA Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design
A Unified /SW Interface Model to Remove Discontinuities between and SW Design Aimen Bouchhima, Xi Chen, Frédéric Pétrot, Wander O. Cesário, Ahmed A. Jerraya TIMA Laboratory 46 Avenue Félix Viallet 38031
More information342 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH /$ IEEE
342 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 3, MARCH 2009 Custom Networks-on-Chip Architectures With Multicast Routing Shan Yan, Student Member, IEEE, and Bill Lin,
More informationExploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors
Exploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors Matteo Monchiero Gianluca Palermo Cristina Silvano Oreste Villa Dipartimento di Elettronica e Informazione Politecnico
More informationReal-Time Dynamic Voltage Hopping on MPSoCs
Real-Time Dynamic Voltage Hopping on MPSoCs Tohru Ishihara System LSI Research Center, Kyushu University 2009/08/05 The 9 th International Forum on MPSoC and Multicore 1 Background Low Power / Low Energy
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationOperating system integrated energy aware scratchpad allocation strategies for multiprocess applications
University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel
More informationReal Time NoC Based Pipelined Architectonics With Efficient TDM Schema
Real Time NoC Based Pipelined Architectonics With Efficient TDM Schema [1] Laila A, [2] Ajeesh R V [1] PG Student [VLSI & ES] [2] Assistant professor, Department of ECE, TKM Institute of Technology, Kollam
More informationFPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)
FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor
More informationSingle Pass Connected Components Analysis
D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected
More informationNetwork-on-Chip Architecture
Multiple Processor Systems(CMPE-655) Network-on-Chip Architecture Performance aspect and Firefly network architecture By Siva Shankar Chandrasekaran and SreeGowri Shankar Agenda (Enhancing performance)
More informationDesign of a System-on-Chip Switched Network and its Design Support Λ
Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of
More informationFunctional modeling style for efficient SW code generation of video codec applications
Functional modeling style for efficient SW code generation of video codec applications Sang-Il Han 1)2) Soo-Ik Chae 1) Ahmed. A. Jerraya 2) SD Group 1) SLS Group 2) Seoul National Univ., Korea TIMA laboratory,
More informationEnergy-Aware Cosynthesis of Real-Time Multimedia Applications on MPSoCs Using Heterogeneous Scheduling Policies
Energy-Aware Cosynthesis of Real-Time Multimedia Applications on MPSoCs Using Heterogeneous Scheduling Policies 9 MINYOUNG KIM, SUDARSHAN BANERJEE, NIKIL DUTT, and NALINI VENKATASUBRAMANIAN University
More informationCo-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,
Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17, 2014 1 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems
More informationUsing Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology
Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore
More informationLow-Power Video Codec Design
International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn : 2278-800X, www.ijerd.com Volume 5, Issue 8 (January 2013), PP. 81-85 Low-Power Video Codec Design R.Kamalakkannan
More informationLOW POWER REDUCED ROUTER NOC ARCHITECTURE DESIGN WITH CLASSICAL BUS BASED SYSTEM
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 5, May 2015, pg.705
More informationThe Design and Implementation of a Low-Latency On-Chip Network
The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current
More informationA Framework for Video Streaming to Resource- Constrained Terminals
A Framework for Video Streaming to Resource- Constrained Terminals Dmitri Jarnikov 1, Johan Lukkien 1, Peter van der Stok 1 Dept. of Mathematics and Computer Science, Eindhoven University of Technology
More informationCaching video contents in IPTV systems with hierarchical architecture
Caching video contents in IPTV systems with hierarchical architecture Lydia Chen 1, Michela Meo 2 and Alessandra Scicchitano 1 1. IBM Zurich Research Lab email: {yic,als}@zurich.ibm.com 2. Politecnico
More informationNETWORKS on CHIP A NEW PARADIGM for SYSTEMS on CHIPS DESIGN
NETWORKS on CHIP A NEW PARADIGM for SYSTEMS on CHIPS DESIGN Giovanni De Micheli Luca Benini CSL - Stanford University DEIS - Bologna University Electronic systems Systems on chip are everywhere Technology
More informationMapping Array Communication onto FIFO Communication - Towards an Implementation
Mapping Array Communication onto Communication - Towards an Implementation Jeffrey Kang Albert van der Werf Paul Lippens Philips Research Laboratories Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands
More informationAn Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling
An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate
More informationISSN Vol.03, Issue.02, March-2015, Pages:
ISSN 2322-0929 Vol.03, Issue.02, March-2015, Pages:0122-0126 www.ijvdcs.org Design and Simulation Five Port Router using Verilog HDL CH.KARTHIK 1, R.S.UMA SUSEELA 2 1 PG Scholar, Dept of VLSI, Gokaraju
More informationDesigning Area and Performance Constrained SIMD/VLIW Image Processing Architectures
Designing Area and Performance Constrained SIMD/VLIW Image Processing Architectures Hamed Fatemi 1,2, Henk Corporaal 2, Twan Basten 2, Richard Kleihorst 3,and Pieter Jonker 4 1 h.fatemi@tue.nl 2 Eindhoven
More informationHigh performance, power-efficient DSPs based on the TI C64x
High performance, power-efficient DSPs based on the TI C64x Sridhar Rajagopal, Joseph R. Cavallaro, Scott Rixner Rice University {sridhar,cavallar,rixner}@rice.edu RICE UNIVERSITY Recent (2003) Research
More informationHiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.
HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation
More informationEfficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip
ASP-DAC 2010 20 Jan 2010 Session 6C Efficient Throughput-Guarantees for Latency-Sensitive Networks-On-Chip Jonas Diemer, Rolf Ernst TU Braunschweig, Germany diemer@ida.ing.tu-bs.de Michael Kauschke Intel,
More informationClustering-Based Topology Generation Approach for Application-Specific Network on Chip
Proceedings of the World Congress on Engineering and Computer Science Vol II WCECS, October 9-,, San Francisco, USA Clustering-Based Topology Generation Approach for Application-Specific Network on Chip
More informationCosimulation of ITRON-Based Embedded Software with SystemC
Cosimulation of ITRON-Based Embedded Software with SystemC Shin-ichiro Chikada, Shinya Honda, Hiroyuki Tomiyama, Hiroaki Takada Graduate School of Information Science, Nagoya University Information Technology
More informationEnergy Aware Computing in Cooperative Wireless Networks
Energy Aware Computing in Cooperative Wireless Networks Anders Brødløs Olsen, Frank H.P. Fitzek, Peter Koch Department of Communication Technology, Aalborg University Niels Jernes Vej 12, 9220 Aalborg
More informationAutomatic Generation of Communication Architectures
i Topic: Network and communication system Automatic Generation of Communication Architectures Dongwan Shin, Andreas Gerstlauer, Rainer Dömer and Daniel Gajski Center for Embedded Computer Systems University
More informationManaging Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks
Managing Dynamic Reconfiguration Overhead in Systems-on-a-Chip Design Using Reconfigurable Datapaths and Optimized Interconnection Networks Zhining Huang, Sharad Malik Electrical Engineering Department
More informationENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AND RESOURCE CONSTRAINTS
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AND RESOURCE CONSTRAINTS Santhi Baskaran 1 and P. Thambidurai 2 1 Department of Information Technology, Pondicherry Engineering
More informationA METHODOLOGY FOR THE OPTIMIZATION OF MULTI- PROGRAM SHARED SCRATCHPAD MEMORY
INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS VOL. 4, NO. 1, MARCH 2011 A METHODOLOGY FOR THE OPTIMIZATION OF MULTI- PROGRAM SHARED SCRATCHPAD MEMORY J. F. Yang, H. Jiang School of Electronic
More informationHardware Scheduling Support in SMP Architectures
Hardware Scheduling Support in SMP Architectures André C. Nácul Center for Embedded Systems University of California, Irvine nacul@uci.edu Francesco Regazzoni ALaRI, University of Lugano Lugano, Switzerland
More informationPower Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study
Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico
More informationMulti-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture
The 51st Annual IEEE/ACM International Symposium on Microarchitecture Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture Byungchul Hong Yeonju Ro John Kim FuriosaAI Samsung
More informationIntegrating MRPSOC with multigrain parallelism for improvement of performance
Integrating MRPSOC with multigrain parallelism for improvement of performance 1 Swathi S T, 2 Kavitha V 1 PG Student [VLSI], Dept. of ECE, CMRIT, Bangalore, Karnataka, India 2 Ph.D Scholar, Jain University,
More informationkickoff 15 oct 2004 Project Overview Henk Corporaal
PreMaDoNA kickoff 15 oct 2004 Project Overview Henk Corporaal Agenda 15.00 Opening and Overview 15.30 Implementation and Demonstrator 15.40 Project Management 15.55 Application track 16.05 Simulation track
More informationAn Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks
An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks ABSTRACT High end System-on-Chip (SoC) architectures consist of tens of processing engines. These processing engines have varied
More informationEnabling Scheduling Analysis of Heterogeneous Systems with Multi-Rate Data Dependencies and Rate Intervals
28.2 Enabling Scheduling Analysis of Heterogeneous Systems with Multi-Rate Data Dependencies and Rate Intervals Marek Jersak, Rolf Ernst Technical University of Braunschweig Institute of Computer and Communication
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationMulti-Level Cache Hierarchy Evaluation for Programmable Media Processors. Overview
Multi-Level Cache Hierarchy Evaluation for Programmable Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Prof. Wayne Wolf Overview Why Programmable Media Processors?
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationA Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design
A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design Ahmed Amine JERRAYA EPFL November 2005 TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr
More informationENERGY EFFICIENT SCHEDULING SIMULATOR FOR DISTRIBUTED REAL-TIME SYSTEMS
I J I T E ISSN: 2229-7367 3(1-2), 2012, pp. 409-414 ENERGY EFFICIENT SCHEDULING SIMULATOR FOR DISTRIBUTED REAL-TIME SYSTEMS SANTHI BASKARAN 1, VARUN KUMAR P. 2, VEVAKE B. 2 & KARTHIKEYAN A. 2 1 Assistant
More informationDesign For High Performance Flexray Protocol For Fpga Based System
IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) e-issn: 2319 4200, p-issn No. : 2319 4197 PP 83-88 www.iosrjournals.org Design For High Performance Flexray Protocol For Fpga Based System E. Singaravelan
More informationImplementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications
46 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.3, March 2008 Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationProfiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency
Profiling-Based L1 Data Cache Bypassing to Improve GPU Performance and Energy Efficiency Yijie Huangfu and Wei Zhang Department of Electrical and Computer Engineering Virginia Commonwealth University {huangfuy2,wzhang4}@vcu.edu
More informationA Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System
A Data Centered Approach for Cache Partitioning in Embedded Real- Time Database System HU WEI, CHEN TIANZHOU, SHI QINGSONG, JIANG NING College of Computer Science Zhejiang University College of Computer
More informationTowards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing
Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de
More information