Matisse: A system-on-chip design methodology emphasizing dynamic memory management

Size: px
Start display at page:

Download "Matisse: A system-on-chip design methodology emphasizing dynamic memory management"

Transcription

1 Matisse: A system-on-chip design methodology emphasizing dynamic memory management Diederik Verkest, Julio Leao da Silva Jr., Chantal Ykman, Kris Croes, Miguel Miranda, Sven Wuytack, Gjalt de Jong, Francky Catthoor, Hugo De Man Abstract MATISSE is a design environment intended for developing systems characterized by a tight interaction between control and data-flow behavior, intensive data storage and transfer, and stringent real-time requirements. Matisse bridges the gap from a system specification, using a concurrent object-oriented language, to an optimized embedded single-chip hardware/software implementation. Matisse supports stepwise exploration and refinement of dynamic memory management, memory architecture exploration, and gradual incorporation of timing constraints before going to traditional tools for hardware synthesis, software compilation, and inter-processor communication synthesis. With this approach, specifications of embedded systems can be written in a high-level programming language using data abstraction. Application of Matisse on telecom protocol processing systems in the ATM area shows significant improvements in area usage and power consumption. 1 Introduction The complexity of modern telecommunication systems is rapidly increasing. A wide variety of services has to be transported and elaborate network management is needed. Such complex systems require a combination of hardware and software components to implement the required functionality at the desired performance level. For applications in this domain, the desired behavior is often characterized by complex algorithms that operate on large, dynamically allocated, stored data structures (e.g. linked lists, trees,... ) resulting in intensive data transfers and data storage. Ideally the specification should reflect the conceptual partitioning of the problem, which typically corresponds to abstract data types (ADTs) along with services provided on the ADTs, and algorithms for the different processing tasks. As these conceptual entities can be readily specified in an object-oriented programming model using data abstraction and class inheritance features, the Alcatel Telecom, F. Wellesplein 1, B-2018 Antwerp, Belgium MATISSE system uses the C++ programming language as the basis for the behavioral specification. The MATISSE language extends the standard C++ language with features for expressing concurrent tasks and synchronization. Behavioral hardware synthesis has been an active area of research for more than a decade (see e.g. [10]), but commercial behavioral synthesis tools offer only very limited support for complex data structures: usually only statically declared arrays and records are supported. All these synthesis environments provide scheduling and resource allocation capabilities that permit the designer to abstract from timing and hardware partitioning details. However, the designer is still largely responsible for specifying the memory hierarchy and organization. Also manipulation and modification of stored data structures must be specified in terms of explicit memory I/O operations. For signal processing oriented applications, significant progress has been made towards high-level memory management and synthesis capabilities [5, 7, 8]. However, these techniques typically rely on the stream-based nature of the applications to minimize the size of large intermediate arrays. The applications targeted by the MATISSE system require the support of irregular data structures, such as heaps, hash tables, trees, and linked lists, that are dynamically created and destroyed at run-time. In a traditional software run-time environment, the underlying operating system is responsible for all the background memory and storage related tasks. In addition, the memory hierarchy is usually fixed. However, for embedded systems solutions, relying on software run-time support may be expensive in terms of area, performance, and power. In addition, dedicated distributed memory architectures may be used. Hence, dynamic memory management behavior must be synthesis ed in the embedded system implementation itself. In this paper we discuss MATISSE, a design environment that takes care of the background memory management problem for dynamic data structure intensive applications by bridging the gap between the conceptual design entry specification, based on C++, and traditional behavioral synthesis. The MATISSE environment addresses all

2 the aforementioned tasks to synthesize a custom distributed memory architecture. It permits to explore different architectures so that an optimal choice can be made, which is crucial as memory bandwidth often is the main performance bottleneck in this type of applications. Another important benefit of the MATISSE environment is that the specification level is lifted to a higher level than currently used for behavioral synthesis. The design entry point can be a highlevel program using data abstraction, where the designer is not burdened unnecessarily by all the details of the implementation of data structures in a memory architecture. In the next section we present the MATISSE specification model and design flow. In subsequent sections, we elaborate on those steps in the design flow directly related to the dynamic memory management. The extensive exploration feasible with the MATISSE environment will be illustrated on an industrial ATM application. 2 Matisse System The system design flow, starts from a concurrent objectoriented system specification using the MATISSE model [6], and targets an optimized embedded single-chip hardware/software implementation. We first introduce the MA- TISSE model, and then discuss the MATISSE design flow. 2.1 Model Protocol processing applications are conceptually seen as sets of concurrent processes that access data (defined as sets of records). Although the target implementation of protocol processing applications is often a mixture of hardware and software components, they are best conceived at the top level from a software perspective. Concurrent objectoriented models play a central role in large scale hardware/software system design, since they allow system specification and fast system-level simulation. In [6], the MATISSE language is presented in detail. It is a concurrent object-oriented specification language, extended from the widely used programming language C++. Minimal syntactic extensions to C++ are introduced to allow the specification of concurrent processes, inter-process communication and synchronization. We summarize the main features of the underlying MATISSE model below. Processes and concurrency - It is possible to specify processes, called active objects, and data, called passive objects. Processes have their own local virtual memory space and default thread of control. They are only created at compile-time. Concurrency is specified at the process level, by the concurrent execution of the default threads of control of all created processes. Data may be created and destroyed in the local virtual memory space of the processes, either at compile-time or at run-time. Communication - Within one process, communication is specified using C++ pointers. Between processes, communication is specified using global pointers. Except for their potentially higher cost of use, global pointers are used just like C++ pointers. Synchronization - Due to concurrent computations, simultaneous accesses to data should be synchronized by using atomic functions. Whenever several threads call an atomic function, the function is executed the required number of times in a sequential order. The execution of an atomic function never interleaves with the execution of another atomic function within one process. 2.2 Design ow The MATISSE design flow is depicted in Figure 1. The input to the design flow is a system together with its environment specified at the algorithmic level, using the MA- TISSE language. HW synthesis Matisse specification Abstract machine generation Dynamic memory Management Process concurrency management Physical memory management HW/SW i/f synthesis SW synthesis Figure 1. Matisse design Flow Abstract Machine (AM) generation creates an executable specification, suitable for simulation, exploration, and refinement of the system specification. The AM consists of a set of communicating concurrent processes, an ultra-light operating system to manage the execution of these processes and a user interface to allow the designer to make refinements of the MATISSE specification. The AM allows profiling of record accesses, inter-process communication, and virtual memory accesses. These profiling data are used to select an optimized implementation for the records, perform process concurrency management, and physical memory management, respectively. Dynamic memory management - Protocol processing applications are often characterized by algorithms that operate on large data structures, which are dynamically allocated. The MATISSE language allows the designer to define these data structures using Abstract Data Types (ADTs), without

3 low-level specification details. When implementing these applications on a chip, efficient organization and implementation of the ADTs is crucial [2, 9, 13], and dynamic memory allocation must be handled efficiently both in terms of time and number of memory accesses. Therefore, refinement of the specification of the ADTs (ADT refinement) and of the memory management (Virtual Memory Management) is required before proceeding with synthesis. Process concurrency management - The goal of process concurrency management is to meet the overall real-time requirements imposed on the application. This step involves process concurrency extraction, thread scheduling, processor allocation, process to processor assignment and interprocess communication refinement. Physical memory management - Typically, protocol processing applications require large storage capacities and very high I/O bandwidth to achieve the real-time requirements. This step aims to synthesize area and power efficient distributed memory architectures and memory management units, meeting the real-time requirements. Finally, software compilation proceeds using traditional software compilers, hardware synthesis proceeds using high-level synthesis tools and interface synthesis generates software device drivers for each software processor and VHDL specifications of the necessary hardware blocks allowing communication between hardware and software processors. The interface synthesis is performed using the hardware/software co-design environment COWARE [4, 1]. In the next three sections, we elaborate on the three steps that are relevant for the dynamic memory management: ADT refinement (section 3), virtual memory management (section 4), and physical memory management (section 5). 3 ADT refinement In an implementation independent specification, complex data structures are typically specified by means of ADTs that represent a certain functionality without imposing implementation decisions. A dictionary type, i.e. a set of records indexed by means of keys, is a typical example of an ADT occurring in transport layer network interface applications. The ADT provides a number of services (e.g., inserting, locating, or removing a record from a set) which can be used to specify the functionality of an application without knowing their implementation. A set of records accessible through one or more keys can be represented by many different data structures. All of these have different characteristics in terms of memory occupation, number of memory accesses to locate a certain record, power dissipation,... To allow the designer to make a motivated choice, all possible data structures have to be represented in a model such that the best solutions for a given application can be searched for. 3.1 A hierarchical ADT model In our model there are four primitive data structures (linked lists, trees, arrays, and pointer arrays) that can be combined to create more complex data structures. A complex ADT is represented as a tree composed of primitive data structures. With every key corresponds a layer in the tree. The bottom layer is the record layer which has no key associated with it. The top layer (i.e., the root of the tree) represents the entire set of records. Each layer below represents a partitioning of the whole set into a number of subsets. Specifying a value for the key corresponding to a layer, selects the subset of records for which the key has the specified value. This process can be applied hierarchically from the top layer till the records are selected. Each node in the tree (except for the bottom layer) has to associate values of the corresponding key with a node on the next layer. This functionality can be implemented with a single primitive data structure. Up to this point, we have assumed that every key corresponds to one layer in the hierarchy. This is not necessary, however. Keys can also be split into sub-keys, or several keys can be combined into one super key. This may heavily impact the implementation cost. Also, the order in which the keys are used to access the data structures heavily impacts the required memory size, the average number of memory accesses to locate a record, and the power cost. Therefore, it is important to find the optimal key ordering for the given application as well as the optimal number of layers. When the keys are not uniformly distributed, hashing can be used to improve the results (hashing applies a permutation function to a key or combination of keys). Note that hashing can be combined with any of the primitive data structures, thereby providing an orthogonal axis of freedom in the search space. Hashing is especially useful in combination with key splitting, because it allows to reduce the (average) size of the primitive data structures associated with the sub-keys after splitting. Many possible data structures within the model can realize a given set of records. Each one can be seen as a combination of different major options which are relatively orthogonal (Figure 2). Within each option, more detailed choices can still be made. Finding the best combination for a given application is not trivial, since it depends on the parameters in the model. Moreover, the full search space is too large to scan it exhaustively. To determine the optimal data structure we have to define the number of layers in the hierarchy, the key ordering, the hashing function for each key, and the primitive data structure for each layer in the hierarchy. Experiments showed that some decisions are much more important than others, and the heuristic decision ordering indicated in Figure 2, leads to near optimal solutions

4 1 Hashing No Yes Hashing function 3 Key ordering 2 Key splitting 4 Primitive data structure 4.1 VMM search space Similar to the ADT refinement problem, this is only feasible in practice by identifying the orthogonal decision trees in the available search space 1. Below we present the decision trees for allocation and recycling mechanisms. Array Pointer Array Binary Tree Linked List Figure 2. ADT refinement search space (a) lookup table free blocks tracking link fields index order none completely address size without exhaustively exploring all combinations. For a detailed description of the full optimization methodology we refer to [16]. 3.2 Experiments The set-of-records ADT in the ATM application was optimized for power using two realistic parameter sets. The first one assumed a storage of records in a memory built from 1 Mbit SRAMs, the second a memory built from 4 Mbit SRAMs. The optimal solution for the ADT data structures in both cases differs. Both are two layer structures with two keys. The first key indexes a pointer array, whereas the primitive data structure (DS) on the second layer is a pointer array and an array of records, for the first and second solution respectively. Applying the optimal DS for one set of parameters in the context for which the other DS was optimized, results in a power consumption that is more than 2.5 times above that of the optimal DS. Moreover, the entire search space spans a power range of four orders of magnitude, clearly substantiating the importance of a thorough exploration before deciding on a solution. 4 Virtual memory management The VMM step reserves storage space for each concrete data type obtained during the ADT refinement step, by defining a virtual memory segment for each concrete data type. Subsequently, it determines a custom virtual memory manager (VMM) for each data type that is dynamically allocated in the application. A VMM takes care of allocating and recycling blocks from the virtual memory segments. Allocation is the mechanism that searches the pool of free blocks and returns a free block large enough in order to satisfy a request of a given application. Recycling is the mechanism that returns a block which is not used anymore to the pool of free blocks for later reuse. Much literature is available about possible implementation choices for allocation mechanisms [3, 14] but none of the earlier work provides a complete search space useful for a systematic exploration. (b) (c) (d) (e) sector per type/size free pool block splitting (when) never immediate entire pool always exact match approximate part of free block used first first block merging (when) deferred fixed/variable amount free blocks reusage LIFO FIFO indexed never last unsatisfied request block merging (how much) all first sequential fit best enough Figure 3. Search space for VMM mechanisms Keeping track of free blocks - The allocator keeps track of free blocks using either link fields within free blocks or lookup tables (Figure 3.a). Using link fields within free blocks does not introduce overhead in terms of memory usage as long as a minimum block size is respected, while lookup tables always imply an overhead in terms of memory usage. The allocators are differentiated based on the indexing mechanism (by size, by address,... ). Choosing a free block - Different mechanisms exist for choosing a free block from the pool (Figure 3.b). The pool may be partitioned in sectors per size or type. The chosen block may be an exact match or an approximate match for the requested size. The allocator will try to satisfy an allocation request by returning either the first free block that is large enough (first fit) or the free block that is closest in size to the requested one (best fit). Freeing used blocks - A block that is freed by the application has to be returned to the pool of free blocks (Figure 3.c). Obvious mechanisms which provide good perfor- 1 We do not consider implicit recycling mechanisms, known as garbage collectors, in our search space....

5 mance are LIFO or FIFO schemes. A scheme that respects an index order (e.g. size) may avoid wasting memory when combined with splitting or merging techniques (see next sections) at the cost of a performance penalty. Splitting block being allocated - When the free block chosen to satisfy a request is larger than the required one, a policy for splitting the block can be implemented (Figure 3.d). The remainder of the split block is returned to the pool of free blocks. The splitting mechanisms are differentiated based on which part of the free block is used and on whether or not splitting respects an index order (e.g. size). Merging free blocks - When adjacent blocks are free, the allocator may decide to merge the blocks in order to have more opportunities to accommodate a subsequent larger allocation request (Figure 3.e). In general it is interesting to defer the merging in order to avoid subsequent splitting operations. Deferred merging may be implemented in different ways: wait for a fixed or variable amount of allocation requests before merging or wait for an unsatisfied allocation request before merging. The amount of blocks to be merged can vary between merging all blocks and merging only enough blocks to satisfy the last request. 4.2 Experiments The three data types in the ATM application that contribute most to the background memory are the Internal Packet Identifier (IPI), the Routing Record (RR), and the ATM cell. The virtual memory segments for these data types range in size from 3K to 12K words. For each virtual memory segment, a VMM mechanism has to be selected. Different choices result in power figures differing up to a factor of 5 for the IPI, 11 for the RR, and 25 for the ATM cell. In this application, the VMM with the minimal power figure is the same one for each data type. However, power is not the only parameter in the trade-off. When the amount of storage in use for two data types reaches a maximum at different moments during the lifetime of the application, it is possible to combine their virtual memory segments, at least if the VMM mechanism allows for this possibility. A second VMM mechanism that has an only slightly higher power figure for the IPI and RR data types, offers this possibility. It might therefore be possible to save area by combining the virtual memory segments for the IPI and RR data types, without affecting the power consumption. Unfortunately, in this application both IPI and RR data types reach there maximal use in an overlapping period of time. 5 Physical Memory Management Usually, for data-intensive algorithms the cycle budget available is insufficient to perform all the memory accesses sequentially. Hence a number of accesses have to be done in parallel. Distributed memory architectures allow to exploit parallelism, thus alleviating memory access bottlenecks. However, as the required memory bandwidth increases, the cycle budget available for each access individually become smaller since the number of addresses that has to be generated in parallel per processed data becomes higher, thus leading to an addressing overhead. 5.1 PMM methodology The signals accessed in parallel have to be assigned to different memories or they have to be accessed through different ports of a multi-port memory. Many different orders of the memory accesses are possible for the given cycle budget. Manually exploring all different ordering possibilities and memory configurations for area and power efficiency is a very tedious task. Therefore, an automated methodology [12, 15] has been developed. Basic groups - The virtual memory segments are split into smaller groups of data which are called basic groups. Every data item belongs to exactly one basic group, so that basic groups can be assigned to physical memories independently from each other. Basic groups are kept as small as possible, to increase the freedom of assigning basic groups to physical memories and to increase the parallel accessibility of the data in a virtual memory segment. Access ordering - The access ordering step optimizes the memory cost for the required storage bandwidth, by determining which basic groups should be made simultaneously accessible in the memory architecture such that the imposed timing constraints can be met. For this purpose, the data accesses are ordered within a given cycle budget. Whenever two accesses to two basic groups occur in the same cycle, there is an access conflict because the basic groups cannot share the same memory port. These conflicts have to be resolved during the subsequent memory allocation and assignment step by assigning conflicting basic groups either to different memories or to a multi-port memory such that they are simultaneously accessible. Memory allocation and assignment - Memory allocation and assignment determines the number and type of the memories, the number and type of their ports, and an assignment of basic groups on the allocated memories in a power and/or area optimized memory architecture. The conflict relations between the basic groups are used to restrict the search space to memory architectures that provide enough memory bandwidth to meet the timing constraints. Address optimization - Address manipulation forms a crucial component of any architecture which deals with data transfer intensive algorithms. The efficient access to the memories within real-time constraints requires an optimized mapping of the address expressions in the algorithm onto address arithmetic optimized for both area and

6 power. A methodology [11] has been developed to reduce the cost overhead for address generation for both custom and instruction-set processors. This methodology includes address expression splitting/clustering, induction variable analysis, target architecture selection, and global scope algebraic optimizations. In addition, high-level controller synthesis and optimal partitioning of the arithmetic unit are incorporated for the synthesis of custom memory management units. 5.2 Experiments Several experiments were performed on the ATM application with varying cycle budgets for the memory accesses. The virtual memory segments from the ATM application can be split in 14 basic groups. This reduces the critical path from 15 cycles to 9 cycles. The access ordering showed 13 conflicts between the basic groups. Several memory architectures were generated that satisfy the cycle budget constraints derived from the previous steps. The best solution is a trade-off between area and power. The best solution for power is a configuration with 6 memories. A configuration with 3 memories consumes a factor of 1.98 more power, and a configuration with a single memory 6.85 times more. To show the impact of the high-level address optimization, the resulting solutions where compared to those obtained by traditional synthesis tools. The RT-VHDL description that generates a hardwired solution (every address expression mapped on a separate unit), results in an area of 1.7 mm 2 after synthesis with Synopsys DC. A behavioral VHDL description synthesis ed with high-level synthesis (Synopsys BC), results in an area of 1.48 mm 2, subject to the constraint of generating one address expression in every clock cycle. When using the high-level address optimization of MATISSE before using high-level synthesis, an area of 0.42 mm 2 is obtained. 6 Conclusions In this paper, we have addressed the support for system design exploration for applications that require manipulation of a large amount of dynamically allocated stored data, as found in e.g. protocol processing applications used in telecom networks. Using the MATISSE language the designer is able to write a system specification, which abstracts low-level details, and is easily retargetable to different embedded hardware/software realizations. The M A- TISSE design flow assists the designer to explore the design space at system level for different ADT implementations and memory managers, and to explore different memory architectures for mixed hardware/software realizations. We demonstrated the results of the system design exploration using an industrial ATM application. We demonstrated that despite the higher level of abstraction of our input with respect to e.g., high-level synthesis (HLS), we achieve more efficient implementations. Acknowledgments This work is partly funded by the Flemish IWT in the HASTEC project and the European commission in the MEDIA project. Julio Leao da Silva Junior is supported by a Brazilian Government Fellowship - CAPES. We would further like to thank Bill Lin (University of California, San Diego) and Mark Genoe (Alcatel Telecom) for many insightful discussions. References [1] Coware. [2] A. Alles. ATM in private networking, a tutorial. Proc. IN- TEROP 93, [3] G. Attardi et al. A customisable memory management framework. Proc. USENIX C++ Conf. Cambridge, MA, [4] I. Bolsens et al. Hardware-software codesign of telecommunication systems. Proceedings of the IEEE, 85(3): , Mar [5] J. T. Buck et al. PTOLEMY: A framework for simulating and prototyping heterogeneous systems. Int l Journal on Computer Simulation, Jan [6] J. da Silva et al. Matisse: A concurrent and object-oriented system specification language. Int. Conf. on VLSI, [7] H. De Man et al. Architecture-driven synthesis techniques for VLSI implementation of DSP algorithms. Proceedings of the IEEE, 72(2): , Feb [8] R. Lauwereins et al. GRAPE-II: A system level prototyping environment for DSP applications. IEEE Computer, pp , Feb [9] J.-Y. Le Boudec. The Asynchronous Transfer Mode: A tutorial. Computer Networks and ISDN Systems, 24: , [10] P. Lippens et al. Allocation of multiport memories for hierarchical data streams. Proc. of ICCAD. Santa Clara, CA, Nov [11] M. Miranda et al. ADOPT: Efficient hardware address generation in distributed memory architectures. Proc. of the Int l Symposium on System Level Synthesis, [12] P. Slock, et al. Fast and extensive system-level memory exploration for ATM applications. Proc. of the Int l Symposium on System Synthesis, Sep [13] Y. Therasse et al. VLSI architecture of a SDMS/ATM router. Annales des Telecommunications, 48(3-4), [14] P. R. Wilson et al. Dynamic storage allocation: A survey and critical review. Proc. Int l Workshop on Memory Management. Kinross, Scotland, UK, Sep [15] S. Wuytack et al. Flow graph balancing for minimizing the required memory bandwidth. Proc. of the Int l Sympopsium on System Synthesis, pp , Nov [16] S. Wuytack et al. Transforming set data types to power optimal data structures. IEEE Transactions on Computer-aided Design, CAD-15(6): , June 1996.

Matisse System Multithread lib. initial Matisse specification. 3 Abstract Machine Generation. Matisse lib. abstract machine

Matisse System Multithread lib. initial Matisse specification. 3 Abstract Machine Generation. Matisse lib. abstract machine Ecient System Exploration and Synthesis of Applications with Dynamic Data Storage and Intensive Data Transfer Julio Leao da Silva Jr., Chantal Ykman-Couvreur, Miguel Miranda, Kris Croes, Sven Wuytack,

More information

Abstract We present Matisse, a concurrent object-oriented system specication language,

Abstract We present Matisse, a concurrent object-oriented system specication language, Matisse: a concurrent and object-oriented system specication language Julio Leao da Silva Jr. 1 Chantal Ykman-Couvreur 1 Gjalt de Jong 2 fsilva,ykmang@imec.be jongg@sh.bel.alcatel.be 1 IMEC, Kapeldreef

More information

ADTs. ADT refinement. concrete DTs VMM Refinement. VMSes. Physical Memory Mngnt. memories ADT ADT ADT ADT. M e m o r y. D y n a m i c.

ADTs. ADT refinement. concrete DTs VMM Refinement. VMSes. Physical Memory Mngnt. memories ADT ADT ADT ADT. M e m o r y. D y n a m i c. Power Exploration for Dynamic Data Types through Virtual Memory Management Renement Julio L. da Silva Jr, Francky Catthoor, Diederik Verkest, Hugo De Man IMEC, Kapeldreef 75, B-3001 Leuven, Belgium Abstract

More information

Transforming Set Data Types to Power Optimal Data Structures

Transforming Set Data Types to Power Optimal Data Structures Transforming Set Data Types to Power Optimal Data Structures Sven Wuytack, Francky Catthoor, Hugo De Man z IMEC, Kapeldreef 75, B-3001 Leuven, Belgium Abstract In this paper we present a novel approach

More information

On the use of C++ for system-on-chip design

On the use of C++ for system-on-chip design On the use of C++ for system-on-chip design Diederik Verkest, Johan Cockx, Freddy Potargent IMEC, Kapeldreef 75, B-3001 Leuven, Belgium Diederik.Verkest@imec.be Hugo De Man Katholieke Universiteit Leuven,

More information

Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework

Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework Piyush Garg Center for Embedded Computer Systems, University of California Irvine, CA 92697 pgarg@cecs.uci.edu Sandeep K.

More information

Hardware Software Codesign of Embedded Systems

Hardware Software Codesign of Embedded Systems Hardware Software Codesign of Embedded Systems Rabi Mahapatra Texas A&M University Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on Codesign of Embedded System

More information

A Process Model suitable for defining and programming MpSoCs

A Process Model suitable for defining and programming MpSoCs A Process Model suitable for defining and programming MpSoCs MpSoC-Workshop at Rheinfels, 29-30.6.2010 F. Mayer-Lindenberg, TU Hamburg-Harburg 1. Motivation 2. The Process Model 3. Mapping to MpSoC 4.

More information

SpecC Methodology for High-Level Modeling

SpecC Methodology for High-Level Modeling EDP 2002 9 th IEEE/DATC Electronic Design Processes Workshop SpecC Methodology for High-Level Modeling Rainer Dömer Daniel D. Gajski Andreas Gerstlauer Center for Embedded Computer Systems Universitiy

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis

A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis A Lost Cycles Analysis for Performance Prediction using High-Level Synthesis Bruno da Silva, Jan Lemeire, An Braeken, and Abdellah Touhafi Vrije Universiteit Brussel (VUB), INDI and ETRO department, Brussels,

More information

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli Retargeting of Compiled Simulators for Digital Signal Processors Using a Machine Description Language Stefan Pees, Andreas Homann, Heinrich Meyr Integrated Signal Processing Systems, RWTH Aachen pees[homann,meyr]@ert.rwth-aachen.de

More information

Hardware Software Codesign of Embedded System

Hardware Software Codesign of Embedded System Hardware Software Codesign of Embedded System CPSC489-501 Rabi Mahapatra Mahapatra - Texas A&M - Fall 00 1 Today s topics Course Organization Introduction to HS-CODES Codesign Motivation Some Issues on

More information

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY Saroja pasumarti, Asst.professor, Department Of Electronics and Communication Engineering, Chaitanya Engineering

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table SYMBOL TABLE: A symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating

More information

HW/SW Co-design. Design of Embedded Systems Jaap Hofstede Version 3, September 1999

HW/SW Co-design. Design of Embedded Systems Jaap Hofstede Version 3, September 1999 HW/SW Co-design Design of Embedded Systems Jaap Hofstede Version 3, September 1999 Embedded system Embedded Systems is a computer system (combination of hardware and software) is part of a larger system

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 8 HW/SW Co-Design Sources: Prof. Margarida Jacome, UT Austin Andreas Gerstlauer Electrical and Computer Engineering University of Texas at Austin gerstl@ece.utexas.edu

More information

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures

A Complete Data Scheduler for Multi-Context Reconfigurable Architectures A Complete Data Scheduler for Multi-Context Reconfigurable Architectures M. Sanchez-Elez, M. Fernandez, R. Maestre, R. Hermida, N. Bagherzadeh, F. J. Kurdahi Departamento de Arquitectura de Computadores

More information

A System Design Methodology for Software/Hardware Co-Development of Telecommunication Network Applications

A System Design Methodology for Software/Hardware Co-Development of Telecommunication Network Applications A System Design Methodology for Software/Hardware Co-Development of Telecommunication Network Applications Bill Lin IMEC, Kapeldreef 75, B-3001 Leuven, Belgium E-mail: billlin@imec.be Tel: +32/16/28.15.41

More information

April 9, 2000 DIS chapter 1

April 9, 2000 DIS chapter 1 April 9, 2000 DIS chapter 1 GEINTEGREERDE SYSTEMEN VOOR DIGITALE SIGNAALVERWERKING: ONTWERPCONCEPTEN EN ARCHITECTUURCOMPONENTEN INTEGRATED SYSTEMS FOR REAL- TIME DIGITAL SIGNAL PROCESSING: DESIGN CONCEPTS

More information

Mapping Array Communication onto FIFO Communication - Towards an Implementation

Mapping Array Communication onto FIFO Communication - Towards an Implementation Mapping Array Communication onto Communication - Towards an Implementation Jeffrey Kang Albert van der Werf Paul Lippens Philips Research Laboratories Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Hardware-Software Codesign. 1. Introduction

Hardware-Software Codesign. 1. Introduction Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2

More information

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2013 Agenda Introduction

More information

Concepts for Model Compilation in Hardware/Software Codesign

Concepts for Model Compilation in Hardware/Software Codesign Concepts for Model Compilation in Hardware/Software Codesign S. Schulz, and J.W. Rozenblit Dept. of Electrical and Computer Engineering The University of Arizona Tucson, AZ 85721 USA sschulz@ece.arizona.edu

More information

A framework for automatic generation of audio processing applications on a dual-core system

A framework for automatic generation of audio processing applications on a dual-core system A framework for automatic generation of audio processing applications on a dual-core system Etienne Cornu, Tina Soltani and Julie Johnson etienne_cornu@amis.com, tina_soltani@amis.com, julie_johnson@amis.com

More information

Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence

Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence Introducing MESSIA: A Methodology of Developing Software Architectures Supporting Implementation Independence Ratko Orlandic Department of Computer Science and Applied Math Illinois Institute of Technology

More information

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008 Dynamic Storage Allocation CS 44: Operating Systems Spring 2 Memory Allocation Static Allocation (fixed in size) Sometimes we create data structures that are fixed and don t need to grow or shrink. Dynamic

More information

StackVsHeap SPL/2010 SPL/20

StackVsHeap SPL/2010 SPL/20 StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

Milind Kulkarni Research Statement

Milind Kulkarni Research Statement Milind Kulkarni Research Statement With the increasing ubiquity of multicore processors, interest in parallel programming is again on the upswing. Over the past three decades, languages and compilers researchers

More information

DISTRIBUTED CO-SIMULATION TOOL. F.Hessel, P.Le Marrec, C.A.Valderrama, M.Romdhani, A.A.Jerraya

DISTRIBUTED CO-SIMULATION TOOL. F.Hessel, P.Le Marrec, C.A.Valderrama, M.Romdhani, A.A.Jerraya 1 MCI MULTILANGUAGE DISTRIBUTED CO-SIMULATION TOOL F.Hessel, P.Le Marrec, C.A.Valderrama, M.Romdhani, A.A.Jerraya System-Level Synthesis Group TIMA Laboratory Grenoble, France Abstract Nowadays the design

More information

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log

Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Scaling Optimistic Concurrency Control by Approximately Partitioning the Certifier and Log Philip A. Bernstein Microsoft Research Redmond, WA, USA phil.bernstein@microsoft.com Sudipto Das Microsoft Research

More information

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VII /Issue 2 / OCT 2016 NEW VLSI ARCHITECTURE FOR EXPLOITING CARRY- SAVE ARITHMETIC USING VERILOG HDL B.Anusha 1 Ch.Ramesh 2 shivajeehul@gmail.com 1 chintala12271@rediffmail.com 2 1 PG Scholar, Dept of ECE, Ganapathy Engineering

More information

UML-BASED CO-DESIGN FOR RUN-TIME RECONFIGURABLE ARCHITECTURES

UML-BASED CO-DESIGN FOR RUN-TIME RECONFIGURABLE ARCHITECTURES Chapter 1 UML-BASED CO-DESIGN FOR RUN-TIME RECONFIGURABLE ARCHITECTURES Bernd Steinbach 1, Thomas Beierlein 2, Dominik Fröhlich 1,2 1 TU Bergakademie Freiberg Institute of Computer Science 2 Hochschule

More information

COMPILER CONSTRUCTION FOR A NETWORK IDENTIFICATION SUMIT SONI PRAVESH KUMAR

COMPILER CONSTRUCTION FOR A NETWORK IDENTIFICATION SUMIT SONI PRAVESH KUMAR COMPILER CONSTRUCTION FOR A NETWORK IDENTIFICATION SUMIT SONI 13 PRAVESH KUMAR language) into another computer language (the target language, often having a binary form known as object code The most common

More information

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study

Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study Power Estimation of System-Level Buses for Microprocessor-Based Architectures: A Case Study William Fornaciari Politecnico di Milano, DEI Milano (Italy) fornacia@elet.polimi.it Donatella Sciuto Politecnico

More information

LL(key) PA(key) record record record record. AR(key) LL(k2) LL(k1) LL(k2) PA(k1) LL(k2) AR(k1) LL(k1) PA(k2) LL(k1) PA(k2) PA(k1) PA(k2) AR(k1)

LL(key) PA(key) record record record record. AR(key) LL(k2) LL(k1) LL(k2) PA(k1) LL(k2) AR(k1) LL(k1) PA(k2) LL(k1) PA(k2) PA(k1) PA(k2) AR(k1) Exploration and Synthesis of Dynamic Data Sets in Telecom Network Applications Ch. Ykman-Couvreur, J. Lambrecht, D. Verkest IMEC, Kapeldreef 75, Leuven, Belgium F. Catthoor, H. De Man IMEC, Kapeldreef

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Hardware Design and Simulation for Verification

Hardware Design and Simulation for Verification Hardware Design and Simulation for Verification by N. Bombieri, F. Fummi, and G. Pravadelli Universit`a di Verona, Italy (in M. Bernardo and A. Cimatti Eds., Formal Methods for Hardware Verification, Lecture

More information

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact: Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base

More information

Efficient Modeling of Embedded Systems using Designer-controlled Recoding. Rainer Dömer. With contributions by Pramod Chandraiah

Efficient Modeling of Embedded Systems using Designer-controlled Recoding. Rainer Dömer. With contributions by Pramod Chandraiah Efficient Modeling of Embedded Systems using Rainer Dömer With contributions by Pramod Chandraiah Center for Embedded Computer Systems University of California, Irvine Outline Introduction Designer-controlled

More information

RTL Coding General Concepts

RTL Coding General Concepts RTL Coding General Concepts Typical Digital System 2 Components of a Digital System Printed circuit board (PCB) Embedded d software microprocessor microcontroller digital signal processor (DSP) ASIC Programmable

More information

Computer-Aided Recoding for Multi-Core Systems

Computer-Aided Recoding for Multi-Core Systems Computer-Aided Recoding for Multi-Core Systems Rainer Dömer doemer@uci.edu With contributions by P. Chandraiah Center for Embedded Computer Systems University of California, Irvine Outline Embedded System

More information

Memory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9

Memory Systems and Compiler Support for MPSoC Architectures. Mahmut Kandemir and Nikil Dutt. Cap. 9 Memory Systems and Compiler Support for MPSoC Architectures Mahmut Kandemir and Nikil Dutt Cap. 9 Fernando Moraes 28/maio/2013 1 MPSoC - Vantagens MPSoC architecture has several advantages over a conventional

More information

An Agent Modeling Language Implementing Protocols through Capabilities

An Agent Modeling Language Implementing Protocols through Capabilities An Agent Modeling Language Implementing Protocols through Capabilities Nikolaos Spanoudakis 1,2 1 Technical University of Crete, Greece nikos@science.tuc.gr Pavlos Moraitis 2 2 Paris Descartes University,

More information

Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components

Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components Cycle-accurate RTL Modeling with Multi-Cycled and Pipelined Components Rainer Dömer, Andreas Gerstlauer, Dongwan Shin Technical Report CECS-04-19 July 22, 2004 Center for Embedded Computer Systems University

More information

Methodology for Memory Analysis and Optimization in Embedded Systems

Methodology for Memory Analysis and Optimization in Embedded Systems Methodology for Memory Analysis and Optimization in Embedded Systems Shenglin Yang UCLA Dept of EE Los Angeles, CA 995 +1-31-267-494 shengliny@ee.ucla.edu Ingrid M. Verbauwhede UCLA Dept of EE Los Angeles,

More information

A Unified Model for Co-simulation and Co-synthesis of Mixed Hardware/Software Systems

A Unified Model for Co-simulation and Co-synthesis of Mixed Hardware/Software Systems A Unified Model for Co-simulation and Co-synthesis of Mixed Hardware/Software Systems C. A. Valderrama 1 A. Changuel P.V. Raghavan M. Abid 2 T. Ben Ismail A. A. Jerraya TIMA / INPG, System-Level Synthesis

More information

Long Term Trends for Embedded System Design

Long Term Trends for Embedded System Design Long Term Trends for Embedded System Design Ahmed Amine JERRAYA Laboratoire TIMA, 46 Avenue Félix Viallet, 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr Abstract. An embedded system is an application

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

Session: Configurable Systems. Tailored SoC building using reconfigurable IP blocks

Session: Configurable Systems. Tailored SoC building using reconfigurable IP blocks IP 08 Session: Configurable Systems Tailored SoC building using reconfigurable IP blocks Lodewijk T. Smit, Gerard K. Rauwerda, Jochem H. Rutgers, Maciej Portalski and Reinier Kuipers Recore Systems www.recoresystems.com

More information

A Normalized Data Library for Prototype Analysis

A Normalized Data Library for Prototype Analysis A Normalized Data Library for Prototype Analysis The goal was that each analysis and display tool to be included in the prototype analyzer should be designed and written only once. Therefore, the data

More information

Cosimulation of ITRON-Based Embedded Software with SystemC

Cosimulation of ITRON-Based Embedded Software with SystemC Cosimulation of ITRON-Based Embedded Software with SystemC Shin-ichiro Chikada, Shinya Honda, Hiroyuki Tomiyama, Hiroaki Takada Graduate School of Information Science, Nagoya University Information Technology

More information

A Rapid Prototyping Methodology for Algorithm Development in Wireless Communications

A Rapid Prototyping Methodology for Algorithm Development in Wireless Communications A Rapid Prototyping Methodology for Algorithm Development in Wireless Communications Abstract: Rapid prototyping has become an important means to verify the performance and feasibility of algorithms and

More information

High Performance Computing on GPUs using NVIDIA CUDA

High Performance Computing on GPUs using NVIDIA CUDA High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial at SIGGRAPH2007: http://www.gpgpu.org/s2007 1 Outline Motivation Stream programming Simplified HW and

More information

Automatic Counterflow Pipeline Synthesis

Automatic Counterflow Pipeline Synthesis Automatic Counterflow Pipeline Synthesis Bruce R. Childers, Jack W. Davidson Computer Science Department University of Virginia Charlottesville, Virginia 22901 {brc2m, jwd}@cs.virginia.edu Abstract The

More information

Run-Time Environments/Garbage Collection

Run-Time Environments/Garbage Collection Run-Time Environments/Garbage Collection Department of Computer Science, Faculty of ICT January 5, 2014 Introduction Compilers need to be aware of the run-time environment in which their compiled programs

More information

Heap Management portion of the store lives indefinitely until the program explicitly deletes it C++ and Java new Such objects are stored on a heap

Heap Management portion of the store lives indefinitely until the program explicitly deletes it C++ and Java new Such objects are stored on a heap Heap Management The heap is the portion of the store that is used for data that lives indefinitely, or until the program explicitly deletes it. While local variables typically become inaccessible when

More information

II. MOTIVATION AND IMPLEMENTATION

II. MOTIVATION AND IMPLEMENTATION An Efficient Design of Modified Booth Recoder for Fused Add-Multiply operator Dhanalakshmi.G Applied Electronics PSN College of Engineering and Technology Tirunelveli dhanamgovind20@gmail.com Prof.V.Gopi

More information

Hardware, Software and Mechanical Cosimulation for Automotive Applications

Hardware, Software and Mechanical Cosimulation for Automotive Applications Hardware, Software and Mechanical Cosimulation for Automotive Applications P. Le Marrec, C.A. Valderrama, F. Hessel, A.A. Jerraya TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble France fphilippe.lemarrec,

More information

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Di-Shi Sun and Douglas M. Blough School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Modeling and Simulating Discrete Event Systems in Metropolis

Modeling and Simulating Discrete Event Systems in Metropolis Modeling and Simulating Discrete Event Systems in Metropolis Guang Yang EECS 290N Report December 15, 2004 University of California at Berkeley Berkeley, CA, 94720, USA guyang@eecs.berkeley.edu Abstract

More information

HIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1

HIERARCHICAL DESIGN. RTL Hardware Design by P. Chu. Chapter 13 1 HIERARCHICAL DESIGN Chapter 13 1 Outline 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical

More information

Outline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design

Outline HIERARCHICAL DESIGN. 1. Introduction. Benefits of hierarchical design Outline HIERARCHICAL DESIGN 1. Introduction 2. Components 3. Generics 4. Configuration 5. Other supporting constructs Chapter 13 1 Chapter 13 2 1. Introduction How to deal with 1M gates or more? Hierarchical

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Unique Journal of Engineering and Advanced Sciences Available online: Research Article

Unique Journal of Engineering and Advanced Sciences Available online:   Research Article ISSN 2348-375X Unique Journal of Engineering and Advanced Sciences Available online: www.ujconline.net Research Article A POWER EFFICIENT CAM DESIGN USING MODIFIED PARITY BIT MATCHING TECHNIQUE Karthik

More information

THE EUROPEAN DESIGN AND TEST CONFERENCE 1995 Paris,France 6-9 March 1995

THE EUROPEAN DESIGN AND TEST CONFERENCE 1995 Paris,France 6-9 March 1995 THE EUROPEAN DESIGN AND TEST CONFERENCE 1995 Paris,France 6-9 March 1995 A UNIFIED MODEL FOR CO-SIMULATION AND CO-SYNTHESIS OF MIXED HARDWARE/SOFTWARE SYSTEMS Authors: C. A. Valderrama, A. Changuel, P.V.

More information

Chapter 2 Overview of the Design Methodology

Chapter 2 Overview of the Design Methodology Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements.

Contemporary Design. Traditional Hardware Design. Traditional Hardware Design. HDL Based Hardware Design User Inputs. Requirements. Contemporary Design We have been talking about design process Let s now take next steps into examining in some detail Increasing complexities of contemporary systems Demand the use of increasingly powerful

More information

ENTITIES IN THE OBJECT-ORIENTED DESIGN PROCESS MODEL

ENTITIES IN THE OBJECT-ORIENTED DESIGN PROCESS MODEL INTERNATIONAL DESIGN CONFERENCE - DESIGN 2000 Dubrovnik, May 23-26, 2000. ENTITIES IN THE OBJECT-ORIENTED DESIGN PROCESS MODEL N. Pavković, D. Marjanović Keywords: object oriented methodology, design process

More information

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose

QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING. Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose QUANTIZER DESIGN FOR EXPLOITING COMMON INFORMATION IN LAYERED CODING Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose Department of Electrical and Computer Engineering University of California,

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

Low Power Mapping of Video Processing Applications on VLIW Multimedia Processors

Low Power Mapping of Video Processing Applications on VLIW Multimedia Processors Low Power Mapping of Video Processing Applications on VLIW Multimedia Processors K. Masselos 1,2, F. Catthoor 2, C. E. Goutis 1, H. DeMan 2 1 VLSI Design Laboratory, Department of Electrical and Computer

More information

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments

8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments 8. Best Practices for Incremental Compilation Partitions and Floorplan Assignments QII51017-9.0.0 Introduction The Quartus II incremental compilation feature allows you to partition a design, compile partitions

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

INTEGRATED MANAGEMENT OF LARGE SATELLITE-TERRESTRIAL NETWORKS' ABSTRACT

INTEGRATED MANAGEMENT OF LARGE SATELLITE-TERRESTRIAL NETWORKS' ABSTRACT INTEGRATED MANAGEMENT OF LARGE SATELLITE-TERRESTRIAL NETWORKS' J. S. Baras, M. Ball, N. Roussopoulos, K. Jang, K. Stathatos, J. Valluri Center for Satellite and Hybrid Communication Networks Institute

More information

RTL Power Estimation and Optimization

RTL Power Estimation and Optimization Power Modeling Issues RTL Power Estimation and Optimization Model granularity Model parameters Model semantics Model storage Model construction Politecnico di Torino Dip. di Automatica e Informatica RTL

More information

Increasing pipelined IP core utilization in Process Networks using Exploration

Increasing pipelined IP core utilization in Process Networks using Exploration Increasing pipelined IP core utilization in Process Networks using Exploration Claudiu Zissulescu, Bart Kienhuis, Ed Deprettere Leiden Embedded Research Center, Leiden Institute of Advanced Computer Science

More information

N-Model Tests for VLSI Circuits

N-Model Tests for VLSI Circuits 40th Southeastern Symposium on System Theory University of New Orleans New Orleans, LA, USA, March 16-18, 2008 MC3.6 N-Model Tests for VLSI Circuits Nitin Yogi and Vishwani D. Agrawal Auburn University,

More information

Operating Systems : Overview

Operating Systems : Overview Operating Systems : Overview Bina Ramamurthy CSE421 8/29/2006 B.Ramamurthy 1 Topics for discussion What will you learn in this course? (goals) What is an Operating System (OS)? Evolution of OS Important

More information

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill Binding and Storage Björn B. Brandenburg The University of North Carolina at Chapel Hill Based in part on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. What s

More information

Using SystemC to Implement Embedded Software

Using SystemC to Implement Embedded Software Using SystemC to Implement Embedded Software Brijesh Sirpatil James M. Baker, Jr. James R. Armstrong Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA Abstract This

More information

Lecture 25: Board Notes: Threads and GPUs

Lecture 25: Board Notes: Threads and GPUs Lecture 25: Board Notes: Threads and GPUs Announcements: - Reminder: HW 7 due today - Reminder: Submit project idea via (plain text) email by 11/24 Recap: - Slide 4: Lecture 23: Introduction to Parallel

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

Hardware/Software Codesign

Hardware/Software Codesign Hardware/Software Codesign SS 2016 Prof. Dr. Christian Plessl High-Performance IT Systems group University of Paderborn Version 2.2.0 2016-04-08 how to design a "digital TV set top box" Motivating Example

More information

LabVIEW Based Embedded Design [First Report]

LabVIEW Based Embedded Design [First Report] LabVIEW Based Embedded Design [First Report] Sadia Malik Ram Rajagopal Department of Electrical and Computer Engineering University of Texas at Austin Austin, TX 78712 malik@ece.utexas.edu ram.rajagopal@ni.com

More information

Garbage Collection (1)

Garbage Collection (1) Garbage Collection (1) Advanced Operating Systems Lecture 7 This work is licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/4.0/

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

Architecture Implementation Using the Machine Description Language LISA

Architecture Implementation Using the Machine Description Language LISA Architecture Implementation Using the Machine Description Language LISA Oliver Schliebusch, Andreas Hoffmann, Achim Nohl, Gunnar Braun and Heinrich Meyr Integrated Signal Processing Systems, RWTH Aachen,

More information