ED&TC /96 $ IEEE

Size: px
Start display at page:

Download "ED&TC /96 $ IEEE"

Transcription

1 ThreadBased Software Synthesis for Embedded System Design Youngsoo Shin Kiyoung Choi Department of Electronics Engineering Seoul National University Seoul, Korea, Abstract We propose in this paper a threadbased software synthesis technique to reduce communication overhead incurred by hardwaresoftware interface in a system. We start from a CDFG that models the system. The CDFG is analyzed and partitioned into a set of threads. Then we generate a mixed staticdynamic thread scheduler. The scheduler statically schedules as many threads as possible to minimize the scheduling overhead. Then the scheduler dynamically schedules the remaining threads. Reduction of the total execution time including the communication overhead is demonstrated with some examples. 1 Introduction In this paper we present an eæcient software synthesis technique for mixed hardwaresoftware systems. Such heterogeneous systems frequently involve unbounded or unknown delays caused by data dependent loops or interactions with environments. Therefore, some kind of synchronization mechanism is required. In ë5ë, idlewait type polling synchronization is used to perform blocking ièo. In ë2ë, polling is implemented as an investigation of trigger signals iequence. But the strategies raise the problem of communication overhead when delays are quite long. In ë3ë, communication overhead of up to 50è is reported, where they assume mutual exclusion betweeoftware running on a processor and hardware implemented as a type of a coprocessor. For this reason, we focus on the problem of minimization of communication overhead caused by this synchronization requirement. The problem can also be viewed as maximization of resource utilization. To achieve this objective, we generate program routines in the form of multithread. Multithreading has been widely used in parallel processor systems. Heterogeneous systems have similar problems in that processors running software and other hardware components are inherently executable in parallel. Communication overhead is reduced by overlapping execution of threads with hardware execution which can be made possible by thread partitioning and scheduling. The overall structure of our software synthesis is as follows. First, system speciæcation is transformed to a control data æow graphècdfgè model. Currently, we are experimenting with system speciæcations described in VHDL, but conceptually, speciæcations are not limited to VHDL only. Our future work includes allowing mixed description in both VHDL, Ptolemyë1ë, andèor C. CDFG is partitioned into two parts to be implemented in hardware and software. Then the interface between the two parts is generated and annotated to the partitioned CDFG. Currently, the partitioning and interface annotation is performed manually. CDFG representing the software part consists of basic operations, control constructs, and system interface operations. It is then partitioned into a set of threads from which a thread scheduler is generated. The scheduling is mixture of static and dynamic scheduling but is maximally static in that all threads that can be givetatic ordering are scheduled statically. Dynamic scheduling is performed when there is a dynamic control æow such as a data dependentloop or an unbounded delay. Examples of unbounded delays are delays incurred by environments or hardware components. The rest of this paper is organized as follows. Iection 2, we deæne CDFG and thread and discuss some related issues. Section 3 describes the thread generation procedure. Section 4 discusses thread scheduling. Iection 5, we present some experimental results and eæectiveness of our methodology. Finally, section 6 concludes with some remarks and the future work. 2 Preliminaries In this section we present some terminologies and the related issues. 2.1 CDFG CDFG is deæned as a directed graph G =é N;E é, where N is a set of nodes and E = fni énjjni;nj 2 Ng is a set of directed edges. ni é njdenotes a di ED&TC /96 $ IEEE

2 Table 1: Types of nodes node types ns;ne nop nct ; description start node, end node operation node control node interface operation nodes n 1 n 2 n 3 while n 4 n 5 n ct n n 8 7 n9 n 10 n e < n 6 n 11 n w1 n w2 rected edge from ni to nj. There are four types of nodes as shown in Table 1. A pair of a start node and an end node is introduced as polar nodes to the whole CDFG, to each condition clause, and to each conditional branch. By using these polar nodes we can take advantage of hierarchical and recursive graph structures, thereby simpling theoretical graph problems. nct is a dummy nodeintroduced to each control constructs. It connects subgraphs, a Gc for a condition clause and Gss for conditional branches. This allows the subgraphs to be hierarchically nested in G. and represent abstract read and write operations. They represent any sequence of operations required to satisfy the selected interface protocol. So their granularity can vary according to the complexity of the interface protocol. We have experimented with the handshaking protocol used in our prototyping system[6]. There are two types of edges between node pairs: data dependency edge and control dependency edge. When nj is data dependent onni,we represent the relation as ni >d nj. When nj is control dependent on ni, we represent the relation as ni >c nj. Figure 1 shows an example of a CDFG, where solid arrows specify data dependency and dotted arrows specify control dependency. We dene P (ni) to denote a set of nodes which have paths neither from ni nor to ni along data dependency and control dependency edges. P (ni) can be recursively dened as follows. Pred(ni)= [ Succ(ni) = [ n j >n i Pred(nj) n j <n i Succ(nj) [ fnig [ fnig P(ni)=N Pred(ni) Succ(ni) Using the above formulas we can recursively nd P () in the CDFG. Nodes in P () are candidate operations to be executed concurrently with hardware components. n 16 n 15 n 14 n 13 n 12 n e n r n e Figure 1: An example of a CDFG. 2.2 Thread A thread is dened as a sequence of successively connected nodes, which has the property that once the rst node res then the remaining nodes execute to the end in xed latency. Therefore, we do not support preemption of threads. The execution of threads is performed by a thread scheduler. We distinguish three thread types as presented in Table 2. Ts is a thread whose nodes are neither in Gc nor Gs. Tc is a thread whose nodes are all condition evaluation operations. Tb is a thread whose nodes are all operations in a conditional branch. We dene T (ni) as a thread starting from node ni. ST (ni) is dened as a set of threads constructed with nodes in P (ni), i.e. a set of threads whose nodes can execute in parallel with T (ni). Notation of dependency relations between threads is dened in the same way as node relations, i.e. Ti > Tjspecies that Tj is dependent onti. There are two types of dependency: Ti >d Tj when Tj is data dependent ontiand Ti >c Tj when Tj is control dependent onti. A thread consists of only nop;, and. nct aects only thread scheduling. ns and ne are dummy nodes and are not included in a thread. always starts a new thread. T () contains only elements between ns and ne of the polar subgraph which encloses directly. T () does not contain another. We schedule threads of ST () only after predecessors of all threads in ST () have beecheduled. To guarantee this scheduling scheme, we introduce the following lemma. n 17

3 Table 2: Types of threads thread type description Ts simple thread Tc codition clause thread conditional branch thread Tb T6 T5 T7 while T11 T12 T1 T3 T8 T9 < T2 T10 T13 T5 T6 T1 T3 while T8 T11 T7 < T2 T9 Lemma 1 For any Tj;Tk 2 ST(ni), if a thread Tl satises both Tj >Tl and Tl >Tk, then Tl 2 ST (ni). Proof: Let's prove it by contradiction. Assume Tl 62 ST (ni). Then, either ni > nm or nm > ni for some nm 2 Tl. First, assume ni >nm. Then there is a path from ni to nodes of Tk because Tl and Tk satises Tl >Tk. Therefore, Tk can not be included in ST (ni). Now, assume nm > ni. Then there is a path from nodes of Tj to ni because Tj and Tl satises Tj >Tl. Therefore, Tj can not be included in ST (ni), which is a contradiction. 2 3 Thread Generation There is a tradeo between the number of threads and the average length of threads. To reduce the cost of thread switching, it is important tokeep the average length of threads long. But it is also important to have a sucient number of threads which are neither directly nor indirectly dependent ont(), so that they can be executed while is waiting for the completion of the corresponding hardware operation having unbounded delay. To achieve this objective, we construct threads using nodes in P (). Thread generation consists of thread partitioning, thread clustering, and variable assignment. 3.1 Thread partitioning Thread partitioning process consists of the following four steps. Step 1 : for each, construct a node set P () Construction of P () is accomplished through the analysis of data dependency and control dependency of. For example, in Figure, 1 we identify fn 12 ;n 13 ;n 14 ;n 15 ;n 16 g as data dependent successors of, f1;2;n 8 ;n 9 ;n 2 ;n 3 ;n 4 g as data dependent predecessors of, fn 5 ;n 6 g as control dependent predecessors of, and fn 16 ;n 17 g as control dependent successors of. Note that n 16 is both a control dependent and data dependent successor of. So P () =fn 1 ;n 7 ;n 11 ;n 10 g. Step 2 : from P (), construct a thread set ST () Construction of a thread set is rather straightforward. From P (), we identify successively con T14 T4 Figure 2: Constructed thread set. (a) initial thread construction (b) threads after clustering nected nodes and statically order(topologically sort) the nodes in each set. Priority is assigned to each thread based on its dependency relation and the length of the thread. We give high priority to a thread which has long latency. This strategy is based on the fact that scheduling long latency threads are more ecient when delay due to hardware is relatively long. However, another priority criterion can be considered based oystem requirements. In Figure 1, ST () =ft 1 ;T 2 ;T 5 g, T 1 = fn 7 ;n 11 g, T 2 = fn 10 g, and T 5 = fn 1 g are identied. Step 3 : construct threads starting from Once ST () is established, we start construction of a thread from. The thread T () is cut when we visit a node which is a successor of any nodeof P() or there exist no more elements to make the thread. This is necessary for T () not to break its nondependency relation with ST (). Step 4 : construct threads with the remaining nodes Construction of a thread starts from each successor of ns nodes. Note that a thread can not cross subgraph boundary. Traversing along the edges, we append nodes to the thread until we havenonodes which have not been included in another thread yet. In Figure 2 (a), we show the results of the above four steps run on example in Figure Thread clustering Initial construction of threads can be improved through thread clustering by which we can reduce the number of context switching. A clustering rule has to be well established to ensure that the resultant threads are deadlockfree. We modied and used the clustering rule dened in [7] as follows. Clustering T10 T4

4 is performed exclusively between ST () and a set of threads not in ST (). Rule 1: Same type of threads T 1 and T 2 can be clustered to become a thread if (a)all output arcs from T 1 go to T 2 or (b)t 1 and T 2 have no input arcs from other threads. Figure 2.(b) shows the results after thread clustering. We omit some data dependency relations for the purpose of clarity. 3.3 Variable assignment We perform static analysis for variable assignment. Variables dened and used only withiingle thread scope can be accessed through registers. Variables used in interthread scope need variable lifetime analysis for an ecient assignment to registers. However, the dynamic scheduling used in our system makes this analysis dicult, and this analysis is not considered in this paper. 4 Thread Scheduling We generate a mixed static and dynamic scheduler which statically schedules threads that can be given static orders, then dynamically schedules threads which are in dynamic control ows or which can be executed in parallel with unbounded delay operations. Currently, we do not take into account timing constraints during scheduling, but focus on enhancing performance of a mixed system by inserting some threads in the idle interval caused by execution of hardware. Our future work includes software synthesis under timing constraints. For the dynamic scheduling we distinguish three types of threads: threads that consist of condition evaluation nodes, threads that consist of nodes in conditional branches, and thread T () and threads in ST (). Schedules of all other threads are statically determined. Before scheduling T () and threads in ST (), we schedule all threads that are not in ST () but are predecessors of T () or predecessors of threads in ST (). Because those threads are not in ST (), by Lemma 1, they are not successors of threads in ST () and therefore are guaranteed to be red before any thread in ST (). This implies that they can be scheduled statically. With this strategy, we can maximize static scheduling thereby minimizing scheduling overhead. This also allows us to reduce communication overhead by executing threads in ST () while T () iswaiting for a completioignal from hardware. Scheduler maintains ST () for each T () in the form of a thread list sorted by a priority assigned to each thread. To schedule thread T (), we can consider three dierent cases. Case 1: T () is a Ts. Scheduler selects and res a thread from ST () based on its priority and then check handshake signal from hardware. If the signal is asserted, T () is red and the remaining threads in ST () are all red. When the signal is not asserted, another thread in ST () is selected and red. This dynamic scheduling is repeated until the handshake signal is asserted or all the threads in ST () are red. In the latter case, we start polling the signal. Case 2: T () is a Tb When T () is in a conditional branch, we distinguish Tbs and Tss in ST () because Tss are executed once but Tbs are executed the same number of times as T (). Note that Tbs and T () are in the same branch. We don't put Tbs in other branches into ST (). For the example of Figure 2, the static scheduling schedules T 11! T 7. If the boolean value of T 7 is true, then the dynamic scheduling schedules T 6 and T 8. If the completioignal from hardware is not asserted, then the scheduler schedules T 5! T 1! T 2 iequence until the signal is asserted. If the signal is not asserted even after the execution of three threads, scheduler starts polling. In example, because T 5 is on the outside of the branch containing, itcanbe red only during the rst iteration of the loop. That is, ST () = ft 1 ;T 2 ;T 5 g for the rst iteration and ST () =ft 1 ;T 2 g for the second iteration. Case 3: T () is a Tc When T () is in a condition evaluation clause, we distinguish Tcs and Tss in ST () in the same way as case 2. In this case, Tbs can not be included in ST () because of control dependency between T () and Tbs. 5 Experimental Results We implemented our software synthesis algorithm in the C programming language on a SUN Sparc workstation. Multithreading is achieved via SUN OS light weight process library. However, our software synthesis techiniques can be easily extended to embedded realtime systems because our methodology is general enough. We have experimented with some examples in our codesign environment[6] which consists of a SUN Sparc Processor, SBus, and an FPGA prototyping board. Hardware components which are synthesized and prototyped with an FPGA communicate with software components via SBus. Figure 3 shows experimental results of an elliptical wave lter where a multiplication operation with delay elements is implemented with the FPGA. Delay have been intentionally inserted into the hardware

5 0 Relative Execution Time Number of Polling Operation Relative Communication Overhead Number of Polling Operation Figure 3: Performance comparison. (a) relative execution time (b) relative communication overhead part to mimic a more complicated system. Figure 3 compares the performance of a straight line code implementation and the code generated by our synthesis algorithm. The straight line code assumes mutual exclusion betweeoftware components and hardware components as in [3]. Therefore, there exits no overlap betweeoftware and hardware execution. The execution time and the communication overhead of the generated code are measured relative to the straight line code changing the delay of the hardware component. Number of polling operations in Figure 3 indicates the number of calls issued by the code generated by our algorithm to check the completioignal from the hardware. The number of calls issued by straight line code is much larger because the former performs simple polling during the hardware execution. We de ne communication overhead as time devoted to i/o operations over total execution time. Figure 3 shows the relative communication overhead of the synthesized code compared to the straight line code. If the completioignal arrives before the rst polling operation, our approach exhibits slightly worse performance because there is overhead of context switching. However, if the hardware delay is long enough that there occurs more polling operations, then the relative execution time and the relative communication overhead are drastically reduced. Figure 3(a) shows saturated reduction of the relative execution time. This is because the example is small. We expect much more reduction when the size of application is large so that P () and the hardware delay is also large. For the example of an MPEG2 decoder, if idct block is implemented in hardware and synchorization is performed by busywait polling, then the performance becomes worse than allsoftware implementation because communication overhead is too large. In such situations, we expect our algorithm will be of great help. Currently, we are experimenting with this example. 6 Conclusions In this paper, we have presented a software synthesis technique which generates codes based on threads. Our methodology tries to execute as many operations as possible before the completion of unbounded delay operations of hardware or environment, thereby reducing the total execution time. The executions of operations are scheduled eciently through thread partitioning and thread scheduling. It has been experimentally shown that the total execution time can be eectively reduced. We are currently experimenting with several embedded system examples. We plan to extend our work to hardwaresoftware codesign where a system is specied with mixed VHDL and Ptolemy, timing constraints are given, and/or interrupt driven i/o protocol is used. We are also exploring generation of some operating system kernel code including scheduler and device drivers. References [1] J.Buck, S.Ha, E.A.Lee, and D.G.Messerschmitt, "Ptolemy: a framework for simulating and prototyping heterogeneous systems," International Journal of Computer Simulation, Vol. 4, Apr. 1994, pp [2] Massimiliano Chiodo et al., "Synthesis of Software Programs for Embedded Control Applications," Proc. of the 32nd DAC, June 1995, pp [3] R.Ernst, J.Henkel, and T.Benner, "Hardware Software Cosynthesis for Microcontrollers," IEEE Design & Test of Computers, December 1993, pp [4] Daniel D. Gajski et al., "Specication and Design of Embedded Systems," PrenticeHall, Inc, [5] Rajesh K.Gupta, Giovanni De Mecheli, "System Synthesis via HardwareSoftware CoDesign," CSL Technical Report CSLTR92548, Stanford University, October [6] Y.Kim, Y.Shin, K.Kim, J.Won, and K.Choi, "Ef cient prototyping system based on incremental design and modulebymodule verication," Proc. of 1995 International Symposium on Circuits and Systems, May. 1995, pp [7] K.E.Schauser et al., "CompilerControlled Multithreading for Lenient Parallel Languages," Proc. Fifth ACM Conf. Functional Programming Languages and Computer Architecture, ACM, New York, 1991, pp.5072.

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli

under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Interface Optimization for Concurrent Systems under Timing Constraints David Filo David Ku Claudionor N. Coelho, Jr. Giovanni De Micheli Abstract The scope of most high-level synthesis eorts to date has

More information

Combining MBP-Speculative Computation and Loop Pipelining. in High-Level Synthesis. Technical University of Braunschweig. Braunschweig, Germany

Combining MBP-Speculative Computation and Loop Pipelining. in High-Level Synthesis. Technical University of Braunschweig. Braunschweig, Germany Combining MBP-Speculative Computation and Loop Pipelining in High-Level Synthesis U. Holtmann, R. Ernst Technical University of Braunschweig Braunschweig, Germany Abstract Frequent control dependencies

More information

Scheduling with Bus Access Optimization for Distributed Embedded Systems

Scheduling with Bus Access Optimization for Distributed Embedded Systems 472 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 5, OCTOBER 2000 Scheduling with Bus Access Optimization for Distributed Embedded Systems Petru Eles, Member, IEEE, Alex

More information

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load

Type T1: force false. Type T2: force true. Type T3: complement. Type T4: load Testability Insertion in Behavioral Descriptions Frank F. Hsu Elizabeth M. Rudnick Janak H. Patel Center for Reliable & High-Performance Computing University of Illinois, Urbana, IL Abstract A new synthesis-for-testability

More information

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract

An Ecient Approximation Algorithm for the. File Redistribution Scheduling Problem in. Fully Connected Networks. Abstract An Ecient Approximation Algorithm for the File Redistribution Scheduling Problem in Fully Connected Networks Ravi Varadarajan Pedro I. Rivera-Vega y Abstract We consider the problem of transferring a set

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ

Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dynamic Voltage Scaling of Periodic and Aperiodic Tasks in Priority-Driven Systems Λ Dongkun Shin Jihong Kim School of CSE School of CSE Seoul National University Seoul National University Seoul, Korea

More information

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network Congestion-free Routing of Streaming Multimedia Content in BMIN-based Parallel Systems Harish Sethu Department of Electrical and Computer Engineering Drexel University Philadelphia, PA 19104, USA sethu@ece.drexel.edu

More information

Interface Design of VHDL Simulation for Hardware-Software Cosimulation

Interface Design of VHDL Simulation for Hardware-Software Cosimulation Interface Design of Simulation for Hardware-Software Cosimulation Wonyong Sung, Moonwook Oh, Soonhoi Ha Seoul National University Codesign and Parallel Processing Laboratory Seoul, Korea TEL : 2-880-7292,

More information

An Integrated Hardware-Software Cosimulation Environment for Heterogeneous Systems Prototyping

An Integrated Hardware-Software Cosimulation Environment for Heterogeneous Systems Prototyping An Integrated Hardware-Software Cosimulation Environment for Heterogeneous Systems Prototyping Yongjoo Kim*, Kyuseok Kim*, Youngsoo Shin*, Taekyoon Ahn*, Wonyong Sung', Kiyoung Choi*, Soonhoi Ha' * Dept.

More information

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations.

A Boolean Expression. Reachability Analysis or Bisimulation. Equation Solver. Boolean. equations. A Framework for Embedded Real-time System Design? Jin-Young Choi 1, Hee-Hwan Kwak 2, and Insup Lee 2 1 Department of Computer Science and Engineering, Korea Univerity choi@formal.korea.ac.kr 2 Department

More information

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX

(Preliminary Version 2 ) Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University. College Station, TX Towards an Adaptive Distributed Shared Memory (Preliminary Version ) Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Don't Cares in Multi-Level Network Optimization. Hamid Savoj. Abstract

Don't Cares in Multi-Level Network Optimization. Hamid Savoj. Abstract Don't Cares in Multi-Level Network Optimization Hamid Savoj University of California Berkeley, California Department of Electrical Engineering and Computer Sciences Abstract An important factor in the

More information

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a

The Global Standard for Mobility (GSM) (see, e.g., [6], [4], [5]) yields a Preprint 0 (2000)?{? 1 Approximation of a direction of N d in bounded coordinates Jean-Christophe Novelli a Gilles Schaeer b Florent Hivert a a Universite Paris 7 { LIAFA 2, place Jussieu - 75251 Paris

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Zhou B. B., Brent R. P. and Tridgell A. y Computer Sciences Laboratory The Australian National University Canberra,

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

A Nim game played on graphs II

A Nim game played on graphs II Theoretical Computer Science 304 (2003) 401 419 www.elsevier.com/locate/tcs A Nim game played on graphs II Masahiko Fukuyama Graduate School of Mathematical Sciences, University of Tokyo, 3-8-1 Komaba,

More information

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology.

A Fast Recursive Mapping Algorithm. Department of Computer and Information Science. New Jersey Institute of Technology. A Fast Recursive Mapping Algorithm Song Chen and Mary M. Eshaghian Department of Computer and Information Science New Jersey Institute of Technology Newark, NJ 7 Abstract This paper presents a generic

More information

Network. Department of Statistics. University of California, Berkeley. January, Abstract

Network. Department of Statistics. University of California, Berkeley. January, Abstract Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,

More information

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for

Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc. Abstract. Direct Volume Rendering (DVR) is a powerful technique for Comparison of Two Image-Space Subdivision Algorithms for Direct Volume Rendering on Distributed-Memory Multicomputers Egemen Tanin, Tahsin M. Kurc, Cevdet Aykanat, Bulent Ozguc Dept. of Computer Eng. and

More information

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally

Transport protocols are of practical. login, le transfer, and remote procedure. calls. will operate on and therefore are generally Hazard-Free Connection Release Jennifer E. Walter Department of Computer Science Texas A&M University College Station, TX 77843-3112, U.S.A. Jennifer L. Welch Department of Computer Science Texas A&M University

More information

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742

Availability of Coding Based Replication Schemes. Gagan Agrawal. University of Maryland. College Park, MD 20742 Availability of Coding Based Replication Schemes Gagan Agrawal Department of Computer Science University of Maryland College Park, MD 20742 Abstract Data is often replicated in distributed systems to improve

More information

PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France.

PPS : A Pipeline Path-based Scheduler. 46, Avenue Felix Viallet, Grenoble Cedex, France. : A Pipeline Path-based Scheduler Maher Rahmouni Ahmed A. Jerraya Laboratoire TIMA/lNPG,, Avenue Felix Viallet, 80 Grenoble Cedex, France Email:rahmouni@verdon.imag.fr Abstract This paper presents a scheduling

More information

PARAS: System-Level Concurrent Partitioning and Scheduling. University of Wisconsin. Madison, WI

PARAS: System-Level Concurrent Partitioning and Scheduling. University of Wisconsin. Madison, WI PARAS: System-Level Concurrent Partitioning and Scheduling Wing Hang Wong and Rajiv Jain Department of Electrical and Computer Engineering University of Wisconsin Madison, WI 53706 http://polya.ece.wisc.edu/~rajiv/home.html

More information

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809

PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA. Laurent Lemarchand. Informatique. ea 2215, D pt. ubo University{ bp 809 PARALLEL PERFORMANCE DIRECTED TECHNOLOGY MAPPING FOR FPGA Laurent Lemarchand Informatique ubo University{ bp 809 f-29285, Brest { France lemarch@univ-brest.fr ea 2215, D pt ABSTRACT An ecient distributed

More information

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path Michalis D. Galanis, Gregory Dimitroulakos, and Costas E. Goutis VLSI Design Laboratory, Electrical and Computer Engineering

More information

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals

instruction fetch memory interface signal unit priority manager instruction decode stack register sets address PC2 PC3 PC4 instructions extern signals Performance Evaluations of a Multithreaded Java Microcontroller J. Kreuzinger, M. Pfeer A. Schulz, Th. Ungerer Institute for Computer Design and Fault Tolerance University of Karlsruhe, Germany U. Brinkschulte,

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp

160 M. Nadjarbashi, S.M. Fakhraie and A. Kaviani Figure 2. LUTB structure. each block-level track can be arbitrarily connected to each of 16 4-LUT inp Scientia Iranica, Vol. 11, No. 3, pp 159{164 c Sharif University of Technology, July 2004 On Routing Architecture for Hybrid FPGA M. Nadjarbashi, S.M. Fakhraie 1 and A. Kaviani 2 In this paper, the routing

More information

Comparison of Priority Queue algorithms for Hierarchical Scheduling Framework. Mikael Åsberg

Comparison of Priority Queue algorithms for Hierarchical Scheduling Framework. Mikael Åsberg Comparison of Priority Queue algorithms for Hierarchical Scheduling Framework Mikael Åsberg mag04002@student.mdh.se August 28, 2008 2 The Time Event Queue (TEQ) is a datastructure that is part of the implementation

More information

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t

2 Data Reduction Techniques The granularity of reducible information is one of the main criteria for classifying the reduction techniques. While the t Data Reduction - an Adaptation Technique for Mobile Environments A. Heuer, A. Lubinski Computer Science Dept., University of Rostock, Germany Keywords. Reduction. Mobile Database Systems, Data Abstract.

More information

Super-Key Classes for Updating. Materialized Derived Classes in Object Bases

Super-Key Classes for Updating. Materialized Derived Classes in Object Bases Super-Key Classes for Updating Materialized Derived Classes in Object Bases Shin'ichi KONOMI 1, Tetsuya FURUKAWA 1 and Yahiko KAMBAYASHI 2 1 Comper Center, Kyushu University, Higashi, Fukuoka 812, Japan

More information

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139

Enumeration of Full Graphs: Onset of the Asymptotic Region. Department of Mathematics. Massachusetts Institute of Technology. Cambridge, MA 02139 Enumeration of Full Graphs: Onset of the Asymptotic Region L. J. Cowen D. J. Kleitman y F. Lasaga D. E. Sussman Department of Mathematics Massachusetts Institute of Technology Cambridge, MA 02139 Abstract

More information

EL6483: Basic Concepts of Embedded System ModelingSpring and Hardware-In-The-Loo

EL6483: Basic Concepts of Embedded System ModelingSpring and Hardware-In-The-Loo : Basic Concepts of Embedded System Modeling and Hardware-In-The-Loop Simulation Spring 2016 : Basic Concepts of Embedded System ModelingSpring and Hardware-In-The-Loo 2016 1 / 26 Overall system : Basic

More information

An Algorithm for the Allocation of Functional Units from. Realistic RT Component Libraries. Department of Information and Computer Science

An Algorithm for the Allocation of Functional Units from. Realistic RT Component Libraries. Department of Information and Computer Science An Algorithm for the Allocation of Functional Units from Realistic RT Component Libraries Roger Ang rang@ics.uci.edu Nikil Dutt dutt@ics.uci.edu Department of Information and Computer Science University

More information

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment

A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment A New Approach to Execution Time Estimations in a Hardware/Software Codesign Environment JAVIER RESANO, ELENA PEREZ, DANIEL MOZOS, HORTENSIA MECHA, JULIO SEPTIÉN Departamento de Arquitectura de Computadores

More information

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD

Localization in Graphs. Richardson, TX Azriel Rosenfeld. Center for Automation Research. College Park, MD CAR-TR-728 CS-TR-3326 UMIACS-TR-94-92 Samir Khuller Department of Computer Science Institute for Advanced Computer Studies University of Maryland College Park, MD 20742-3255 Localization in Graphs Azriel

More information

An Integrated Cosimulation Environment for Heterogeneous Systems Prototyping

An Integrated Cosimulation Environment for Heterogeneous Systems Prototyping An Integrated Cosimulation Environment for Heterogeneous Systems Prototyping Yongjoo Kim, Kyuseok Kim, Youngsoo Shin, Taekyoon Ahn, and Kiyoung Choi School of Electrical Engineering Seoul National University

More information

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi

Incorporating the Controller Eects During Register Transfer Level. Synthesis. Champaka Ramachandran and Fadi J. Kurdahi Incorporating the Controller Eects During Register Transfer Level Synthesis Champaka Ramachandran and Fadi J. Kurdahi Department of Electrical & Computer Engineering, University of California, Irvine,

More information

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Politecnico di Milano & EPFL A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs Vincenzo Rana, Ivan Beretta, Donatella Sciuto Donatella Sciuto sciuto@elet.polimi.it Introduction

More information

requests or displaying activities, hence they usually have soft deadlines, or no deadlines at all. Aperiodic tasks with hard deadlines are called spor

requests or displaying activities, hence they usually have soft deadlines, or no deadlines at all. Aperiodic tasks with hard deadlines are called spor Scheduling Aperiodic Tasks in Dynamic Priority Systems Marco Spuri and Giorgio Buttazzo Scuola Superiore S.Anna, via Carducci 4, 561 Pisa, Italy Email: spuri@fastnet.it, giorgio@sssup.it Abstract In this

More information

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines

Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines Ecient Implementation of Sorting Algorithms on Asynchronous Distributed-Memory Machines B. B. Zhou, R. P. Brent and A. Tridgell Computer Sciences Laboratory The Australian National University Canberra,

More information

Greedy Algorithms. T. M. Murali. January 28, Interval Scheduling Interval Partitioning Minimising Lateness

Greedy Algorithms. T. M. Murali. January 28, Interval Scheduling Interval Partitioning Minimising Lateness Greedy Algorithms T. M. Murali January 28, 2008 Algorithm Design Start discussion of dierent ways of designing algorithms. Greedy algorithms, divide and conquer, dynamic programming. Discuss principles

More information

Credit-Based Fair Queueing (CBFQ) K. T. Chan, B. Bensaou and D.H.K. Tsang. Department of Electrical & Electronic Engineering

Credit-Based Fair Queueing (CBFQ) K. T. Chan, B. Bensaou and D.H.K. Tsang. Department of Electrical & Electronic Engineering Credit-Based Fair Queueing (CBFQ) K. T. Chan, B. Bensaou and D.H.K. Tsang Department of Electrical & Electronic Engineering Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong

More information

ABSTRACT Finding a cut or nding a matching in a graph are so simple problems that hardly are considered problems at all. Finding a cut whose split edg

ABSTRACT Finding a cut or nding a matching in a graph are so simple problems that hardly are considered problems at all. Finding a cut whose split edg R O M A TRE DIA Universita degli Studi di Roma Tre Dipartimento di Informatica e Automazione Via della Vasca Navale, 79 { 00146 Roma, Italy The Complexity of the Matching-Cut Problem Maurizio Patrignani

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

Adaptive Migratory Scheme for Distributed Shared Memory 1. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University

Adaptive Migratory Scheme for Distributed Shared Memory 1. Jai-Hoon Kim Nitin H. Vaidya. Department of Computer Science. Texas A&M University Adaptive Migratory Scheme for Distributed Shared Memory 1 Jai-Hoon Kim Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 E-mail: fjhkim,vaidyag@cs.tamu.edu

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song

CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS. Xiaodong Zhang and Yongsheng Song CHAPTER 4 AN INTEGRATED APPROACH OF PERFORMANCE PREDICTION ON NETWORKS OF WORKSTATIONS Xiaodong Zhang and Yongsheng Song 1. INTRODUCTION Networks of Workstations (NOW) have become important distributed

More information

Frank Mueller. Dept. of Computer Science. Florida State University. Tallahassee, FL phone: (904)

Frank Mueller. Dept. of Computer Science. Florida State University. Tallahassee, FL phone: (904) Static Cache Simulation and its Applications by Frank Mueller Dept. of Computer Science Florida State University Tallahassee, FL 32306-4019 e-mail: mueller@cs.fsu.edu phone: (904) 644-3441 July 12, 1994

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

Politecnico di Milano

Politecnico di Milano Politecnico di Milano Automatic parallelization of sequential specifications for symmetric MPSoCs [Full text is available at https://re.public.polimi.it/retrieve/handle/11311/240811/92308/iess.pdf] Fabrizio

More information

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA

Telecommunication and Informatics University of North Carolina, Technical University of Gdansk Charlotte, NC 28223, USA A Decoder-based Evolutionary Algorithm for Constrained Parameter Optimization Problems S lawomir Kozie l 1 and Zbigniew Michalewicz 2 1 Department of Electronics, 2 Department of Computer Science, Telecommunication

More information

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica

1 Introduction Data format converters (DFCs) are used to permute the data from one format to another in signal processing and image processing applica A New Register Allocation Scheme for Low Power Data Format Converters Kala Srivatsan, Chaitali Chakrabarti Lori E. Lucke Department of Electrical Engineering Minnetronix, Inc. Arizona State University

More information

Reliability-Aware Co-synthesis for Embedded Systems

Reliability-Aware Co-synthesis for Embedded Systems Reliability-Aware Co-synthesis for Embedded Systems Y. Xie, L. Li, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin Department of Computer Science and Engineering Pennsylvania State University {yuanxie,

More information

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli

However, no results are published that indicate the applicability for cycle-accurate simulation purposes. The language RADL [12] is derived from earli Retargeting of Compiled Simulators for Digital Signal Processors Using a Machine Description Language Stefan Pees, Andreas Homann, Heinrich Meyr Integrated Signal Processing Systems, RWTH Aachen pees[homann,meyr]@ert.rwth-aachen.de

More information

Copyright (C) 1997, 1998 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for

Copyright (C) 1997, 1998 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for Copyright (C) 1997, 1998 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided

More information

8ns. 8ns. 16ns. 10ns COUT S3 COUT S3 A3 B3 A2 B2 A1 B1 B0 2 B0 CIN CIN COUT S3 A3 B3 A2 B2 A1 B1 A0 B0 CIN S0 S1 S2 S3 COUT CIN 2 A0 B0 A2 _ A1 B1

8ns. 8ns. 16ns. 10ns COUT S3 COUT S3 A3 B3 A2 B2 A1 B1 B0 2 B0 CIN CIN COUT S3 A3 B3 A2 B2 A1 B1 A0 B0 CIN S0 S1 S2 S3 COUT CIN 2 A0 B0 A2 _ A1 B1 Delay Abstraction in Combinational Logic Circuits Noriya Kobayashi Sharad Malik C&C Research Laboratories Department of Electrical Engineering NEC Corp. Princeton University Miyamae-ku, Kawasaki Japan

More information

Abstract. provide substantial improvements in performance on a per application basis. We have used architectural customization

Abstract. provide substantial improvements in performance on a per application basis. We have used architectural customization Architectural Adaptation in MORPH Rajesh K. Gupta a Andrew Chien b a Information and Computer Science, University of California, Irvine, CA 92697. b Computer Science and Engg., University of California,

More information

A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION

A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION Fethullah Karabiber 1 Ahmet Sertbaş 1 Hasan Cam 2 1 Computer Engineering Department Engineering Faculty, Istanbul University 34320, Avcilar,

More information

Networks for Control. California Institute of Technology. Pasadena, CA Abstract

Networks for Control. California Institute of Technology. Pasadena, CA Abstract Learning Fuzzy Rule-Based Neural Networks for Control Charles M. Higgins and Rodney M. Goodman Department of Electrical Engineering, 116-81 California Institute of Technology Pasadena, CA 91125 Abstract

More information

A technique for adding range restrictions to. August 30, Abstract. In a generalized searching problem, a set S of n colored geometric objects

A technique for adding range restrictions to. August 30, Abstract. In a generalized searching problem, a set S of n colored geometric objects A technique for adding range restrictions to generalized searching problems Prosenjit Gupta Ravi Janardan y Michiel Smid z August 30, 1996 Abstract In a generalized searching problem, a set S of n colored

More information

Rate-Controlled Static-Priority. Hui Zhang. Domenico Ferrari. hzhang, Computer Science Division

Rate-Controlled Static-Priority. Hui Zhang. Domenico Ferrari. hzhang, Computer Science Division Rate-Controlled Static-Priority Queueing Hui Zhang Domenico Ferrari hzhang, ferrari@tenet.berkeley.edu Computer Science Division University of California at Berkeley Berkeley, CA 94720 TR-92-003 February

More information

DRAFT for FINAL VERSION. Accepted for CACSD'97, Gent, Belgium, April 1997 IMPLEMENTATION ASPECTS OF THE PLC STANDARD IEC

DRAFT for FINAL VERSION. Accepted for CACSD'97, Gent, Belgium, April 1997 IMPLEMENTATION ASPECTS OF THE PLC STANDARD IEC DRAFT for FINAL VERSION. Accepted for CACSD'97, Gent, Belgium, 28-3 April 1997 IMPLEMENTATION ASPECTS OF THE PLC STANDARD IEC 1131-3 Martin hman Stefan Johansson Karl-Erik rzen Department of Automatic

More information

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu

to automatically generate parallel code for many applications that periodically update shared data structures using commuting operations and/or manipu Semantic Foundations of Commutativity Analysis Martin C. Rinard y and Pedro C. Diniz z Department of Computer Science University of California, Santa Barbara Santa Barbara, CA 93106 fmartin,pedrog@cs.ucsb.edu

More information

Two Problems - Two Solutions: One System - ECLiPSe. Mark Wallace and Andre Veron. April 1993

Two Problems - Two Solutions: One System - ECLiPSe. Mark Wallace and Andre Veron. April 1993 Two Problems - Two Solutions: One System - ECLiPSe Mark Wallace and Andre Veron April 1993 1 Introduction The constraint logic programming system ECL i PS e [4] is the successor to the CHIP system [1].

More information

System Modeling and Presynthesis Using Timed Decision Tables. Urbana, Illinois Irvine, CA March 20, Abstract

System Modeling and Presynthesis Using Timed Decision Tables. Urbana, Illinois Irvine, CA March 20, Abstract System Modeling and Presynthesis Using Timed Decision Tables y Jian Li and z Rajesh K. Gupta y Department of Computer Science z Information & Computer Science University of Illinois at Urbana-Champaign

More information

OUT. + * * + + * * + c1 c2. c4 c5 D2 OUT

OUT. + * * + + * * + c1 c2. c4 c5 D2  OUT Techniques for Functional Test Pattern Execution Inki Hong and Miodrag Potkonjak UCLA Computer Science Department Los Angeles, CA 90095-596 USA Abstract Functional debugging of application specic integrated

More information

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE

[HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE [HaKa92] L. Hagen and A. B. Kahng, A new approach to eective circuit clustering, Proc. IEEE International Conference on Computer-Aided Design, pp. 422-427, November 1992. [HaKa92b] L. Hagen and A. B.Kahng,

More information

The temporal explorer who returns to the base 1

The temporal explorer who returns to the base 1 The temporal explorer who returns to the base 1 Eleni C. Akrida, George B. Mertzios, and Paul G. Spirakis, Department of Computer Science, University of Liverpool, UK Department of Computer Science, Durham

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

Program Implementation Schemes for Hardware-Software Systems

Program Implementation Schemes for Hardware-Software Systems Program Implementation Schemes for Hardware-Software Systems Rajesh K. Gupta Claudionor N. Coelho, Jr. Giovanni De Micheli Computer Systems Laboratory Departments of Electrical Engineering and Computer

More information

Concurrent Programming Lecture 3

Concurrent Programming Lecture 3 Concurrent Programming Lecture 3 3rd September 2003 Atomic Actions Fine grain atomic action We assume that all machine instructions are executed atomically: observers (including instructions in other threads)

More information

Uncontrollable. High Priority. Users. Multiplexer. Server. Low Priority. Controllable. Users. Queue

Uncontrollable. High Priority. Users. Multiplexer. Server. Low Priority. Controllable. Users. Queue Global Max-Min Fairness Guarantee for ABR Flow Control Qingyang Hu, David W. Petr Information and Telecommunication Technology Center Department of Electrical Engineering & Computer Science The University

More information

Eliminating False Loops Caused by Sharing in Control Path

Eliminating False Loops Caused by Sharing in Control Path Eliminating False Loops Caused by Sharing in Control Path ALAN SU and YU-CHIN HSU University of California Riverside and TA-YUNG LIU and MIKE TIEN-CHIEN LEE Avant! Corporation In high-level synthesis,

More information

Solve the Data Flow Problem

Solve the Data Flow Problem Gaining Condence in Distributed Systems Gleb Naumovich, Lori A. Clarke, and Leon J. Osterweil University of Massachusetts, Amherst Computer Science Department University of Massachusetts Amherst, Massachusetts

More information

Centre for Parallel Computing, University of Westminster, London, W1M 8JS

Centre for Parallel Computing, University of Westminster, London, W1M 8JS Graphical Construction of Parallel Programs G. R. Ribeiro Justo Centre for Parallel Computing, University of Westminster, London, WM 8JS e-mail: justog@wmin.ac.uk, Abstract Parallel programming is not

More information

where is a constant, 0 < <. In other words, the ratio between the shortest and longest paths from a node to a leaf is at least. An BB-tree allows ecie

where is a constant, 0 < <. In other words, the ratio between the shortest and longest paths from a node to a leaf is at least. An BB-tree allows ecie Maintaining -balanced Trees by Partial Rebuilding Arne Andersson Department of Computer Science Lund University Box 8 S-22 00 Lund Sweden Abstract The balance criterion dening the class of -balanced trees

More information

Ptolemy Seamlessly Supports Heterogeneous Design 5 of 5

Ptolemy Seamlessly Supports Heterogeneous Design 5 of 5 In summary, the key idea in the Ptolemy project is to mix models of computation, rather than trying to develop one, all-encompassing model. The rationale is that specialized models of computation are (1)

More information

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh

Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented wh Adaptive Estimation of Distributions using Exponential Sub-Families Alan Gous Stanford University December 1996 Abstract: An algorithm is presented which, for a large-dimensional exponential family G,

More information

Hardware/Software Partitioning using Integer Programming. Ralf Niemann, Peter Marwedel. University of Dortmund. D Dortmund, Germany

Hardware/Software Partitioning using Integer Programming. Ralf Niemann, Peter Marwedel. University of Dortmund. D Dortmund, Germany Hardware/Software using Integer Programming Ralf Niemann, Peter Marwedel Dept. of Computer Science II University of Dortmund D-44221 Dortmund, Germany Abstract One of the key problems in hardware/software

More information

Lecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018

Lecture. DM510 - Operating Systems, Weekly Notes, Week 11/12, 2018 Lecture In the lecture on March 13 we will mainly discuss Chapter 6 (Process Scheduling). Examples will be be shown for the simulation of the Dining Philosopher problem, a solution with monitors will also

More information

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control.

Neuro-Remodeling via Backpropagation of Utility. ABSTRACT Backpropagation of utility is one of the many methods for neuro-control. Neuro-Remodeling via Backpropagation of Utility K. Wendy Tang and Girish Pingle 1 Department of Electrical Engineering SUNY at Stony Brook, Stony Brook, NY 11794-2350. ABSTRACT Backpropagation of utility

More information

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems

A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems A Modified Genetic Algorithm for Task Scheduling in Multiprocessor Systems Yi-Hsuan Lee and Cheng Chen Department of Computer Science and Information Engineering National Chiao Tung University, Hsinchu,

More information

SPARK: A Parallelizing High-Level Synthesis Framework

SPARK: A Parallelizing High-Level Synthesis Framework SPARK: A Parallelizing High-Level Synthesis Framework Sumit Gupta Rajesh Gupta, Nikil Dutt, Alex Nicolau Center for Embedded Computer Systems University of California, Irvine and San Diego http://www.cecs.uci.edu/~spark

More information

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano

THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL. Jun Sun, Yasushi Shinjo and Kozo Itano THE IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM SUPPORTING THE PARALLEL WORLD MODEL Jun Sun, Yasushi Shinjo and Kozo Itano Institute of Information Sciences and Electronics University of Tsukuba Tsukuba,

More information

CAD with use of Designers' Intention. Osaka University. Suita, Osaka , Japan. Abstract

CAD with use of Designers' Intention. Osaka University. Suita, Osaka , Japan. Abstract CAD with use of Designers' Intention Eiji Arai, Keiichi Shirase, and Hidefumi Wakamatsu Dept. of Manufacturing Science Graduate School of Engineering Osaka University Suita, Osaka 565-0871, Japan Abstract

More information

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract

A New Theory of Deadlock-Free Adaptive. Routing in Wormhole Networks. Jose Duato. Abstract A New Theory of Deadlock-Free Adaptive Routing in Wormhole Networks Jose Duato Abstract Second generation multicomputers use wormhole routing, allowing a very low channel set-up time and drastically reducing

More information

Andreas Kuehlmann. validates properties conrmed on one (preferably abstract) synthesized by the Cathedral system with the original. input specication.

Andreas Kuehlmann. validates properties conrmed on one (preferably abstract) synthesized by the Cathedral system with the original. input specication. Formal Verication of a PowerPC TM Microprocessor David P. Appenzeller IBM Microelectronic Burlington Essex Junction, VT, U.S.A. Andreas Kuehlmann IBM Thomas J. Watson Research Center Yorktown Heights,

More information

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science

DSC: Scheduling Parallel Tasks on an Unbounded Number of. Processors 3. Tao Yang and Apostolos Gerasoulis. Department of Computer Science DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors 3 Tao Yang and Apostolos Gerasoulis Department of Computer Science Rutgers University New Brunswick, NJ 08903 Email: ftyang, gerasoulisg@cs.rutgers.edu

More information

Hardware Software Partitioning of Multifunction Systems

Hardware Software Partitioning of Multifunction Systems Hardware Software Partitioning of Multifunction Systems Abhijit Prasad Wangqi Qiu Rabi Mahapatra Department of Computer Science Texas A&M University College Station, TX 77843-3112 Email: {abhijitp,wangqiq,rabi}@cs.tamu.edu

More information

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J.

Compilation Issues for High Performance Computers: A Comparative. Overview of a General Model and the Unied Model. Brian J. Compilation Issues for High Performance Computers: A Comparative Overview of a General Model and the Unied Model Abstract This paper presents a comparison of two models suitable for use in a compiler for

More information

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems

On Object Orientation as a Paradigm for General Purpose. Distributed Operating Systems On Object Orientation as a Paradigm for General Purpose Distributed Operating Systems Vinny Cahill, Sean Baker, Brendan Tangney, Chris Horn and Neville Harris Distributed Systems Group, Dept. of Computer

More information

Design For High Performance Flexray Protocol For Fpga Based System

Design For High Performance Flexray Protocol For Fpga Based System IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) e-issn: 2319 4200, p-issn No. : 2319 4197 PP 83-88 www.iosrjournals.org Design For High Performance Flexray Protocol For Fpga Based System E. Singaravelan

More information

Optimizing Closures in O(0) time

Optimizing Closures in O(0) time Optimizing Closures in O(0 time Andrew W. Keep Cisco Systems, Inc. Indiana Univeristy akeep@cisco.com Alex Hearn Indiana University adhearn@cs.indiana.edu R. Kent Dybvig Cisco Systems, Inc. Indiana University

More information

Department of Computer Science. a vertex can communicate with a particular neighbor. succeeds if it shares no edge with other calls during

Department of Computer Science. a vertex can communicate with a particular neighbor. succeeds if it shares no edge with other calls during Sparse Hypercube A Minimal k-line Broadcast Graph Satoshi Fujita Department of Electrical Engineering Hiroshima University Email: fujita@se.hiroshima-u.ac.jp Arthur M. Farley Department of Computer Science

More information

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s]

Cluster quality 15. Running time 0.7. Distance between estimated and true means Running time [s] Fast, single-pass K-means algorithms Fredrik Farnstrom Computer Science and Engineering Lund Institute of Technology, Sweden arnstrom@ucsd.edu James Lewis Computer Science and Engineering University of

More information

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract

Algorithms for an FPGA Switch Module Routing Problem with. Application to Global Routing. Abstract Algorithms for an FPGA Switch Module Routing Problem with Application to Global Routing Shashidhar Thakur y Yao-Wen Chang y D. F. Wong y S. Muthukrishnan z Abstract We consider a switch-module-routing

More information

WaveScalar. Winter 2006 CSE WaveScalar 1

WaveScalar. Winter 2006 CSE WaveScalar 1 WaveScalar Dataflow machine good at exploiting ILP dataflow parallelism traditional coarser-grain parallelism cheap thread management memory ordering enforced through wave-ordered memory Winter 2006 CSE

More information