Hardware Accelerators for Real-Time Scheduling in Packet Processing Systems

Size: px
Start display at page:

Download "Hardware Accelerators for Real-Time Scheduling in Packet Processing Systems"

Transcription

1 Hardware Accelerators for Real-Time Scheduling in Packet Processing Systems Abstract Scheduler Task Fast job scheduling is vital for real-time systems in which task execution times are short. This paper focuses on packet processing systems, e.g. multimedia streaming applications, in which it is critical that scheduling delays are no more than a few clock cycles. In order to meet this challenge, we propose hardware accelerators which implement scheduling algorithms directly in custom hardware, thereby achieving very low scheduling latencies. We first present a generic framework for such scheduling accelerators, then instantiate several designs by implementing different scheduling algorithms as plug-ins. Our proposed hardware implementation uses an asynchronous or clockless circuit implementation style because of its significant benefits over clocked circuits, particularly in the area of modularity. Simulations of the resulting scheduler hardware indicates scheduling latencies as low as 14 processor cycles for the highest-priority tasks. 1. Introduction This paper targets a particular class of real-time systems, called packet processors, in which quick job turnover is a critical benchmark for performance. Such systems operate on a stream of packets, each packet typically executing for a limited number of clock cycles (e.g., cycles). In order for packet scheduling overhead to be acceptable, it is vital that scheduling occurs rapidly, with delays no more than a few clock cycles. This scenario differs from traditional real-time operating systems where task execution times are typically orders of magnitude greater (and therefore scheduling speed is less critical). Our work introduces a scheduling approach that uses custom hardware accelerators to provide the scheduling speeds required by packet processing systems. Building schedulers directly in hardware has two benefits: significantly shortening scheduling delays, and freeing up the processor from the burden of scheduling. While these proposed hardware accelerators are applicable to any real-time packet processing system, an application of particular interest is real-time multimedia processing. Devices with multimedia components have become commonplace in consumer electronics, reflecting a widespread interest in interactive audio and video applications. To continue this trend, the breadth of technology offered by individual products is expanding: e.g., cellular phones now record video, play music, and access the internet, in addition to handling voice. a) b) c) Figure 1. Different scheduler implementations: (a) scheduling in software, (b) scheduling on hardware accelerator, (c) scheduling on hardware in parallel with job execution. For such streaming multimedia systems, it is often a challenge to efficiently manage scheduling of several tasks of different priorities. First, to maintain usability, certain real-time requirements must be met. In audio and video communication, for example, a maximum two-way delay of 300ms [8] can be tolerated. Therefore, in addition to the requirement that the processing of each multimedia packet be fast, there is also a requirement that the scheduling be fast and effective, so that deadlines are maintained. Second, it is typically desirable that overhead of task management itself does not impose a significant burden on the processor; the greater the amount of useful work performed by the processor, the higher is the quality of the multimedia experience. In this paper, we propose dedicated hardware accelerators for real-time scheduling, with the twin objectives of (i) increasing the scheduling speed, and (ii) removing the burden of scheduling from the processor. Figure 1 illustrates the benefit of our proposed approach. Figure 1(a) shows a timeline corresponding to conventional software-based scheduling: the scheduler imposes significant overheads in systems where task execution times are short. In Figure 1(b), a hardware accelerator is introduced to substantially shorten scheduling delays. In this scheme, scheduling decisions are made after the previous job has completed execution. Finally, Figure 1(c) shows a more concurrent scheduling approach, where the scheduler decides which task to execute next before the current task has finished execution. While (c) has the benefit of removing scheduling delays off of the critical path, (b) may provide better scheduling quality for late-arriving high-priority jobs. The performance benefits of hardware accelerators are most obvious when the cost of scheduling jobs is a nonnegligible portion of the execution time of a processor. How-

2 ever, even in systems where scheduling overheads are negligible compared to processing costs, using a hardware scheduler is still advantageous. In particular, hardware-based schedulers allow reduced complexity and overhead in software. They also provide more rapid response to dynamic behavior in the system or environment, e.g., job scheduling in a bursty environment. For the design of the scheduling hardware, an asynchronous or clockless circuit paradigm was chosen [5]. Asynchronous circuits dispense with centralized clocking, and instead use local handshakes to coordinate communication. As a result, they offer the potential benefits of lower power consumption, higher speed, and greater modularity and ease of design [3]. This work exploits their design modularity to enable the rapid and low-effort construction of custom scheduler hardware from high-level algorithmic descriptions. In particular, the lack of centralized clocking frees the designer from low-level timing considerations, allowing the design to be easily composed from reusable building blocks. This modularity allows us to propose a general framework, or hardware template for a generic scheduling accelerator. Different scheduling policies have been designed as distinct hardware components (i.e., plug ins ), any of which can be selected during physical design for insertion into the generic template, for either silicon fabrication or FPGA implementation. A wide range of scheduling algorithms were implemented in hardware, down to the gate level, using our approach. The schedulers implemented includes well-known static and dynamic schedulers, including rate-monotonic (RM) and earliest-deadline-first (EDF) [10]. The resulting hardware implementations were simulated, at the gate level, and their performance evaluated assuming they were part of a system containing a typical embedded compute processor running at 40 MHz. Simulations indicate promising scheduling speeds for our dynamic schedulers: only processor cycle latencies for the highest priority jobs to be scheduled. On the other hand, if scheduling is performed in software, the scheduling latencies are several orders of magnitude greater. The remainder of this paper is organized as follows. Section 2 on asynchronous design, including a brief description of the specific synthesis flow used in our approach. Section 3 then presents the design and implementation of the proposed scheduling accelerator; several distinct scheduling policies are covered. Section 4 presents synthesis and simulation results for each of the schedulers implemented, including scheduling speeds obtained, as well as throughput and area overheads incurred. Finally, Section 5 gives conclusions and future research directions. 2. Previous Work and Background This section discusses previous work in the area of hardware accelerators for real-time scheduling, and provides a brief clock (a) Clocked implementation ack req ack req (b) Asynchronous implementation Figure 2. Clocked vs. Clockless FIFOs introduction to asynchronous design, including an overview of the sythesis flow that is used to implement the scheduler hardware proposed in this paper. 2.1 Previous Work on Scheduler Accelerators Several hardware-based approaches to real-time scheduling have been reported in literature, but each has significant limitations. Most of these approaches are based on hardware implementations using binary-comparator-trees [12], shiftregisters [14], and systolic arrays [9]. Each of these approaches has significant drawbacks, including high design complexity, lack of scalability and flexibility, and limited scheduling performance [11]. The approach of [4], however, is both fast and scalable. A novel pipelined heap architecture is introduced, which is capable of high-speed enqueueing and dequeueing. Further, the architecture can be scaled to arbitrary priority levels without performance degradation. Our approach, however, has several advantages over that of [4]. First, our approach provides a generalized framework for scheduling accelerators, along with modular scheduling plug in components. In contrast, their approach provides only one specific scheduling strategy. Second, their implementation is memory-based and relies on efficient manipulation of in memory. In contrast, our approach is flow, which avoids bottlenecks due to centralized memory accesses. Finally, our approach allows much more concurrent job insertion: jobs are inserted at the leaf nodes of the structure, thereby allowing a concurrent insertion of as many jobs as the number of leaf nodes. In contrast, their approach inserts jobs at the root node, and therefore only a single queueing operation can be initiated at a time. 2.2 Background: Asynchronous Design The current practice of synchronous hardware design is facing increasing difficulties as clock speeds approach 10 GHz, chip complexity approaches a billion transistors, and the demands for low power consumption and modular design become paramount [2]. 2

3 Asynchronous or clockless design is emerging as an attractive alternative, with the promise of alleviating these challenges by dispensing with centralized clocking altogether [5, 3]. Instead, events inside an asynchronous system are coordinated in a much more distributed fashion, using local handshakes between communicating components. Figure 2 illustrates this difference between synchronous and asynchronous design for the simple example of a FIFO. In the synchronous implementation, is computed in one stage and transfered to the next on every clock tick; thus, coordination between communicating stages is governed implicitly by the clock. In contrast, the asynchronous implementation of the FIFO replaces the global clock signal by a pair of request-acknowledge signals for each pair of communicating stages. 1 A stage initiates computation only when it receives new along with a request from its left neighbor. Once has been processed by the stage, the left neighbor is acknowledged, and the along with a request is relayed to the right neighbor. Advantages of Asynchronous Design. Asynchronous circuits have several key advantages over synchronous circuits (see [5, 3]). 1) Greater Energy Efficiency. In synchronous systems, idle components generally process garbage unless a clock gating protocol is in place. Asynchronous circuits inherently avoid unnecessary computation: components are only activated upon receipt of a handshake, thereby consuming little energy when idle [3]. 2) Better Electromagnetic Compatibility. While synchronous circuits produce noise spikes at clock frequencies and their harmonics, asynchronous circuits not only produce less total noise energy but it is also spread across the spectrum [3]. The benefit of lower noise emissions was the key motivation for Philips to develop fully asynchronous microprocessors, which have been used in tens of millions of commercial pagers, cell phones and smartcards used throughout Europe [7]. 3) Higher Speed. An asynchronous system can exploit -dependent computation times: when a result in a component is produced early it can be immediately communicated to the next stage. In contrast, in a synchronous system, that component would have to wait for the clock cycle to finish before the result can be communicated. Thus, while synchronous implementations are limited by worstcase clocking, asynchronous designs can potentially obtain average-case behavior [3]. 4) More Robust Arbitration. Arbitration and mutual exclusion, which are fundamental to real-time systems, can often lead to metastability in hardware. While metastability 1 Note that in asynchronous real-time systems the presence of a clock may be necessary to verify that all real-time deadlines are met; however, the purpose of such a clock is to provide a timing reference, rather than to govern the pace of all logic activity. &fifo = proc(in?chan packet & OUT!chan packet). begin & x: var packet forever do IN?x; OUT!x od end IN forever do ; x Figure 3. Haste example OUT can have drastic effects on clocked designs, its impact is less drastic in asynchronous designs. In particular, should a metastable state arise in a synchronous circuit, it must be resolved before the next clock edge to ensure correct circuitlevel behavior. In asynchronous systems, on the other hand, the metastable state is allowed to persist as long as necessary for its resolution; because there is no clock deadline to meet, the rest of the circuitry simply waits for the state to be resolved. 5) Greater Modularity and Ease-of-Design. Asychronous handshake protocols promote modularity, allowing components to be developed and reused in multiple designs [3]. Moreover, the time-consuming task of designing a low-skew high-speed clock distribution network is no longer necessary. As long as the communication protocol at a module s interface is met, the module will operate correctly regardless of the environment it is embedded within. This greater modularity of asynchronous components is key to design reuse, which is likely to become critical as chip complexities increase to a billion tansistors over the next few years. 2.3 High-Level Asynchronous Synthesis Flow The designs discussed in Section 3 were synthesized and simulated using the Haste/TiDE design flow (formerly Tangram ), a product of Philips/Handshake Solutions [1]. Haste is one of a few mature asynchronous design flows currently available; the toolset focuses on rapid design of custom asynchronous hardware. It targets medium-speed low-power implementations running at or below 400 MHz (in 0.13µm technology). The Haste toolset is essentially a silicon compiler: it accepts specifications written in a high-level hardware description language, and compiles them, via syntax-driven transla- 3

4 tion, into a gate-level circuit. The high-level language is a close variant of the CSP behavioral modeling language [6], which allows complex behaviors to be easily specified in a few lines of code. Figure 3 shows the Haste specification of a single stage of a FIFO. The stage has an input channel IN, through which it receives a packet from its left neighbor, and an output channel OUT, through which it transmits the packet to its right neighbor. Each channel consists of a pari of requestacknowledge wires along with the wires. In the specification, x is an storage variable, which corresponds to the storage element (i.e., latch or flip-flop) of the FIFO stage. Finally, the main construct in the body of the specification is a forever do loop, which specifies the following: (i) read a packet from channel IN and store it into variable x; then (ii) write the value stored in x to the output channel OUT; and (iii) perform this sequence of actions repreatedly, forever. The compiler parses the specification, and syntactically maps each construct onto a predefined library component to generate the hardware implementation, as shown in the bottom of Figure 3. In particular, there is a predefined component that implements the forever do construct: it repeatedly initiates handshakes with its target. Similarly, there is a predefined component that implements sequencing, called the sequencer and denoted by ;. The sequencer, upon receiving a handshake from its parent, performs a handshake with its left child followed by a handshake with its right child. The variable x maps to a storage element. Finally, the read and write operations i.e., read from channel IN and write to x, and subsequently, read from x and write to channel OUT map to redefined components called transferrers, denoted in the figure by. In summary, the compilation approach is quite simple but very powerful: fairly complex algorithms can be easily mapped to hardware. Gate-level implementations for fairly complex designs, such as complete microcontroller, can be generated in as little as a few minutes. Specifications of large systems are naturally decomposed into subsystems or smaller components (i.e., individual procedures). This was a key motivation for the approach of this paper to use the Haste/TiDE synthesis flow: individual scheduling policies can be independently implemented as separate components, which can then be plugged in into a generic hardware template to rapidly implement a scheduling accelerator. 3. Design and Implementation This section introduces several designs of a scheduling accelerator for a real-time packet processing system. A simple FIFO queue and our generalized replacement are described in subsection 3.1. A static scheduler is discussed in subsection 3.2. Dynamic schedulers are presented in subsection 3.3; both rate-monotonic (RM) and earlist-deadline-first Distributor Scheduler Processor Figure 4. General structure (EDF) designs have been implemented. Finally, a multiprocessor configuration using our architecture is proposed in subsection Overview In the following subsections, we explore the design space of asynchronous scheduling by introducing several potential schedulers. Many tradeoffs exist in the domain: algorithm complexity versus ease of design, latency versus throughput, correctness versus effectiveness; any one design may be optimized for one metric but suffer in another. Quantitative results such as latency and throughput are presented in Section 4, but the discussion in this section includes a qualitative comparison of the different designs in terms of their implementation complexity. In order to analyze the effectiveness of the proposed scheduler hardware, a base case is necessary for comparison. Here we use a simple first-in first-out (FIFO) queue as a reference. A FIFO queue essentially models a packet processing system ignoring job prioritization. The length of the queue must be sufficient enough for bursts of activity to be captured without packet loss. Any incoming job may be blocked by up to n jobs, where n is the number of stages in the pipeline. High priority jobs are likely to miss deadlines in such a system. In contrast to the FIFO queue which ignores job priorities, our implementations allow packet prioritization by breaking up the single queue into several priority queues and attaching a scheduler between the queues and the processor. The generalized design is shown in Figure 4. On the left is a distributor that quickly analyzes incoming packets and places them in the appropriate queue. After being routed and traveling through its assigned queue, a job is visible to the scheduler and may be selected for processing. If selected, it is nonpreemptively executed by the processor. The designs in the following subsections implement this general structure. 3.2 Static Scheduling The behavior of a static scheduler is determined prior to execution. For a packet processing system where incoming jobs may enter at arbitrary times, static schedulers such as a cyclic executives are a poor choice. Here we have modified the cyclic executive to allow more dynamic behavior. In our design all priority levels are assigned a share of the processor s time. A schedule is generated that main- 4

5 tains these shares and spaces processor access evenly. The schedule is written to a fast on-chip ROM. At runtime, the scheduler cycles through the ROM, reading from the queue indicated by the schedule. The behavior of this implementation differs from a cyclic executive when a queue scheduled to be read is empty. Rather than block or no-op, the schedule advances until a non-empty queue is read. Note that shares are no longer accurately maintained; this can be remedied in an even more dynamic implementation. In environments where priority queues are generally nonempty, this implementation can be effective. The flaws of the scheduler are apparent when traffic occurs in widelyspaced bursts or when priority assignments have been poorly mapped. Scaling the scheduler to a large priority set has a detrimental effect on performance; as the number of priority levels increases, queues are more likely to be vacant, reducing scheduler quality. 3.3 Dynamic Scheduling Dynamic schedulers react more appropriately to the behavior of a packet processing system. Here we discuss two simple RM schedulers and a more advanced scheduler which handles both RM and EDF RM: Sequential Selection The simple RM scheme mimics a basic software approach to scheduling. When the processor becomes idle, it queries the scheduler for the next available job. The scheduler then checks to see if any jobs are available, starting with the highest priority queue. If a job is available, it is scheduled for execution. If not, the next highest priority queue is checked and the process repeats. The general structure becomes a large series of if-then-else statements. The highest priority job can prevented from executing for the maximum execution time of any lower priority process. Choosing the highest priority job to execute is not a constant-time decision; the time taken by the scheduler is dependent on the priority of the job chosen. However, jobs of high priority take the least amount of time to be selected, leaving jobs of low priority with the greatest overhead. For systems with many priority levels, jobs with low priority can see significant overheads, affecting the throughput of the scheduler. A notable property of this scheduler is the lack of arbitration. All other dynamic implementations rely on mutual exclusion elements to make decisions RM: Parallel Selection One downside of the simple RM scheme just described is its non-constant decision time. To remedy this, we introduce a parallel approach to queue probing, which is analogous to the Distributor Heap Figure 5. Heap Scheduler Processor case construct in software. Instead of sequentially checking each priority queue, all queues can be checked in parallel, yielding a constant-time decision. Parallel decision is performed in Haste by the use of the select (sel) construct. Each guard is evaluated concurrently; if a guard is true, it is executed. When several guards are true we select the highest priority job for execution. Compared to the previous scheduler, parallel RM improves both throughput and latency for almost all jobs. As the number of priority levels increase, this scheduler maintains a good response time. However, jobs using sequential selection will see quicker response for the top several priority levels in large-scale systems RM: Early Selection The previous scheduler can be further parallelized by removing the scheduling decision from the critical path. Instead of the processor querying the scheduler for the next available job after completion, this decision can be made in parallel to execution. Should a higher priority job enter the system after the scheduling decision is made, it will be unable to execute in the next time slot. Using early selection, a job can be blocked by a maximum of two other jobs rather than just one when the scheduler is in the critical path. Gains in overall throughput and latency of low priority jobs are achieved at the expense of high priority jobs EDF and RM: Binary Heap The basic design is shown in Figure 5. Each priority queue empties into a child node of the heap structure. Pairs of nodes are connected by internal nodes recursively up to the root. The root node interacts with the processor, providing the next job scheduled for execution. Operation of the structure is as follows. First, jobs enter the system and are routed to the appropriate queue. For an RM scheme, jobs are placed in the queue corresponding to their priority level. In EDF, the jobs are distributed evenly across the queues to balance the heap. Some time after entering the queue, a job will arrive at a child node of the heap. If the node is ready to accept a new job, it reads the job from the queue. Each node keeps track of its own job and the priority/deadline of its parent s job. Should the new job arriving at a node be of higher priority/earlier deadline than the parent, 5

6 Asynch Priority Level Scheduler Avg Static Sequential Parallel Early Heap Software Table 1. Job latency to processor for each scheduler and priority level the jobs are swapped. In this way, the highest priority job in the system will bubble up to the root node. Since several jobs may be advancing through the system simultaneously, arbitration is necessary at internal nodes. A parent node will arbitrate between two children requesting swaps concurrently. When one swap is accepted, the parent updates each child with the priority of its new job. This design aims for high throughput and high schedulability. Of the designs presented, it is the only one capable of an EDF scheme. Furthermore, the system can be easily designed to select between an EDF and RM scheduling scheme at runtime with little degradation in performance. 3.4 Multiprocessor Configurations Adaptation of software schedulers to a multiprocessor environment can be a complex process. A simple approach is to map certain priority levels to each processor, e.g. even priorities map to one processor, odd priorities to another. The process can be made easier if the job distribution is known beforehand, which may not the case in a packet processor. In contrast, every hardware scheduler discussed here can be easily mapped to a multiprocessor configuration. Between the scheduler and the processors a distributor can be introduced. When a processor is empty, it queries the distributor for the next available job. This design prevents processors from idling when jobs are available. 4. Results 4.1 Experimental Setup Each scheduler was designed, implemented, tested, and simulated using the Haste/TiDE toolflow (formerly Tangram ) from Philips/Handshake Solutions [1], described earlier in subsection 2.3. Jobs were generated to simulate a bursty environment. Latency were measured as the time between the introduction of a job to the system and the time the job becomes available for execution at the processor. The properties of jobs are listed in Table 2. The execution cost of a job was randomized, generally between clock cycles, depending on priority. Over 2500 jobs were executed during a 0.5ms interval for this simulation. For this example, eight priority levels were specified. For packet processing, a typical embedded multimedia processor running at 40MHz was assumed, which implies Job Priority Level Properties Min Execution Time Max Execution Time Percent of All Jobs Table 2. Packet distribution and execution time in cycles a processor cycle of 25ns duration. All latencies reported are in units of this cycle time. The following schedulers were implemented, (i) : simple FIFO queue, (ii) Static: the static scheduler of Section 3.2, (iii) Sequential: a dynamic RM scheduler using a series of sequential if-then-else constructs, (iv) Parallel: a dynamic RM scheduler using a select construct (parallel construct similar to a case statement), (v) Early: the same as Parallel, but the scheduling of a job is done in parallel with the execution of the current job, and (vi) Heap: the heap scheduler from Section In addition, for comparison, a software scheduler was also simulated. The decision time for the software scheduler was assumed to be constant at 20 cycles. This assumption is actually quite favorable to the software scheduler; scheduling decisions in real-world implementations can take hundreds to thousands of cycles. Therefore, the relative gains obtained by our hardware scheduler will in practice be substantially higher than reported here. 4.2 Performance: Scheduling Latencies Table 1 shows the average latency of jobs at each priority level. A lower number in the chart indicates a higher priority level (i.e. level 1 is the highest priority). Latencies listed are in units of the embedded processor s cycle time, and indicate the average time from the moment a job is enqueued to the instant it reaches the processor (but before it is executed). The basic queue, which was used as a baseline, has a latency of around 107 cycles per job. Finally, the software scheduler has an job latency that is much higher: 1363 cycles per job. The table demonstrates the effectiveness of our scheduler hardware in prioritizing jobs. The highest-priority jobs experience significantly shortened latencies from 14 to 31 cycles for dynamic schedulers compared with 52 cycles for the static scheduler and 104 cycles for the simple queue. Since these times include a blocking delay due to nonpreemptive scheduling i.e., the time for which the processor is tied up with the previous job the actual scheduler latencies are therefore even shorter than cycles. A key observation is that all schedulers see a decrease in latency for jobs until priority level 6. With the exception of Early, every dynamic scheduler had an average latency reduction of 86% or more for their highest priority jobs. The Early scheduler showed a latency reduction of 70%. This was at the expense of lower priority jobs, who saw up to a 3x increase in latency. 6

7 Asynch Throughput Area Scheduler (items/µs) (µm) Static Sequential Parallel Early Heap Table 3. Maximum throughput and total area for each scheduler The rightmost column in Table 1 shows the average latency of all tasks in the system. Amongst our scheduler, the worst performer in this category is the static scheduler. Queries to empty queues slowed down the scheduler, giving it an average latency worse than that of the dynamic schedulers. Performing best in this category are the Heap and Parallel schdulers. 4.3 Area and Throughput Overheads The cost of using an hardware scheduler manifests itself primarily in area and throughput. Table 4.3 shows the throughput degradation and area increase due to the additional scheduling logic in hardware. The static scheduler and the sequential scheduler saw a 6% reduction in overall throughput. These schedulers are both limited due to probing of empty queues. Because the scheduling time of Parallel occurs between processing of jobs, throughput is also degraded. Heap saw a less significant decline in throughput because most scheduling occurs in the background, although individual jobs have a longer path from entrance to execution than the basic queue. Early saw no change in throughput, as scheduling occurred during execution. The final column in the table lists the chip area consumed by each of the hardware accelerator implementations. Compared to the size of most embedded processors (e.g., 5 20 mm 2 ) [13], this area overhead (less than 0.03mm 2 ) is quite negligible. 5. Conclusion This paper proposed several asynchronous hardware schedulers as a replacement for software scheduling in real-time packet processing systems. Our results showed improved response times in all cases for high priority tasks. Each scheduler occupied a niche in the design space, achieving high performance in one area at the expense of another (area, throughput, latency). In ongoing and future work, we plan to conduct further analysis and refinement of the scheduler designs. In addition, we are developing performance bounds and heuristics to help choose the optimal scheduler. We also plan to expand our simulations to include multiprocessor implementations. An intriguing extension to this work is to explore the potential for dynamic voltage scaling in a hardware scheduling system. We will extend our scheduling approach to take into account this new dimension. References [1] Handshake Solutions, a Philips subsidiary. [2] Int. Technology Roadmap for Semiconductors. Overall Roadmap Technology Characteristics. itrs.net. [3] C. H. K. v. Berkel, M. B. Josephs, and S. M. Nowick. Scanning the technology: Applications of asynchronous circuits. Proceedings of the IEEE, 87(2): , Feb [4] R. Bhagwan and B. Lin. Fast and scalable priority queue architecture for high-speed network switches. In Proc. IN- FOCOM, pages , [5] A. Davis and S. M. Nowick. An introduction to asynchronous circuit design. In A. Kent and J. G. Williams, editors, The Encyclopedia of Computer Science and Technology, volume 38. Marcel Dekker, New York, Feb [6] C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, [7] J. Kessels, T. Kramer, G. den Besten, A. Peeters, and V. Timm. Applying asynchronous circuits in contactless smart cards. In Proc. Int. Symp. on Advanced Research in Asynchronous Circuits and Systems, pages IEEE Computer Society Press, Apr [8] T. Kurita, S. Iai, and N. Kitawaki. Effects of transmission delay in audiovisual communication. Electronics and Communications in Japan, 77(3):63 74, [9] P. Lavoie and Y. Savaria. A systolic architecture for fast stack sequential decoders. IEEE Trans. on Communications, 42(2 4): , Feb Apr [10] C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 20(1):46 61, [11] S.-W. Moon, J. Rexford, and K. G. Shin. Scalable hardware priority queue architectures for high-speed packet switches. IEEE Transactions on Computers, 49(11): , [12] D. Picker and R. Fellman. A VLSI priority packet queue with inheritance and overwrite. IEEE Transactions on VLSI Systems, 3(2): , [13] S. Segars. The ARM9 Family high performance microprocessors for embedded applications. In Proc. Intl. Conference on Computer Design, pages , [14] K. Toda, K. Nishida, E. Takahashi, N. Michell, and Y. Yamaguchi. Design and implementation of a priority forwarding router chip for real-time interconnection networks. International Journal on Mini and Microcomputers, 17(1):42 51,

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation Kshitij Bhardwaj Dept. of Computer Science Columbia University Steven M. Nowick 2016 ACM/IEEE Design Automation

More information

Section 3 - Backplane Architecture Backplane Designer s Guide

Section 3 - Backplane Architecture Backplane Designer s Guide Section 3 - Backplane Architecture Backplane Designer s Guide March 2002 Revised March 2002 The primary criteria for backplane design are low cost, high speed, and high reliability. To attain these often-conflicting

More information

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC)

FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) FPGA based Design of Low Power Reconfigurable Router for Network on Chip (NoC) D.Udhayasheela, pg student [Communication system],dept.ofece,,as-salam engineering and technology, N.MageshwariAssistant Professor

More information

Chapter 5: ASICs Vs. PLDs

Chapter 5: ASICs Vs. PLDs Chapter 5: ASICs Vs. PLDs 5.1 Introduction A general definition of the term Application Specific Integrated Circuit (ASIC) is virtually every type of chip that is designed to perform a dedicated task.

More information

TIMA Lab. Research Reports

TIMA Lab. Research Reports ISSN 1292-862 TIMA Lab. Research Reports TIMA Laboratory, 46 avenue Félix Viallet, 38000 Grenoble France Session 1.2 - Hop Topics for SoC Design Asynchronous System Design Prof. Marc RENAUDIN TIMA, Grenoble,

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad

NoC Round Table / ESA Sep Asynchronous Three Dimensional Networks on. on Chip. Abbas Sheibanyrad NoC Round Table / ESA Sep. 2009 Asynchronous Three Dimensional Networks on on Chip Frédéric ric PétrotP Outline Three Dimensional Integration Clock Distribution and GALS Paradigm Contribution of the Third

More information

2. REAL-TIME CONTROL SYSTEM AND REAL-TIME NETWORKS

2. REAL-TIME CONTROL SYSTEM AND REAL-TIME NETWORKS 2. REAL-TIME CONTROL SYSTEM AND REAL-TIME NETWORKS 2.1 Real-Time and Control Computer based digital controllers typically have the ability to monitor a number of discrete and analog inputs, perform complex

More information

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141

ECE 637 Integrated VLSI Circuits. Introduction. Introduction EE141 ECE 637 Integrated VLSI Circuits Introduction EE141 1 Introduction Course Details Instructor Mohab Anis; manis@vlsi.uwaterloo.ca Text Digital Integrated Circuits, Jan Rabaey, Prentice Hall, 2 nd edition

More information

Applying the Benefits of Network on a Chip Architecture to FPGA System Design

Applying the Benefits of Network on a Chip Architecture to FPGA System Design white paper Intel FPGA Applying the Benefits of on a Chip Architecture to FPGA System Design Authors Kent Orthner Senior Manager, Software and IP Intel Corporation Table of Contents Abstract...1 Introduction...1

More information

The Design and Implementation of a Low-Latency On-Chip Network

The Design and Implementation of a Low-Latency On-Chip Network The Design and Implementation of a Low-Latency On-Chip Network Robert Mullins 11 th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 24-27 th, 2006, Yokohama, Japan. Introduction Current

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration

MULTIPROCESSORS. Characteristics of Multiprocessors. Interconnection Structures. Interprocessor Arbitration MULTIPROCESSORS Characteristics of Multiprocessors Interconnection Structures Interprocessor Arbitration Interprocessor Communication and Synchronization Cache Coherence 2 Characteristics of Multiprocessors

More information

Switched Network Latency Problems Solved

Switched Network Latency Problems Solved 1 Switched Network Latency Problems Solved A Lightfleet Whitepaper by the Lightfleet Technical Staff Overview The biggest limiter to network performance is the control plane the array of processors and

More information

Design of a System-on-Chip Switched Network and its Design Support Λ

Design of a System-on-Chip Switched Network and its Design Support Λ Design of a System-on-Chip Switched Network and its Design Support Λ Daniel Wiklund y, Dake Liu Dept. of Electrical Engineering Linköping University S-581 83 Linköping, Sweden Abstract As the degree of

More information

Introduction to Real-Time Systems ECE 397-1

Introduction to Real-Time Systems ECE 397-1 Introduction to Real-Time Systems ECE 97-1 Northwestern University Department of Computer Science Department of Electrical and Computer Engineering Teachers: Robert Dick Peter Dinda Office: L477 Tech 8,

More information

Robustness of Multiplexing Protocols for Audio-Visual Services over Wireless Networks

Robustness of Multiplexing Protocols for Audio-Visual Services over Wireless Networks Robustness of Multiplexing Protocols for Audio-Visual Services over Wireless Networks W. S. Lee, M. R. Frater, M. R. Pickering and J. F. Arnold School of Electrical Engineering University College UNSW

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information

Multiprocessor and Real-Time Scheduling. Chapter 10

Multiprocessor and Real-Time Scheduling. Chapter 10 Multiprocessor and Real-Time Scheduling Chapter 10 1 Roadmap Multiprocessor Scheduling Real-Time Scheduling Linux Scheduling Unix SVR4 Scheduling Windows Scheduling Classifications of Multiprocessor Systems

More information

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues D.N. Serpanos and P.I. Antoniadis Department of Computer Science University of Crete Knossos Avenue

More information

TEMPLATE BASED ASYNCHRONOUS DESIGN

TEMPLATE BASED ASYNCHRONOUS DESIGN TEMPLATE BASED ASYNCHRONOUS DESIGN By Recep Ozgur Ozdag A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the

More information

4/6/2011. Informally, scheduling is. Informally, scheduling is. More precisely, Periodic and Aperiodic. Periodic Task. Periodic Task (Contd.

4/6/2011. Informally, scheduling is. Informally, scheduling is. More precisely, Periodic and Aperiodic. Periodic Task. Periodic Task (Contd. So far in CS4271 Functionality analysis Modeling, Model Checking Timing Analysis Software level WCET analysis System level Scheduling methods Today! erformance Validation Systems CS 4271 Lecture 10 Abhik

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Advances in Designing Clockless Digital Systems

Advances in Designing Clockless Digital Systems Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick nowick@cs.columbia.edu Department of Computer Science (and Elect. Eng.) Columbia University New York, NY, USA Introduction l Synchronous

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date:

Subject Name: OPERATING SYSTEMS. Subject Code: 10EC65. Prepared By: Kala H S and Remya R. Department: ECE. Date: Subject Name: OPERATING SYSTEMS Subject Code: 10EC65 Prepared By: Kala H S and Remya R Department: ECE Date: Unit 7 SCHEDULING TOPICS TO BE COVERED Preliminaries Non-preemptive scheduling policies Preemptive

More information

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9

Implementing Scheduling Algorithms. Real-Time and Embedded Systems (M) Lecture 9 Implementing Scheduling Algorithms Real-Time and Embedded Systems (M) Lecture 9 Lecture Outline Implementing real time systems Key concepts and constraints System architectures: Cyclic executive Microkernel

More information

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics

Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics Optimizing Emulator Utilization by Russ Klein, Program Director, Mentor Graphics INTRODUCTION Emulators, like Mentor Graphics Veloce, are able to run designs in RTL orders of magnitude faster than logic

More information

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP

6LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃ7LPHÃIRUÃDÃ6SDFH7LPH $GDSWLYHÃ3URFHVVLQJÃ$OJRULWKPÃRQÃDÃ3DUDOOHOÃ(PEHGGHG 6\VWHP LPXODWLRQÃRIÃWKHÃ&RPPXQLFDWLRQÃLPHÃIRUÃDÃSDFHLPH $GDSWLYHÃURFHVVLQJÃ$OJRULWKPÃRQÃDÃDUDOOHOÃ(PEHGGHG \VWHP Jack M. West and John K. Antonio Department of Computer Science, P.O. Box, Texas Tech University,

More information

Introduction to Asynchronous Circuits and Systems

Introduction to Asynchronous Circuits and Systems RCIM Presentation Introduction to Asynchronous Circuits and Systems Kristofer Perta April 02 / 2004 University of Windsor Computer and Electrical Engineering Dept. Presentation Outline Section - Introduction

More information

Design & Implementation of AHB Interface for SOC Application

Design & Implementation of AHB Interface for SOC Application Design & Implementation of AHB Interface for SOC Application Sangeeta Mangal M. Tech. Scholar Department of Electronics & Communication Pacific University, Udaipur (India) enggsangeetajain@gmail.com Nakul

More information

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling

An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling An Integration of Imprecise Computation Model and Real-Time Voltage and Frequency Scaling Keigo Mizotani, Yusuke Hatori, Yusuke Kumura, Masayoshi Takasu, Hiroyuki Chishiro, and Nobuyuki Yamasaki Graduate

More information

Digital Design Methodology (Revisited) Design Methodology: Big Picture

Digital Design Methodology (Revisited) Design Methodology: Big Picture Digital Design Methodology (Revisited) Design Methodology Design Specification Verification Synthesis Technology Options Full Custom VLSI Standard Cell ASIC FPGA CS 150 Fall 2005 - Lec #25 Design Methodology

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Homework index. Processing resource description. Goals for lecture. Communication resource description. Graph extensions. Problem definition

Homework index. Processing resource description. Goals for lecture. Communication resource description. Graph extensions. Problem definition Introduction to Real-Time Systems ECE 97-1 Homework index 1 Reading assignment.............. 4 Northwestern University Department of Computer Science Department of Electrical and Computer Engineering Teachers:

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION Rapid advances in integrated circuit technology have made it possible to fabricate digital circuits with large number of devices on a single chip. The advantages of integrated circuits

More information

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks

An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks ABSTRACT High end System-on-Chip (SoC) architectures consist of tens of processing engines. These processing engines have varied

More information

Worst-case Ethernet Network Latency for Shaped Sources

Worst-case Ethernet Network Latency for Shaped Sources Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, SMSC 7th October 2005 Contents For 802.3 ResE study group 1 Worst-case latency theorem 1 1.1 Assumptions.............................

More information

Analyzing Real-Time Systems

Analyzing Real-Time Systems Analyzing Real-Time Systems Reference: Burns and Wellings, Real-Time Systems and Programming Languages 17-654/17-754: Analysis of Software Artifacts Jonathan Aldrich Real-Time Systems Definition Any system

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Digital Design Methodology

Digital Design Methodology Digital Design Methodology Prof. Soo-Ik Chae Digital System Designs and Practices Using Verilog HDL and FPGAs @ 2008, John Wiley 1-1 Digital Design Methodology (Added) Design Methodology Design Specification

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

«Computer Science» Requirements for applicants by Innopolis University

«Computer Science» Requirements for applicants by Innopolis University «Computer Science» Requirements for applicants by Innopolis University Contents Architecture and Organization... 2 Digital Logic and Digital Systems... 2 Machine Level Representation of Data... 2 Assembly

More information

Part IV. Chapter 15 - Introduction to MIMD Architectures

Part IV. Chapter 15 - Introduction to MIMD Architectures D. Sima, T. J. Fountain, P. Kacsuk dvanced Computer rchitectures Part IV. Chapter 15 - Introduction to MIMD rchitectures Thread and process-level parallel architectures are typically realised by MIMD (Multiple

More information

Prioritization scheme for QoS in IEEE e WLAN

Prioritization scheme for QoS in IEEE e WLAN Prioritization scheme for QoS in IEEE 802.11e WLAN Yakubu Suleiman Baguda a, Norsheila Fisal b a,b Department of Telematics & Communication Engineering, Faculty of Electrical Engineering Universiti Teknologi

More information

4. Hardware Platform: Real-Time Requirements

4. Hardware Platform: Real-Time Requirements 4. Hardware Platform: Real-Time Requirements Contents: 4.1 Evolution of Microprocessor Architecture 4.2 Performance-Increasing Concepts 4.3 Influences on System Architecture 4.4 A Real-Time Hardware Architecture

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Three DIMENSIONAL-CHIPS

Three DIMENSIONAL-CHIPS IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 4 (Sep-Oct. 2012), PP 22-27 Three DIMENSIONAL-CHIPS 1 Kumar.Keshamoni, 2 Mr. M. Harikrishna

More information

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications

Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications Multi-threading technology and the challenges of meeting performance and power consumption demands for mobile applications September 2013 Navigating between ever-higher performance targets and strict limits

More information

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator).

Von Neumann architecture. The first computers used a single fixed program (like a numeric calculator). Microprocessors Von Neumann architecture The first computers used a single fixed program (like a numeric calculator). To change the program, one has to re-wire, re-structure, or re-design the computer.

More information

Start of Lecture: February 10, Chapter 6: Scheduling

Start of Lecture: February 10, Chapter 6: Scheduling Start of Lecture: February 10, 2014 1 Reminders Exercise 2 due this Wednesday before class Any questions or comments? 2 Scheduling so far First-Come-First Serve FIFO scheduling in queue without preempting

More information

Design of 8 bit Pipelined Adder using Xilinx ISE

Design of 8 bit Pipelined Adder using Xilinx ISE Design of 8 bit Pipelined Adder using Xilinx ISE 1 Jayesh Diwan, 2 Rutul Patel Assistant Professor EEE Department, Indus University, Ahmedabad, India Abstract An asynchronous circuit, or self-timed circuit,

More information

Low-Power FIR Digital Filters Using Residue Arithmetic

Low-Power FIR Digital Filters Using Residue Arithmetic Low-Power FIR Digital Filters Using Residue Arithmetic William L. Freking and Keshab K. Parhi Department of Electrical and Computer Engineering University of Minnesota 200 Union St. S.E. Minneapolis, MN

More information

Design Patterns for Real-Time Computer Music Systems

Design Patterns for Real-Time Computer Music Systems Design Patterns for Real-Time Computer Music Systems Roger B. Dannenberg and Ross Bencina 4 September 2005 This document contains a set of design patterns for real time systems, particularly for computer

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI

OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI CMPE 655- MULTIPLE PROCESSOR SYSTEMS OVERHEADS ENHANCEMENT IN MUTIPLE PROCESSING SYSTEMS BY ANURAG REDDY GANKAT KARTHIK REDDY AKKATI What is MULTI PROCESSING?? Multiprocessing is the coordinated processing

More information

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture Computer and Information Sciences College / Computer Science Department CS 207 D Computer Architecture The Computer Revolution Progress in computer technology Underpinned by Moore s Law Makes novel applications

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

EC EMBEDDED AND REAL TIME SYSTEMS

EC EMBEDDED AND REAL TIME SYSTEMS EC6703 - EMBEDDED AND REAL TIME SYSTEMS Unit I -I INTRODUCTION TO EMBEDDED COMPUTING Part-A (2 Marks) 1. What is an embedded system? An embedded system employs a combination of hardware & software (a computational

More information

Chapter 6 Heaps. Introduction. Heap Model. Heap Implementation

Chapter 6 Heaps. Introduction. Heap Model. Heap Implementation Introduction Chapter 6 Heaps some systems applications require that items be processed in specialized ways printing may not be best to place on a queue some jobs may be more small 1-page jobs should be

More information

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas

FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS. Waqas Akram, Cirrus Logic Inc., Austin, Texas FILTER SYNTHESIS USING FINE-GRAIN DATA-FLOW GRAPHS Waqas Akram, Cirrus Logic Inc., Austin, Texas Abstract: This project is concerned with finding ways to synthesize hardware-efficient digital filters given

More information

A framework for automatic generation of audio processing applications on a dual-core system

A framework for automatic generation of audio processing applications on a dual-core system A framework for automatic generation of audio processing applications on a dual-core system Etienne Cornu, Tina Soltani and Julie Johnson etienne_cornu@amis.com, tina_soltani@amis.com, julie_johnson@amis.com

More information

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing

A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing A 167-processor Computational Array for Highly-Efficient DSP and Embedded Application Processing Dean Truong, Wayne Cheng, Tinoosh Mohsenin, Zhiyi Yu, Toney Jacobson, Gouri Landge, Michael Meeuwsen, Christine

More information

Design and Performance Evaluation of a New Spatial Reuse FireWire Protocol. Master s thesis defense by Vijay Chandramohan

Design and Performance Evaluation of a New Spatial Reuse FireWire Protocol. Master s thesis defense by Vijay Chandramohan Design and Performance Evaluation of a New Spatial Reuse FireWire Protocol Master s thesis defense by Vijay Chandramohan Committee Members: Dr. Christensen (Major Professor) Dr. Labrador Dr. Ranganathan

More information

PROOFS Fault Simulation Algorithm

PROOFS Fault Simulation Algorithm PROOFS Fault Simulation Algorithm Pratap S.Prasad Dept. of Electrical and Computer Engineering Auburn University, Auburn, AL prasaps@auburn.edu Term Paper for ELEC 7250 (Spring 2005) Abstract This paper

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Frequently asked questions from the previous class survey

Frequently asked questions from the previous class survey CS 370: OPERATING SYSTEMS [CPU SCHEDULING] Shrideep Pallickara Computer Science Colorado State University L14.1 Frequently asked questions from the previous class survey Turnstiles: Queue for threads blocked

More information

Implementation of Asynchronous Topology using SAPTL

Implementation of Asynchronous Topology using SAPTL Implementation of Asynchronous Topology using SAPTL NARESH NAGULA *, S. V. DEVIKA **, SK. KHAMURUDDEEN *** *(senior software Engineer & Technical Lead, Xilinx India) ** (Associate Professor, Department

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill

A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE. (Extended Abstract) Gyula A. Mag6. University of North Carolina at Chapel Hill 447 A CELLULAR, LANGUAGE DIRECTED COMPUTER ARCHITECTURE (Extended Abstract) Gyula A. Mag6 University of North Carolina at Chapel Hill Abstract If a VLSI computer architecture is to influence the field

More information

Real-Time Mixed-Criticality Wormhole Networks

Real-Time Mixed-Criticality Wormhole Networks eal-time Mixed-Criticality Wormhole Networks Leandro Soares Indrusiak eal-time Systems Group Department of Computer Science University of York United Kingdom eal-time Systems Group 1 Outline Wormhole Networks

More information

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing

Linköping University Post Print. epuma: a novel embedded parallel DSP platform for predictable computing Linköping University Post Print epuma: a novel embedded parallel DSP platform for predictable computing Jian Wang, Joar Sohl, Olof Kraigher and Dake Liu N.B.: When citing this work, cite the original article.

More information

Revision: August 30, Overview

Revision: August 30, Overview Module 5: Introduction to VHDL Revision: August 30, 2007 Overview Since the first widespread use of CAD tools in the early 1970 s, circuit designers have used both picture-based schematic tools and text-based

More information

Clockless IC Design using Handshake Technology. Ad Peeters

Clockless IC Design using Handshake Technology. Ad Peeters Clockless IC Design using Handshake Technology Ad Peeters Handshake Solutions Philips Electronics Philips Semiconductors Philips Corporate Technologies Philips Medical Systems Lighting,... Philips Research

More information

VLSI Design Automation

VLSI Design Automation VLSI Design Automation IC Products Processors CPU, DSP, Controllers Memory chips RAM, ROM, EEPROM Analog Mobile communication, audio/video processing Programmable PLA, FPGA Embedded systems Used in cars,

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study Bradley F. Dutton, Graduate Student Member, IEEE, and Charles E. Stroud, Fellow, IEEE Dept. of Electrical and Computer Engineering

More information

CSE 332: Data Structures & Parallelism Lecture 3: Priority Queues. Ruth Anderson Winter 2019

CSE 332: Data Structures & Parallelism Lecture 3: Priority Queues. Ruth Anderson Winter 2019 CSE 332: Data Structures & Parallelism Lecture 3: Priority Queues Ruth Anderson Winter 201 Today Finish up Intro to Asymptotic Analysis New ADT! Priority Queues 1/11/201 2 Scenario What is the difference

More information

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo

Real-Time Scalability of Nested Spin Locks. Hiroaki Takada and Ken Sakamura. Faculty of Science, University of Tokyo Real-Time Scalability of Nested Spin Locks Hiroaki Takada and Ken Sakamura Department of Information Science, Faculty of Science, University of Tokyo 7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan Abstract

More information

Event-based tasks give Logix5000 controllers a more effective way of gaining high-speed processing without compromising CPU performance.

Event-based tasks give Logix5000 controllers a more effective way of gaining high-speed processing without compromising CPU performance. Event-based tasks give Logix5000 controllers a more effective way of gaining high-speed processing without compromising CPU performance. Event Tasks Take Controllers to the Next Level Whether it is material

More information

Network Control and Signalling

Network Control and Signalling Network Control and Signalling 1. Introduction 2. Fundamentals and design principles 3. Network architecture and topology 4. Network control and signalling 5. Network components 5.1 links 5.2 switches

More information

The design of a simple asynchronous processor

The design of a simple asynchronous processor The design of a simple asynchronous processor SUN-YEN TAN 1, WEN-TZENG HUANG 2 1 Department of Electronic Engineering National Taipei University of Technology No. 1, Sec. 3, Chung-hsiao E. Rd., Taipei,10608,

More information

CPU scheduling. Alternating sequence of CPU and I/O bursts. P a g e 31

CPU scheduling. Alternating sequence of CPU and I/O bursts. P a g e 31 CPU scheduling CPU scheduling is the basis of multiprogrammed operating systems. By switching the CPU among processes, the operating system can make the computer more productive. In a single-processor

More information

1993 Paper 3 Question 6

1993 Paper 3 Question 6 993 Paper 3 Question 6 Describe the functionality you would expect to find in the file system directory service of a multi-user operating system. [0 marks] Describe two ways in which multiple names for

More information

Synchronization In Digital Systems

Synchronization In Digital Systems 2011 International Conference on Information and Network Technology IPCSIT vol.4 (2011) (2011) IACSIT Press, Singapore Synchronization In Digital Systems Ranjani.M. Narasimhamurthy Lecturer, Dr. Ambedkar

More information

An Efficient Method for Constructing a Distributed Depth-First Search Tree

An Efficient Method for Constructing a Distributed Depth-First Search Tree An Efficient Method for Constructing a Distributed Depth-First Search Tree S. A. M. Makki and George Havas School of Information Technology The University of Queensland Queensland 4072 Australia sam@it.uq.oz.au

More information

P2FS: supporting atomic writes for reliable file system design in PCM storage

P2FS: supporting atomic writes for reliable file system design in PCM storage LETTER IEICE Electronics Express, Vol.11, No.13, 1 6 P2FS: supporting atomic writes for reliable file system design in PCM storage Eunji Lee 1, Kern Koh 2, and Hyokyung Bahn 2a) 1 Department of Software,

More information

Testing Techniques for Ada 95

Testing Techniques for Ada 95 SOFTWARE QUALITY ASSURANCE TOOLS & TECHNOLOGY PROFESSIONAL SERVICES ACADEMY P a g e 1 White Paper Testing Techniques for Ada 95 The Ada language is widely accepted as the language of choice for the implementation

More information

Introduction to Real-Time Communications. Real-Time and Embedded Systems (M) Lecture 15

Introduction to Real-Time Communications. Real-Time and Embedded Systems (M) Lecture 15 Introduction to Real-Time Communications Real-Time and Embedded Systems (M) Lecture 15 Lecture Outline Modelling real-time communications Traffic and network models Properties of networks Throughput, delay

More information

ECE 551 System on Chip Design

ECE 551 System on Chip Design ECE 551 System on Chip Design Introducing Bus Communications Garrett S. Rose Fall 2018 Emerging Applications Requirements Data Flow vs. Processing µp µp Mem Bus DRAMC Core 2 Core N Main Bus µp Core 1 SoCs

More information

IMPLEMENTATION OF A FAST MPEG-2 COMPLIANT HUFFMAN DECODER

IMPLEMENTATION OF A FAST MPEG-2 COMPLIANT HUFFMAN DECODER IMPLEMENTATION OF A FAST MPEG-2 COMPLIANT HUFFMAN ECOER Mikael Karlsson Rudberg (mikaelr@isy.liu.se) and Lars Wanhammar (larsw@isy.liu.se) epartment of Electrical Engineering, Linköping University, S-581

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Fault tolerant scheduling in real time systems

Fault tolerant scheduling in real time systems tolerant scheduling in real time systems Afrin Shafiuddin Department of Electrical and Computer Engineering University of Wisconsin-Madison shafiuddin@wisc.edu Swetha Srinivasan Department of Electrical

More information

Scheduling Algorithms to Minimize Session Delays

Scheduling Algorithms to Minimize Session Delays Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of

More information

Real-Time (Paradigms) (47)

Real-Time (Paradigms) (47) Real-Time (Paradigms) (47) Memory: Memory Access Protocols Tasks competing for exclusive memory access (critical sections, semaphores) become interdependent, a common phenomenon especially in distributed

More information