A Scalable, Cache-Based Queue Management Subsystem for Network Processors
|
|
- Hubert Berry
- 6 years ago
- Views:
Transcription
1 A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar and Patrick Crowley Applied Research Laboratory Department of Computer Science and Engineering Washington University in St. Louis, MO, Abstract Queues are a fundamental data structure in packet processing systems. In this short paper, we propose and discuss a scalable queue management (QM) building block for network processors (NPs). We make two main contributions: 1) we argue qualitatively and quantitatively that caching can be used to improve both best- and worst-case queuing performance, and 2) we describe and discuss our proposal and show that it avoids the failings of existing approaches. We show that our cache-based approach improves worst-case queue operation throughput by a factor of 4 as compared to a cache-less system. We also argue that our approach is more scalable and more efficient than the cache-based mechanism used in Intel s second-generation network processors. I. INTRODUCTION When packets enter or prepare to leave a router, they are typically placed in a queue depending on their source, destination, or type (i.e., protocol or application). The use of packet queues enables systematic allocation of resources such as buffer space and link and switch bandwidth. Queues allow packets to be buffered together according to various criteria. Queuing applications are numerous, and include: per-flow queuing to rate-limit unresponsive flows that do not participate in congestion control [5]; hierarchical queues and scheduling algorithms that provide integrated link sharing, real-time traffic management, and best-effort traffic management [2]; and, the use of multiple queues to approximate fair interconnect scheduling [3]. Since packets must be enqueued on arrival and dequeued upon departure, these operations must occur at line rates. This is a challenge since packet transmission is limited by speed-of-light constraints, while queue operations are typically constrained by external memory speeds. In this paper, we: describe packet queuing in NPbased line cards (Section II); discuss a scalable, cachebased queue management building block that is compatible with existing NP-based SRAM and DRAM architectures (Section III); present a case for using cache-based approaches in queue management (Section IV); and contrast our proposal with the mechanism used in the second generation of Intel IXP [1] network processors (Section V). While our proposal is not the first cache-based queue management approach (e.g., the Intel IXP uses one), it is to the best of our knowledge the first to be discussed in the research literature. This work makes two main contributions: 1) we argue qualitatively and quantitatively that caching can be used to improve both best- and worst-case queuing performance, and 2) we describe and discuss our proposal and show that it avoids the failings of existing approaches. II. PACKET QUEUES IN NETWORK PROCESSORS The organization of an NP-based router line card is shown in Figure 1. There are, of course, variations among line cards, including those that do not use NPs [4], but we do not consider these in this short paper. In both the ingress and egress directions, an NP sits between the switch fabric and the physical interface. NPs are typically organized as highly-integrated chipmultiprocessors. For concreteness, we will use the IXP2800 as an example throughout this paper. It features 16 pipelined processors, called micro-engines (MEs), each of which support 8 thread contexts and zero cycle context switches in hardware. The chip also integrates 4 QDR SRAM controllers and 3 Rambus DRAM controllers, along with many other hardware units unrelated to queuing. In line cards like this, both SRAM and DRAM are used to implement packet queues. Queues and their descriptors are kept in SRAM, while the packets are kept in DRAM. The scheduling discipline is implemented in software on an ME. The framer can be viewed as an integrated device that interfaces one high-bandwidth port (e.g., 10 Gb Ethernet) or many lower-bandwidth ones (e.g., 10x1 Gb Ethernet). The switch fabric is the interconnection path between line cards and, hence, router ports. A. Queue Hierarchies Queue hierarchies are often used to provide packet scheduling and QoS. Many routers use a three-level hierarchy, where the first represents physical ports, the second represents classes of traffic and the last level consists of virtual (output) queues. Note that both the port and class levels are meta-queues in the sense that they contain queues (only the virtual queues contain packets). Each ingress NP maintains a queue for each output port (this avoids head-of-line blocking); each of these output port queues has a number of class queues associated with it (this enables QoS); each of these class queues consists of per-flow virtual output queues (this
2 Framer SRAM DRAM Ingress NP Egress NP SRAM DRAM allows individual flows to be shaped, e.g., unresponsive ones causing congestion). Each incoming packet is enqueued into some virtual queue, and the status for the corresponding class and physical queues are updated to record the activity. A similar sequence occurs when a packet is dequeued from a virtual queue by the scheduler. Scheduling is typically carried out from root to leaf; i.e., first, the port is selected according to the port selection policy, then a class from the selected port is chosen which is followed by a virtual queue selection. It is important to note that one enqueue and one dequeue are expected each packet arrival/departure period. Moreover, since both involve updates to shared queues, serialization can occur. Virtual queues are generally kept in a linked list data structure because packets are generally enqueued and dequeued sequentially from the virtual queue. Port and class queues, however, are kept in a linked list data structure only if the selection policy of class and virtual queue is ring based. Round robin or deficit weighted round robin are examples of ring based selection policies, where next selection is the next link in the ring of active queues. A queue s status needs to be updated for every incoming and outgoing packet, so that scheduling can be carried out efficiently (e.g., a queue s occupancy can influence the schedule). In some architectures, every enqueue and dequeue command, along with the respective queue addresses, are passed to the scheduler, which manages its own local queue status database. This keeps the scheduler from either using stale information or making frequent queue descriptor read requests. B. A Packet Processing Pipeline Fabric Interface Figure 1: NP-based router line card Switch Fabric NPs incorporate multiple processors on a single chip to allow parallel packet processing. Packet processing is typically implemented as a pipeline consisting of multiple processor stages. Whenever a stage makes heavy use of memory (e.g., queue operations), multiple threads are used to hide the latency. The processing pipeline generally consists of the following tasks. Packet assembly- Several interfaces deliver packets in multiplexed frames or cells across different physical ports. Packet classification- Incoming packets are mapped to a queue in each hierarchy. Admission control- Based on the QoS attribute of the queues, such as a maximum size, packets are either admitted or dropped. Packet enqueue- Upon admission, the packet is buffered in the DRAM, and the packet pointers are enqueued to the associated queues. Most architectures buffer the packet in DRAM at the first stage and then deallocate the buffer later if the packet is not admitted. Scheduling and dequeue- The scheduler selects the queues based on the QoS configuration, then a packet is dequeued and transmitted. Data Manipulation, Statistics- A module may perform statistics collection and data manipulation based on the configuration. Packet reordering, segmentation and reassembly may also be performed. C. Queue Operations and Parallelism Both the queue descriptors, consisting of head and tail pointers and the queue length, and the linked lists (implementing the queues) are stored in SRAM. SRAM and DRAM buffers are allocated in pairs, so that the address of the linked list node in SRAM indicates the packet address in DRAM. Thus, the linked lists in SRAM only contain next-pointer addresses. With this structure, every enqueue and dequeue operation involves an external memory read followed by a write operation. Recall that since the access time of external memory requires many processor cycles, multiple threads are used to hide the memory latency. A system can be balanced in this way by adding processors and threads, so long as each thread accesses a different queue. As soon as threads start accessing the same queue, the parallelism is lost, since every queuing operation involves a read followed by write, and the write back is always based on the data that was read. In the worst case, all threads compete for the same queue and progress is serialized, regardless of the number of processors or threads. As we will show, using an on-chip cache for queue descriptors can improve this worst-case performance. III. PROPOSED QUEUING ARCHITECTURE We propose a queuing cache architecture for NPs that scales well with increased numbers of threads and processors. The proposed architecture uses a shared onchip cache of queue descriptors accessible by every ME. The cache ensures that every thread operates on the most recent copy of data and internally manages queue descriptors and the read and write-back operations from external memories. We will later show that our scheme also simplifies the programming of network processors. Our proposal shares two important features with the Intel IXP: 1) it presumes hardware support in the SRAM
3 controller for queue operations, and 2) it uses an on-chip cache to hold in-use queue descriptors. In Section V, we describe Intel s approach in detail and compare it to our own. First we describe our proposal and how it handles packet enqueue and dequeue operations. Then, we discuss the benefits of caching in this context and the instructions needed to implement it. A. Managing Queues with the Queuing Cache m Threads m Threads n Processors (En)Dequeue Due to the speed gap between the NP and its external memory, multiple processors and thread contexts are used to provide higher throughput despite long-latency memory operations. In our architecture, groups of m threads are used on each of n processors. Thus, a total of m*n threads are used to handle enqueus and dequeues. For a given application, these parameters can be chosen as follows. 1. The number of threads on a processor, m, is determined by the ratio of time the program spends waiting for memory references to the time spent using the ALU. For example, a ratio of one implies that two threads could completely hide the latency. 2. The number of processors, n, is determined by the aggregate throughput requirement and the throughput supported by a single processor. However, when the external memory interface bandwidth saturates, adding further processors will have no effect. When threads access different queues, multithreading automatically pipelines the queuing operations at the memory interface and, hence, throughput is only constrained by memory bandwidth and the number of threads. Each thread increases the throughput linearly until the bandwidth at the memory interface saturates. However, when threads access the same queue throughput drops dramatically because a) the queue accesses are all serialized and no longer pipelined, and b) queue descriptors are stored in external memory and hence every enqueue/dequeue incurs the latency of multiple memory references. Our proposed queuing cache eliminates this worst-case problem. We use an on-chip cache which moves the most recent copies of queue descriptors to and from external memory. Every enqueue/dequeue command for a queue passes through the cache, which sends the request to external memory only if the required queue descriptors are not present in the cache (a miss). Upon a hit, the queue descriptor in the cache is updated and the associated links of the queue are updated in the external memory. Queue descriptors are evicted only if a miss occurs while the cache is full. The mapping of a queue ID (i.e., the address of the queue descriptor) to a cache location is handled internally by the queuing cache. A fully associative mapping is feasible in this context, since request arrival is rate-limited by the chipinterconnect (i.e., rather than a processor pipeline stage). The top level architecture using such a scheme is shown in Figure 2. With a queuing cache, queue descriptors need not be accessed from external memory multiple times when many threads access few queues in a small window of time. Thus in a scenario of multiple threads accessing a queue, the queuing operation itself is accelerated. Therefore, even if queue operations from multiple threads are serialized at one queue, the throughput is improved since the serialized operations (queue descriptor read and write) happen on-chip. B. Benefits of Caching The worst-case condition now occurs when an operation misses in the cache and an eviction must be carried out (due either to a conflict or a full cache). Thus the queue descriptor needs to be read, along with an eviction and a write back. To illustrate these ideas, Figure 3 shows a series of queue operations from multiple threads on a single processor (with a single ALU) both with and without a queuing cache. Two scenarios have been shown, a) one when the queuing operations are performed across distinct queues and b) another in which all threads contend for a single queue. It is clear from the figure that, as threads hit a single queue, the performance without a queuing cache degrades rapidly. With a queuing cache, performance remains the same, whether queuing is done on a single queue or on distinct queues. C. Queuing Instructions External memory (Head/Tail/Count) Q0 (0/1/3) Q(q-1) Queuing cache Entry Q0 Q(x-1) Tag Cache Index Links X 2 1 (H/T/C) Figure 2: Top-level queuing cache architecture Given a queue ID (address of the queue descriptor) and a packet ID (head and tail address of the packet to be enqueued), the queuing and linking operations can be performed entirely by the queuing cache. If the ISA provides enqueue and dequeue operations, then the queuing cache can hide the implementation of these operations from the software. This provides a simple programming interface, and saves cycles on the processor. Our proposed enqueue and dequeue
4 instructions with their arguments and return values are shown in Table 1. It may also be desirable to add an instruction that pins a queue descriptor in the cache. Table 1: Enqueue and Dequeue instructions Instruction Arguments Returns Enqueue Queue ID, Head, Null Tail, Count Dequeue Queue ID Buffer/cell ID IV. EFFECT OF QUEUING CACHE Some of the benefits of the queuing cache have been discussed qualitatively in previous sections. In this section, we consider these benefits in greater detail, beginning with a quantitative analysis of performance. A. Significantly higher throughput We now construct an analytical model of throughput. In addition to our previous notation of m threads on each of n processors, we assume the following notation. Total number of threads = T = m*n Read and write latency to cache = c Read latency of external memory = r Burst read throughput = 1/br Write latency to external memory = w Burst write throughput = 1/bw Burst read/write throughput = 1/bwr Thus a single write takes time w and x parallel writes take w+(x-1)*bw units of time. Replace w with r for reads and wr for alternate reads and writes. 1. Throughput without Queuing Cache Consider a multithreaded packet queuing system without any caching. An enqueue operation involves the following steps. 1. Read queue descriptor (head, tail and count). 2. tail and increment the count. 3. Post a write to the external memory to make the old tail point to the new tail. 4. Write back the updated queue. Step 1 takes r units of time, and assume that processing time (in step 2) is P. Steps 3 and 4 can be carried out in parallel, therefore will take time (w+bw). Thus, throughput of a single thread will be 1/(r+P+w+bw). The best case will be when every thread accesses distinct queues. Throughput will increase linearly with the number of threads and will be T/(r+P+w+bw). The worst case arises when all threads operate on the same queue. In this scenario, threads need to operate sequentially and they also need to use some inter thread mechanism to ensure coherent sequential access. In this scenario, the throughput of a single thread will be the total throughput of system and hence will be 1/(r+P+w+bw) regardless of the number of threads. A dequeue operation involves 1. Read queue descriptor (Head, tail and count). 2. Post a read to external memory to get the head node. 3. the head and decrement the count. 4. Write back the queue descriptor to memory. Note that here, none of the above steps can be parallelized; hence, worst case throughput when all threads operate on same queue is 1/(r+r+P+w). This is valid regardless of the number of threads employed. Again the best case throughput will be T/(r+r+P+w). 2. Throughput with Queuing Cache When a queuing cache is used, an enqueue involves 1. Check if queue descriptor is present in cache. Consider worst case, i.e. miss and cache is full 2. Select an entry (LRU) in cache, and post a write to memory (this entry is getting evicted) 3. Post a read request from cache to memory to get the queue descriptor into cache 4. tail and increment count in cache 5. Post a write to external memory to make the old tail point to the new tail Step 1 takes time c. Steps 2 and 3 can be carried out in parallel, therefore take time Max(w,r)+bwr. Step 4 takes time P and Step 5 takes time w. Note that when N threads access the same queue, Steps 2 and 3 can be skipped N-1 times, because the queue descriptor stays in the cache after the first queue access. Hence throughput when N threads access the same queue is X(N) = N/(N*c+Max(w,r)+bwr+N*P+N*bw). Note bw is present instead of w because every thread can perform Step 5 independently and thus will achieve the burst access throughput. Now, when threads access distinct queues, the throughput of a single thread will be 1/(c+Max(w,r)+bwr+P+bw), and since each thread operates independently, for M threads, throughput will be Y(M) = M/(c+Max(w,r)+bwr+P+bw), assuming that M is not enough to saturate the memory and cache bus. A dequeue involves 1. Check if queue descriptor is present in cache. Consider worst case, i.e. miss and cache is full. 2. Select an entry in cache, and post a write to memory 3. Post a read request from cache to the memory to get the queue descriptor 4. Post a read to the memory to get the next pointer of the head 5. head and decrement count in cache
5 Thread 0 Queue A, H, O Queue B, I, P Queue C, J, Q Queue D, K, R Queue E, L Queue F, M Queue G, N Without queuing cache, threads (0, 1, 6) accessing distinct queues (A, B, C, ) Read QD A Read QD B Read QD C Read QD D Read QD E Read QD F Read QD G Write QD A Write QD B Write QD C Read QD H Write QD D Read QD I Write QD E Read QD J Write QD F Read QD K Write QD G Write QD H Read QD L Write QD I Read QD M Write QD J Read QD N Read QD O Write QD K Read QD P Write QD L Write QD M Read QD Q Read QD R Thread 0 Without queuing cache, threads (0, 1, 6) accessing same queue (A) Read QD A Write QD A Read QD A Write QD A Read QD A Thread 0 Queue A, H, O Queue B, I, P Queue C, J, Q Queue D, K, R Queue E, L Queue F, M Queue G, N With queuing cache, threads (0, 1, 6) accessing distinct queues (A, B, C, ) (Assuming that cache size is 7 and it is empty initially) Read QD A in cache Read QD B in cache Read QD C in cache Write backqd * Write backqd * Read QD H in cache Write back QD * Read QD D in cache Write back QD * Read QD E in cache Write back QD * Read QD I in cache Read QD F in cache Write back QD * Read QD J in cache Read QD G in cache Write back QD * Write back QD A Read QD K in cache Write back QD B Read QD L in cache Write back QD C Read QD M in cache Read QD O in cache Write back QD D Read QD N in cache Write back QD E Write backqd F Read QD P in cache Read QD Q in cache Thread 0 With queuing cache, threads (0, 1, 6) accessing same queue (A) Read QD A in cache Figure 3: Time line of queuing with and without queuing cache for same queues and different queues Step 1 takes time c. Steps 2 and 3 can be carried out in parallel and take Max(w,r)+bwr units of time. Step 4 takes r time units. Step 5 takes time P. With N threads accessing same queue, the throughput will be X(N) = N/(N*c+Max(w,r)+bwr+N*r+N*P). N*r has been used instead of N*br because Steps 4 and 5 are carried out sequentially and burst read performance can t be achieved because threads are parsing the same queue (link list). When threads access distinct queues, the throughput of a single thread will be 1/(c+Max(w,r)+bwr+br+P), and for M threads, throughput will be Y(M) = M/(c+Max(w,r)+bwr+br+P), assuming that M is not enough to saturate the memory. For a total T threads and M+N=T, consider the vector <t1, t2, t3,, tt>, with constraint that sum of the tis is T, and ti is the number of threads operating on each distinct queue. A vector like <1, 1,, 1> means that all threads are operating on distinct queues. The aggregate throughput for any given vector will be T i= 1 X ( ti). The minimum of this function will give the worst case throughput. We have plotted this function and found that its minimum is achieved for any vector of type <0,, T,, 0> and is equal to X(T). Y(T) will give the upper bound on throughput. Consider following realistic values for the parameters, w = r = 80 ns, bwr = bw = br = 7.5 ns, c = 20 ns, and P =
6 Million enqueues per second ns. Figure 4 graphs best and worst-case enqueue throughput for these parameters, both with and without caching, over a range of thread counts. The results in the figure assume that memory bandwidth is not exhausted. When T=16 (i.e., 16 threads), the worst case enqueue throughput without any cache is 5.6 million enqueues per second while the worst case enqueue throughput with queuing cache is 23.3 million enqueues per second, an increase by a factor of more than 4. It should be noted that this analysis is based on a static view of the system. Precise analyses of cache-based systems must be benchmarked with traffic for long time periods since the history of the cache state affects performance. However, in our analysis we have considered the worst possible scenario, thus we believe that our worst case results are pessimistic. B. Improved Efficiency In addition to the throughput benefits discussed, our queuing cache proposal improves efficiency in several ways. 1. Reduced On-Chip Communication Best Throughput (With Queuing cache) Best Throughput (Without Queuing cache) Worst Throughput (With Queuing cache) Worst Throughput (Without Queuing cache) Number of threads Figure 4: Throughput of a processor with and without queuing cache versus number of threads Since the queuing cache handles linking, the processors do not ever fetch the queue descriptors. Without this support at the memory interface, all active queuing threads would require substantially more on-chip bandwidth to access to queue descriptors and fetch link nodes. 2. Reduced Instruction Count on Each Processor Since the cache implements the queue operations, the processors do not; this is a direct reduction in the time spent utilizing the ALU. As discussed in Section III.A, reducing the ALU utilization, and hence the ratio of compute to I/O times, increases the utility of additional threads. This leads to better utilization of the processor, and, perhaps, fewer processors needed to handle queue management. 3. Centralized Arbitration for Shared Queues With multithreading, protections must be applied to keep multiple threads from corrupting shared queue state. A queue cache is a logical and efficient mechanism for providing this coherency; far more efficient than using shared locks in memory. V. COMPARISON TO INTEL S IXP NP Intel s second-generation IXP network processors have support for queuing in both the SRAM controller, which holds queue descriptors and implements queue operations, and in the MEs, which support enqueue and dequeue operations. In this section, we describe how queuing works in the Intel IXP2X00 architecture and compare it with our scheme. The IXP architecture ensures high performance queuing in following ways. 1. It implements enqueue and dequeue operations for cell/packet queues in the SRAM controller, and the ME ISA provides enqueue and dequeue I/O instructions. The SRAM controller internally manages the links in the queues and updates the queue length counters. 2. The IXP SRAM controller provides on-chip memory to store queue descriptors, called the Q-array. Descriptors are loaded and evicted by threads; that is, the controller provides memory, but cache management is left to the threads. Individual threads manage evictions and write backs. 3. To keep track of which queue descriptors are stored in the memory controller, and where, each ME is equipped with a 16-entry content-addressable memory (CAM). The queue descriptor address (QID) is used to store a 4-bit Q-array tag entry in the cache. A thread performs a lookup into the CAM, using the QID as a key. A hit returns the desired Q-array entry, while a miss returns the LRU entry which can then be evicted to make room for the desired queue descriptor. Thus, the IXP mechanism keeps the data entries in the SRAM controller and the tag entries in the ME. Since the CAM has 16 entries and each processor has 8 threads, a given ME can support two queue operations at a time in all threads, yielding 16 concurrently active queues. The IXP queuing architecture and the steps involved for an optimized queuing operation are shown in Figure 5. Thus, three mechanisms in the IXP the CAM in every ME, the SRAM Q-array, and the queuing specific instructions together provide effective queuing with good worst-case throughput (when all threads access same queue). This architecture can support enqueues and dequeues at an OC192 rate for a single hierarchy of queues in a single ME (all 8 threads manage two queues each). However, this approach has the following drawbacks not shared by our proposed architecture. A. Unnecessary Instructions and Software Complexity
7 Processor 0 8 Threads Processor 15 8 Threads, CAM Entry Tag Q0 Cache Index External memory In our scheme, cache management is implemented at the memory interface. The IXP requires software on the ME to implement the cache. Transparent cache management is desirable since it simplifies the program and reduces the number of instructions executed when queuing. Software management holds the promise of flexibility, but a fully associative cache, plus the ability to pin queue descriptors to avoid eviction, is ideal in most cases. In our view, it is better to reduce the ALU resources for each thread and improve throughput via additional multithreading. B. Unnecessary Communication between Processors and the Memory Controller For every enqueue and dequeue operation, three distinct commands at three distinct times need to be sent to the memory controller from the ME, namely, a) cache command with queue ID, b) enqueue/dequeue command, and c) write back command. Since the interconnect between the memory controller and the MEs is a bus shared by many units across the chip, reducing such communication is always a benefit. Our proposed scheme, as explained earlier, achieves such a reduction by requiring only one command for each enqueue and dequeue operation. C. Lack of scalability 1 Q15 CAM (Head/Tail/Count) Q0 (0/1/3) Q(q-1) Memory controller (H/T/C) 3b Links X 2 1 Step 1: Select cache ID (queue descriptor location in memory controller) and store the cache ID and queue ID in CAM. Step 2: Cache the queue descriptor (Specific instructions provided) Step 3: Perform enqueue/dequeue (Specific instructions provided) 3c: Links are either made (for enqueue) or parsed (for dequeue) 3b: Cached queue descriptor (head/tail/count) is updated internally Step 4: Write back queue descriptor to memory (Specific instructions) Figure 5: IXP queuing architecture One consequence of using an ME CAM to implement the cache index is that only one ME can access the queues. There is no way to keep distinct CAMs coherent, so on the IXP all queue operations must be generated by a single ME. This limits the number of threads that can be used for queuing, and, hence, the ultimate throughput of the system. The IXP mechanism a can only scale to faster line rates by increasing both the number of threads per ME, and the size of the CAM. However, as noted earlier, increasing the number of threads is problematic since the IXP offloads queue management to the MEs, keeping the ratio of instructions to I/O cycles high. Our scheme, on the other hand, scales naturally by increasing the cache size and utilizing more threads, from any number of processors. Our scheme would, in fact, remove the main need for a CAM in each ME. While the CAM can be used for other cache-related purposes, we suspect it would be dropped from the ME microarchitecture if it were not needed for queuing. As the number of MEs per IXP increases, there may well be increasing pressure to economize. VI. CONCLUSION AND FUTURE WORK In this short paper, we have proposed a cache-based queue management system for network processors. We described the role of queues in packet processing systems, argued qualitatively and quantitatively that caching queue descriptors can provide a significant benefit to queue management, and contrasted our proposal with the queuing mechanism used in Intel s second-generation IXP network processors. In future work, we plan to explore this idea and a few variations in greater detail. For example, in order to support queuing in line cards supporting many 10s of gigabits per second or more, we plan to investigate a cache hierarchy of queuing caches in which each of cluster of MEs shares a first-level queuing cache, which is backed by a shared second-level cache at the memory controller. Additionally, we plan to explore the benefits of similar queue management techniques in highperformance end-hosts. References [1] M. Adiletta, et al. The Next Generation of Intel IXP Network Processors, Intel Technology Journal, vol. 6, no 3, pp. 6-18, Aug [2] J. Bennett and H. Zhang, "Hierarchical packet fair queueing algorithms," IEEE/ACM Transactions on Networking, vol. 5, no. 5, pp , Oct [3] Saleem N. Bhatti and Jon Crowcroft. QoS-Sensitive Flows: Issues in IP Packet Handling, IEEE Internet Computing, vol 4, no. 4, pp , July [4] J-G Chen, et al. Chapter 14 Implementing High-Performance, High- Value Traffic Management Using Agere Network Procecssor Solutoins, in Network Processor Design, volume 2, Morgan Kaufmann, [5] B. Suter, et al. Buffer management schemes for supporting TCP in gigabit routers with per-flow queueing, IEEE Journal on Selected Areas in Communications, vol 17, pp , June 1999.
Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks
Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks Yuhua Chen Jonathan S. Turner Department of Electrical Engineering Department of Computer Science Washington University Washington University
More informationWhite Paper Enabling Quality of Service With Customizable Traffic Managers
White Paper Enabling Quality of Service With Customizable Traffic s Introduction Communications networks are changing dramatically as lines blur between traditional telecom, wireless, and cable networks.
More informationPARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS
THE UNIVERSITY OF NAIROBI DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING FINAL YEAR PROJECT. PROJECT NO. 60 PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS OMARI JAPHETH N. F17/2157/2004 SUPERVISOR:
More informationChapter 1. Introduction
Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic
More informationTopics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,
Topics for Today Network Layer Introduction Addressing Address Resolution Readings Sections 5.1, 5.6.1-5.6.2 1 Network Layer: Introduction A network-wide concern! Transport layer Between two end hosts
More informationResource allocation in networks. Resource Allocation in Networks. Resource allocation
Resource allocation in networks Resource Allocation in Networks Very much like a resource allocation problem in operating systems How is it different? Resources and jobs are different Resources are buffers
More informationConfiguring QoS CHAPTER
CHAPTER 36 This chapter describes how to configure quality of service (QoS) by using automatic QoS (auto-qos) commands or by using standard QoS commands on the Catalyst 3750 switch. With QoS, you can provide
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationA Pipelined Memory Management Algorithm for Distributed Shared Memory Switches
A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches Xike Li, Student Member, IEEE, Itamar Elhanany, Senior Member, IEEE* Abstract The distributed shared memory (DSM) packet switching
More informationScheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router
Scheduling algorithms Scheduling Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Scheduling: choose a packet to transmit over a link among all
More informationConfiguring QoS. Understanding QoS CHAPTER
29 CHAPTER This chapter describes how to configure quality of service (QoS) by using automatic QoS (auto-qos) commands or by using standard QoS commands on the Catalyst 3750 switch. With QoS, you can provide
More informationConfiguring QoS CHAPTER
CHAPTER 37 This chapter describes how to configure quality of service (QoS) by using automatic QoS (auto-qos) commands or by using standard QoS commands on the Catalyst 3750-E or 3560-E switch. With QoS,
More informationA 400Gbps Multi-Core Network Processor
A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,
More informationImplementation of Adaptive Buffer in Video Receivers Using Network Processor IXP 2400
The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009 289 Implementation of Adaptive Buffer in Video Receivers Using Network Processor IXP 2400 Kandasamy Anusuya, Karupagouder
More informationFIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues
FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues D.N. Serpanos and P.I. Antoniadis Department of Computer Science University of Crete Knossos Avenue
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationConfiguring QoS. Finding Feature Information. Prerequisites for QoS
Finding Feature Information, page 1 Prerequisites for QoS, page 1 Restrictions for QoS, page 3 Information About QoS, page 4 How to Configure QoS, page 28 Monitoring Standard QoS, page 80 Configuration
More informationMemory Hierarchy. Slides contents from:
Memory Hierarchy Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing, NPTEL Memory Performance Gap Memory
More informationEP2210 Scheduling. Lecture material:
EP2210 Scheduling Lecture material: Bertsekas, Gallager, 6.1.2. MIT OpenCourseWare, 6.829 A. Parekh, R. Gallager, A generalized Processor Sharing Approach to Flow Control - The Single Node Case, IEEE Infocom
More informationCS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007
CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 Question 344 Points 444 Points Score 1 10 10 2 10 10 3 20 20 4 20 10 5 20 20 6 20 10 7-20 Total: 100 100 Instructions: 1. Question
More informationConfiguring QoS CHAPTER
CHAPTER 34 This chapter describes how to use different methods to configure quality of service (QoS) on the Catalyst 3750 Metro switch. With QoS, you can provide preferential treatment to certain types
More informationAbsolute QoS Differentiation in Optical Burst-Switched Networks
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 22, NO. 9, NOVEMBER 2004 1781 Absolute QoS Differentiation in Optical Burst-Switched Networks Qiong Zhang, Student Member, IEEE, Vinod M. Vokkarane,
More informationRouter Architectures
Router Architectures Venkat Padmanabhan Microsoft Research 13 April 2001 Venkat Padmanabhan 1 Outline Router architecture overview 50 Gbps multi-gigabit router (Partridge et al.) Technology trends Venkat
More informationCaches. Hiding Memory Access Times
Caches Hiding Memory Access Times PC Instruction Memory 4 M U X Registers Sign Ext M U X Sh L 2 Data Memory M U X C O N T R O L ALU CTL INSTRUCTION FETCH INSTR DECODE REG FETCH EXECUTE/ ADDRESS CALC MEMORY
More informationBasic Low Level Concepts
Course Outline Basic Low Level Concepts Case Studies Operation through multiple switches: Topologies & Routing v Direct, indirect, regular, irregular Formal models and analysis for deadlock and livelock
More informationMemory. Objectives. Introduction. 6.2 Types of Memory
Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts
More informationModule 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth
Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012
More informationConfiguring QoS. Finding Feature Information. Prerequisites for QoS. General QoS Guidelines
Finding Feature Information, on page 1 Prerequisites for QoS, on page 1 Restrictions for QoS, on page 2 Information About QoS, on page 2 How to Configure QoS, on page 10 Monitoring Standard QoS, on page
More informationCHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS
28 CHAPTER 3 EFFECTIVE ADMISSION CONTROL MECHANISM IN WIRELESS MESH NETWORKS Introduction Measurement-based scheme, that constantly monitors the network, will incorporate the current network state in the
More informationQuality of Service (QoS)
Quality of Service (QoS) The Internet was originally designed for best-effort service without guarantee of predictable performance. Best-effort service is often sufficient for a traffic that is not sensitive
More informationECE/CS 757: Homework 1
ECE/CS 757: Homework 1 Cores and Multithreading 1. A CPU designer has to decide whether or not to add a new micoarchitecture enhancement to improve performance (ignoring power costs) of a block (coarse-grain)
More informationCSE398: Network Systems Design
CSE398: Network Systems Design Instructor: Dr. Liang Cheng Department of Computer Science and Engineering P.C. Rossin College of Engineering & Applied Science Lehigh University April 04, 2005 Outline Recap
More informationMQC Hierarchical Queuing with 3 Level Scheduler
MQC Hierarchical Queuing with 3 Level Scheduler The MQC Hierarchical Queuing with 3 Level Scheduler feature provides a flexible packet scheduling and queuing system in which you can specify how excess
More informationOptimizing Memory Bandwidth of a Multi-Channel Packet Buffer
Optimizing Memory Bandwidth of a Multi-Channel Packet Buffer Sarang Dharmapurikar Sailesh Kumar John Lockwood Patrick Crowley Dept. of Computer Science and Engg. Washington University in St. Louis, USA.
More informationHierarchically Aggregated Fair Queueing (HAFQ) for Per-flow Fair Bandwidth Allocation in High Speed Networks
Hierarchically Aggregated Fair Queueing () for Per-flow Fair Bandwidth Allocation in High Speed Networks Ichinoshin Maki, Hideyuki Shimonishi, Tutomu Murase, Masayuki Murata, Hideo Miyahara Graduate School
More informationReal-Time Protocol (RTP)
Real-Time Protocol (RTP) Provides standard packet format for real-time application Typically runs over UDP Specifies header fields below Payload Type: 7 bits, providing 128 possible different types of
More informationAn Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks
An Analysis of Blocking vs Non-Blocking Flow Control in On-Chip Networks ABSTRACT High end System-on-Chip (SoC) architectures consist of tens of processing engines. These processing engines have varied
More informationChapter 7 The Potential of Special-Purpose Hardware
Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture
More informationLS Example 5 3 C 5 A 1 D
Lecture 10 LS Example 5 2 B 3 C 5 1 A 1 D 2 3 1 1 E 2 F G Itrn M B Path C Path D Path E Path F Path G Path 1 {A} 2 A-B 5 A-C 1 A-D Inf. Inf. 1 A-G 2 {A,D} 2 A-B 4 A-D-C 1 A-D 2 A-D-E Inf. 1 A-G 3 {A,D,G}
More informationPage 1. Multilevel Memories (Improving performance using a little cash )
Page 1 Multilevel Memories (Improving performance using a little cash ) 1 Page 2 CPU-Memory Bottleneck CPU Memory Performance of high-speed computers is usually limited by memory bandwidth & latency Latency
More informationNetwork Processors. Nevin Heintze Agere Systems
Network Processors Nevin Heintze Agere Systems Network Processors What are the packaging challenges for NPs? Caveat: I know very little about packaging. Network Processors What are the packaging challenges
More informationComputer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM
Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic
More informationCSE 123A Computer Networks
CSE 123A Computer Networks Winter 2005 Lecture 8: IP Router Design Many portions courtesy Nick McKeown Overview Router basics Interconnection architecture Input Queuing Output Queuing Virtual output Queuing
More informationChapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction
Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.
More informationCongestion in Data Networks. Congestion in Data Networks
Congestion in Data Networks CS420/520 Axel Krings 1 Congestion in Data Networks What is Congestion? Congestion occurs when the number of packets being transmitted through the network approaches the packet
More informationCommercial Network Processors
Commercial Network Processors ECE 697J December 5 th, 2002 ECE 697J 1 AMCC np7250 Network Processor Presenter: Jinghua Hu ECE 697J 2 AMCC np7250 Released in April 2001 Packet and cell processing Full-duplex
More informationComputer Science 432/563 Operating Systems The College of Saint Rose Spring Topic Notes: Memory Hierarchy
Computer Science 432/563 Operating Systems The College of Saint Rose Spring 2016 Topic Notes: Memory Hierarchy We will revisit a topic now that cuts across systems classes: memory hierarchies. We often
More informationNetwork Design Considerations for Grid Computing
Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom
More informationFPX Architecture for a Dynamically Extensible Router
FPX Architecture for a Dynamically Extensible Router Alex Chandra, Yuhua Chen, John Lockwood, Sarang Dharmapurikar, Wenjing Tang, David Taylor, Jon Turner http://www.arl.wustl.edu/arl Dynamically Extensible
More informationCS 152 Computer Architecture and Engineering. Lecture 7 - Memory Hierarchy-II
CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste
More informationDesign and Evaluation of Diffserv Functionalities in the MPLS Edge Router Architecture
Design and Evaluation of Diffserv Functionalities in the MPLS Edge Router Architecture Wei-Chu Lai, Kuo-Ching Wu, and Ting-Chao Hou* Center for Telecommunication Research and Department of Electrical Engineering
More informationBefore configuring standard QoS, you must have a thorough understanding of these items: Standard QoS concepts.
Prerequisites for Quality of Service, on page 1 QoS Components, on page 2 QoS Terminology, on page 2 Information About QoS, on page 3 QoS Implementation, on page 4 QoS Wired Model, on page 8 Classification,
More informationMemory hierarchy review. ECE 154B Dmitri Strukov
Memory hierarchy review ECE 154B Dmitri Strukov Outline Cache motivation Cache basics Six basic optimizations Virtual memory Cache performance Opteron example Processor-DRAM gap in latency Q1. How to deal
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationNetwork Interface Architecture and Prototyping for Chip and Cluster Multiprocessors
University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors
More informationIV. PACKET SWITCH ARCHITECTURES
IV. PACKET SWITCH ARCHITECTURES (a) General Concept - as packet arrives at switch, destination (and possibly source) field in packet header is used as index into routing tables specifying next switch in
More informationLecture 5: Performance Analysis I
CS 6323 : Modeling and Inference Lecture 5: Performance Analysis I Prof. Gregory Provan Department of Computer Science University College Cork Slides: Based on M. Yin (Performability Analysis) Overview
More informationThe Network Layer and Routers
The Network Layer and Routers Daniel Zappala CS 460 Computer Networking Brigham Young University 2/18 Network Layer deliver packets from sending host to receiving host must be on every host, router in
More informationLecture 9. Quality of Service in ad hoc wireless networks
Lecture 9 Quality of Service in ad hoc wireless networks Yevgeni Koucheryavy Department of Communications Engineering Tampere University of Technology yk@cs.tut.fi Lectured by Jakub Jakubiak QoS statement
More informationBefore configuring standard QoS, you must have a thorough understanding of these items:
Finding Feature Information, page 1 Prerequisites for QoS, page 1 QoS Components, page 2 QoS Terminology, page 3 Information About QoS, page 3 Restrictions for QoS on Wired Targets, page 41 Restrictions
More informationPERFORMANCE ANALYSIS OF AF IN CONSIDERING LINK UTILISATION BY SIMULATION WITH DROP-TAIL
I.J.E.M.S., VOL.2 (4) 2011: 221-228 ISSN 2229-600X PERFORMANCE ANALYSIS OF AF IN CONSIDERING LINK UTILISATION BY SIMULATION WITH DROP-TAIL Jai Kumar, Jaiswal Umesh Chandra Department of Computer Science
More informationMemory Systems IRAM. Principle of IRAM
Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several
More informationShow Me the $... Performance And Caches
Show Me the $... Performance And Caches 1 CPU-Cache Interaction (5-stage pipeline) PCen 0x4 Add bubble PC addr inst hit? Primary Instruction Cache IR D To Memory Control Decode, Register Fetch E A B MD1
More informationAN 831: Intel FPGA SDK for OpenCL
AN 831: Intel FPGA SDK for OpenCL Host Pipelined Multithread Subscribe Send Feedback Latest document on the web: PDF HTML Contents Contents 1 Intel FPGA SDK for OpenCL Host Pipelined Multithread...3 1.1
More informationReview on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors. By: Anvesh Polepalli Raj Muchhala
Review on ichat: Inter Cache Hardware Assistant Data Transfer for Heterogeneous Chip Multiprocessors By: Anvesh Polepalli Raj Muchhala Introduction Integrating CPU and GPU into a single chip for performance
More informationOverview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services
Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling
More informationAddendum to Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches
Addendum to Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches Gabriel H. Loh Mark D. Hill AMD Research Department of Computer Sciences Advanced Micro Devices, Inc. gabe.loh@amd.com
More informationNetwork Model for Delay-Sensitive Traffic
Traffic Scheduling Network Model for Delay-Sensitive Traffic Source Switch Switch Destination Flow Shaper Policer (optional) Scheduler + optional shaper Policer (optional) Scheduler + optional shaper cfla.
More informationOptimizing Performance: Intel Network Adapters User Guide
Optimizing Performance: Intel Network Adapters User Guide Network Optimization Types When optimizing network adapter parameters (NIC), the user typically considers one of the following three conditions
More informationPerformance of Multihop Communications Using Logical Topologies on Optical Torus Networks
Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,
More informationEECS 570 Final Exam - SOLUTIONS Winter 2015
EECS 570 Final Exam - SOLUTIONS Winter 2015 Name: unique name: Sign the honor code: I have neither given nor received aid on this exam nor observed anyone else doing so. Scores: # Points 1 / 21 2 / 32
More informationAdvanced Computer Networks
Advanced Computer Networks QoS in IP networks Prof. Andrzej Duda duda@imag.fr Contents QoS principles Traffic shaping leaky bucket token bucket Scheduling FIFO Fair queueing RED IntServ DiffServ http://duda.imag.fr
More informationToward a Reliable Data Transport Architecture for Optical Burst-Switched Networks
Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Dr. Vinod Vokkarane Assistant Professor, Computer and Information Science Co-Director, Advanced Computer Networks Lab University
More informationLecture 17: Router Design
Lecture 17: Router Design CSE 123: Computer Networks Alex C. Snoeren Eample courtesy Mike Freedman Lecture 17 Overview Finish up BGP relationships Router internals Buffering Scheduling 2 Peer-to-Peer Relationship
More informationInterconnection Networks
Lecture 17: Interconnection Networks Parallel Computer Architecture and Programming A comment on web site comments It is okay to make a comment on a slide/topic that has already been commented on. In fact
More informationLecture 13: Interconnection Networks. Topics: lots of background, recent innovations for power and performance
Lecture 13: Interconnection Networks Topics: lots of background, recent innovations for power and performance 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees,
More informationWireless Networks (CSC-7602) Lecture 8 (15 Oct. 2007)
Wireless Networks (CSC-7602) Lecture 8 (15 Oct. 2007) Seung-Jong Park (Jay) http://www.csc.lsu.edu/~sjpark 1 Today Wireline Fair Schedulling Why? Ideal algorithm Practical algorithms Wireless Fair Scheduling
More informationWhat Is Congestion? Computer Networks. Ideal Network Utilization. Interaction of Queues
168 430 Computer Networks Chapter 13 Congestion in Data Networks What Is Congestion? Congestion occurs when the number of packets being transmitted through the network approaches the packet handling capacity
More informationRouters with a Single Stage of Buffering * Sigcomm Paper Number: 342, Total Pages: 14
Routers with a Single Stage of Buffering * Sigcomm Paper Number: 342, Total Pages: 14 Abstract -- Most high performance routers today use combined input and output queueing (CIOQ). The CIOQ router is also
More information440GX Application Note
Overview of TCP/IP Acceleration Hardware January 22, 2008 Introduction Modern interconnect technology offers Gigabit/second (Gb/s) speed that has shifted the bottleneck in communication from the physical
More informationMulti-gigabit Switching and Routing
Multi-gigabit Switching and Routing Gignet 97 Europe: June 12, 1997. Nick McKeown Assistant Professor of Electrical Engineering and Computer Science nickm@ee.stanford.edu http://ee.stanford.edu/~nickm
More informationTopic 4b: QoS Principles. Chapter 9 Multimedia Networking. Computer Networking: A Top Down Approach
Topic 4b: QoS Principles Chapter 9 Computer Networking: A Top Down Approach 7 th edition Jim Kurose, Keith Ross Pearson/Addison Wesley April 2016 9-1 Providing multiple classes of service thus far: making
More informationAssignment 7: TCP and Congestion Control Due the week of October 29/30, 2015
Assignment 7: TCP and Congestion Control Due the week of October 29/30, 2015 I d like to complete our exploration of TCP by taking a close look at the topic of congestion control in TCP. To prepare for
More informationSlide Set 9. for ENCM 369 Winter 2018 Section 01. Steve Norman, PhD, PEng
Slide Set 9 for ENCM 369 Winter 2018 Section 01 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary March 2018 ENCM 369 Winter 2018 Section 01
More informationDiffServ Architecture: Impact of scheduling on QoS
DiffServ Architecture: Impact of scheduling on QoS Abstract: Scheduling is one of the most important components in providing a differentiated service at the routers. Due to the varying traffic characteristics
More informationQuality of Service in the Internet
Quality of Service in the Internet Problem today: IP is packet switched, therefore no guarantees on a transmission is given (throughput, transmission delay, ): the Internet transmits data Best Effort But:
More informationQuality of Service Mechanism for MANET using Linux Semra Gulder, Mathieu Déziel
Quality of Service Mechanism for MANET using Linux Semra Gulder, Mathieu Déziel Semra.gulder@crc.ca, mathieu.deziel@crc.ca Abstract: This paper describes a QoS mechanism suitable for Mobile Ad Hoc Networks
More information,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics
,e-pg PATHSHALA- Computer Science Computer Architecture Module 25 Memory Hierarchy Design - Basics The objectives of this module are to discuss about the need for a hierarchical memory system and also
More informationFair Adaptive Bandwidth Allocation: A Rate Control Based Active Queue Management Discipline
Fair Adaptive Bandwidth Allocation: A Rate Control Based Active Queue Management Discipline Abhinav Kamra, Huzur Saran, Sandeep Sen, and Rajeev Shorey Department of Computer Science and Engineering, Indian
More informationCisco Series Internet Router Architecture: Packet Switching
Cisco 12000 Series Internet Router Architecture: Packet Switching Document ID: 47320 Contents Introduction Prerequisites Requirements Components Used Conventions Background Information Packet Switching:
More informationMultilevel Memories. Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology
1 Multilevel Memories Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind CPU-Memory Bottleneck 6.823
More informationRD-TCP: Reorder Detecting TCP
RD-TCP: Reorder Detecting TCP Arjuna Sathiaseelan and Tomasz Radzik Department of Computer Science, King s College London, Strand, London WC2R 2LS {arjuna,radzik}@dcs.kcl.ac.uk Abstract. Numerous studies
More informationComputer Architecture A Quantitative Approach, Fifth Edition. Chapter 2. Memory Hierarchy Design. Copyright 2012, Elsevier Inc. All rights reserved.
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design 1 Introduction Programmers want unlimited amounts of memory with low latency Fast memory technology is more
More informationWorst-case Ethernet Network Latency for Shaped Sources
Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, SMSC 7th October 2005 Contents For 802.3 ResE study group 1 Worst-case latency theorem 1 1.1 Assumptions.............................
More informationAn introduction to SDRAM and memory controllers. 5kk73
An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part
More informationUnit 2 Packet Switching Networks - II
Unit 2 Packet Switching Networks - II Dijkstra Algorithm: Finding shortest path Algorithm for finding shortest paths N: set of nodes for which shortest path already found Initialization: (Start with source
More informationLecture 24: Scheduling and QoS
Lecture 24: Scheduling and QoS CSE 123: Computer Networks Alex C. Snoeren HW 4 due Wednesday Lecture 24 Overview Scheduling (Weighted) Fair Queuing Quality of Service basics Integrated Services Differentiated
More informationMulti-Processor / Parallel Processing
Parallel Processing: Multi-Processor / Parallel Processing Originally, the computer has been viewed as a sequential machine. Most computer programming languages require the programmer to specify algorithms
More informationQueuing. Congestion Control and Resource Allocation. Resource Allocation Evaluation Criteria. Resource allocation Drop disciplines Queuing disciplines
Resource allocation Drop disciplines Queuing disciplines Queuing 1 Congestion Control and Resource Allocation Handle congestion if and when it happens TCP Congestion Control Allocate resources to avoid
More informationInternet QoS 1. Integrated Service 2. Differentiated Service 3. Linux Traffic Control
Internet QoS 1. Integrated Service 2. Differentiated Service 3. Linux Traffic Control weafon 2001/9/27 Concept of IntServ Network A flow is the basic management unit Supporting accurate quality control.
More information