ignored Virtual L1 Cache HAC

Size: px
Start display at page:

Download "ignored Virtual L1 Cache HAC"

Transcription

1 Suez: A Cluster-Based Scalable Real-Time Packet Router Tzi-cker Chiueh? Prashant Pradhan? Computer Science Division, EECS University of California at Berkeley Berkeley, CA Computer Science Department State University of New York at Stony Brook Stony Brook, NY chiueh@cs.berkeley.edu Abstract Suez is a high-performance real-time packet router that supports fast best-eort packet routing and scalable QoS-guaranteed packet scheduling, and is built on a hardware platform consisting of a cluster of commodity PCs connected by a gigabit/sec system area network. The major goal of the Suez project is to demonstrate that the PC cluster architecture can be as cost-eective a platform for high-performance network packet routing as for parallel computing. Suez features a cache-conscious routing-table search algorithm that exploits CPU caching hardware for fast lookup by treating IP addresses directly as virtual addresses. To scale the number of real-time connections supportable with the link speed, Suez implements a xed-granularity uid fair queuing (FGFFQ) algorithm that completely eliminates the per-packet sorting overhead associated with conventional weighted fair queuing algorithms. With the use of general-purpose CPU as the main processing engine, Suez is inherently a programmable network device that could be dynamically extended with functionalities important to end user applications or to the eciency of the entire network. This paper describes in detail the architectural features of Suez, and reports the performance measurements of the initial Suez prototype, which is built on four Pentium-II 400MHz machines and Myrinet, running the Linux operating system. Keywords: Routing Table Lookup, Packet Scheduling, Ecient Data Movement, Cluster Computing, Extensible Systems, IP Router Technical Areas: Clustered Architecture, Internet Computing, Distributed Operating Systems

2 1 Introduction High-performance packet routers that support guaranteed Quality of Services (QoS) are a crucial building block for next-generation Internet. Given the exponentially growing Internet trac, the increasing number of distributed multimedia applications, and the constant request for new protocols to accommodate emerging network applications, next-generation Internet routers should satisfy the following requirements: Non-real-time network packets should be delivered at the wire speed. Reservations for QoS from real-time connections should be guaranteed regardless of the trac loads. A exible service extension mechanism should be provided to extend the core router services in an ecient and safe way. Although a large amount of research eorts from the industry have been invested in building highperformance real-time packet routers, all of these projects, when it comes to actual implementations, are based on proprietary, custom-designed hardware. The goal of the Suez project is to demonstrate that just as PC clusters are a cost-eective platform for parallel computing, it is technically feasible to build high-performance real-time packet routers with optimized software running on commodity PC hardware. A packet router typically consists of the following four components: link controllers, buer memory, a control processor, and the switching backplane. A Suez router is comprised of multiple Pentium-II PCs, connected through a scalable Myrinet switch through full-duplex 1.2 Gbits/sec links. One of the PCs is a dedicated control processor, handling all administrative functions such as real-time connection establishment through RSVP, routing table updates, as well as network monitoring and management, etc. Each of the remaining PCs, in addition to being connected to the Myrinet switch, also connects to the outside world, and thus plays the role of input/output link controllers that buer incoming packets, perform routing-table lookup or real-time packet classication, forward packets to the corresponding output ports, and schedule packets to ensure their respective QoS requirements be met. The main memory on the PCs buers input trac before forwarding decisions are made, typically in the form of per-connection output queues. Suez's routing backplane is the Myrinet switch, which boasts a 10 Gbits/sec switching bus. When a packet arrives at one of the input ports of a network router, it is rst buered, and then forwarded to the corresponding output port cross the router backplane, buered at the output queue and eventually scheduled to be put on the output link for transmission. Accordingly, Suez features two novel technologies to streamline this process, both designed to exploit to the fullest extent the raw hardware capabilities of the underlying general-purpose CPU and I/O subsystem. To speed up routing-table lookup, which is mapped to a longest prex match problem, Suez takes advantage of the cache memory hierarchy in modern PC hardware for aggressive host address caching, and recasts the routing-table search algorithm into a series of direct table indexing steps, again exploiting processor cache to trade memory for performance. To improve the scalability of real-time packet scheduling with increasing link speeds, Suez employs a new scheduling algorithm called Fixed Granularity Fluid Fair Queuing (FGFFQ), which relies on a special link-layer multiplexing/demultiplexing protocol to emulate FFQ at a xed granularity. FGFFQ eliminates the 1

3 overhead of packetized weighted fair queuing (WFQ), particularly, virtual/physical time mapping and priority sorting. In addition, FGFFQ provides a unied framework for both work-conserving and non-work-conserving disciplines, and supports decoupling of bandwidth and delay guarantees. As new distributed applications emerge, more and more intelligence is required to be embedded into the network. Instead of hard-coding these advanced functionalities directly to network devices, a more general solution is to provide an open platform on which new protocols can be easily developed and quickly deployed. Suez's underlying operating system, in addition to being optimized for a high-performance real-time packet router, also supports a exible programming interface that allows extension of the core protocols in a safe and ecient manner. Specically, Suez exposes the protocol structure of the network subsystem so that user-level extensions could be attached to the existing protocol graph. A protection mechanism based on segmentation hardware is used to ensure that user-level extensions be sandboxed within a separate area from the kernel without resorting to separate processes or in-line checks on every store instruction. Finally, Suez employs a unied data buer management mechanism for dierent modules within the kernel and for user extensions, so that no memory-to-memory copying for the packet payload is ever required. Due to space constraints, the special OS support of Suez is not covered in this paper. The rest of this paper is organized as follows. In Section 2, we review previous and current eorts in building gigabit routers, and compare their approaches with ours. In Section 3 and 4, we describe the CPU-cache conscious routing-table lookup and caching algorithm, and the FGFFQ packet scheduling algorithm, respectively. Section 5 and 6 details the current Suez prototype implementation and the results of a performance study. Section 7 concludes this paper with the current implementation status of the Suez router, and its on-going plan for further improvements. 2 Related Work BBN's MGR project produced [1] a high-performance router that achieves 32 million packets per second IP forwarding rate using a 50 Gbits/sec backplane over 16 ports. Although MGR's forwarding engine is based on a general-purpose processor, DEC Alpha 21164, its backplane and QoS processors are still based on proprietary designs. Washington University's IP over ATM project [4] takes a similar approach as in IP switching [3] that sets up an ATM connection for long-term IP connections. The work was similar to Suez in that the link-layer transmission are based on xed-sized data units, which greatly simplies real-time packet scheduling. The scalable IP router project in Michigan State University [2] is based on the same hardware/software platform as Suez. However, that project's focus is on parallelizing route computation to address route instability problems rather than fast routing-table lookup or QoS provisions. Similar to IP switching, Suez employs o-the-shelf high-performance interconnects for packet forwarding, except the switching platform is a system area network rather than an ATM switch. In essence, the Suez project pushes the notion of software-based router to the extreme by completely avoiding custom hardware, and aims to demonstrate that the proposed approach can gain the best of both worlds: excellent cost eectiveness in performance and exible extensibility. In fast routing-table lookup area, Waldvogel et al. [7] developed a lookup scheme using multiple hash tables, each based on a distinct prex length. The worst-case lookup time is shown to be log 2 (No: of address bits). This work also introduced a Mutating Binary Search technique to 2

4 further reduce the required number of hash table lookups. Degermark et al. [8] developed a compact routing table representation to ensure the entire representation be t within typical L2 caches. They estimated each lookup can be completed within 100 instructions using 8 memory references. Gupta et al. [9] proposed a similar routing-table lookup scheme, which assumes two instead of three levels, and focused more on ecient route updates than lookup. Compared to previous schemes, Suez's routing table organization is much simpler and the lookup operation is thus more ecient. None of the previous works exploited the CPU cache available in modern processors, as in Suez. In addition, we believe Suez's routing table lookup scheme is the rst that integrates routing table caching and searching in a single algorithm. In the real-time packet scheduling area, WFQ was discussed in [10] and an implementation and simulation study was reported in [11], which discussed the performance cost of priority sorting required in WFQ but mainly considered insertion costs to the underlying data structures. Virtual Clock [13] assigns virtual service times independent of the system load by attempting to emulate a static TDM system. Hence, the priority calculation in Virtual Clock is simple. However, as shown in [12], Virtual Clock can punish packets for over-using the link during underload even if they are not aecting the guarantees of other active connections later. Work in [14] attempts to do away with the virtual time computation overhead by using the nish time of the packet currently in service to approximate the current virtual time, at the expense of looser delay bounds. [15] uses a similar idea and uses the start tag of the packet currently in service to approximate the current virtual time. This does away with the large delay bounds of SCFQ. However, the sorting overhead is still there. None of these works, except maybe [11], focused on how to scale the packet scheduling techniques to terabit/sec link speed or million connections per output link, which is one of the major research focuses of the Suez project. 3 Cache-Conscious Routing-Table Lookup Algorithm Because Suez is built on general-purpose CPUs, its routing table lookup algorithm [26] heavily leverages o the cache memory hierarchy. The algorithm is based on two data structures, a destination host address cache (HAC), and a destination network address routing table (NART). Both are designed to use CPU cache eciently. The algorithm rst looks up the HAC to check whether the given IP address is cached in the HAC because it has been seen recently. If so, the lookup succeeds and the corresponding output port is used to route the associated packet. If not, the algorithm further consults the NART to complete the lookup. 3.1 Host Address Caching To explain how the HAC lookup algorithm works, let's make the following assumptions. Assume the L1 data cache is direct-mapped and physically addressed, has 16 KBytes in total and 32-byte blocks. Also assume that the virtual memory page size is 4 KBytes, and each HAC entry is 4 bytes wide, containing a 23-bit tag, one unused bit, and an 8-bit output port identier. Therefore each cache block contains at most 8 HAC entries. Finally, assume one quarter of the L1 cache, i.e., 128 out of 512 cache sets, is reserved for HAC. 3

5 Destination bits [11, 5] IP Address (DA) bits [1, 0] bits [31, 12] bits [4, 2] ignored 01 Virtual Address Physical Address 32 bytes Address Translation 0 0 Software Compare 512 sets L1 Cache 384 sets Cache Lookup HAC Tag 32 bits OutPort Figure 1: The data ow of HAC lookup. Here we assume that L1 cache is direct mapped with byte cache sets, among which 128 cache sets are reserved for HAC. Each HAC entry is 4 bytes wide. The search through the HAC entries within a cache set is performed by software. Given a destination IP address, DA, the DA 5;11 1 portion is extracted with 0's padded at both ends to form a 32-bit virtual address, V A, which is then used to access the cache. V A always falls within the 0-th virtual page, and will eventually access the 0-th physical page and thus the rst 4 KBytes of the L1 cache, assuming that the 0-th virtual page is mapped to the 0-th physical page. The V A such formed fetches the rst word, CW, of the cache set as specied by DA 5;11. In the mean while, a 23-bit tag, DAT, is constructed from a concatenation of DA 12;31 and DA 2;4. The rst word of each HAC cache set is a duplicated HAC entry that corresponds to the HAC entry last accessed in the previous visit to the cache set. If DAT matches CW 9;31, the lookup is completed and CW 0;7 is the output port identier. Otherwise, the software search continues with the second word of the cache set and so on, until a hit or the end of the set. The way V A is formed from DA means that only one virtual page is allocated to hold the entire HAC. As a result, only one page table entry and thus one TLB entry is required. Such a setup guarantees that the performance overhead in HAC lookup due to TLB misses be zero. After the lookup is completed, a new HAC entry for the destination IP address is constructed and a word from the corresponding cache set is allocated for it, if such an HAC entry does not exit yet. Finally, this HAC entry is copied to the rst word of the cache set if necessary. When a cache word is chosen for replacement, the associated HAC entry is gone. Because the HAC contents are generated dynamically from looking up the NART, it is safe to over-write HAC entries without keeping a copy of them. Figure 1 illustrates the ow of the HAC lookup process. 1 We will use the notationx n;m to represent a bit subsequence of X ranging from the n-th bit to the m-th bit. 4

6 The HAC algorithm gives a 5-cycle best-case latency for an IP address lookup, and takes 3 cycles for each additional HAC entry access in the cache set. So it takes = 27 cycles to detect a HAC miss. By pipelining the lookups of consecutive IP packets, i.e., overlapping the 5-cycle work of each lookup, the best-case throughput can be improved to 3 cycles per lookup. Because HAC is guaranteed to be in the L1 cache, the 3-cycle per lookup estimate is exact, as long as the HAC lookup itself is a hit. 3.2 Network Address Routing Table If the HAC access results in a miss, a full-scale network address routing table lookup is required. The guiding principle of Suez's NART design is to trade table size for lookup performance. This principle is rooted in the observation that the L2 cache size of modern microprocessors is comparatively large for storing routing tables and is expected to continue increasing over time. For example, one MByte of L2 cache is a norm in mid-range PCs today. On the other hand, the routing table of an Internet backbone router has about 100,000 entries. Assume that each entry takes 8 bytes, then the entire routing table takes only 800 KBytes, i.e., comfortably t into a PC's L2 cache. Of course, such a comparison is over-simplied because the 800 KBytes estimate assumes a sequential search procedure, and thus does not include any index data structure to speed up the lookup process. NART lookup is dicult to speed up because given an IP address, it is not possible to determine a priori its corresponding network address, which is the most signicant N bits of the IP address, where N is unknown beforehand. With the increasing threat of the IP address depletion, a technique called Classless InterDomain Routing (CIDR) is currently in use that allows more ecient allocation of contiguous address chunks in the IP address space. In contrast to the original Class A/B/C scheme, in which N can take only than than one of three possible values: 8, 16, and 24, CIDR allows N to range from 1 to 29. This generality complicates routing table lookup because it signicantly increases the number of possible network addresses. NART Construction The network addresses in an IP routing table are classied into three types according to their length: smaller than or equal to 16 bits (called Class AB), between 17 and 24 bits (called Class BC), and between 25 and 30 bits (called Class CD). To represent a given routing table, Suez's NART includes three levels of tables: one Level-1 table and one Level-3 table, but a variable number of Level-2 tables. Entries in these tables are denoted in this paper as Level1-Table[.], Level2-Table[.], and Level3-Table[.], respectively. The Level-1 table has 64K entries, each of which is 2 bytes wide, and contains either an 8-bit output port identier or an oset pointer to a Level-2 table. These two cases are distinguished by the entry's most signicant bit. Each Level-2 table has 256 entries, each of which is 1 byte wide, and contains either a special indicator ( ) or an 8-bit output port identier. The Level-3 table has a variable number of entries, each of which contains a 4-byte network mask eld, a 4-byte network address, and an 8-bit output port identier. The number depends on the given IP routing table and is typically small. To convert an IP routing table to the NART representation, Class AB addresses are processed rst, then Class BC addresses, followed by Class CD addresses. Each Class AB network address, NA, of length L in the routing table takes 2 16?L entries of the Level-1 table starting from Level1-5

7 bits [31, 16] 2 Base1 * + 2 bytes Level-1 Table Destination IP Address (DA) bits [15, 8] 1 OutPort 64K entries 0 Level-2 Pointer * 256 Base3 OutPort bytes Level-2 Table Level-3 Table 1 byte 256 entries 32 bits + + Base2 bits [31, 0] & Compare 32 bits OutPort Network Address Network Mask Figure 2: The data ow of the NART lookup. It starts with the Level-1 table, and if necessary looks up a Level-2 table pointed by the Level1-Table[DA 16;31 ]. If the chosen Level-2 table entry still can not resolve DA, then a sequential search through the Level-3 table is needed. Note that NART has only one Level-1 table and one Level-3 table, and many Level-2 tables. Table[NA 0;L?1 2 16?L ], with each of these entries lled with NA's output port identier, as specied in the IP routing table. After all Class AB addresses are processed, the unlled Level-1 table entries are lled with the default output port identier. For each Class BC address, NA, of length L in the IP routing table, if Level1-Table[NA L?16;L?1 ] contains an 8-bit output port identier, a Level-2 table is created; Level1-Table[NA L?16;L?1 ] is put aside as OP I and changed to be an oset that can be used to compute the base of the newly created Level-2 table; and every entry in the Level-2 table is initialized to OP I. If Level1- Table[NA L?16;L?1 ] already contains an oset to a Level-2 table, no table is created. In both cases, 2 24?L entries of the Level-2 table, starting from Level2-Table[NA 0;L? ?L ], are lled with NA's output port identier as specied in the IP routing table. For each Class CD address, NA, of length L, if Level1-Table[NA L?16;L?1 ] contains an 8-bit output port identier, a Level-2 table is created; Level1-Table[NA L?16;L?1 ] is put aside as OP I1 and changed to be an oset that can be used to compute the base of the newly created Level-2 table; and every entry in the Level-2 table is initialized to OP I1. If Level1-Table[NA L?16;L?1 ] already contains an oset to a Level-2 table, this Level-2 table is used in the next step. If Level2-Table[NA L?24;L?17 ] is not , Level2-Table[NA L?24;L?17 ] is put aside as OP I2 and is changed to ; and 6

8 a new Level-3 table entry with a network mask of , a network address of NA L?24;L?1, and an output port identier of OP I2 is created. Regardless of the content of Level2-Table[NA L?24;L?17 ], a Level-3 table entry with NA's network mask network address, and output port identier is added to the global Level-3 table. NART Lookup Given a destination IP address, DA, the most signicant 16 bits, DA 16;31, is used to index into the Level-1 table. The exact address used to index the Level-1 table is Base1 + 2 DA 16;31, where Base1 is the base address of the Level-1 table. If DA has a Class AB network address, one lookup into the Level-1 table is sucient to complete the routing table search. If DA has a Class BC address, Level1-Table[DA 16;31 ] contains an oset from which the base of the Level-2 table is computed. The corresponding Level-2 table's base is Base Level1? T able[da 16;31 ], where Base2 is the base for an address region specically reserved for all Level-2 tables. If Level2-Table[DA 8;15 ] is not , the lookup is completed. Otherwise the network address of DA is a Class CD address, and a sequential search through the Level-3 table is required. We chose to use a single global Level-3 table, rather than multiple Level-3 tables like Level-2 tables, because it reduces the storage space requirement and because it allows Level-2 table entries to specify up to 255 output port identiers. The search through the Level-3 table is based on DA 0;31 and is done sequentially. Because the Level-3 table entries are sorted according to the length of their network addresses, the linear search can stop at the rst match. As will be shown in the Section 5, the performance overhead of this sequential search is negligible. Figure 2 illustrates the data ow of the NART lookup process. If DA's network address is a Class AB address, the lookup takes 7 cycles to complete. If DA's network address is a Class BC address, the lookup takes 12 cycles to complete. If DA's network address is a Class CD address, the lookup takes 20+4K cycles to complete, where K is the number of Level-3 table entries the algorithm has to examine before the rst match. Unlike HAC lookup, these cycle estimates are not precise because there is no guarantee that the memory accesses in NART lookup always result in cache hits. 4 Fixed-Granularity Fluid Fair Queuing The most well-known real-time packet scheduling technique is Weighted Fair Queuing (WFQ), which allocates the output link bandwidth among competing network connections in proportion to their bandwidth reservations. Each connection that goes through an output link is assumed to have a per-connection queue associated with that output. For each incoming packet, WFQ computes the virtual nish time slot at which it should be sent out on the corresponding output link, and puts it to the corresponding queue. Whenever an output link is free, WFQ selects for transmission the packet with the earliest virtual nish time from the per-connection queues at the output. Specically, the virtual nish time of the i-th packet of a real-time connection whose bandwidth reservation is W bits per virtual time unit is calculated by the following formula: V irtual F inish T ime(i) = MAX(V irtual Arrival T ime(i); V irtual F inish T ime(i? 1)) P acket Size(i) + (1) W 7

9 where V irtual Arrival T ime(i) and P acket Size(i) are the virtual arrival time and the size of the i-th packet, respectively. The scalability of WFQ is limited by the following two operations that are to be performed for each packet. First, the mapping between a packet's physical arrival time to its virtual arrival time requires constant tracking of the set of active network connections whose queues are non-empty. Second, the ordering among head packets from an output link's perconnection queues requires an log(n) sorting overhead, where N is the number of active network connections whose queues are non-empty. To route one million packets per second, the delay total of both operations has to be smaller than 1 sec. Suez takes a dierent scheduling approach called the Fixed-Granularity Fluid Fair Queuing (FGFFQ) packet scheduling algorithm, which completely eliminates the need for virtual-to-physical time mapping and packet sorting, and thus can be shown to be more scalable than WFQ. Moreover, FGFFQ is more suitable for hardware implementation than WFQ. The key observation behind FGFFQ is that the overheads of current WFQ implementations result from the assumption that a network-layer packet has to be transferred from one router to another in an indivisible fashion. FGFFQ does away with this assumption by introducing a multiplexing/demultiplexing mechanism above the link-layer protocol. As a result, FGFFQ can now physically transfer partial networklayer packets from router to router, and rely on the multiplexing/demultiplexing mechanism to logically forward entire network-layer packets. This exibility allows FGFFQ to emulate the uid fair queuing model directly by sending a data quantum from each active connection in a round robin fashion, with the quantum size determined according to their bandwidth reservations. 4.1 The Scheduling Model In addition to addressing the implementation overhead problem of WFQ, FGFFQ is also designed to support two additional features not available in WFQ. First, FGFFQ supports both workconserving and non-work-conserving scheduling disciplines. It is known that at the expense of link under-utilization in some cases, the non-work-conserving scheduling discipline reduces the burstiness of packet sequences, and thus the router buer requirement. Second, FGFFQ allows real-time applications to decouple the bandwidth and delay requirements. That is, applications can minimize the end-to-end packet latency without reserving a much larger bandwidth than needed, as in WFQ. The performance guarantee requirement of a real-time connection is specied in terms of the following triple: (H; W; C), which means the packet scheduler transmits H bits from this connection in each of the W scheduling periods within a cycle of C periods, where C should be larger than or equal to W. Therefore the bandwidth requirement of this connection is HW bits/sec. Assuming CD each scheduling period is D seconds and the link speed is B bits/sec, then the link can transmit a total of D B bits at the maximum within one scheduling period. Assume that the i-th active connection of an output link has a reservation of (H i ; W i ; C i ), and two counter variables, W Counter i and CCounter i, which are initialized to W i and C i, respectively. In each scheduling period, the packet scheduler visits each active connection on the output link, and transmits H i bits if W Counter i is greater than 0. After W Counter i reaches 0, the packet scheduler will skip the i-th connection until CCounter i becomes zero. The FGFFQ scheduling algorithm is as follows. 8

10 while (1) f /* In each scheduling period the packet scheduler visits each active real-time connection */ for (i=1; i <= N; i++) f g if (W Counter i if (Queue i g Ccounter i --; if (Ccounter i g > 0) f Transmit H i is not empty) f W Counter i --; g <= 0) f bits; if (Queue i W Counter i = W i ; Ccounter i = C i ; g is not empty) f g Transmit non-real-time data or loop idly until the current scheduling period ends; where N is again the number of active real-time network connections, and Queue i is the perconnection queue for the i-th connection. There is a separate best-eort connection queue for all non-real-time packets. Typically packets from the i-th connection are of the size H i W i. In the above algorithm, a real-time connection can send out at most H i W i bits within a cycle of C i scheduling periods. The unused bandwidth in the previous cycle can not be carried over to the following cycles. After servicing all the active real-time network connections in each scheduling period, FGFFQ's packet scheduler spends the rest of the period in either transmitting non-real-time or best-eort trac, or simply looping idly if there is no non-real-time data to send. Because the packet scheduler chooses to service non-real-time data or sit idle even when there are real-time data for transmission, FGFFQ is non-work-conserving. FGFFQ is non-work-conserving also in the sense that when W Counter i is zero, the packet scheduler would not consider the i-th connection for transmission even though Queue i is not empty, until Ccounter i becomes zero. If a real-time connection is not concerned with packet latency, then C is usually chosen to be equal to W. If packet latency needs to be minimized, W is set to be smaller than C. In this case, the connection is distinguishing between the instantaneous bandwidth requirement, which is set for reduced latency, and the average bandwidth requirement, which corresponds to the long-term bandwidth need. Given a performance specication (H; W; C), the instantaneous bandwidth is H HW, whereas the average bandwidth is. Packets are physically transmitted at the rate of H, D CD D although the logical long-term bandwidth consumption is HW. CD A major advantage of bandwidth/delay requirement decoupling, as supported in FGFFQ, is 9

11 Real-Time Connection-ID/Output-Queue Map Non-Real-Time Routing Table Output Queuing MUX DEMUX empty Link 4 Input Queuing Output Queuing 1 2 Figure 3: There are three real-time connections (1, 2, 3) and one none-real-time connection (4). At the time the packet scheduler visits Connection 3's queue, it is still empty. Therefore, the linklayer packet contains data transmission units or quanta from Connection 1, 2, and 4. For real-time connections, their transmission units are forwarded to the connections' corresponding output queues by consulting with the Connection-ID/Output-Queue map. For non-real-time trac, however, whole network-layer packets need to be assembled at the input queue before being forwarded to the output queues, which are determined by looking up the non-real-time routing table. that unlike WFQ, low-delay and low-bandwidth real-time connections can be serviced at high instantaneous rates without over-reserving the link bandwidth. The dierence between a connection's instantaneous and average bandwidth requirements can be used by another real-time connection. However, in this case, the delay bound of the latter connection may suer. Section 4.3 presents the delay bounds of real-time connections under FGFFQ. 4.2 Transparent Multiplexing/Demultiplexing Traditionally network-layer packets are transferred from one network node to another as atomic units. Because of this assumption, existing real-time packet schedulers are forced to schedule data transmission on the output links on a packet by packet basis. FGFFQ chooses to use a scheduling granularity that is smaller than individual packets, so that each active connection receives its reserved share of the link bandwidth within each scheduling period. To support FGFFQ, a multiplexing/demultiplexing (MUDEM) layer is transparently added above the link-layer protocol. This subsection discusses the design of this MUDEM scheme in detail. Every time the packet scheduler visits a non-empty connection queue, say the i-th connection, a quantum or data transmission unit of H i bits is sent to the MUDEM layer, which merges multiple quanta into one link-layer packet and sends to the next-hop router. The MUDEM layer at the other end receives the composite packet, and directs each quantum in the packet to their respective per-connection output queues. Figure 3 shows the data structures used at the input and output sides of the MUDEM layer. In the most general case, an output link of a router may be connected to multiple next-hop routers. Therefore, the data structures shown in Figure 3 are set up for each pair of routers that are directly connected, either through a point-to-point link or through a shared medium. 10

12 The MUDEM layers at the two sides of a link are synchronized on every link-layer packet as to which connections' data transmission units are contained in the packet. At the beginning of a link-layer packet is a 4-byte MUDEM header that includes the number of connections that have put a quantum in the packet and the connection ID of the rst such connection. These connection IDs are unique on a link, and are established at connection setup time, and agreed upon on both sides of a link. The number of participating connections in a packet is needed for the receiver to determine when to stop decoding the packet. There are two reasons why the starting connection ID is included. First, when link-layer packets are corrupted, the errors are restricted only to those connections that transmit quanta in those packets, and would not propagate to other connections. Second, incorporation of a starting connection ID makes it possible for the rst quantum in a transmission unit to be from the next connection that has any data to send, i.e., skipping inactive connections if necessary. The sender side of the MUDEM layer also inserts a header for each connection onward starting with the rst connection, regardless of whether a connection is empty or not. The rst bit of this header byte indicates whether the corresponding connection is empty. If it is not, the remaining bits are used to represent the length of the quantum, which follows the header. The length of the perconnection header is xed and is known at the other end of the link. The quantum length is specied explicitly because each connection may use a dierent H i 's, and because the tail transmission unit of a network-layer packet may be shorter than its corresponding H i. If a connection queue is empty, one header byte is inserted to the bit stream without any payload. The MUDEM layer composes link-layer packets by including as many connections as possible until the total size of headers and payloads reach the MTU (Maximum Transmission Unit). At the receiver end, the MUDEM layer rst decides the starting connection ID, and for each non-empty real-time connection, enqueues its transmission unit to its corresponding per-connection queue at the associated output link. For real-time connections, the mapping from connection IDs to the per-connection queues is constructed at connection establishment time. For non-real-time trac, the header of the network-layer packet is used to look up the IP routing table. However, data transmission units of non-real-time packets can not be immediately forwarded to the corresponding per-connection output queue, because multiple best-eort packets that are to be routed through the same output link may arrive at the same time. Therefore, cut throughs of best-eort packets may lead to packet interleaving in the buer. Instead, data transmission units of non-real-time trac should be buered at the input queue to form a complete network-layer packet before pushed to the per-connection output queue. Formation of network-layer packets is relatively straightforward since the network-layer header contains the length information. In summary, real-time packets can be forwarded inside a router in a cut-through fashion because each real-time connection has a dedicated queue in each router, whereas non-real-time packets must be serviced in a store-and-forward fashion because multiple non-real-time connections may share the same output queue. 4.3 Performance Bounds Given a real-time connection's performance requirement specication, (H; W; C), the per-hop packet latency bound provided by FGFFQ is (1 + W ) D, which is independent of the maximum packet size of other connections and the number of connections sharing an output link. The corresponding bandwidth guarantee is HW. The choice of D can not be arbitrarily small for implementation DC 11

13 eciency, especially when the number of connections sharing a link is large. The latency bound is broken down into a queuing component, D, and a transmission component, W D. The worst-case queuing delay occurs when packets from all active connections arrive simultaneously, and the packet that is delayed the longest experiences a queuing delay of D. The per-hop delay bound of WFQ for a real-time connection with a bandwidth reservation of H D Pmax is H D + Pmax B, where P max is the maximum packet size and B is again the link bit rate. Assume that B can maximally support N max such connections. Then H = B D Nmax of WFQ is Pmax(Nmax+1). On the other hand, the delay bound of FGFFQ is B. Therefore the delay bound (1+W )HNmax B. So the delay bound ratio between FGFFQ and WFQ is NmaxH(1+W ) Nmax+Nmax Pmax, which is equal to Pmax(Nmax+1) Nmax+1 when P max = H W. As long as the quantum size, H, remains much smaller than P max, the delay bound ratio remains close to 1. Assume that P max = 1500 bytes, N max = 1000, B = 10 9 bits=sec, and D = 1 msec. Then the ratio is 1.082, or 8.2% increase in the delay bound compared to WFQ. One desirable property of FGFFQ is that its delay bound remains unchanged as the maximum number of network connections sharing an output link increases with the increase in the link speed. That is, assuming H is xed, then as N max increases in proportion to the increase in B, D and thus the delay bound stay the same, because H N max = B D. 5 Prototype Implementation 5.1 Overview The current Suez prototype consists of four Pentium-II 400MHz processors interconnected through a Myrinet system area network. A 1.28Gbps full-duplex link connects each Pentium node to the Myrinet switch. Each host interface consists of an on-board Lanai 4.3 processor, 512 KBytes of memory and a DMA engine. The on-board processor executes a control program loaded into the board memory during system start-up. Both the bridging and the external interfaces on a host are Myrinet interfaces running tailored control programs to perform such operations as peer-to-peer DMAs, interface-to-memory DMAs and shared memory communication with the host processor. The interfaces reside on a 32-bit 33MHz PCI bus. Myrinet interface cards allow data transfer from the wire to the interface card to proceed concurrently with a DMA transaction from the card memory to the host memory. The software of the Suez prototype is based on Linux, and includes a routing table lookup engine, a real-time output link scheduler and a data buering and movement module. To reserve a portion of L1 cache for the HAC, the page table entry creation routines have been modied in the Linux kernel. Essentially, unless the allocation is for the HAC, any page table entry created for an allocated physical page that is a multiple of 4 is created with the page cache disable bit set. This is done during dynamic memory allocation as well as during the static initial setup of the rst 4 MBytes of the kernel's address space. Code pages in the kernel page tables are not cache disabled since they reside in a separate instruction cache and hence can not conict with the HAC which resides in the data cache. Also, the pages allocated to the NART search data structure are always mapped to physical pages that are not multiples of 4, so that they are not cache disabled. Given an incoming IP packet, its destination address is extracted and used for the lookup according to the algorithm described earlier. H 12

14 Each output interface has a set of per-connection real-time queues and a best-eort queue. The FGFFQ algorithm is used to schedule and transmit the packets buered in these queues. Under FGFFQ, packets received on an input interface would be a set of quanta with an associated bitmap. Real-time quanta are demultiplexed to appropriate output queues based on a real-time connection ID table, which is set up during connection establishment. However, this demultiplexing does not involve data copying and is accomplished by simply appending a descriptor to the output queue that contains a pointer to the quantum and its length. Because a single arriving packet consists of multiple quanta, multiple enqueuing operations may be required. 5.2 Data Buering and Movement A key implementation challenge for the Suez prototype is to develop an ecient data buering and movement scheme to reduce the latency that a packet experiences when it travels from the input node to the output node of the Suez cluster, and to maximize the aggregate packet throughput of the entire Suez router. When an interface card receives a packet, it rst performs routing table lookup for non-realtime quanta or packet classication for real-time quanta. Because the memory capacity on the interface card is limited, it can only hold a real-time connection ID table for packet classication. It is the CPU that performs routing table lookup for non-real-time packets, using the HAC and NART algorithms. To eliminate unnecessary interrupts, the interface card places a lookup request on a per-interface shared memory region. The CPU actively polls this shared memory region and services these lookup requests with high priority. The CPU puts the lookup results again in the corresponding per-interface shared memory region. An incoming non-real-time packet can be transferred out of the input interface only after the lookup result becomes available. In the case that the CPU can not perform the lookup fast enough, the interface card is forced to DMA packets that are delayed to a per-interface buer queue. This additional DMA overhead is the cost of eliminating interrupts. Once the output queue for a given packet/quantum is determined, there are two cases to consider. If the output interface of a packet/quantum resides in a dierent Suez node as its input interface, the packet/quantum is rst transferred across the Myrinet switch to the output node, through a peer-to-peer DMA mechanism over PCI. Peer-to-peer DMA is possible because the memory on the interface cards are memory-mapped and thus could serve as source and destination addresses of DMA transactions. Peer-to-peer DMA transactions are blocking transactions since interface cards may not have enough buer to maintain queues for ow control. The interface card on a Suez node that is connected to the Myrinet switch (called the bridging interface) provides a well-known set of memory locations, one for each interface on the same node, to hold requests for peer-to-peer DMA transactions. The host sends these memory locations to the interfaces at startup, using the shared memory regions between the host and the interfaces. The i-th interface card initiates a inter-node data transfer transaction by writing a small request message, through peer-to-peer DMA, to the request area allocated to the i-th interface on the bridging interface, which cycles through the set of data transfer requests in its memory and services each of them by pulling data from the requesting interface, again through peer-to-peer DMA. Then, the bridging interface attempts to batch all data items that are destined to the same output node into one transfer unit, and ship them through the switching fabric to the appropriate nodes in the cluster. The results of routing table lookup 13

15 1. Classifier Invocation (NRT) X. Y Node Memory 2. peer DMA req 3. peer DMA data 5. DMA data 7. schedule 1. Maptable Lookup (RT) 4. peer DMA ack Switch 6. DMA descriptor Input cluster node Output cluster node Figure 4: The life of a data packet through the cluster-based router when its input interface and output interface reside on separate nodes. The packet is transferred to the output cluster node through a combination of a peer-to-peer DMA transaction on the input node, a transport over the backplane switch, and an interface-to-host memory DMA transaction on the output node. The output link scheduler eventually puts data on the wire through a gather DMA transaction. and packet classication are sent together with the packets/quanta, to avoid redundant lookups or classication at the output nodes. Once a packet reaches its output node, the subsequent processing is the same as the case in which the input and output interfaces reside in the same node. The interface through which a packet arrives at its output node DMAs the packet to its corresponding output queue in host memory. Each interface on a Suez node is pre-allocated a buer pool and a descriptor pool residing on the host memory. A descriptors to a rst approximation includes a memory location pointer and a length led. To put a packet to its output queue, the receiving interface card allocates a buer to hold the packet, and creates one descriptor for each quantum in the packet. Then the interface card logically inserts each packet/quantum into its corresponding output queue by appending the descriptor to the associated output queue, which is physically implemented as a list of descriptors. The advantage of this approach is that a data packet never gets copied from one memory location to another. The disadvantage is that the DMA required for putting a data packet on the output link is less ecient because the quanta making up a packet do not necessarily reside in a contiguous memory region. The CPU on a Suez node executes the FGFFQ scheduler, which scans across the output queues on all the node's output interfaces. Every time a packet is put on the output link, the scheduler returns the associated buer and descriptor to the pools of the interface through which the packet arrives at the node. Therefore, logically there could be one consumer and multiple producers to each per-interface buer pool and descriptor buer. However, because the FGFFQ scheduler is nonpreemptive, at any point in time there is only one consumer and one producer to each per-interface buer/descriptor pool. By building a \guard" entry into each per-interface buer/descriptor pool, one can guarantee that the accesses to these pools are free of race condition without using locks. Figure 4 illustrates the processing of a data packet within the Suez router. The input interface performs packet classication for real-time packets, while leaving routing-table lookup for non-real- 14

16 time packets to the CPU. Once a data packet reaches its output cluster node, it is DMAed into the host memory and waits for the output link scheduler to ship it out at appropriate time. Buering data packets at the output node's host memory is essential because arrival order and departure order of data packets can be very dierent due to real-time packet scheduling. Without packet scheduling, peer-to-peer DMA could also be used on the output node as well. 6 Performance Evaluation 6.1 Routing Table Lookup We have performed a simulation study as well as a measurement study of the initial Suez prototype based on an IP trace collected from the main router of Brookhaven National Laboratory from 9AM on 3/3/98 to 5PM on 3/6/98. The performance metric used is million lookups per second (MLPS). We have built a detailed simulator that takes into account the HAC and NART lookup algorithms, as well as the eects of L1 cache, L2 cache and main memory. The results of this simulation study were reported in [26]. In this paper, we only focused on the lookup performance as measured by running the same packet trace through the Suez prototype, which runs on a Pentium-II 400MHz machine with a 16-KByte 2-way set associative L1 cache, and 512-KByte 4-way set associative L2 cache. The HAC hit ratios and lookup performance results for dierent HAC associativity are shown in Table 1. The timing measurements were made using the cycle counter in the Pentium architecture. The best-case measured performance, MLPS, is comparable to state-of-the-art high-end routers. We believe better performance is possible on RISC architecture for the following reasons. Pentium instruction set architecture provides a limited set of registers and thus prohibits aggressive exploitation of instruction level parallelism. It also has a very high branch misprediction penalty. As a result, the best-case scenario, where the HAC hit occurs at the very rst entry of the cache set, takes 23 cycles rather than 5 cycles, as we estimated originally. Because the rst HAC entry in each cache set is replicated for performance optimization, 7-way and 3-way logical set associativities actually correspond to 8-way and 4-way physical set associativities. For 1-way and 2-way set associativities, however, this optimization is not applied. Although larger degrees of associativity increase the HAC hit ratio, the best performance occurs when the degree of associativity is 1. The main reason is that set associativity is implemented in software in this algorithm, which entails two performance overheads. First, with a large set-associative cache, the access time for HAC hits at the end of the cache sets is not necessarily shorter than NART lookup time. That is, the time to match an IP address at the 8-th HAC entry of a 8-way HAC's cache set may be longer than looking it up in the NART directly. Second, large set associativities increase the HAC miss detection time, thus adding a higher xed overhead for every NART lookup. In the BNL trace, because the HAC hit ratio is already high (95.22%) when it is direct mapped, the additional gain in HAC hit ratio is not sucient to oset the additional overhead due to software-implemented set associativity. The reason that the overall performance of the 3-way case is slightly better than that of the 2-way case, despite lower HAC hit ratio and higher HAC miss detection time, is because the average hit access time for the 3-way case is shorter, owing to duplication optimization. When the HAC is completely turned o, the measured performance is 8.93 MLPS, which shows 15

Performance Evaluation of Myrinet-based Network Router

Performance Evaluation of Myrinet-based Network Router Performance Evaluation of Myrinet-based Network Router Information and Communications University 2001. 1. 16 Chansu Yu, Younghee Lee, Ben Lee Contents Suez : Cluster-based Router Suez Implementation Implementation

More information

Implementation and Evaluation of A QoS-Capable Cluster-Based IP Router

Implementation and Evaluation of A QoS-Capable Cluster-Based IP Router Implementation and Evaluation of A QoS-Capable Cluster-Based IP Router Prashant Pradhan Tzi-cker Chiueh Computer Science Department State University of New York at Stony Brook prashant, chiueh @cs.sunysb.edu

More information

High-Performance IP Routing Table Lookup Using CPU Caching

High-Performance IP Routing Table Lookup Using CPU Caching 1 High-Performance IP Routing Table Lookup Using CPU Caching Tzi-cker Chiueh Prashant Pradhan Computer Science Department State University of New York at Stony Brook Stony Brook, NY 11794-4400 fchiueh,

More information

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley,

More information

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture Generic Architecture EECS : Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,

More information

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING James J. Rooney 1 José G. Delgado-Frias 2 Douglas H. Summerville 1 1 Dept. of Electrical and Computer Engineering. 2 School of Electrical Engr. and Computer

More information

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap,

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap, Simple Input/Output Streaming in the Operating System Frank Miller, George Apostolopoulos, and Satish Tripathi Mobile Computing and Multimedia Laboratory Department of Computer Science University of Maryland

More information

Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck

Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck Novel Intelligent I/O Architecture Eliminating the Bus Bottleneck Volker Lindenstruth; lindenstruth@computer.org The continued increase in Internet throughput and the emergence of broadband access networks

More information

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097 Chair for Network Architectures and Services Prof. Carle Department for Computer Science TU München Chair for Network Architectures and Services Prof. Carle Department for Computer Science TU München Master

More information

Growth of the Internet Network capacity: A scarce resource Good Service

Growth of the Internet Network capacity: A scarce resource Good Service IP Route Lookups 1 Introduction Growth of the Internet Network capacity: A scarce resource Good Service Large-bandwidth links -> Readily handled (Fiber optic links) High router data throughput -> Readily

More information

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA

A taxonomy of race. D. P. Helmbold, C. E. McDowell. September 28, University of California, Santa Cruz. Santa Cruz, CA A taxonomy of race conditions. D. P. Helmbold, C. E. McDowell UCSC-CRL-94-34 September 28, 1994 Board of Studies in Computer and Information Sciences University of California, Santa Cruz Santa Cruz, CA

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches Xike Li, Student Member, IEEE, Itamar Elhanany, Senior Member, IEEE* Abstract The distributed shared memory (DSM) packet switching

More information

Master Course Computer Networks IN2097

Master Course Computer Networks IN2097 Chair for Network Architectures and Services Prof. Carle Department for Computer Science TU München Master Course Computer Networks IN2097 Prof. Dr.-Ing. Georg Carle Christian Grothoff, Ph.D. Chair for

More information

Mohammad Hossein Manshaei 1393

Mohammad Hossein Manshaei 1393 Mohammad Hossein Manshaei manshaei@gmail.com 1393 Voice and Video over IP Slides derived from those available on the Web site of the book Computer Networking, by Kurose and Ross, PEARSON 2 Multimedia networking:

More information

Queuing Mechanisms. Overview. Objectives

Queuing Mechanisms. Overview. Objectives Queuing Mechanisms Overview Objectives This module describes the queuing mechanisms that can be used on output interfaces. It includes the following topics: Queuing Overview FIFO Queuing Priority Queuing

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [NETWORKING] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Why not spawn processes

More information

Real-Time Protocol (RTP)

Real-Time Protocol (RTP) Real-Time Protocol (RTP) Provides standard packet format for real-time application Typically runs over UDP Specifies header fields below Payload Type: 7 bits, providing 128 possible different types of

More information

EEC-484/584 Computer Networks

EEC-484/584 Computer Networks EEC-484/584 Computer Networks Lecture 13 wenbing@ieee.org (Lecture nodes are based on materials supplied by Dr. Louise Moser at UCSB and Prentice-Hall) Outline 2 Review of lecture 12 Routing Congestion

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

UNIT-II OVERVIEW OF PHYSICAL LAYER SWITCHING & MULTIPLEXING

UNIT-II OVERVIEW OF PHYSICAL LAYER SWITCHING & MULTIPLEXING 1 UNIT-II OVERVIEW OF PHYSICAL LAYER SWITCHING & MULTIPLEXING Syllabus: Physical layer and overview of PL Switching: Multiplexing: frequency division multiplexing, wave length division multiplexing, synchronous

More information

IBM Almaden Research Center, at regular intervals to deliver smooth playback of video streams. A video-on-demand

IBM Almaden Research Center, at regular intervals to deliver smooth playback of video streams. A video-on-demand 1 SCHEDULING IN MULTIMEDIA SYSTEMS A. L. Narasimha Reddy IBM Almaden Research Center, 650 Harry Road, K56/802, San Jose, CA 95120, USA ABSTRACT In video-on-demand multimedia systems, the data has to be

More information

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM

MULTIPROCESSORS AND THREAD LEVEL PARALLELISM UNIT III MULTIPROCESSORS AND THREAD LEVEL PARALLELISM 1. Symmetric Shared Memory Architectures: The Symmetric Shared Memory Architecture consists of several processors with a single physical memory shared

More information

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy

Operating Systems. Designed and Presented by Dr. Ayman Elshenawy Elsefy Operating Systems Designed and Presented by Dr. Ayman Elshenawy Elsefy Dept. of Systems & Computer Eng.. AL-AZHAR University Website : eaymanelshenawy.wordpress.com Email : eaymanelshenawy@yahoo.com Reference

More information

Ch 4 : CPU scheduling

Ch 4 : CPU scheduling Ch 4 : CPU scheduling It's the basis of multiprogramming operating systems. By switching the CPU among processes, the operating system can make the computer more productive In a single-processor system,

More information

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

Topic 4a Router Operation and Scheduling. Ch4: Network Layer: The Data Plane. Computer Networking: A Top Down Approach

Topic 4a Router Operation and Scheduling. Ch4: Network Layer: The Data Plane. Computer Networking: A Top Down Approach Topic 4a Router Operation and Scheduling Ch4: Network Layer: The Data Plane Computer Networking: A Top Down Approach 7 th edition Jim Kurose, Keith Ross Pearson/Addison Wesley April 2016 4-1 Chapter 4:

More information

Chapter 4 Network Layer: The Data Plane

Chapter 4 Network Layer: The Data Plane Chapter 4 Network Layer: The Data Plane A note on the use of these Powerpoint slides: We re making these slides freely available to all (faculty, students, readers). They re in PowerPoint form so you see

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

NETWORK LAYER DATA PLANE

NETWORK LAYER DATA PLANE NETWORK LAYER DATA PLANE 1 GOALS Understand principles behind network layer services, focusing on the data plane: Network layer service models Forwarding versus routing How a router works Generalized forwarding

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Router Design: Table Lookups and Packet Scheduling EECS 122: Lecture 13

Router Design: Table Lookups and Packet Scheduling EECS 122: Lecture 13 Router Design: Table Lookups and Packet Scheduling EECS 122: Lecture 13 Department of Electrical Engineering and Computer Sciences University of California Berkeley Review: Switch Architectures Input Queued

More information

SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988

SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS IAN RAMSAY PHILP. B.S., University of North Carolina at Chapel Hill, 1988 SCHEDULING REAL-TIME MESSAGES IN PACKET-SWITCHED NETWORKS BY IAN RAMSAY PHILP B.S., University of North Carolina at Chapel Hill, 1988 M.S., University of Florida, 1990 THESIS Submitted in partial fulllment

More information

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network

FB(9,3) Figure 1(a). A 4-by-4 Benes network. Figure 1(b). An FB(4, 2) network. Figure 2. An FB(27, 3) network Congestion-free Routing of Streaming Multimedia Content in BMIN-based Parallel Systems Harish Sethu Department of Electrical and Computer Engineering Drexel University Philadelphia, PA 19104, USA sethu@ece.drexel.edu

More information

Cisco Series Internet Router Architecture: Packet Switching

Cisco Series Internet Router Architecture: Packet Switching Cisco 12000 Series Internet Router Architecture: Packet Switching Document ID: 47320 Contents Introduction Prerequisites Requirements Components Used Conventions Background Information Packet Switching:

More information

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview

Chapter 4: network layer. Network service model. Two key network-layer functions. Network layer. Input port functions. Router architecture overview Chapter 4: chapter goals: understand principles behind services service models forwarding versus routing how a router works generalized forwarding instantiation, implementation in the Internet 4- Network

More information

1. Memory technology & Hierarchy

1. Memory technology & Hierarchy 1 Memory technology & Hierarchy Caching and Virtual Memory Parallel System Architectures Andy D Pimentel Caches and their design cf Henessy & Patterson, Chap 5 Caching - summary Caches are small fast memories

More information

1 Connectionless Routing

1 Connectionless Routing UCSD DEPARTMENT OF COMPUTER SCIENCE CS123a Computer Networking, IP Addressing and Neighbor Routing In these we quickly give an overview of IP addressing and Neighbor Routing. Routing consists of: IP addressing

More information

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas

Memory hierarchy. 1. Module structure. 2. Basic cache memory. J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Memory hierarchy J. Daniel García Sánchez (coordinator) David Expósito Singh Javier García Blas Computer Architecture ARCOS Group Computer Science and Engineering Department University Carlos III of Madrid

More information

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including

Introduction. Introduction. Router Architectures. Introduction. Recent advances in routing architecture including Router Architectures By the end of this lecture, you should be able to. Explain the different generations of router architectures Describe the route lookup process Explain the operation of PATRICIA algorithm

More information

DiffServ Architecture: Impact of scheduling on QoS

DiffServ Architecture: Impact of scheduling on QoS DiffServ Architecture: Impact of scheduling on QoS Abstract: Scheduling is one of the most important components in providing a differentiated service at the routers. Due to the varying traffic characteristics

More information

Process- Concept &Process Scheduling OPERATING SYSTEMS

Process- Concept &Process Scheduling OPERATING SYSTEMS OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne PROCESS MANAGEMENT Current day computer systems allow multiple

More information

MDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science

MDP Routing in ATM Networks. Using the Virtual Path Concept 1. Department of Computer Science Department of Computer Science MDP Routing in ATM Networks Using the Virtual Path Concept 1 Ren-Hung Hwang, James F. Kurose, and Don Towsley Department of Computer Science Department of Computer Science & Information Engineering University

More information

CS 326: Operating Systems. CPU Scheduling. Lecture 6

CS 326: Operating Systems. CPU Scheduling. Lecture 6 CS 326: Operating Systems CPU Scheduling Lecture 6 Today s Schedule Agenda? Context Switches and Interrupts Basic Scheduling Algorithms Scheduling with I/O Symmetric multiprocessing 2/7/18 CS 326: Operating

More information

Rate-Controlled Static-Priority. Hui Zhang. Domenico Ferrari. hzhang, Computer Science Division

Rate-Controlled Static-Priority. Hui Zhang. Domenico Ferrari. hzhang, Computer Science Division Rate-Controlled Static-Priority Queueing Hui Zhang Domenico Ferrari hzhang, ferrari@tenet.berkeley.edu Computer Science Division University of California at Berkeley Berkeley, CA 94720 TR-92-003 February

More information

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services

Overview Computer Networking What is QoS? Queuing discipline and scheduling. Traffic Enforcement. Integrated services Overview 15-441 15-441 Computer Networking 15-641 Lecture 19 Queue Management and Quality of Service Peter Steenkiste Fall 2016 www.cs.cmu.edu/~prs/15-441-f16 What is QoS? Queuing discipline and scheduling

More information

CS 856 Latency in Communication Systems

CS 856 Latency in Communication Systems CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency

More information

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses.

Addresses in the source program are generally symbolic. A compiler will typically bind these symbolic addresses to re-locatable addresses. 1 Memory Management Address Binding The normal procedures is to select one of the processes in the input queue and to load that process into memory. As the process executed, it accesses instructions and

More information

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including

Introduction. Router Architectures. Introduction. Introduction. Recent advances in routing architecture including Introduction Router Architectures Recent advances in routing architecture including specialized hardware switching fabrics efficient and faster lookup algorithms have created routers that are capable of

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

NoC Test-Chip Project: Working Document

NoC Test-Chip Project: Working Document NoC Test-Chip Project: Working Document Michele Petracca, Omar Ahmad, Young Jin Yoon, Frank Zovko, Luca Carloni and Kenneth Shepard I. INTRODUCTION This document describes the low-power high-performance

More information

Main Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1

Main Memory. Electrical and Computer Engineering Stephen Kim ECE/IUPUI RTOS & APPS 1 Main Memory Electrical and Computer Engineering Stephen Kim (dskim@iupui.edu) ECE/IUPUI RTOS & APPS 1 Main Memory Background Swapping Contiguous allocation Paging Segmentation Segmentation with paging

More information

DISTRIBUTED EMBEDDED ARCHITECTURES

DISTRIBUTED EMBEDDED ARCHITECTURES DISTRIBUTED EMBEDDED ARCHITECTURES A distributed embedded system can be organized in many different ways, but its basic units are the Processing Elements (PE) and the network as illustrated in Figure.

More information

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University.

Operating Systems. Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring Paul Krzyzanowski. Rutgers University. Operating Systems Week 9 Recitation: Exam 2 Preview Review of Exam 2, Spring 2014 Paul Krzyzanowski Rutgers University Spring 2015 March 27, 2015 2015 Paul Krzyzanowski 1 Exam 2 2012 Question 2a One of

More information

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T

is developed which describe the mean values of various system parameters. These equations have circular dependencies and must be solved iteratively. T A Mean Value Analysis Multiprocessor Model Incorporating Superscalar Processors and Latency Tolerating Techniques 1 David H. Albonesi Israel Koren Department of Electrical and Computer Engineering University

More information

CSC 401 Data and Computer Communications Networks

CSC 401 Data and Computer Communications Networks CSC 401 Data and Computer Communications Networks Network Layer Overview, Router Design, IP Sec 4.1. 4.2 and 4.3 Prof. Lina Battestilli Fall 2017 Chapter 4: Network Layer, Data Plane chapter goals: understand

More information

Numerical Evaluation of Hierarchical QoS Routing. Sungjoon Ahn, Gayathri Chittiappa, A. Udaya Shankar. Computer Science Department and UMIACS

Numerical Evaluation of Hierarchical QoS Routing. Sungjoon Ahn, Gayathri Chittiappa, A. Udaya Shankar. Computer Science Department and UMIACS Numerical Evaluation of Hierarchical QoS Routing Sungjoon Ahn, Gayathri Chittiappa, A. Udaya Shankar Computer Science Department and UMIACS University of Maryland, College Park CS-TR-395 April 3, 1998

More information

Communication Networks

Communication Networks Communication Networks Chapter 3 Multiplexing Frequency Division Multiplexing (FDM) Useful bandwidth of medium exceeds required bandwidth of channel Each signal is modulated to a different carrier frequency

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

Configuring QoS. Finding Feature Information. Prerequisites for QoS

Configuring QoS. Finding Feature Information. Prerequisites for QoS Finding Feature Information, page 1 Prerequisites for QoS, page 1 Restrictions for QoS, page 3 Information About QoS, page 4 How to Configure QoS, page 28 Monitoring Standard QoS, page 80 Configuration

More information

MEMORY MANAGEMENT/1 CS 409, FALL 2013

MEMORY MANAGEMENT/1 CS 409, FALL 2013 MEMORY MANAGEMENT Requirements: Relocation (to different memory areas) Protection (run time, usually implemented together with relocation) Sharing (and also protection) Logical organization Physical organization

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Lecture (04 & 05) Packet switching & Frame Relay techniques Dr. Ahmed ElShafee

Lecture (04 & 05) Packet switching & Frame Relay techniques Dr. Ahmed ElShafee Agenda Lecture (04 & 05) Packet switching & Frame Relay techniques Dr. Ahmed ElShafee Packet switching technique Packet switching protocol layers (X.25) Frame Relay ١ Dr. Ahmed ElShafee, ACU Fall 2011,

More information

Lecture (04 & 05) Packet switching & Frame Relay techniques

Lecture (04 & 05) Packet switching & Frame Relay techniques Lecture (04 & 05) Packet switching & Frame Relay techniques Dr. Ahmed ElShafee ١ Dr. Ahmed ElShafee, ACU Fall 2011, Networks I Agenda Packet switching technique Packet switching protocol layers (X.25)

More information

Accelerating Dynamic Binary Translation with GPUs

Accelerating Dynamic Binary Translation with GPUs Accelerating Dynamic Binary Translation with GPUs Chung Hwan Kim, Srikanth Manikarnike, Vaibhav Sharma, Eric Eide, Robert Ricci School of Computing, University of Utah {chunghwn,smanikar,vaibhavs,eeide,ricci}@utah.edu

More information

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Chapter 8: Main Memory. Operating System Concepts 9 th Edition Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

Configuring QoS. Understanding QoS CHAPTER

Configuring QoS. Understanding QoS CHAPTER 29 CHAPTER This chapter describes how to configure quality of service (QoS) by using automatic QoS (auto-qos) commands or by using standard QoS commands on the Catalyst 3750 switch. With QoS, you can provide

More information

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors University of Crete School of Sciences & Engineering Computer Science Department Master Thesis by Michael Papamichael Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors

More information

Operating System Support

Operating System Support Operating System Support Objectives and Functions Convenience Making the computer easier to use Efficiency Allowing better use of computer resources Layers and Views of a Computer System Operating System

More information

Mark Sandstrom ThroughPuter, Inc.

Mark Sandstrom ThroughPuter, Inc. Hardware Implemented Scheduler, Placer, Inter-Task Communications and IO System Functions for Many Processors Dynamically Shared among Multiple Applications Mark Sandstrom ThroughPuter, Inc mark@throughputercom

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Review. Preview. Three Level Scheduler. Scheduler. Process behavior. Effective CPU Scheduler is essential. Process Scheduling

Review. Preview. Three Level Scheduler. Scheduler. Process behavior. Effective CPU Scheduler is essential. Process Scheduling Review Preview Mutual Exclusion Solutions with Busy Waiting Test and Set Lock Priority Inversion problem with busy waiting Mutual Exclusion with Sleep and Wakeup The Producer-Consumer Problem Race Condition

More information

1995 Paper 10 Question 7

1995 Paper 10 Question 7 995 Paper 0 Question 7 Why are multiple buffers often used between producing and consuming processes? Describe the operation of a semaphore. What is the difference between a counting semaphore and a binary

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Network Layer Enhancements

Network Layer Enhancements Network Layer Enhancements EECS 122: Lecture 14 Department of Electrical Engineering and Computer Sciences University of California Berkeley Today We have studied the network layer mechanisms that enable

More information

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure

More information

Medium Access Protocols

Medium Access Protocols Medium Access Protocols Summary of MAC protocols What do you do with a shared media? Channel Partitioning, by time, frequency or code Time Division,Code Division, Frequency Division Random partitioning

More information

Wireless Networks (CSC-7602) Lecture 8 (15 Oct. 2007)

Wireless Networks (CSC-7602) Lecture 8 (15 Oct. 2007) Wireless Networks (CSC-7602) Lecture 8 (15 Oct. 2007) Seung-Jong Park (Jay) http://www.csc.lsu.edu/~sjpark 1 Today Wireline Fair Schedulling Why? Ideal algorithm Practical algorithms Wireless Fair Scheduling

More information

Chapter 8: Virtual Memory. Operating System Concepts

Chapter 8: Virtual Memory. Operating System Concepts Chapter 8: Virtual Memory Silberschatz, Galvin and Gagne 2009 Chapter 8: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:

More information

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS

ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS ADVANCED COMPUTER ARCHITECTURE TWO MARKS WITH ANSWERS 1.Define Computer Architecture Computer Architecture Is Defined As The Functional Operation Of The Individual H/W Unit In A Computer System And The

More information

Using Time Division Multiplexing to support Real-time Networking on Ethernet

Using Time Division Multiplexing to support Real-time Networking on Ethernet Using Time Division Multiplexing to support Real-time Networking on Ethernet Hariprasad Sampathkumar 25 th January 2005 Master s Thesis Defense Committee Dr. Douglas Niehaus, Chair Dr. Jeremiah James,

More information

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel

Chapter-6. SUBJECT:- Operating System TOPICS:- I/O Management. Created by : - Sanjay Patel Chapter-6 SUBJECT:- Operating System TOPICS:- I/O Management Created by : - Sanjay Patel Disk Scheduling Algorithm 1) First-In-First-Out (FIFO) 2) Shortest Service Time First (SSTF) 3) SCAN 4) Circular-SCAN

More information

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1,

Topics for Today. Network Layer. Readings. Introduction Addressing Address Resolution. Sections 5.1, Topics for Today Network Layer Introduction Addressing Address Resolution Readings Sections 5.1, 5.6.1-5.6.2 1 Network Layer: Introduction A network-wide concern! Transport layer Between two end hosts

More information

Credit-Based Fair Queueing (CBFQ) K. T. Chan, B. Bensaou and D.H.K. Tsang. Department of Electrical & Electronic Engineering

Credit-Based Fair Queueing (CBFQ) K. T. Chan, B. Bensaou and D.H.K. Tsang. Department of Electrical & Electronic Engineering Credit-Based Fair Queueing (CBFQ) K. T. Chan, B. Bensaou and D.H.K. Tsang Department of Electrical & Electronic Engineering Hong Kong University of Science & Technology Clear Water Bay, Kowloon, Hong Kong

More information

Top-Level View of Computer Organization

Top-Level View of Computer Organization Top-Level View of Computer Organization Bởi: Hoang Lan Nguyen Computer Component Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

TDDD82 Secure Mobile Systems Lecture 6: Quality of Service

TDDD82 Secure Mobile Systems Lecture 6: Quality of Service TDDD82 Secure Mobile Systems Lecture 6: Quality of Service Mikael Asplund Real-time Systems Laboratory Department of Computer and Information Science Linköping University Based on slides by Simin Nadjm-Tehrani

More information

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Routers. Session 12 INST 346 Technologies, Infrastructure and Architecture

Routers. Session 12 INST 346 Technologies, Infrastructure and Architecture Routers Session 12 INST 346 Technologies, Infrastructure and Architecture Goals for Today Finish up TCP Flow control, timeout selection, close connection Network layer overview Structure of a router Getahead:

More information

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS

PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS THE UNIVERSITY OF NAIROBI DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING FINAL YEAR PROJECT. PROJECT NO. 60 PARALLEL ALGORITHMS FOR IP SWITCHERS/ROUTERS OMARI JAPHETH N. F17/2157/2004 SUPERVISOR:

More information

Objectives and Functions Convenience. William Stallings Computer Organization and Architecture 7 th Edition. Efficiency

Objectives and Functions Convenience. William Stallings Computer Organization and Architecture 7 th Edition. Efficiency William Stallings Computer Organization and Architecture 7 th Edition Chapter 8 Operating System Support Objectives and Functions Convenience Making the computer easier to use Efficiency Allowing better

More information

Quality of Service (QoS)

Quality of Service (QoS) Quality of Service (QoS) The Internet was originally designed for best-effort service without guarantee of predictable performance. Best-effort service is often sufficient for a traffic that is not sensitive

More information

UNIT- 2 Physical Layer and Overview of PL Switching

UNIT- 2 Physical Layer and Overview of PL Switching UNIT- 2 Physical Layer and Overview of PL Switching 2.1 MULTIPLEXING Multiplexing is the set of techniques that allows the simultaneous transmission of multiple signals across a single data link. Figure

More information

Configuring QoS CHAPTER

Configuring QoS CHAPTER CHAPTER 34 This chapter describes how to use different methods to configure quality of service (QoS) on the Catalyst 3750 Metro switch. With QoS, you can provide preferential treatment to certain types

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (8 th Week) (Advanced) Operating Systems 8. Virtual Memory 8. Outline Hardware and Control Structures Operating

More information