A Conflict-Free Memory Banking Architecture for Fast Packet Buffers.

Size: px
Start display at page:

Download "A Conflict-Free Memory Banking Architecture for Fast Packet Buffers."

Transcription

1 A Conflict-Free Memory anking Architecture for Fast Packet uffers. Jorge García, Llorenç Cerdà, and Jesús Coral Polytechnic University of Catalonia - Computer Architecture Dept. fjorge,llorenc,jcoralg@ac.upc.es July, Astract Routers and switches operating at high link rate require large and fast packet uffers. In many cases the uffers store cells from different queues which are requested y an ariter. Such uffers can e uilt with slow ut low cost DRAM coupled with fast ut expensive SRAM. The heads and tails of the queues reside in SRAM, while the rest of the queues are stored in DRAM. Periodically a Memory Management Algorithm requests the transfer of a group of cells of a given queue etween DRAM and SRAM. In this paper we propose a memory architecture that exploits memory ank organization in order to achieve conflict-free access. This leads to a reduction of the granularity of DRAM accesses, resulting in a decrease of storage capacity needed y the SRAM. We carry out an analysis giving the values of the system parameters that guarantee zero miss conditions. Moreover, a technological study of the system implementation shows that, confronted with previous designs [], [], our organization gives etter results for area, access times, latency, and maximum numer of queues. Index Terms Memory uffers, high-speed routers and switches, DRAM-SRAM memory organization, memory management algorithms, lookahead uffers. Methods Keywords System design. I. INTRODUCTION ONE prolem that comes out when designing high speed routers or switches is how to uild high andwidth packet uffers ale of storing a large amount of packets. For a router or switch consisting of n I interfaces with line rate R ps, an input-uffer architecture would require uffers with andwidth of R, while an output-uffer architecture would require uffers with andwidth of (n I +) R. As a rule of thum, packet uffers are dimensioned to store the andwidth-round trip time (RTT) product of the network. For example, for a port operating at Gps (OC-768) and a RTT of.5 s, the required uffer size would e of. Gytes and the This work was supported y the Ministry of Education of Spain under grant TIC--956-C- and TIC98-5C-. required andwidth for an input-uffer architecture would e of 8 Gps. In many cases, packet uffers consist of Q queues that are accessed in the order determined y an ariter. For example, this ariter can e a scheduler of an interconnection network or an scheduler that guarantees the fair share of an output link. A memory architecture for uilding these uffers using slow ut low cost DRAM coupled with fast ut costly SRAM is analyzed in []. In this architecture the line rate can e sustained using several DRAM memory devices in parallel such that cells of length L C of the same queue are accessed every random access time of the DRAM. For example, in case of an input-uffer design if the random access time of the DRAM is T seconds and the line rate is R ps, the relation L C =T R must hold. Figure shows the memory architecture descried in []. In this paper we shall refer to this memory architecture as Random Access DRAM System (RADS). Fig.. t-sram Q t-mma DRAM Q h-mma h-sram Q ariter RADS memory architecture of the packet uffer. The tail of each queue is stored in the SRAM indicated as t-sram, the head is stored in the h-sram and the rest is stored in DRAM. Every time slots the susystem indicated as the tail-memory Management Algorithm (t-mma) selects the queue of the t-sram from which cells are to e transferred to DRAM. Similarly, We assume that the router or switch works internally with fixedlength cells. We shall refer to a slot as the transmission time of a cell.

2 the h-mma decides which queue from the h-sram has to e replenished with cells from DRAM. In this paper we will focus on the SRAM size and latency dimensioning. We will assume that all the queues have een initialized and have always cells in the DRAM memory. Furthermore, we shall suppose that the t-sram and h-sram memories are shared y all the queues, i.e. the cells of a queue can occupy any position in the t-sram and h-sram. This can e implemented e.g. using a single linked list for the cells of every queue. The t-mma has to guarantee that cells from the t-sram are transferred to DRAM so that we have space availale for incoming cells. Using a shared t-sram the t-mma is simple: chose any queue with an occupancy counter higher than or equal to. Furthermore, the required t-sram size is Q ( ) +. Note that this size guarantees that there are one or more queues having at least cells. At every slot the ariter selects the queue that is going to e read. The cell of this queue is taken from the h-sram. The h-mma has to guarantee that cells transferred etween DRAM and h-sram can accommodate the sequence of cells requested y the ariter. Otherwise it may happen that the cell requested y the ariter is not present in the h-sram ecause it has not een already transferred from the DRAM. We shall refer this condition as a miss. We will dedicate the rest of the paper to analyze the h-mma and h-sram, and we shall refer to them simply as MMA and SRAM. There are previous studies related with the analysis carried out in this paper. In [] the authors investigate a scheme that leads to the minimum SRAM size needed for the RADS memory architecture formerly descried. In this scheme the requests issued y the ariter are delayed such that the MMA knows the sequence of requests efore deciding which queue has to e replenished. The numer or ariter requests known y the MMA is called the lookahead. The special case when the lookahead is zero is analyzed in []. The main contriution of our work is a conflict free memory anking architecture that further allows reducing the SRAM size and latency than of the RADS architecture studied in []. We shall refer to our memory architecture as Conflict Free DRAM System (CFDS). We analyze CFDS giving the values of the system parameters that guarantee zero miss conditions. Furthermore, y means of a technological study of the system implementation, we show that for the same performance parameters, CFDS achieves consideraly etter access times and smaller SRAM area than RADS. if the queue has no cells in the DRAM, they can e directly granted to the ariter from t-sram or h-sram There are many proposals dealing with mechanisms to alleviate the prolem of memory conflicts [] or to provide conflict-free access [], [5], [6]. This is specially true in the vector processor domain. The novelty of our technique resides on the application of these mechanisms to the context of fast packet uffering. In packet uffering no ank collision can e produced as packets have to e granted within a ounded delay. II. RANDOM ACCESS DRAM SYSTEM In this section the SRAM dimensioning is analyzed when cells from the same queue are transfered etween the SRAM and the DRAM every random access time of DRAM. Figure shows the MMA scheme considered in this section. In this scheme, the ariter requests are delayed in the lookahead shift register of l positions. At every slot one cell from the queue demanded y the head of the lookahead is read from the SRAM and granted to the ariter. Then the request issued y the ariter is stored in the tail of the lookahead. Every slots the queue previously selected y the MMA is replenished with cells. The first of these cells can e granted to the ariter if they elong to the requested queue and the queue has no cells in the SRAM. Then, the MMA selects a new queue to e replenished. The decisions of the MMA take into account (i) the SRAM occupancy counters (i.e. the numer of cells of every queue present in the SRAM), and (ii) the ariter requests stored in the lookahead. The lookahead allows the MMA to select the queue to e replenished knowing the l requests that are going to e issued in the future. Armed with this information, the MMA can take etter decisions and hence, the SRAM size can e reduced and still guarantee zero miss proaility. For example, suppose that the parameters of the system shown in Figure are: Q =, =, l = 5. Suppose also that the MMA is called with the SRAM occupancy counters and the lookahead values shown in the figure. The MMA should select the queue. This queue would e replenished with cells after slots, and would remain with cells after 5 slots. If the MMA would have selected the queue, a miss would occur for queue after 5 slots. In [] it is shown that the minimum SRAM size necessary to have zero miss proaility is SRAM min = Q ( ) and the required lookahead is l = Q ( )+. Furthermore, the MMA necessary to have zero miss proaility is the so called Earliest Critical Queue First (ECQF-MMA). This algorithm works as follows: read the lookahead from the head (slot ) to the tail (slot l). For every request read from the lookahead, decrease the occupancy counter of the corresponding queue (we shall refer to the resulting value as the excess of the queue). If the

3 DRAM Fig.. SRAM MMA occupancy counters 6 lookahead ariter RADS Memory Architecture. ffl excess is < then the queue is said to e critical. The first queue found to e critical following this algorithm is the queue selected y the ECQF-MMA. Note that a lookahead l = Q ( ) + guarantees that there is always at least one critical queue. The special case of l =is analyzed in []. In this case, the MMA consists of selecting the queue having the lower occupancy counter. The ound SRAM max» Q( + ln Q) is given for the maximum SRAM size necessary to have zero miss proaility. In the following we analyze the intermediate case: < l < Q( ) +. We approach the analysis as follows: Let the system have all the queues with I cells in the SRAM. We shall refer to the deficit of a queue as I minus the occupancy counter of the queue. Let D e the maximum deficit that a queue of the system can reach having zero misses. Then, since any queue can reach this deficit, we have to dimension the SRAM size to QDcells and I = D. We shall use the following MMA: ffl Earliest Critical Queue First (ECQF): First look for the first critical queue (as formerly explained). If no critical queue is found then use the following principle. Maximum Deficit Queue First (MDQF): Take the queue with the maximum deficit at the end of the lookahead, i.e. the queue for which the occupancy counter minus the numer of requests pending in the lookahead is the lowest. If several queues have the same deficit at the end of the lookahead, chose one randomly. This MMA tries to minimize the proaility of unnecessarily refilling a queue, i.e. a queue for which no requests will arrive in the near future. We now compute the maximum deficit. We compute this deficit y finding out the worst case pattern (WCP) of ariter requests, i.e. the pattern that mostly empties the occupancy of any of the queues. Let refer to the queues as q i ;i [;Q ]. For sake of simplicity, lets assume that when the MDQF principle is used and several queues have the same deficit, the MMA chooses the queue of lower index (instead of doing a random selection). Note that this assumption implies only a queue reordering and has no influence on the result. Let q Q e the queue that reaches the maximum deficit. Without loss of generality, let the system start with all the queues having I cells in the SRAM. Clearly, the WCP consists of requests that empties as much as possile the queue q Q, ut forcing the MMA to replenish all the other queues efore replenishing q Q. We claim that the WCP of ariter requests is given y round roins of the queues q i ;i [;Q ] such that the MMA is forced to chose the sequence of replenishments q ;q ;:::;q Q. Every round roin starts with the queue of maximum index such that this sequence is fulfilled and ends with q Q. Note that the queue q Q reaches the largest deficit since its queue length is the only one decreased in all round roins. Furthermore, the queue is reduced to the maximum value since the queue replenishment is delayed as much as possile. ased on this idea, Figure gives an algorithm in C code that computes the exact maximum deficit for all values of the lookahead. int maximum_deficit(int L, int Q, int ) { int nextround, maxdeficit, slot ; } if(l > Q * (-)) { return - ; } if(l > ) { maxdeficit = int((l - )/ Q) + ; slot = maxdeficit * Q ; } else { maxdeficit = ; slot = ; } while() { nextround = int((slot - L) / ) + ; if(nextround >= Q-) { /* There is a RR of only one queue. */ return maxdeficit + Q * - - slot ; } slot += Q - nextround ; if(slot >= Q * ) { return maxdeficit ; } ++maxdeficit ; } Fig.. Algorithm to compute the maximum deficit. The parameters of the function are: lookahead size (L), numer of queues (Q) and numer of cells transfered from DRAM (). A close formula giving a ound for the maximum deficit can e derived ased on the WCP. Assume a fluid model where all the queues can e equally emptied. Figure shows the evolution of the deficit of q Q in this situation. Note that during the first l slots, the ariter requests are round roins of all the queues. This is done to force the MMA to chose q as the first queue to e replenished (this would occur at slot ). Therefore, during

4 deficit slope Q Q l l + l / Q slot Fig.. Evolution of the deficit of qq in the fluid model of the worst case pattern. mechanism can e implemented to avoid memory ank contention, the cycle time of a random DRAM access can e significantly reduced. This translates into the potential for reducing the required data granularity of any DRAM cell access, hence alleviating the requirements of the associated SRAM size. These features are exploited in the memory architecture proposed in section IV. In this section the concept of memory anking is introduced. l slots all the queues are emptied with a slope =Q. After the slot l, the round roin starts at queue q, and thus, the slope is =(Q ) and so on until the slot Qwhere the queue q Q would e replenished. The maximum deficit ( ^D) reached y q Q in this fluid model would e: ^D = Q Q l X + i i= l ; <l» Q: () Mhz ank ank ank M- The maximum deficit in this fluid model is an upper ound to the maximum deficit in the discrete P model we Q are approximating (D). Furthermore, since ln Q l, L= >, wehave: i= l < i D< l (Q Q + ) ln ; <l» Q: () l Figure 5 compares the exact maximum deficit computed with the algorithm of Figure and the ound given y equation (). Rememer that the SRAM size in cells is Q x maximum deficit. Fig. 5. Maximum deficit (cells) Q = 6, = 8 Fluid approx. Exact Lookahead (slots) Tradeoff etween maximum deficit and lookahead. III. POTENTIAL OF ANK INTERLEAVING In response to the growing gap etween processor and memory, DRAM manufacturers have created several new architectures that address the prolem of latency, andwidth and cycle time (e.g. DDR-SDRAM [7] or RAM- US DRDRAM [8]). All these commercial DRAM solutions implement several memory anks that can e addressed independently (as high as 5). If a conflict-free data clk Fig. 6. Organization of a DRAM main memory system in anks: (a) internal structure of a memory ank, () configuration of a DRDRAMstyle memory system. Figure 6 illustrates the concept of memory ank and the concept of an interleaved memory system. A memory ank is a set of memory modules that are always accessed in parallel with the same memory address. The numer of memory modules grouped in parallel is dictated y the size of the data element we want to address. This size is known as data granularity. For instance, in [], a data granularity of = 8 cells is used for DRAM memories with a random access time of 8 ns. This achieves an effective input/output rate of cell every 6 ns (correspondent to OC768). Figure 6 also shows a possile memory ank configuration (similar to that of a DRDRAM-like memory system [8]). In a conventional DRAM memory system, the data is interleaved across all memory anks using a certain policy, and the memory controller is simply responsile of roadcasting the addresses to all of them. Each memory ank has a special logic that determines whether the address identifies a data item that the ank contains or not. Given an array of M memory anks and a random cycle time of T seconds per ank, it is theoretically possile to initiate a new memory access every T=M seconds. Therefore, the data granularity of the packet uffering system ( cells accessed per memory access) can e potentially reduced y a factor of M, hence significantly reducing the need for large and expensive SRAM uffers.

5 5 There are two fundamental limits to the ank interleaving technique. First, the us address speed: that is, the cycle time required to roadcast an address to all memory anks. Second, the prolem of ank collisions. In order to fully exploit the potential andwidth of an interleaved memory system, we need to grant that the same ank is not accessed twice within its random access time (T ). The implementation of conflict-free mechanisms is specially sensile in the context of fast packet uffering, as we need to enforce that no ank collision is ever produced, otherwise, a packet would e lost as a result. IV. CONFLICT FREE DRAM SYSTEM In this section we descrie a novel DRAM memory system that grants conflict-free access with affordale cost. The asis on this system relies on a special memory ank organization that minimizes the likelihood of memory conflicts coupled to a reordering mechanism that schedules the different MMA requests. We shall refer to our memory architecture as Conflict Free DRAM System (CFDS). Figure 7 summarizes the CFDS memory architecture. Four items stand out in CFDS: ffl CFDS exploits the DRAM ank organization. ffl The Virtual SRAM Susystem works exactly as the RADS memory architecture descried in section II, ut using a granularity of <when cells are transfered etween DRAM and SRAM. ffl The DRAM Scheduler Susystem hides the implications of the DRAM ank organization to the former Virtual SRAM Susystem. ffl The DRAM ank organization introduces a cost in terms of latency and SRAM size. These items are discussed in the following susections. DRAM Scheduler Susystem Ongoing Requests Register (ORR) DRAM Fig. 7. DSA Requests Register (RR) SRAM Virtual SRAM Susystem latency V-MMA virtual occupancy counters 6 lookahead ariter CFDS Memory Architecture of the packet uffer. A. DRAM ank Organization Let M e the numer of DRAM anks. We organize these anks in N groups of = anks per group (see Fig- $!%! Fig. 8. "# Group? log N! ank? log (/) Proposed memory ank interleaving. ure 8). Thus, we have N = M=(=) groups. Each group stores the cells of Q=N queues. anks are accessed transferring cells of the same queue. Furthermore, in order to avoid ank conflicts, the cells of every queue are stored in round roin fashion across all the anks of the group (low-order interleaving). Doing this way, we can perform = consecutive access to the same queue (transferring cells overall) without ank conflicts. The distriution of the queues among the maximum numer of groups maximizes the likelihood of finding independent accesses, reducing the hardware requirements to provide conflict-free access. In Figure 8 we can oserve the mapping function used to otain the ank and group indexes. A given request address contains two it fields, one determining the virtual queue and one determining the relative order. The group index is otained using the low-order its of the virtual queue field while the ank inside that group is otained using the low order-its of the ordinal field. The rest of the its are used to determine the row and column addresses of the specified DRAM ank.. Virtual SRAM Susystem The Virtual SRAM Susystem shown in Figure 7 works in the same way as the RADS memory architecture descried in section II, ut assuming that < cells are transfered every DRAM memory access. The requests issued y the ariter are shifted into a lookahead register. Every slots the Virtual MMA (V-MMA) decides the queue to e replenished and release the request to the DRAM Scheduler Susystem. The ariter requests leaving the lookahead register and the V-MMA replenish requests update accordingly the virtual occupancy counters.

6 6 C. DRAM Scheduler Susystem The DRAM Scheduler Susystem shown in Figure 7 manages the transfers etween the DRAM and SRAM. The purpose of this susystem is to guarantee that the replenish requests issued y the V-MMA are fulfilled. To do so, we have introduced the DRAM Scheduler Algorithm (DSA), responsile of solving ank conflicts that could prevent a replenish request. In the following, we give a detailed description of this algorithm. As explained in the former susection, every slots the V-MMA delivers a replenish request to the DRAM Scheduler Susystem which, in turn, initiates a transfer of cells of a queue chosen y the DSA etween DRAM and SRAM. In order to avoid transfers having ank conflicts, the DSA makes use of two registers (see Figure 7): the Requests Register (RR) and the Ongoing Requests Register (ORR). The ORR is a shift register that stores the anks that are eing accessed. Hence, the anks stored in the ORR are locked and a new transfer with these anks cannot e initiated. Taking into account that a ank is locked during slots, we need to consider the latest = ongoing requests. The size of the ORR is hence =. Every slots the DSA stores the replenish request delivered y the V-MMA in the RR and removes one of the pending requests of the RR. This request is passed to the ORR. Oviously, the request removed from the RR must e to a ank different from those stored in the ORR. To do so, the DSA seeks the RR, starting from the oldest request (the head of the RR). In the appendix we demonstrate that a ank satisfying the former condition can always e found if the RR has a size of: L =(Q=N ) (= ) + : () The factor is due to the fact that we have Q incoming streams from the t-sram and Q outcoming streams for the h-sram. Note that in case the DSA always choose the head of the RR, the replenish requests delivered y the V-MMA would suffer a delay of (L ) slots until they are passed to the ORR. If the head of the RR is not always eligile, there will e requests having a delay higher and lower than (L ) slots. In the appendix we demonstrate that the maximum numer of times a request can e delayed is: R max =(Q=N ) (= ): () This delay in slots is R max. D. Cost of the DRAM ank organization The drawack of our proposed conflict-free access mechanism is that the DRAM may deliver the cells out of order (even those from the same queue that are not located in the same ank). Therefore, we have to introduce some mechanisms to cope with that. The replenish requests issued y V-MMA have to e already provided to the SRAM efore the cells are granted to the ariter. Therefore, an additional latency equal to the maximum delay that a replenish request can suffer (due to the DSA reordering) has to e added to the lookahead of the V-MMA. This delay is introduced y the latency shift register shown in Figure 7. From the results formerly, given we have that the size of this register is equal to: latency (in slots) = ((L ) + R max ) = ( Q=N ) (= ): Finally, note that the replenish requests of the SRAM queues are delivered to DRAM when they leave the RR register, ut cells are removed from SRAM when they leave the latency shift register. The mismatch etween these two events requires increasing the SRAM size (in order to store the cells downloaded to the SRAM efore they are granted to the ariter). We conclude that two factors contriute to the SRAM size dimensioning: (i) the maximum deficit (D (L;Q;) ) of the queues in the Virtual SRAM Susystem (computed as explained in section II), and (ii) the additional SRAM size to cope with the mismatch descried aove. Summing oth terms we have: (5) SRAM size (cells) = QD (L;Q;) + R max : (6) As we will see, QD (L;Q;) +R max is in general much lower than D (L;Q;). The issue of out of order insertion in the SRAM uffer will e addressed in section VI. V. EVALUATION OF RADS In this section we will perform an evaluation of different implementation issues concerning the design of the RADS memory architecture descried in section II. We pose the prolem of implementing a SRAM structure ale to handle several queues and we propose some design alternatives. We finally estimate the area and access time and determine the viaility of the proposed SRAM designs. Through all this section, we shall assume OC768 and OC7 links for an input-uffer architecture. For the OC768, we will assume Q = 8. This could correspond to 6 interfaces with 8 service classes. For OC7 we have assumed a more aggressive design with Q = 5. On the other hand, taking into account the link rates of OC768 and OC7, we will set the RADS data granularity () to 8 for the former system and for the latter.

7 7 A. Design of SRAM uffers In order to show how significant the enefits of our proposed CFDS system are, we can evaluate whether a RADS SRAM uffer will e technologically hurdled in the near future. We have used CACTI. [9] to estimate the access time (in ns) and the area (in cm ) of different implementations of the t-sram and h-sram uffers using a. μm technological process. CACTI is an integrated cache access time, cycle time, area, aspect ratio, and power analytical model. The main advantage of CACTI is that, as all these models are integrated together, tradeoffs etween the different parameters are all ased on the same assumptions and hence are mutually consistent. As mentioned in section I, we have assumed that the t- SRAM and h-sram are shared y all the queues. The design of a unified (shared) SRAM uffer is not as trivial as the design of a distriuted (isolated) SRAM uffer, where each queue has its own partition of the availale memory. The second kind of SRAM uffer could e easily implemented as a set of circular queues implemented with simple direct-mapped SRAM structures. On the other hand, in the shared SRAM uffer, we need a mechanism to identify where the n th element of a given queue q i is exactly placed. Intuitively, this is analog to the design of Q linked lists, where the next cell to access y the ariter is located at the head of the correspondent list, and the next cell to store coming from the DRAM is placed at the tail of the correspondent list. Figure 9 shows different alternatives to implement a unified SRAM uffer. The gloal CAM consists of a full content-addressale memory where all the cells are stored together. Each cell has a tag that identifies the queue relative to that cell, and the relative order inside the list of cells of that same queue. When the address (queue identifier and order) of the cell is set, the CAM searches across all entries for the related cell. This implementation requires two ports (one for reading and one for writing) or one read/write port if the operation is going to e timemultiplexed (i.e. the operations of reading and writing from the CAM are serialized). The time multiplexed alternative is slower ut requires less area. Note that we assume that the the refreshes from the DRAM are serialized along time slots at a rate of one cell per slot. The pointer CAM is an alternative to the previous proposal. As a cell is relatively ig (6 ytes), for certain configurations it may pay off to store pointers into the CAM instead of whole cells. This pointers are actually the position where the real cell is stored, in a more simple direct-mapped structure. This alternative has the potential to reduce the overall area, ut at the cost of increasing the access time (as the CAM has to e accessed to otain the pointer to the direct-mapped SRAM structure). Again. this implementation requires one read port and one write port (or one single read/write port for a time-multiplexed configuration) for oth the CAM and the direct-mapped memories. Finally, the unified linked-list proposal is a straightforward implementation of Q linked lists onto a directmapped memory structure. Each entry of that directmapped SRAM contains one cell and a pointer to the next cell (another entry of the same structure). In order to e ale to identify the head and the tail of each linked list, we have another direct-mapped structure that stores the head and tail pointers for each of the queues. The advantage of this implementation is that it does not require complex CAM implementation. However, it requires one additional write operation to store the position of the new tail onto the pointer field of the old tail. As a result, the structure needs one read port and two write ports per structure (or one single read/write port for a time-multiplexed configuration). cam cam direct mapped log Q log S log Q log S direct-mapped direct mapped Fig. 9. SRAM design alternatives: (a) gloal CAM, () pointer CAM and (c) unified linked-list.. RADS SRAM uffer performance Figure shows the access time and the area of the different SRAM implementations (regular and timemultiplexed) for OC768 and OC7 as a function of the numer of slots of the lookahead. The OC768 system SRAM size ranges etween k (for the minimum lookahead) and 6 k (for the maximum lookahead). The OC7 system SRAM size ranges etween 6. M (for the minimum lookahead) and. M (for the maximum). For a OC768 system, we need to access a new cell every.8 ns (assuming 6 ytes wide cells). We can oserve

8 8 OC768 - SRAM access time OC768 - SRAM area access time (ns) delay (slots) gloal CAM pointer CAM unified linked list gloal CAM (time-mux) pointer CAM (time-mux) unified linked list (time-mux) area (cm^) delay (slots) gloal CAM pointer CAM unified linked list gloal CAM (time-mux) pointer CAM (time-mux) unified linked list (time-mux) OC7 - SRAM access time OC7 - SRAM area access time (ns) delay (slots) gloal CAM pointer CAM unified linked list gloal CAM (time-mux) pointer CAM (time-mux) unified linked list (time-mux) area (cm^) delay (slots) gloal CAM pointer CAM unified linked list gloal CAM (time-mux) pointer CAM (time-mux) unified linked list (time-mux) Fig.. h-sram area and access time as a function of the lookahead for the RADS scheme (Q=8, =8 for OC768 and Q=5, = for OC7). from figure that the access time of all SRAM alternatives are way under that numer, even for the shortest lookahead. Therefore, as access time is not a concern, we shall focus on those implementations with minimum area. For instance, the regular gloal CAM and unified linked-list implementations have an overall area of more than. cm for lookaheads shorter than - slots, which could represent a significant fraction of the overall transistor udget of a medium cost system. On the other hand, time-multiplexed approaches such as the time-mux unified linked list exhiits an overall area of. cm with very small lookaheads. For the rest of the paper, we will use the time-mux unified linked list SRAM for all OC768 systems. For an OC7 system, we need to access a new cell every. ns, which is a significantly harder constraint to meet, taking into account that the SRAM uffers are now larger. Indeed, if we look at figure, we can clearly appreciate the fact no SRAM implementation is ale to comply with the. ns target (not even for the longest lookaheads). The fastest implementation is the gloal CAM which has an access time slightly higher than 7 ns. Furthermore, if we look at the area results for the different alternatives, we can oserve that only the time-multiplexed SRAMs have areas smaller than cm, which is already a significant fraction of the overall transistor udget of a high-end system (as we need oth h-sram and t-sram uffers). As the access time is clearly the constraint for OC7, for the rest of the paper we will use the gloal CAM implementation (the fastest) for evaluation purposes. VI. EVALUATION OF CFDS In this section we will perform an extensive evaluation of different implementation issues concerning the design of the CFDS memory architecture descried in section IV. We will discuss the design of the request register scheduling logic, as well as the required modifications for the SRAM uffers. We finally compare the performance (in terms of area and access time) of RADS and CFDS memory architectures. A. Implementation issues of the DRAM Scheduler Susystem As already descried in section IV-C, the Requests Register (RR) is a special lookahead of requests to the DRAM memory system. The main function of this register is to

9 9 store the requests so the DSA can schedule them in order to grant conflict-free access to the DRAM memory anks. The scheduling algorithm that is used to select the request to e serviced is very simple. The Ongoing Requests Register (ORR) contains the anks that are currently accessing a lock of cells ((=) ). The scheduler is responsile of finding the oldest request from the RR that does not access to either of the anks contained in the ORR. This logic prolem is actually equivalent to the design of instruction issue windows in out-of-order superscalar processors [], [], where we have to select a set of instructions that are free of dependences ased on the roadcast of results already availale each cycle. (a) Requests register select lock select lock select lock select lock () select lock Fig.. Implementation of the request register: (a) wake-up logic, () selection logic. Figure shows a possile design for the RR scheduling logic, ased on a conventional issue window implementation. The logic is ased on sending the tags from the ORR (that is, the (=) indexes of the anks eing accessed) across all the entries of the Requests Register. Each entry is responsile of determining whether the ank index corresponding to the request is different to all the indexes coming from the ORR. In this case, the entry is marked as ready. This stage is known as the wake-up, and is performed every time we want to select a new request candidate. After the wake-up stage, the logic needs to select the oldest entry that is in ready state (i.e. that can access the DRAM array with no ank conflicts). In order to do so, we can use a hierarchical selection logic that first propagates the ready signal across the tree, to send ack the selection signal afterwards using priority encoders. This stage is known as the selection stage. Finally, once the request has een selected, some mechanism to perform compaction of requests is required in order to maintain the ordering of the requests y age. Tale I shows the RR size for different values of and the time availale to perform the scheduling of one single = =6 =8 = = = OC768 RR size Scheduling time (ns) OC9 RR size Scheduling time (ns) TALE I REQUESTS REGISTER SIZE AND TIME TO PERFORM THE SCHEDULING OF A NEW REQUEST. request. In order to figure out the technological hurdles of those implementations, we may look at a current commercial processor: the Alpha 6 implements, with.5 CMOS process, a entry issue queue ale to select up to four instructions in less than ns [], using.5 cm of the overall die area. From this ase result, we can conclude that the implementation of the RR logic for OC768 is fairly trivial, since even for = we have.8 ns to search in a RR of length 6. For OC7, the design is attainale for values of higher than, and possile (yet extremely aggressive) for =. The implementation of the RR scheduling logic for OC7 and = is certainly of difficult viaility.. Design of CFDS SRAM uffers In order to implement a SRAM uffer for any CFDS configuration, we have to oserve two main issues. First, the cells coming from the DRAM memory system of a given queue may come out-of-order. Second, the SRAM must contain R max additional entries to e ale to hold elements efore there were scheduled y the MMA. The first prolem can e easily overcome y implementing some asic changes on our proposed SRAM structures to allow them to insert cells from a queue out of its natural order: ffl gloal CAM: Implementation of writing operations out-of-order is trivial in this configuration, as only more its are required in the ordinal tag field. ffl pointer CAM: Again, implementation of writing operations out-of-order is very simple y increasing the tag its used to identify the order of the cell inside the queue. ffl unified linked-list: Out-of-order writing operations is complex inside a linked-list. However, an easy solution is to implement Q (=) linked-lists instead of Q, as= is the numer of anks per group and two operations over the same ank are always performed in strict order. ) Performance comparison: Figure allows us to demonstrate the performance enefits of using CFDS instead of a asic RADS approach. The figure shows the

10 area (of oth h-sram and t-sram) and most restricting access time for OC768 and OC7 in function of the lookahead delay (measured in μ-seconds). Again, the numer of queues Q is 8 for the OC768 system and 5 for the OC7 system. The curves with a data granularity of = 8 for OC768 and = for OC7 correspond to the RADS implementation. The rest of the curves correspond to different CDFS configurations varying the value of. We assume the numer of anks M to e 56. There are two main conclusions that can e inferred from results on figure. First, it can e seen the evident advantages of CFDS relative to RADS. For a OC768 (where area is the main factor of merit), a CFDS system with =achieves an area.8x smaller than RADS with a lookahead delay times shorter ( μs instead of 8). The specially significant gains, however, come from the OC7 system. A CFDS system with = is compliant with the requirements of uffering packets at 6G/s, as the access time is lower than. ns. Moreover, this is accomplished with a modest lookahead delay ( μs) and an affordale area (.6 cm overall). This contrasts heavily with its RADS counterpart, which is hardly unale to access data in 7 ns, even with a delay of more than 5 μs and the non-irrelevant area of cm (almost the size of the Pentium IV). The second important conclusion is that there is an optimal value of for a given CFDS implementation. The reason is the trade-off etween the SRAM size required to tolerate the unpredictaility of arrivals from the ariter D (L;Q;) which is proportional to, see equation () and the SRAM size required to asor the level of reordering of the accesses from the DRAM R max which is proportional to =, see equation (). Max # queues Qmax RADS Qmax CFDS OC768 8 Max # queues Qmax RADS Qmax CFDS OC7 6 8 Fig.. Maximum numer of queues attainale for RADS and CFDS with different configurations (using maximum lookahead and complying OC768 and OC7 time constraints) ) Potential numer of queues: An important parameter is the numer of queues that can e supported y our system []. Figure shows the maximum numer of queues that the different SRAM uffers approaches can afford taking into account the access time constraints (.8 ns of maximum delay for OC768 and. ns for OC7). The first column ar (where = ) corresponds to a RADS implementation, while the rest of the columns corresponds to CFDS implementations as we reduce the data granularity. As shown in the figure, CFDS allow.5 more queues for OC768 and 6 times more queues for OC7. VII. CONCLUSIONS In this paper we have proposed a novel architecture targeted at fast packet uffering. In order to overcome the andwidth prolems of current commodity DRAM memory systems, the use of SRAM coupled to DRAM have een proposed in the past. Those SRAM memories act as ingress and egress uffers to allow wide transfers etween the uffering system and its main DRAM memory system. The main drawack of this organization is that the data granularity of the DRAM accesses have to e enlarged to sidestep the high cycle times of a single DRAM ank. As a result, the SRAM memories could e too large and slow for very high link rates. In our proposal, the memory ank organization is exploited in order to achieve conflict-free access to DRAM. This leads to a reduction of the granularity of the DRAM accesses, resulting in a decrease of storage capacity needed y the SRAM. The main cost of the access-free organization scheme is that cells can e transfer from DRAM to SRAM out of order. We carry out an analysis that shows that the reordering introduced y the access scheme is ounded and that zero miss conditions can e guaranteed. Moreover, a technological study of the system implementation shows that our organization gives etter results for area, access times, latency, and maximum numer of queues, than the previously proposed designs. For instance, our proposed mechanism allows implementing SRAM times smaller than those required for the aseline system for OC768. Moreover, our proposal allows SRAM uffers to e implemented fast enough to perform packet uffering at OC7 link rates, with reasonale area cost. Another contriution is the oservation that the SRAM size has two components: The first one (QD (L;Q;) )is proportional to the data granularity of the DRAM access. The second, the degree of reordering (R max ) is actually inversely proportional to the data granularity. As a result there is an optimum value of (generally greater than ) that minimizes the SRAM size.

11 OC768 - SRAM access tim OC768 - SRAM area access time (ns) 6 5 RADS =8 = = = area (cm^).5. RADS =8 = = = CFDS.5 CFDS.E+.E-6.E-6.E-6.E-6 5.E-6 6.E-6 7.E-6 8.E-6 9.E-6.E-5 delay (s).e+.e-6.e-6.e-6.e-6 5.E-6 6.E-6 7.E-6 8.E-6 9.E-6.E-5 delay (s) OC7 - SRAM access time OC7 - SRAM area RADS RADS 9 access time (ns) = =6 =8 = = = area (cm^) = =6 =8 = = = 5 CFDS CFDS.E+ 5.E-6.E-5.5E-5.E-5.5E-5.E-5.5E-5.E-5.5E-5 5.E-5 5.5E-5 delay (s).e+ 5.E-6.E-5.5E-5.E-5.5E-5.E-5.5E-5.E-5.5E-5 5.E-5 5.5E-5 delay (s) Fig.. SRAM area (oth h-sram and t-sram) and access time, as a function of the delay, for the RADS scheme and several CFDS configurations. Delay for RADS = lookahead delay. Delay for CFDS = lookahead delay + latency. APPENDIX In this appendix we derive (i) the size L of the RR register that allows avoiding ank conflicts in the CFDS memory architecture, and (ii) the maximum numer of times R max that a request can e delayed in the RR register (see section IV). Let us define A(j; n) to e the numer of requests issued y the MMA to the memory ank j during n consecutive -slots. A consequence of the CFDS memory organization proposed in section IV is the following: Property : For the memory anks elonging to a group of memory anks we have: max j:;:::; fa(j; n)g min j:;:::; fa(j; n)g»min(n; Q N ): where Q = Q as we are taking into account requests from oth tail and head V-MMA. We know that A(j; n)» n, hence: P j:;:::; A(j; n)» minfn; Q N ( )+n g: (7) We define a -slot to e time-slots Note: This is equivalent to a leaky-ucket rate controlled source with parameters ff = Q N ( ) and ρ = and C =(see [], equation 7). A direct consequence of this property is: Property : If A(j; n) =Q =N + k, then for the other anks of the group (i :;:::; ; i 6= j): A(i; n) k. Let us now define R(j; n; m) to e the numer of requests to the memory ank j stored etween positions n and m of registers OOR and RR (we are considering the OOR to e attached to the head of RR. Positions ;:::; correspond to requests stored in OOR, while positions ;::: correspond to requests stored in RR. We are also assuming that the length of RR is infinite. Property : For the memory anks elonging to the same group we have: max R(j; n; m) min R(j; n; m) j:;:::; j:;:::; (8)» min(m n; Q N ): Proof: Let us consider the initial state of the system. In this state the OOR is empty and the first request arrives to the head of the RR. At this instant the RR content is

12 identical to the sequence of requests issued y the MMA, and thus, using Property, it is easy to see that Property holds. Let t e the first -slot at which Property does not hold. For some anks j and i elonging to the same group of memory anks and some interval of length n, the following relation holds at -slot t: R(j; n; m) R(i; n; m) = Q +: (9) N Here we can assume that requests placed at positions n and m are addressed to ank j. Note that if Property was true at t -slot, then we should have issued a request for i that was placed etween positions n and m +. However, this is only possile if a request to ank j was at position p» (i.e. in the ORR). Otherwise we should have chosen the request for memory ank j which is placed at the n position of ORR+RR, instead of request to i which is located etween positions n and m +. Consequently, at t -slot we would have R(j; p + ;m+) R(j; n; m +)+and thus, R(i; p +;n ). However, this is a contradiction since in this case the request to memory ank i placed etween n and m+ at -slot t P would not e eligile. Using j:;:::; R(j; n; m)» m n we arrive to the following property: Property : If R(j; n; m) = Q =N + k, then for the other anks of the group (i : ;:::; ; i 6= j): R(i; n; m) k. Now we can proof the following theorem: Theorem : The minimum length L of the RR register that allows avoiding ank conflicts is given y: L =( Q N ) ( ) + ; () Proof: Note that in the ORR we have different requests. If there is no any eligile request for the next -slot, then in the RR we would only have requests to these memory anks. At least for one ank we would have R(j; ;L+ Q ) +. N From Property we deduce that for the rest of anks of the group there would e in OOR+RR at least one request, which is in contradiction with the fact that only anks have requests in the ORR+RR. Theorem : The maximum numer of times R max that a request can e delayed in the RR register is given y: R max =( Q N ) ( ) () Proof: Assume that in ORR + RR there are no requests to memory ank j. Assume further that the first request to memory ank j enters into the RR at -slot t, and that the n th request to ank j enters into this register at -slot t n, with t <t < :::. The first request to ank j will not suffer any additional delay and it will e issued to OOR at most L -slots after eing requested y the MMA. If during (t ;t ) there are requests to k different memory anks, the second request to ank j would suffer a delay lower than ( k)+ -slots. The maximum delay is produced when t = t +. In this case the second request to memory ank j can suffer a delay of ( Q ) -slots. In fact, if n < N, then the maximum delay occurs when t n = t + n and this request can suffer a delay of n ( Q )» ( N ) ( ) -slots. If n Q N, then from Property we have at least n Q N request to different memory anks. Consequently, until request n Q to ank j is issued, this N request could not suffer any further delay. Hence, when request n Q N is issued, the delay of request n can e at most ( Q N ) ( ) -slots. REFERENCES [] S. Iyer, R. Kompella, and N. McKeown, Analysis of a memory architecture for fast packet uffers, in IEEE Workshop on High Performance Switching and Routing, Dallas, Texas.,. [] S. Iyer, R. Kompella, and N. McKeown, Techniques for fast packet uffers, in Gigait Networking Workshop, Anchorage, Alaska, USA,. []. R. Rau, M. S. Schlansker, and D. W. L. Yen, The cydra 5 stride-insensitive memory system, ICPP, pp. 6, 989. [] M. Valero, T. Lang, J. M. LLaeria, M. Peiron, E. Ayguade, and J. J.Navarro, Increasing the numer of strides for conflict-free vector access, ISCA-9, pp. 7 8, May 99. [5] Mateo Valero, Tomas Lang, and Eduard Ayguade, Conflict-free access of vectors with power-of-two strides, ICS, July 99. [6] D. T. Harper III and J. R. Jump, Performance evaluation of vector accesses in parallel memories using a skewed storage scheme, ISCA-, pp. 8, 986. [7] Hitachi, Hitachi 66 mhz sdram, Hitachi HM557XX series, p. modules.htm,. [8] Richard Crisp, Direct ramus technology: The new main memory standard, IEEE Micro, vol. 7, pp. 8 8, Novemer/Decemer 997. [9] P. Shivakumar and N. P. Jouppi, Cacti.: An integrated cache timing, power and area model, Technical report, Compaq Computer Corp., August [] S. Palacharla, N. P. Jouppi, and J. E. Smith, Complexityeffective superscalar processors, in ISCA-, 997. [] D. S. Henry,. C. Kuszmaul, G. H. Loh, and R. Sami, Circuits for wide-window superscalar processors, ISCA-7,. [] R.E. Kessler, The alpha 6 microprocessor, IEEE Micro, pp. 6, March-April 999. [] W. ux, W.E. Denzel, T. Engersen, A. Herkersdorf, and R.P. Luijten, Technologies and uilding locks for fast packet forwarding, IEEE Communications Magazine, January. [] A. K. Parekh and R. G. Gallager, A generalized processor sharing approach to flow control in integrated services networks: The multiple-node case, IEEE/ACM Transactions on Networking, vol., no., pp. 7 5, 99.

/06/$20.00 c 2006 IEEE I. INTRODUCTION

/06/$20.00 c 2006 IEEE I. INTRODUCTION Scalale Router Memory Architecture Based on Interleaved DRAM Feng Wang and Mounir Hamdi Computer Science Department of The Hong Kong University of Science and Technology {fwang, hamdi}@cs.ust.hk Astract

More information

Design and Implementation of High-Performance Memory Systems for Future Packet Buffers

Design and Implementation of High-Performance Memory Systems for Future Packet Buffers Design and Implementation of High-Performance Memory Systems for Future Packet Buffers Jorge García, Jesús Corbal Λ,LlorençCerdà, and Mateo Valero Polytechnic University of Catalonia - Computer Architecture

More information

Arouter is a network node connecting several transmission

Arouter is a network node connecting several transmission 588 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006 A DRAM/SRAM Memory Scheme for Fast Packet Buffers Jorge García-Vidal, Member, IEEE, Maribel March, Llorenç Cerdà, Member, IEEE, Jesús Corbal,

More information

Response time analysis in AFDX networks

Response time analysis in AFDX networks Response time analysis in AFDX networks J. Javier Gutiérrez, J. Carlos Palencia, and Michael González Harour Computers and Real-Time Group, Universidad de Cantaria, 39005-Santander, SPAIN {gutierjj, palencij,

More information

CONFLICT-FREE ACCESS OF VECTORS WITH POWER-OF-TWO STRIDES

CONFLICT-FREE ACCESS OF VECTORS WITH POWER-OF-TWO STRIDES CONFLICT-FREE ACCESS OF VECTORS WITH POWER-OF-TWO STRIDES Mateo Valero, Tomás Lang and Eduard Ayguadé Departament d Arquitectura de Computadors, Universitat Politècnica de Catalunya c/ Gran Capità s/n,

More information

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches Xike Li, Student Member, IEEE, Itamar Elhanany, Senior Member, IEEE* Abstract The distributed shared memory (DSM) packet switching

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router

Scheduling. Scheduling algorithms. Scheduling. Output buffered architecture. QoS scheduling algorithms. QoS-capable router Scheduling algorithms Scheduling Andrea Bianco Telecommunication Network Group firstname.lastname@polito.it http://www.telematica.polito.it/ Scheduling: choose a packet to transmit over a link among all

More information

Vertex 3-colorability of claw-free graphs

Vertex 3-colorability of claw-free graphs R u t c o r Research R e p o r t Vertex 3-coloraility of claw-free graphs M. Kamiński a V. Lozin RRR 8 2007, Feruary 2007 RUTCOR Rutgers Center for Operations Research Rutgers University 640 Bartholomew

More information

Cache Justification for Digital Signal Processors

Cache Justification for Digital Signal Processors Cache Justification for Digital Signal Processors by Michael J. Lee December 3, 1999 Cache Justification for Digital Signal Processors By Michael J. Lee Abstract Caches are commonly used on general-purpose

More information

Stretch-Optimal Scheduling for On-Demand Data Broadcasts

Stretch-Optimal Scheduling for On-Demand Data Broadcasts Stretch-Optimal Scheduling for On-Demand Data roadcasts Yiqiong Wu and Guohong Cao Department of Computer Science & Engineering The Pennsylvania State University, University Park, PA 6 E-mail: fywu,gcaog@cse.psu.edu

More information

FUTURE communication networks are expected to support

FUTURE communication networks are expected to support 1146 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 13, NO 5, OCTOBER 2005 A Scalable Approach to the Partition of QoS Requirements in Unicast and Multicast Ariel Orda, Senior Member, IEEE, and Alexander Sprintson,

More information

Recent Results from Analyzing the Performance of Heuristic Search

Recent Results from Analyzing the Performance of Heuristic Search Recent Results from Analyzing the Performance of Heuristic Search Teresa M. Breyer and Richard E. Korf Computer Science Department University of California, Los Angeles Los Angeles, CA 90095 {treyer,korf}@cs.ucla.edu

More information

Scheduling Algorithms to Minimize Session Delays

Scheduling Algorithms to Minimize Session Delays Scheduling Algorithms to Minimize Session Delays Nandita Dukkipati and David Gutierrez A Motivation I INTRODUCTION TCP flows constitute the majority of the traffic volume in the Internet today Most of

More information

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy.

Eastern Mediterranean University School of Computing and Technology CACHE MEMORY. Computer memory is organized into a hierarchy. Eastern Mediterranean University School of Computing and Technology ITEC255 Computer Organization & Architecture CACHE MEMORY Introduction Computer memory is organized into a hierarchy. At the highest

More information

Designing Packet Buffers for Router Linecards

Designing Packet Buffers for Router Linecards Designing Packet Buffers for Router Linecards Sundar Iyer, Ramana Rao Kompella, Nick McKeown Computer Systems Laboratory, Stanford University, Ph: (650)-725 9077, Fax: (650)-725 6949 Stanford, CA 94305-9030

More information

The Evolution of Quality-of-Service on the Internet. The Evolution of Quality-of-Service. on the Internet

The Evolution of Quality-of-Service on the Internet. The Evolution of Quality-of-Service. on the Internet University of North Carolina at Chapel Hill Research Context Network support for immersive DVEs The Evolution of Quality-of-Service Kevin Jeffay Department of Computer Science Feruary 2001 1 The Office

More information

Maximum and Asymptotic UDP Throughput under CHOKe

Maximum and Asymptotic UDP Throughput under CHOKe Maximum and Asymptotic UDP Throughput under CHOKe Jiantao Wang Ao Tang Steven H. Low California Institute of Technology, Pasadena, USA {jiantao@cds., aotang@, slow@}caltech.edu ABSTRACT A recently proposed

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Episode 5. Scheduling and Traffic Management

Episode 5. Scheduling and Traffic Management Episode 5. Scheduling and Traffic Management Part 2 Baochun Li Department of Electrical and Computer Engineering University of Toronto Keshav Chapter 9.1, 9.2, 9.3, 9.4, 9.5.1, 13.3.4 ECE 1771: Quality

More information

Ripple-Precharge TCAM: A Low-Power Solution for Network Search Engines

Ripple-Precharge TCAM: A Low-Power Solution for Network Search Engines Ripple-Precharge TCAM: A Low-Power Solution for Network Search Engines Deepak S Vijayasarathi, Mehrdad Nourani, Mohammad J. Akharizadeh, Poras T. Balsara Center for Integrated Circuits & Systems The University

More information

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006

2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 2386 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 6, JUNE 2006 The Encoding Complexity of Network Coding Michael Langberg, Member, IEEE, Alexander Sprintson, Member, IEEE, and Jehoshua Bruck,

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

THE CAPACITY demand of the early generations of

THE CAPACITY demand of the early generations of IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 3, MARCH 2007 605 Resequencing Worst-Case Analysis for Parallel Buffered Packet Switches Ilias Iliadis, Senior Member, IEEE, and Wolfgang E. Denzel Abstract

More information

EXTENDING THE PRIORITY CEILING PROTOCOL USING READ/WRITE AFFECTED SETS MICHAEL A. SQUADRITO A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE

EXTENDING THE PRIORITY CEILING PROTOCOL USING READ/WRITE AFFECTED SETS MICHAEL A. SQUADRITO A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE EXTENDING THE PRIORITY CEILING PROTOCOL USING READ/WRITE AFFECTED SETS BY MICHAEL A. SQUADRITO A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER

More information

Performance Analysis of Hierarchical Mobile IPv6 in IP-based Cellular Networks

Performance Analysis of Hierarchical Mobile IPv6 in IP-based Cellular Networks Performance Analysis of Hierarchical Mobile IPv6 in IP-based Cellular Networks Sangheon Pack and Yanghee Choi School of Computer Science & Engineering Seoul National University Seoul, Korea Abstract Next-generation

More information

An introduction to SDRAM and memory controllers. 5kk73

An introduction to SDRAM and memory controllers. 5kk73 An introduction to SDRAM and memory controllers 5kk73 Presentation Outline (part 1) Introduction to SDRAM Basic SDRAM operation Memory efficiency SDRAM controller architecture Conclusions Followed by part

More information

Comparing Multiported Cache Schemes

Comparing Multiported Cache Schemes Comparing Multiported Cache Schemes Smaїl Niar University of Valenciennes, France Smail.Niar@univ-valenciennes.fr Lieven Eeckhout Koen De Bosschere Ghent University, Belgium {leeckhou,kdb}@elis.rug.ac.be

More information

Binary Decision Diagrams (BDDs) Pingqiang Zhou ShanghaiTech University

Binary Decision Diagrams (BDDs) Pingqiang Zhou ShanghaiTech University Binary Decision Diagrams (BDDs) Pingqiang Zhou ShanghaiTech University Computational Boolean Algera Representations Applying unate recursive paradigm (URP) in solving tautology is a great warm up example.

More information

Worst-case Ethernet Network Latency for Shaped Sources

Worst-case Ethernet Network Latency for Shaped Sources Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, SMSC 7th October 2005 Contents For 802.3 ResE study group 1 Worst-case latency theorem 1 1.1 Assumptions.............................

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 20: Main Memory II. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 20: Main Memory II Prof. Onur Mutlu Carnegie Mellon University Today SRAM vs. DRAM Interleaving/Banking DRAM Microarchitecture Memory controller Memory buses

More information

A High Performance Bus Communication Architecture through Bus Splitting

A High Performance Bus Communication Architecture through Bus Splitting A High Performance Communication Architecture through Splitting Ruibing Lu and Cheng-Kok Koh School of Electrical and Computer Engineering Purdue University,West Lafayette, IN, 797, USA {lur, chengkok}@ecn.purdue.edu

More information

Memory Placement in Network Compression: Line and Grid Topologies

Memory Placement in Network Compression: Line and Grid Topologies ISITA212, Honolulu, Hawaii, USA, October 28-31, 212 Memory Placement in Network Compression: Line and Grid Topologies Mohsen Sardari, Ahmad Beirami, Faramarz Fekri School of Electrical and Computer Engineering,

More information

Comp Online Algorithms

Comp Online Algorithms Comp 7720 - Online Algorithms Assignment 1: Introduction, Searching & List Update Shahin Kamalli University of Manitoa - Fall 2018 Due: Thursday, Octoer 4th at 11:59 pm Octoer 29, 2018 The real prolem

More information

Misbehavior and MAC Friendliness in CSMA Networks

Misbehavior and MAC Friendliness in CSMA Networks Misehavior and MAC Friendliness in CSMA Networks Zhefu Shi, Cory Beard, Ken Mitchell, University of Missouri-Kansas City Contact: eardc@umkcedu Astract Nodes using standardized, contention-ased CSMA protocols

More information

Analyzing the Receiver Window Modification Scheme of TCP Queues

Analyzing the Receiver Window Modification Scheme of TCP Queues Analyzing the Receiver Window Modification Scheme of TCP Queues Visvasuresh Victor Govindaswamy University of Texas at Arlington Texas, USA victor@uta.edu Gergely Záruba University of Texas at Arlington

More information

Module 17: "Interconnection Networks" Lecture 37: "Introduction to Routers" Interconnection Networks. Fundamentals. Latency and bandwidth

Module 17: Interconnection Networks Lecture 37: Introduction to Routers Interconnection Networks. Fundamentals. Latency and bandwidth Interconnection Networks Fundamentals Latency and bandwidth Router architecture Coherence protocol and routing [From Chapter 10 of Culler, Singh, Gupta] file:///e /parallel_com_arch/lecture37/37_1.htm[6/13/2012

More information

Optimizing Memory Bandwidth of a Multi-Channel Packet Buffer

Optimizing Memory Bandwidth of a Multi-Channel Packet Buffer Optimizing Memory Bandwidth of a Multi-Channel Packet Buffer Sarang Dharmapurikar Sailesh Kumar John Lockwood Patrick Crowley Dept. of Computer Science and Engg. Washington University in St. Louis, USA.

More information

LECTURE 11. Memory Hierarchy

LECTURE 11. Memory Hierarchy LECTURE 11 Memory Hierarchy MEMORY HIERARCHY When it comes to memory, there are two universally desirable properties: Large Size: ideally, we want to never have to worry about running out of memory. Speed

More information

Designing Packet Buffers from Slower Memories

Designing Packet Buffers from Slower Memories Chapter 7: Designing Packet Buffers from Slower Memories Apr 2008, Petaluma, CA Contents 7.1 Introduction...183 7.1.1 Where Are Packet Buffers Used?....184 7.2 Problem Statement............................

More information

Virtual Memory: Policies. CS439: Principles of Computer Systems March 5, 2018

Virtual Memory: Policies. CS439: Principles of Computer Systems March 5, 2018 Virtual Memory: Policies CS439: Principles of Computer Systems March 5, 28 Last Time Overlays Paging Pages Page frames Address translation Today s Agenda Paging: Mechanisms Page Tales Page Faults Paging:

More information

SigmaRAM Echo Clocks

SigmaRAM Echo Clocks SigmaRAM Echo s AN002 Introduction High speed, high throughput cell processing applications require fast access to data. As clock rates increase, the amount of time available to access and register data

More information

An Enhanced Dynamic Packet Buffer Management

An Enhanced Dynamic Packet Buffer Management An Enhanced Dynamic Packet Buffer Management Vinod Rajan Cypress Southeast Design Center Cypress Semiconductor Cooperation vur@cypress.com Abstract A packet buffer for a protocol processor is a large shared

More information

Improving VoD System Efficiency with Multicast and Caching

Improving VoD System Efficiency with Multicast and Caching Improving VoD System Efficiency with Multicast and Caching Jack Yiu-bun Lee Department of Information Engineering The Chinese University of Hong Kong Contents 1. Introduction 2. Previous Works 3. UVoD

More information

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735. Volume 7, Issue 2 (Jul. - Aug. 2013), PP 13-18 A Novel Approach of Area-Efficient FIR Filter

More information

Efficient Queuing Architecture for a Buffered Crossbar Switch

Efficient Queuing Architecture for a Buffered Crossbar Switch Proceedings of the 11th WSEAS International Conference on COMMUNICATIONS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007 95 Efficient Queuing Architecture for a Buffered Crossbar Switch MICHAEL

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

Chapter 2: Memory Hierarchy Design Part 2

Chapter 2: Memory Hierarchy Design Part 2 Chapter 2: Memory Hierarchy Design Part 2 Introduction (Section 2.1, Appendix B) Caches Review of basics (Section 2.1, Appendix B) Advanced methods (Section 2.3) Main Memory Virtual Memory Fundamental

More information

Maximum and Asymptotic UDP Throughput under CHOKe

Maximum and Asymptotic UDP Throughput under CHOKe Maximum and Asymptotic UDP Throughput under CHOKe Jiantao Wang Ao Tang Steven H. Low California Institute of Technology, Pasadena, USA {jiantao@cds., aotang@, slow@}caltech.edu ABSTRACT A recently proposed

More information

AODV-PA: AODV with Path Accumulation

AODV-PA: AODV with Path Accumulation -PA: with Path Accumulation Sumit Gwalani Elizabeth M. Belding-Royer Department of Computer Science University of California, Santa Barbara fsumitg, ebeldingg@cs.ucsb.edu Charles E. Perkins Communications

More information

THERE are a growing number of Internet-based applications

THERE are a growing number of Internet-based applications 1362 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 14, NO. 6, DECEMBER 2006 The Stratified Round Robin Scheduler: Design, Analysis and Implementation Sriram Ramabhadran and Joseph Pasquale Abstract Stratified

More information

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < : A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks Visvasuresh Victor Govindaswamy,

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

Real-time targeted influence maximization for online advertisements

Real-time targeted influence maximization for online advertisements Singapore Management University Institutional nowledge at Singapore Management University Research Collection School Of Information Systems School of Information Systems 9-205 Real-time targeted influence

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III Subject Name: Operating System (OS) Subject Code: 630004 Unit-1: Computer System Overview, Operating System Overview, Processes

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Utility-Based Rate Control in the Internet for Elastic Traffic

Utility-Based Rate Control in the Internet for Elastic Traffic 272 IEEE TRANSACTIONS ON NETWORKING, VOL. 10, NO. 2, APRIL 2002 Utility-Based Rate Control in the Internet for Elastic Traffic Richard J. La and Venkat Anantharam, Fellow, IEEE Abstract In a communication

More information

RBFT: Redundant Byzantine Fault Tolerance

RBFT: Redundant Byzantine Fault Tolerance RBFT: Redundant Byzantine Fault Tolerance Pierre-Louis Aulin Grenole University Sonia Ben Mokhtar CNRS - LIRIS Vivien Quéma Grenole INP Astract Byzantine Fault Tolerant state machine replication (BFT)

More information

Switch Architecture for Efficient Transfer of High-Volume Data in Distributed Computing Environment

Switch Architecture for Efficient Transfer of High-Volume Data in Distributed Computing Environment Switch Architecture for Efficient Transfer of High-Volume Data in Distributed Computing Environment SANJEEV KUMAR, SENIOR MEMBER, IEEE AND ALVARO MUNOZ, STUDENT MEMBER, IEEE % Networking Research Lab,

More information

Unleashing the Power of Embedded DRAM

Unleashing the Power of Embedded DRAM Copyright 2005 Design And Reuse S.A. All rights reserved. Unleashing the Power of Embedded DRAM by Peter Gillingham, MOSAID Technologies Incorporated Ottawa, Canada Abstract Embedded DRAM technology offers

More information

Hybrids on Steroids: SGX-Based High Performance BFT

Hybrids on Steroids: SGX-Based High Performance BFT Hyrids on Steroids: SGX-Based High Performance BFT Johannes Behl TU Braunschweig ehl@ir.cs.tu-s.de Toias Distler FAU Erlangen-Nürnerg distler@cs.fau.de Rüdiger Kapitza TU Braunschweig rrkapitz@ir.cs.tu-s.de

More information

Application Layer Multicast Algorithm

Application Layer Multicast Algorithm Application Layer Multicast Algorithm Sergio Machado Universitat Politècnica de Catalunya Castelldefels Javier Ozón Universitat Politècnica de Catalunya Castelldefels Abstract This paper presents a multicast

More information

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 Question 344 Points 444 Points Score 1 10 10 2 10 10 3 20 20 4 20 10 5 20 20 6 20 10 7-20 Total: 100 100 Instructions: 1. Question

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

A Mechanism for Verifying Data Speculation

A Mechanism for Verifying Data Speculation A Mechanism for Verifying Data Speculation Enric Morancho, José María Llabería, and Àngel Olivé Computer Architecture Department, Universitat Politècnica de Catalunya (Spain), {enricm, llaberia, angel}@ac.upc.es

More information

Interleaving Schemes on Circulant Graphs with Two Offsets

Interleaving Schemes on Circulant Graphs with Two Offsets Interleaving Schemes on Circulant raphs with Two Offsets Aleksandrs Slivkins Department of Computer Science Cornell University Ithaca, NY 14853 slivkins@cs.cornell.edu Jehoshua Bruck Department of Electrical

More information

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues

FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues FIRM: A Class of Distributed Scheduling Algorithms for High-speed ATM Switches with Multiple Input Queues D.N. Serpanos and P.I. Antoniadis Department of Computer Science University of Crete Knossos Avenue

More information

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013

18-447: Computer Architecture Lecture 25: Main Memory. Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 18-447: Computer Architecture Lecture 25: Main Memory Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 4/3/2013 Reminder: Homework 5 (Today) Due April 3 (Wednesday!) Topics: Vector processing,

More information

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University

15-740/ Computer Architecture Lecture 19: Main Memory. Prof. Onur Mutlu Carnegie Mellon University 15-740/18-740 Computer Architecture Lecture 19: Main Memory Prof. Onur Mutlu Carnegie Mellon University Last Time Multi-core issues in caching OS-based cache partitioning (using page coloring) Handling

More information

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing

Heuristic Algorithms for Multiconstrained Quality-of-Service Routing 244 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL 10, NO 2, APRIL 2002 Heuristic Algorithms for Multiconstrained Quality-of-Service Routing Xin Yuan, Member, IEEE Abstract Multiconstrained quality-of-service

More information

Caching Policies for In-Network Caching

Caching Policies for In-Network Caching Caching Policies for In-Network Caching Zhe Li, Gwendal Simon, Annie Gravey Institut Mines Telecom - Telecom Bretagne, UMR CNRS 674 IRISA Université Européenne de Bretagne, France {firstname.lastname}@telecom-retagne.eu

More information

Network Model for Delay-Sensitive Traffic

Network Model for Delay-Sensitive Traffic Traffic Scheduling Network Model for Delay-Sensitive Traffic Source Switch Switch Destination Flow Shaper Policer (optional) Scheduler + optional shaper Policer (optional) Scheduler + optional shaper cfla.

More information

Appendix B. Standards-Track TCP Evaluation

Appendix B. Standards-Track TCP Evaluation 215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error

More information

An Approach to Task Attribute Assignment for Uniprocessor Systems

An Approach to Task Attribute Assignment for Uniprocessor Systems An Approach to ttribute Assignment for Uniprocessor Systems I. Bate and A. Burns Real-Time Systems Research Group Department of Computer Science University of York York, United Kingdom e-mail: fijb,burnsg@cs.york.ac.uk

More information

A Hierarchical Fair Service Curve Algorithm for Link-Sharing, Real-Time, and Priority Services

A Hierarchical Fair Service Curve Algorithm for Link-Sharing, Real-Time, and Priority Services IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 2, APRIL 2000 185 A Hierarchical Fair Service Curve Algorithm for Link-Sharing, Real-Time, and Priority Services Ion Stoica, Hui Zhang, Member, IEEE, and

More information

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents

E-Companion: On Styles in Product Design: An Analysis of US. Design Patents E-Companion: On Styles in Product Design: An Analysis of US Design Patents 1 PART A: FORMALIZING THE DEFINITION OF STYLES A.1 Styles as categories of designs of similar form Our task involves categorizing

More information

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product

Achieve Significant Throughput Gains in Wireless Networks with Large Delay-Bandwidth Product Available online at www.sciencedirect.com ScienceDirect IERI Procedia 10 (2014 ) 153 159 2014 International Conference on Future Information Engineering Achieve Significant Throughput Gains in Wireless

More information

Episode 5. Scheduling and Traffic Management

Episode 5. Scheduling and Traffic Management Episode 5. Scheduling and Traffic Management Part 2 Baochun Li Department of Electrical and Computer Engineering University of Toronto Outline What is scheduling? Why do we need it? Requirements of a scheduling

More information

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes Page 1/ 16 by Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 233 6.2 Types of Memory 233 6.3 The Memory Hierarchy 235 6.3.1 Locality of Reference 237 6.4 Cache Memory 237 6.4.1 Cache Mapping Schemes 239 6.4.2 Replacement Policies 247

More information

Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks

Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks Design of a Weighted Fair Queueing Cell Scheduler for ATM Networks Yuhua Chen Jonathan S. Turner Department of Electrical Engineering Department of Computer Science Washington University Washington University

More information

Using Hybrid Algorithm in Wireless Ad-Hoc Networks: Reducing the Number of Transmissions

Using Hybrid Algorithm in Wireless Ad-Hoc Networks: Reducing the Number of Transmissions Using Hybrid Algorithm in Wireless Ad-Hoc Networks: Reducing the Number of Transmissions R.Thamaraiselvan 1, S.Gopikrishnan 2, V.Pavithra Devi 3 PG Student, Computer Science & Engineering, Paavai College

More information

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF

CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF CS244 Advanced Topics in Computer Networks Midterm Exam Monday, May 2, 2016 OPEN BOOK, OPEN NOTES, INTERNET OFF Your Name: Answers SUNet ID: root @stanford.edu In accordance with both the letter and the

More information

ABSTRACT I. INTRODUCTION

ABSTRACT I. INTRODUCTION International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 1 ISSN : 2456-3307 Distributed Packet Buffers for High Bandwidth Switches

More information

RED behavior with different packet sizes

RED behavior with different packet sizes RED behavior with different packet sizes Stefaan De Cnodder, Omar Elloumi *, Kenny Pauwels Traffic and Routing Technologies project Alcatel Corporate Research Center, Francis Wellesplein, 1-18 Antwerp,

More information

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy Chapter 5B Large and Fast: Exploiting Memory Hierarchy One Transistor Dynamic RAM 1-T DRAM Cell word access transistor V REF TiN top electrode (V REF ) Ta 2 O 5 dielectric bit Storage capacitor (FET gate,

More information

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1

CSE 431 Computer Architecture Fall Chapter 5A: Exploiting the Memory Hierarchy, Part 1 CSE 431 Computer Architecture Fall 2008 Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Mary Jane Irwin ( www.cse.psu.edu/~mji ) [Adapted from Computer Organization and Design, 4 th Edition, Patterson

More information

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control,

Basic Processing Unit: Some Fundamental Concepts, Execution of a. Complete Instruction, Multiple Bus Organization, Hard-wired Control, UNIT - 7 Basic Processing Unit: Some Fundamental Concepts, Execution of a Complete Instruction, Multiple Bus Organization, Hard-wired Control, Microprogrammed Control Page 178 UNIT - 7 BASIC PROCESSING

More information

The Design and Performance Analysis of QoS-Aware Edge-Router for High-Speed IP Optical Networks

The Design and Performance Analysis of QoS-Aware Edge-Router for High-Speed IP Optical Networks The Design and Performance Analysis of QoS-Aware Edge-Router for High-Speed IP Optical Networks E. Kozlovski, M. Düser, R. I. Killey, and P. Bayvel Department of and Electrical Engineering, University

More information

36 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 1, MARCH 2008

36 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 1, MARCH 2008 36 IEEE TRANSACTIONS ON BROADCASTING, VOL. 54, NO. 1, MARCH 2008 Continuous-Time Collaborative Prefetching of Continuous Media Soohyun Oh, Beshan Kulapala, Andréa W. Richa, and Martin Reisslein Abstract

More information

Episode 5. Scheduling and Traffic Management

Episode 5. Scheduling and Traffic Management Episode 5. Scheduling and Traffic Management Part 3 Baochun Li Department of Electrical and Computer Engineering University of Toronto Outline What is scheduling? Why do we need it? Requirements of a scheduling

More information

QoS Provisioning Using IPv6 Flow Label In the Internet

QoS Provisioning Using IPv6 Flow Label In the Internet QoS Provisioning Using IPv6 Flow Label In the Internet Xiaohua Tang, Junhua Tang, Guang-in Huang and Chee-Kheong Siew Contact: Junhua Tang, lock S2, School of EEE Nanyang Technological University, Singapore,

More information

Binary Decision Tree Using Genetic Algorithm for Recognizing Defect Patterns of Cold Mill Strip

Binary Decision Tree Using Genetic Algorithm for Recognizing Defect Patterns of Cold Mill Strip Binary Decision Tree Using Genetic Algorithm for Recognizing Defect Patterns of Cold Mill Strip Kyoung Min Kim 1,4, Joong Jo Park, Myung Hyun Song 3, In Cheol Kim 1, and Ching Y. Suen 1 1 Centre for Pattern

More information

CS244a: An Introduction to Computer Networks

CS244a: An Introduction to Computer Networks Do not write in this box MCQ 9: /10 10: /10 11: /20 12: /20 13: /20 14: /20 Total: Name: Student ID #: CS244a Winter 2003 Professor McKeown Campus/SITN-Local/SITN-Remote? CS244a: An Introduction to Computer

More information

Proper colouring Painter-Builder game

Proper colouring Painter-Builder game Proper colouring Painter-Builder game Małgorzata Bednarska-Bzdęga Michael Krivelevich arxiv:1612.02156v3 [math.co] 7 Nov 2017 Viola Mészáros Clément Requilé Novemer 6, 2018 Astract We consider the following

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction In a packet-switched network, packets are buffered when they cannot be processed or transmitted at the rate they arrive. There are three main reasons that a router, with generic

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

CENG3420 Lecture 08: Memory Organization

CENG3420 Lecture 08: Memory Organization CENG3420 Lecture 08: Memory Organization Bei Yu byu@cse.cuhk.edu.hk (Latest update: February 22, 2018) Spring 2018 1 / 48 Overview Introduction Random Access Memory (RAM) Interleaving Secondary Memory

More information