Good Memories: Enhancing Memory Performance for Precise Flow Tracking

Size: px
Start display at page:

Download "Good Memories: Enhancing Memory Performance for Precise Flow Tracking"

Transcription

1 Good Memories: Enhancing Memory Performance for Precise Flow Tracking Boris Grot, William Mangione-Smith University of California, Los Angeles Department of Electrical Engineering {bgrot, Abstract Flow tracking is an integral aspect of many network-related applications, including intrusion detection, network administration, per-flow billing, and network engineering. While some applications do not require precise tracking of every packet or flow and successfully employ statistical sampling or approximations to achieve an acceptable level of accuracy, many others, especially in the security domain, do not have such flexibility. Faced with rapidly growing number of networked devices, increasing Internet usage, and rising network bandwidth, precise flow tracking has become a very difficult problem with system memory as the main bottleneck. In this paper, we look at memory performance when tracking a large number of flows, show that finding an existing flow in memory often requires multiple key comparisons, and examine ways to reduce the number of comparisons for existing flows and avoid them altogether for the majority of new flows. Additionally, we propose a novel scheme called Predictive Placement that reduces the number of searches requiring three or more comparisons from 4.87% with a linked list implementation to just.27% of all lookups. These results are based on our simulations with widely used long packet traces from CAIDA. Our approach relies on a Bloom filter derivative that encodes each element s location using multiple bitmaps. The accuracy of the scheme depends on the number of bitmaps employed and on their load factor.. Introduction Traffic measurement and monitoring is necessary for managing today s high-speed links. Tracking flows is one crucial aspect of such management, and has applications in network security, administration, engineering, and accounting (in the form of usagebased billing) []. A number of recent works have investigated approximate and statistical approaches to flow tracking at high link speeds, focusing on counting the number of flows [2], measuring flow volumes [6][8], identifying the elephants [] and superspreaders [7], and classifying flows [7]. Unfortunately, while many network-related tasks can tolerate small errors and approximations, others can not. Network intrusion detection is one such application which requires precise flow tracking, as certain attacks (i.e., stealthy port scans, TCP connection hijacking, evasion by fragmentation), can only be detected by keeping exact and complete per-flow state [3]. In fact, most, if not all, of today s Network Intrusion Detection Systems (NIDS) keep per-connection state [4]. The current crop of NIDS aims to protect enterprise and campus-wide networks from attacks. For instance, product literature for NetScreen-IDP [8], a topof-the-line intrusion detection and prevention system from Juniper Networks, specifies maximum throughput of GB/s and a maximum number of supported sessions of 5, with 4 GB of memory. However, our tests using traces of real traffic showed periods of over 7, flows simultaneously live with data rates well below Gb/s, indicating that NetScreen might be overwhelmed by the number of connections, and an attack might go unnoticed. Another study [5] reported a day-time average (not peak) of over 45, flows on a very modest 55 Mb/s (OC-3) link. Thus, keeping track of a sufficient number of flows is a challenge even at moderate data rates. This challenge will likely be exacerbated going forward, as NIDS are pushed deeper into the network, where they are exposed to more flows (allowing them to better identify, and react to, attacks) and higher link speeds [3]. In order for these systems to maintain real- It is not known whether the limitation in the number of simultaneous sessions is due to system scalability issues or memory capacity

2 time performance, flow identification and matching to tracked flows will have to scale with the growing demands. In fact, McKeown, et al, identify this classification as the most expensive and complex step of a flow monitoring system, requiring a one-toone mapping between a fixed set of packet fields (the 5-tuple made of protocol number and source and destination addresses and port numbers) and the memory portion that contains the flow record [6]. The contribution of this work is in investigating ways to reduce the number of comparisons that a packet entering the system must undergo before being mapped to a correct flow. After an overview of our experimental setup, we examine the conventional approach to tracking flows, consider the performance impact of inserting new flows at the head of their respective hash bucket versus appending them at the tail, evaluate the use of a Bloom filter for avoiding memory accesses for new flows, and present a novel scheme for matching packets to existing flows so as to significantly reduce the number of key comparisons that must be made. 2. Methodology We conducted our study using three anonymized traces from an OC-48 link [9]. Each trace file is around 9 GB and represents a 6-minute snapshot of the network. Two of the traces were collected in August of 22, while the third one was collected in January 23. Table lists some of the details for each trace. Since the goal of the study is to enable applications that rely on precise flow tracking to meet the demands of high-bandwidth networks, we chose to support a reasonably large number of flows. Assuming 52 MB of system memory and a 64-byte flow descriptor (large enough to store the 5-tuple key, time stamp, and some application-specific data), we arrived at 8 million flows. This leaves plenty of room for scaling by adding more memory, as the effective bandwidth in our traces never exceeds 5 Mb/s. One of the first steps in looking up a flow is producing a hash of its key (see Section 3); hence, an efficient hashing function is crucial. Such a function must be easy to implement in hardware, be sufficiently fast in operation, and approach a uniform random mapping. The ability to change the function over the lifetime of the hardware would also be very desirable, as it would make the system less vulnerable to exploits targeting specific memory locations or access patterns. The function we chose has all of the above properties and is defined [] as H q (x) = (x * q ) ^ (x 2 * q 2 ) ^ ^ (x i * q i ), where * denotes bit-wise AND while ^ denotes bit-wise XOR operations. Here, x i is the i th bit of the key, and q i is a bit string stored in register i. Thus, to obtain a 32-bit hash value from a 64-bit key, we would need 64 registers, each 32 bits wide. It is worth pointing out that the hash function itself is not fixed in hardware, as changing it requires simply initializing the register array to a different value. Flow aging is another issue that has drastic implications on system performance. A flow that has not seen any packets after a certain time should be evicted from memory to make space for new flows and reduce the length of the search for incoming packets. A long timeout value may unnecessarily fill the memory with a large number of dead flows. A short timeout, on the other hand, may result in premature evictions of long-lived flows, leading to inaccuracies in the collected data or, in intrusion detection applications, to undetected attacks. Several studies [5][6] have noted that a timeout value of around 6 seconds offers a good balance between accuracy and memory requirements, leading us to adopt it for our experiments. In practice, the exact timeout value should be determined based on each application s needs and the underlying network s characteristics. Note that if a newly arrived packet maps to an expired flow that has not yet been evicted, the flow will get revived as its last-accessed time stamp will get updated. On the other hand, if the flow has been deleted, a new flow will get allocated for the packet. In this case, the flow will effectively be counted twice. This effect is reflected in Table, which lists the number of flow allocation events over the duration of each trace, rather than a count of unique flows in the file. In our current simulation environment, the memory scrub task is modeled as a method call that takes zero simulation time; however, in a real system such a process could be implemented as a low-priority background task, gradually walking over the memory space when the memory and the bus are otherwise idle. An important concern that must be addressed is whether the memory subsystem will have sufficient bandwidth to handle the expected traffic. For instance, the latest DDR2 DRAM chips are able to support 533 MT/s (millions of transfers per second) at 64 bits per transfer. Intel s 95G chipset paired with this type of memory achieves single-channel peak bandwidth of 4.25 GB/s [22]. Assuming a Gb/s link carrying a constant stream of 64 byte packets (worst case scenario which maximizes the memory workload) and 64 byte

3 Table : Trace characteristics Trace Name Number of packets Number of flows Avg number of live flows Max number of live flows anon.pcap 272,55,99 6,94,78 427, , anon.pcap 236,556,354 4,63,7 45,478 74, anon.pcap 34,235,548 5,52,88 49,62 55,95 flow descriptors, the memory configuration described above has enough bandwidth to handle over byte accesses per packet. The number drops to just 3.4 accesses on a Gb/s link; however, memory throughput can be scaled by adding additional channels. Thus, a quad-channel configuration could support 7 GB/s peak transfer rate, enough for over 3 64-byte transactions for every minimum-sized packet received. A recent study of memory controller performance in server environment [23] found that an out-of-order controller subjected to a representative server workload was able to achieve sustained throughput of over 5 GB/s with a DDR2 memory subsystem rated at 6.4 GB/s peak bandwidth, which is only 22% lower than the theoretical peak rate. This leads us to conclude that a memory subsystem with sufficient bandwidth for flow tracking at high data rates is, indeed, practical. 3. Evaluation The common approach to tracking flows is to compute a hash of several fields from the packet header and use the resulting value to index a memory holding the full key and the desired data. Since multiple flows are likely to produce the same hash value, a linked list of descriptors is kept in memory. Such an approach is used by the FreeBSD TCP/IP stack, which maintains a hash table with each element (hash node) pointing to a linked list [2]. The hash key is usually taken to be a 5-tuple composed of the source and destination IP addresses, source and destination port numbers, and the protocol field. As discussed in Section 2, we simulated a memory with a maximum capacity of 8 million flows. Since one of the best ways to reduce the number of hash collisions is to increase the size of the hash space, we logically partitioned the memory into 52 K hash buckets with a maximum list length of 6 elements Although the maximum list length does not have to be limited, potentially leading to some very long lists, this is highly undesirable as the worst-case search length becomes intolerable. The latter is simply disastrous in a latency-sensitive environment such as a core- or edge-based network element. On the other hand, fixing the list length to a very small value will likely lead to collisions and dropped flows, as new flows will compete with existing flows for list space.. Thus, in case a new flow must be added to a list which has reached its maximum length, our approach is to first try to locate an expired entry in the list and remove it; if no element in the list has timed out, the element closest to timing out is evicted instead. 3. List Insertion Position The first set of experiments we conducted compared insertion of a new flow at the head of the linked list versus at the tail (traversals always start at the head). The choice is not obvious. On one hand, it is well known that the majority of the flows are short-lived and carry only a small fraction of the total traffic on a link. For instance, one study found that nearly 8% of all flows are active less than one second, after which they go completely silent [4]. This characteristic would suggest that inserting new flows at the tail would be a good strategy, as they are likely to be shortlived and should not increase the access latency for long-lived flows. On the other hand, network traffic exhibits fairly good temporal locality; specifically, packet trains, defined as two or more back-to-back packets belonging to the same flow, have been observed for a number of server workloads [2], suggesting that inserting at the head of the linked list would better exploit this temporal locality. Figure plots the number of key comparisons for the three traces using head and tail insertion. Since each comparison requires reading the flow descriptor from memory, the number of comparisons is effectively the number of independent memory accesses. The trend is clear: insertion at the head produces fewer comparisons over the duration of the traces than insertion at the tail. The savings produced by inserting at the head of the list are 4.89%, 2.83%, and 9.89% for Traces, 2 and 3, respectively, with an

4 average of 2.42% fewer comparisons. Thus, we do not further consider end-of-list insertion and adopt a linked list-based implementation with head-of-list insertion as our baseline. Key comparisons (millions) 3.2. Distribution of Search Lengths Figure 2 plots the number of searches of a given length using head-of-list insertion note that the y-axis is logarithmic.,,, Occurances ,,,,,,,,, Key comparisons using head & tail insertion Trace Trace 2 Trace 3 Head Figure : Head vs. tail insertion: number of key comparisons Number of traversals of a given length Search length Trace Trace 2 Trace 3 Figure 2: Number of linked list traversals of a given length As expected, the greatest majority of all lookups require only one or two comparisons. However, over 4 million packets in the three traces combined nearly 5% of the total packet count result in three comparisons or more, with a few requiring up to comparisons. Recall that our memory is partitioned into 52 K buckets, so given that we rarely see over 5 K flows simultaneously live in our system, such long comparison sequences might seem surprising at Tail first. But with several hundred million packets processed per trace, and given occasional spikes in traffic, these events do in fact occur. As such, the rest of the study focuses not just on reducing the number of key comparisons required to match a packet to a flow, but specifically on eliminating these long search sequences Avoiding List Traversal for New Flows One obvious source of maximum-length list traversals are new flows. Consider what happens when a packet from a new flow arrives: as usual, memory is indexed using the hash of the packet s key, the linked list pointed to by the source node is reviewed from beginning to end, and only after failing to find a matching flow is a new flow descriptor allocated. In our traces, the ratio of packets without a matching flow to the total number of packets processed is fairly small only 6% of packets require a new flow to be allocated. However, since each new flow is accompanied by a complete traversal of the associated list, we wanted to study the impact that new flows have on the number of key comparisons Identifying New Flows Using Oracle Knowledge Table 2 shows the number of key comparisons that could be avoided using oracle knowledge of whether a given flow is in memory. Note that the actual reduction in the number of comparisons is somewhat small around 3.3% on average. This is explained by the fact that we are using a fairly large and sparsely populated memory exactly what one would like to have to reduce the number of hash collisions. In fact, most of the time, the number of tracked flows is just over 4, in a memory with capacity for 8 million flow descriptors, with the implication that a new flow often maps to an empty list. Hence, in many cases, no key comparisons need to be made when a new flow is inserted. The intuitive conclusion supported by our simulations with a smaller and higher utilized memory is that greater memory utilization will significantly increase the number of comparisons caused by unmatched flows. So despite the fact that potential savings in the number of memory accesses are quite modest under the presented setup, the fact that each new flow is a longest sequence lookup with

5 x Maybe y Miss (a) (b) (c) Figure 3: An empty Bloom filter (a); a hit in the Bloom filter (b); a definite miss (c) respect to its associated list length motivated us to try to reduce the impact of such worst-case offenders. Table 2: Reduction in key comparisons using oracle knowledge of whether a flow is in memory Trace Actual key comparisons Avoidable using oracle knowledge Longest Search Length 338,57,75 2,335, ,25,359,522, ,49,2,328, Identifying New Flows Using A Bloom Filter Bloom Filter Theory A Bloom filter [5] uses an array of bits to represent a set of elements. Each member is represented by k bits, where each bit is indexed by a separate hash function. If one or more of the k bits corresponding to an element is, the element is guaranteed not to be a member of the set. If, on the other hand, all k bits are set, the element belongs to the set with the probability of f, where f is the false positive ratio. In effect, a Bloom filter is a compact representation of a set, allowing queries for membership that sometimes yield a false positive, but never a false negative. Figure 3 shows an example of a small Bloom filter that represents each element using three bits (k = 3). In (a), an empty Bloom filter is depicted. Figure 2 (b) shows a hit in the Bloom filter for some key X, after X has been inserted into the filter. X is hashed using three different hash functions, whose outputs are used to index the Bloom filter. Since the values at all three accessed locations are, the filter produces a hit. Note that a hit in the Bloom filter is not a definite yes but is more akin to a maybe. This is similar to conventional hashing, where two hash values that match might have been computed from the same key, but a comparison of the keys is required to confirm a match. Part (c) shows a miss in the Bloom filter. Even though two of the three bits are set, there is no chance that Y is an element of the given set. Now, the relationship between the false positive rate, f, and the number of bits used for each element, k, is expressed by the following formula: f = ( e -nk/m ) k () where m is the size of the Bloom filter (in bits) and n is the number of member elements. Thus, the product n*k is the number of bits occupied by the n members of the Bloom filter. It can be shown that f is minimized with respect to k when k = (m/n) ln 2 (2) which corresponds to the false positive ratio f = (/2) k (3) In fact, comparing this result to (), we see that the false positive ratio is minimized with respect to k when approximately half of the bits are set. Since in a network setting we have no control over how many elements are in the set (i.e., how many flows we are tracking), the filter should be sized so that it gives an acceptable false positive ratio in the worst possible case. Our Implementation In our setting, the worst case represents tracking 8 million flows, so the Bloom filter should have around 6 million bits for a reasonable false positive rate when the memory is at full capacity. Since the Bloom filter is supposed to identify new flows to avoid the long-latency search through the entire list, its accuracy is most important when the memory is highly utilized and lists are long.

6 Trace # Table 3: Bloom filter performance False positive % % of new flows correctly identified # of key comparisons eliminated.9% 99.96% 2,328, % 2.22% 99.97%,57, % 3.6% 99.97%,323, % Realized % of oracle s potential A value for f around -2% seems to be appropriate, given that in our traces only 6% of all packets do not match an existing flow. This means that in practice, the Bloom filter should produce a false positive for just.6-.2% of all packets when tracking the maximum number of flows, since only new flows can result in a false positive. Of course, a more lightly loaded Bloom filter would be expected to have an even higher accuracy. Thus, we use three hash functions (k = 3) with the theoretical f of 2.5% One problem with a conventional Bloom filter is that it is impossible to remove an element from the set, since doing so requires clearing all the bits associated with that element. Because bits can be shared among multiple set members (it is the combination of all k bits which uniquely identifies a member), clearing all bits associated with an element is not an option. For this reason, we employ a counting Bloom filter [9], which is implemented with a vector of counters, rather than bits. Thus, adding an element requires incrementing all k counters, while evicting it results in the counters being decremented. With the Bloom filter in place, each arriving packet generates the following sequence of events: the 5- tuple key is hashed using three different hash functions, whose outputs serve as indices into the Bloom filter array. The three counters at the indexed locations are read out and their values are compared to. If all counters are greater than, then the given flow is likely found in main memory and a search through the appropriate linked list is initiated. However, if one or more counters are, the packet is immediately tagged as a new flow, its corresponding counters in the Bloom filter are incremented, and a flow descriptor is inserted at the head of the appropriate linked list. For faster access, the Bloom filter array can be implemented in SRAM. Assuming 8 bit counters, a Bloom filter with 6 million entries would require 28 Mb. This is somewhat large for a single SRAM part; fortunately, the Bloom filter array can be divided into disjoint sub-arrays, so that each hash function addresses its own physical memory. An additional benefit of this approach is that all memories can be accessed in parallel. Bloom Filter Performance Evaluation Table 3 summarizes our finding. As expected, our Bloom filter has a very high accuracy rate, as evidenced by the percentage of new flows correctly identified. Equally important is the fact that nearly all the savings offered by an oracle predictor are being realized, eliminating virtually all key comparisons for new flows. The false positive rate is extremely low due to three factors: appropriate Bloom filter sizing, its low occupancy ratio, and relative infrequency of occurrence of new flows. Unfortunately, despite its excellent ability to correctly identify new flows and eliminate the associated memory accesses, the Bloom filter offers little help in terms of reducing the total number of long list traversals. Figure 4 shows the effect of the Bloom filter on the number of key comparisons. The two spikes in Trace correspond to a reduction in searches of length 9 from four occurrences to one, and the elimination of the sole occurrence of a -long search. However, for the three traces, most searches of length greater than see a very modest reduction of around 4% in the number of occurrences. Reduction (%) % Reduction in Searches of a Given Length Trace Trace 2 Trace Search Length Figure 4: Percent reduction in list traversals of a given length

7 (a) Figure 5: Replacing a linked list with an array. (a) shows an empty hash bucket; since none of the bits are set, X is definitely not an element of the set. (b) shows a non-empty hash bucket with three potential candidates at offsets, 4, and 5 (b) 3.4. Predictive Placement The biggest shortcoming of the Bloom filter is its limited utility. While it helps eliminate lookups for new flows that enter the system, all subsequent packets belonging to a flow must go through the conventional lookup process. Furthermore, the memory requirements of a Bloom filter are relatively high, especially in light of its limited usefulness. Additionally, the ability to blindly insert new flow descriptors has a price, as linked list lengths must be explicitly tracked to ensure that a given list has not reached its maximum length and a new flow may be added. Ideally, we would like a way to not only predict whether a given flow is already being tracked, but also to locate it in memory without doing all, or most, comparisons associated with a linked list traversal. A good solution should have a reasonable memory footprint with the ability to trade accuracy for memory overhead. Predictive Placement Overview Consider a memory partitioning scheme which allocates a fixed number of contiguous locations to each hash bucket, in effect forming a two-dimensional array. Hence, each hash value is effectively a pointer to a one-dimensional array in memory. Furthermore, consider a bitmap with the same number of bits as there are array elements, logically organized as the memory array and indexed with the same hash function. Thus, each entry in the array has a corresponding bit set in the bitmap. A simple way to determine if a given element X is in memory is to compute a hash of its key and check the bits in its respective hash bucket in the bitmap. If none of the bits are set, X is definitely not in memory. Alternatively, if some of the bits are set, then each bit represents a potential memory location where X may reside. In the former case, the bitmap obviates the need for a memory access, while in the latter it provides a list, in the form of a bit vector, of possible locations for X. If the bit vector has non-zero entries, then verifying X s existence requires checking the memory array at indices corresponding to these entries. Figure 5 (a) and (b) depicts the two cases just described. So what, if any, benefits are there to this organization compared to a conventional linked list approach? In reality, the two are quite similar, as the list of potential locations for X produced by the bitmap is hardly different from a linked list which must be traversed one entry at a time. One potential benefit of the bitmap approach is that fetches of multiple candidate locations can be overlapped, something that s normally not possible with a conventional linked list. This is because obtaining a pointer to a given element in a linked list first requires fetching its predecessor, whereas a bitmap can be used to determine the exact offsets at which memory must be read. However, we will not take this advantage into consideration in this study, although the potential benefit is not insignificant and is something we are currently exploring. Now, instead of using one bitmap as described above, consider using two. The goal is to eliminate at least some of the bits in the candidate bit vector as potential locations of X. As before, the bitmaps are logically organized like the main memory, with each hash bucket mapping to a fixed number of contiguous bits. One bitmap is still indexed with the same hash function as the memory, providing a one-to-one mapping between non-zero bits and allocated memory entries. We call this bitmap the primary filter. The second bitmap, which we will refer to as the secondary filter, is indexed by a completely different hash function.

8 Under the new organization, X might reside in memory if and only if both bitmaps contain a in the same position. Thus, a membership query consists of a bitwise AND operation on the respective hash buckets in the primary and secondary filters, followed by an OR of the result vector. If the OR operation produces a, a potential hit, which we call a maybe, is registered and memory locations corresponding to nonzero entries in the result vector are fetched. Figure 6 shows an example of a membership query. Figure 6: Replacing a linked list with an array and multiple bit maps. A potential hit ( Maybe ) occurs when both primary and secondary arays have s in the same position. Insertion of a new element into the memory is similarly easy and requires finding bits with a value of in the same position in both bitmaps. An insert operation sets both bits to, while a remove operation clears the bits. Note that in the case of insert, the selected memory location is guaranteed to be empty due to a one-to-one mapping between the primary filter and the memory. Additional Considerations One issue that must be considered in using Predictive Placement is that of a bucket in either the primary or the secondary filter becoming full. In the former case, a full bucket indicates that a given memory array is at its maximum capacity. As discussed in Section 3, dealing with this situation requires walking over the entries in the bucket and replacing either an expired entry or another element via some LRU-like approach. The case of a secondary filter filling up is a little trickier. Since inserting an element requires setting a bit and removing it results in the bit getting cleared, sharing bits among multiple elements is not an option. Of course, the situation is identical to that of a conventional Bloom filter. The solution there was to use counters instead of bits, and that s what we chose to do here. Thus, each hash bucket in the secondary filters is actually a vector of counters. We get excellent results using just two bits per counter. It is important to note that counters are only required for the secondary filters, since the primary filter is a oneto-one mapping with memory. With this minor modification, let s consider the memory requirements of Predictive Placement. As before, memory is organized into 52 K hash buckets with a fixed number of contiguous memory locations per bucket. Since our lists are limited to 6 entries, each hash bucket has exactly this number of available slots. Primary and secondary filters mirror the memory organization; with 52 K 6-bit buckets, the primary filter requires only 8 Mb of storage. Due to the use of counters, secondary filters require 2 bits per entry, so their memory footprint is 6 Mb. Thus, a primary filter with two secondary filters needs only 4 Mb of storage quite reasonable given today s SRAM technology. Even more pleasant is the fact that each filter naturally lands itself to being in a separate physical device; in fact, increasing accuracy simply requires adding another secondary filter SRAM. Another issue that must be discussed is the worstcase performance of our approach. The extreme case is that of the memory being completely full, with every bit in the primary filter set and every counter in every secondary filter greater than. Clearly, in this situation, the filters will return a maybe answer for every query. Hence, worst-case performance of Predictive Placement is comparable to that of a linked list approach, since every query will require an inorder traversal over the appropriate hash bucket. Indeed, like any other hashing-based scheme, including the Bloom filters, our approach requires data structures to be under-utilized for satisfactory performance. And like a Bloom filter, our design offers a trade-off between space requirements and accuracy. Evaluation We tested our approach using a primary filter with, 2, and 3 secondary filters, where each secondary filter is indexed with a different hash function. Generally, we anticipate that accuracy will improve with a higher number of secondary filters as aliasing between different elements is reduced.

9 ,,,,,,,,,,,, Occurances Trace Linked list Pred 2 Pred 3 Pred Search Length Occurrences,,,,,,,,,,,, Trace 2 Linked List Pred 2 Pred 3 Pred Search Length Trace 3,,,,,,,,,,,, Occurances Linked List Pred 2 Pred 3 Pred Search Length Figure 7: Number of list traversals of a given length for a Linked List implementation and Predictive Placement with 2, 3, and 4 filters (primary + or more secondary filters) for Trace, Trace 2, and Trace 3 Figure 7 compares the number of searches of a given length for a linked list using head-of-list insertion against our Predictive Placement approach. As expected, each additional filter used in the Predictive Placement scheme provides a noticeable improvement in prediction accuracy. Pred 2, the simplest implementation with just one secondary filter, eliminates all searches of length 7 and above and provides an order of magnitude reduction in the number of searches of length 3 and above. In fact, the number of occurrences of these searches is cut from over 4 million in a linked list implementation (4.87% of all lookups) to 6.7 million (.8%) when using Pred 2. Adding an additional secondary filter (Pred 3) results in the elimination of length 6 traversals and drops the number of searches requiring 3 comparisons or more to approximately 46 thousand (.6%). Pred 4, which uses three secondary filters, not only completely eliminates all searches of length 5 and above, but reduces the number of length 4 traversals from well over a million to under for all three traces. The total number of searches of length 3 or more is just 22 thousand or.27% of all lookups. Table 4 provides a detailed breakdown of the number of searches of each length for Trace 3 using a linked list and Predictive Placement schemes. Notice that with Predictive Placement, the number of traversals of length greater than drops, while the number of traversals of length equal to increases. This, of course, is the desired outcome, as it indicates that long searches are replaced by lookups that match on the first comparison. Comparisons (millions) Number of Key Comparisons vs Oracle Linked List Pred 2 Pred 3 Pred 4 Oracle Trace Trace 2 Trace 3 Figure 8: Number of key comparisons for Traces, 2, and 3 using a Linked List and Predictive Placement with 2, 3, and 4 filters against an Oracle predictor We also compared the performance of our approach to an oracle predictor. Unlike the oracle introduced is Section 3.3, this predictor not only recognizes new

10 Table 4: Number of searches of a given length for Trace Linked List 232,766,358 59,9,64,482,698 2,94,65 437,253 64,288 8, Pred 2 278,89,287 2,43,4,99,86 46,4 7, Pred 3 293,674,25 5,538,58 37,26 2, Pred 4 297,687,93,56,562 5,758 3 flows and avoids the associated lookups, but always finds an existing flow on the first try. Results are presented in Figure 8. A separate calculation shows that Pred 4 is within.6% of the total number of key comparisons required by the oracle predictor for all three traces. 4. Related Work A number of recent works have employed Bloom filters for network-related tasks. Several of these used Bloom filters or similar data structures for measuring flow volumes [][6][8][2]. These approaches do not require precise flow tracking, have limited utility, and are generally not % accurate due to the false positives produced by the Bloom filter. Chang, Feng, and Li utilized a Bloom filter to assist in packet classification [7]. Classified packets were added to the Bloom filter to avoid future classification, while unclassified packets went through the regular classification process. This is similar to our use of a Bloom filter to avoid linked list traversals for new flows. Their work also explored encoding information into a Bloom filter for the purpose of packet routing. However, unlike flow tracking, routing requires a static mapping between a given flow and an output port. Our work uses a modified Bloom filter structure to efficiently find an empty memory location to which a new flow may be written and supply a list of candidate locations in which an existing flow may be found. One possible alternative to linked lists is closed hashing. Open hashing, sometimes referred to as chaining, resolves collisions by keeping a linked list pointed to by an entry in the hash table. This is the technique used in our study. Closed hashing, on other hand, resolves hash collisions by probing additional locations in the table, often through the use of a separate collision resolution function. Double hashing is an example of this approach, and its use in flow monitoring applications has been suggested by McKeown et al [6]. While closed hashing has good theoretical average-case performance for lightlyloaded hash tables, it is unsuitable in an environment with frequent deletions, since these tend to increase the average search length []. Closed hashing also does not permit optimizations such as head-of-list insertion. Another well-known technique for improving lookup performance is caching. Most cache-related studies in the networking domain have focused on improving the performance of route lookup and packet classification, with reported cache hit rates ranging from 6% to over 9% [2][3]. In fact, we consider our work orthogonal to caching for several reasons. In network intrusion detection, for instance, caching cannot be relied on to satisfy any fraction of memory requests, since it makes the system susceptible to attacks against the cache with the goal of degrading memory subsystem performance. Additionally, caching in the traditional sense implies a fixed tag and data arrangement, where the tag is a portion of the memory address at which the data resides. However, in a conventional flow tracking application, given a newly arrived packet, the memory address of its corresponding flow is typically not known. This implies that the key used to identify a flow should itself serve as the tag. But different applications might require different definitions of a flow, leading to different key size requirements. For example, an application collecting per-flow statistics on a network node might use a five-tuple (4 bits) for flow identification, whereas an algorithm for identifying distributed port scans may very well use destination IP addresses (32 bits) as the flow id. Naturally, data size requirements for these applications would also be expected to vary widely, leading to difficulties implementing a conventional cache in an adaptable multi-purpose architecture. Our approach helps eliminate long probes to memory regardless of whether a cache is employed in the system, enabling a wide range of latency- and bandwidth-sensitive applications requiring flow tracking.

11 5. Conclusion In this paper, we evaluated techniques for reducing the number of key comparisons required to find a matching flows in applications requiring precise flow tracking. We established that head-of-list insertion reduces the number of key comparisons required to find a matching flow by an average of 2.4% for the three traces when compared to insertion at the tail of a linked list. Using a memory 6 times larger than the average number of live flows in the system and headof-list insertion, we witnessed frequent occurrences of searches involving more than one key comparison. Almost 5% of all packets required three key comparisons or more, with maximum observed search lengths exceeding eight comparisons in all three traces. We evaluated the use of a Bloom filter to eliminate lookups for new flows entering the system. The reduction in the number of key comparisons turned out to be a modest 3.3%, with most searches of length greater than seeing a 4% decrease in the number of occurrences. We then introduced our Predictive Placement approach, which uses multiple filters to reduce the number of candidate locations that must be searched. The filters use a Bloom-like data structure to encode not only membership but also the location of every element. Using a primary filter and one secondary filter, we succeeded in removing all searches requiring more than six comparisons and reduced the number of searches of length 3 and above to approximately.8% of all lookups. A Predictive Placement design with three secondary filters completely eliminated searches with over four comparisons and reduced the number of probes of length 3 or more to just.27%. In fact, the number of key comparisons required by this scheme was within.6% of an oracle predictor which always finds an existing flow in exactly one lookup and avoids lookups altogether for new flows. 6. References [] C. Estan and G. Varghese. New Directions in Traffic Measurement and Accounting. Proc. of the ACM SIGCOMM 22, pp , October 22. [2] C. Estan, G. Varghese and M. Fisk. Bitmap algorithms for counting active flows on high speed links. Proc. of the 23 ACM SIGCOMM conference on Internet measurement (IMC-3), pp , ACM Press, October 23. [3] K. Levchenko, R. Paturi, and G. Varghese. On the Difficulty of Scalably Detecting Network Attacks. Proc. of the Eleventh ACM Conference on Computer and Communication Security, October 24. [4] R. R. Kompella, S. Singh and G. Varghese. On Scalable Attack Detection in the Network. Proc of the 4th ACM SIGCOMM conference on Internet measurement, pp. 87-2, 24. [5] B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM, 3(7), pp , July 97. [6] A. Kumar, J. Xu, L. Li and J. Wang. Space-code bloom filter for efficient traffic flow measurement. Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, pp , 23. [7] F Chang, W. Feng, and K. Li. Approximate Caches for Packet Classification. Proc. IEEE INFOCOM 24, March 24. [8] S. Cohen and Y. Matias. Spectral bloom filters. Proc. of the 23 ACM SIGMOD International Conference on Management of Data 23, San Diego, California, June 9 2, 23, pp , ACM Press, 23. [9] L. Fan, J. Almeida and A. Z. Broder. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. IEEE/ACM Transactions on Network, 8(3), pp , 2. [] M. V. Ramakrishna, E. Fu, and E. Bahcekapili. Efficient Hardware Hashing Functions for High Performance Computers. IEEE Transactions on Computers, pp , December 997 [] Weiss, M.A., Data structures and algorithm analysis in C [2] P. Newman, G. Minshall, and L. Huston. IP Switching and Gigabit Routers. IEEE Communications Magazine, January 997. [3] J. Xu, M. Singhal and J. Degroat. A novel cache architecture to support layer-four packet classification at memory access speeds. Proc. of INFOCOM 2, pp , March 2. [4] N. Brownlee. Some Observations of Internet Stream Lifetimes. Passive and Active Measurement Workshop, 25. [5] J. Apsidorf, K.C. Claffy, K. Thompson, and R. Wilder. OC3MON: Flexible, Affordable, High Performance Statistics Collection. Proc. of the th USENIX conference on System administration, pp. 97-2, September 996.

12 [6] G. Iannaccone, C. Diot, I. Graham and N. McKeown. Monitoring very high speed links. ACM Sigcomm Internet Measurement Workshop, November 2. [7] S. Venkataraman, D. Song, P. Gibbons, and A. Blum. New Streaming Algorithms for Fast Detection of Superspreaders. Network and Distributed System Security Symposium, February 25 [8] Juniper Networks NetScreen-IDP //5/. [9] CAIDA. Anonymized OC48 traces. [2] L. Zhao, S. Makineni, R. Illikkal, R. Iyer, L. Bhuyan. Efficient Caching Techniques for Server Network Acceleration. Advanced Networking and Communications Hardware Workshop (ANCHOR 24), June 24. [2] D. Nguyen, J. Zambreno and G. Memik. Flow Monitoring in High-Speed Networks using Two Dimensional Hash Tables. Proc. of Field-Programmable Logic and its Applications (FPL), Aug.-Sep. 24. [22] Intel 95G/95GV/9GL Express Chipset Memory Configuration Guide. Intel White Paper. September 24. [23] C. Natarajan, B. Christenson, F. Briggs. A Study of Performance Impact of Memory Controller Features in Multi-Processor Server Environment. Proc. of the 3rd workshop on Memory Performance Issues: in Conjunction with the 3st International Symposium on Computer Architecture, 24.

Bloom Filters. References:

Bloom Filters. References: Bloom Filters References: Li Fan, Pei Cao, Jussara Almeida, Andrei Broder, Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol, IEEE/ACM Transactions on Networking, Vol. 8, No. 3, June 2000.

More information

An Enhanced Bloom Filter for Longest Prefix Matching

An Enhanced Bloom Filter for Longest Prefix Matching An Enhanced Bloom Filter for Longest Prefix Matching Gahyun Park SUNY-Geneseo Email: park@geneseo.edu Minseok Kwon Rochester Institute of Technology Email: jmk@cs.rit.edu Abstract A Bloom filter is a succinct

More information

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme

AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING. 1. Introduction. 2. Associative Cache Scheme AN ASSOCIATIVE TERNARY CACHE FOR IP ROUTING James J. Rooney 1 José G. Delgado-Frias 2 Douglas H. Summerville 1 1 Dept. of Electrical and Computer Engineering. 2 School of Electrical Engr. and Computer

More information

Tree-Based Minimization of TCAM Entries for Packet Classification

Tree-Based Minimization of TCAM Entries for Packet Classification Tree-Based Minimization of TCAM Entries for Packet Classification YanSunandMinSikKim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington 99164-2752, U.S.A.

More information

A Scalable Approach for Packet Classification Using Rule-Base Partition

A Scalable Approach for Packet Classification Using Rule-Base Partition CNIR Journal, Volume (5), Issue (1), Dec., 2005 A Scalable Approach for Packet Classification Using Rule-Base Partition Mr. S J Wagh 1 and Dr. T. R. Sontakke 2 [1] Assistant Professor in Information Technology,

More information

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho

PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database. Johnny Ho PERG-Rx: An FPGA-based Pattern-Matching Engine with Limited Regular Expression Support for Large Pattern Database Johnny Ho Supervisor: Guy Lemieux Date: September 11, 2009 University of British Columbia

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

Finding a needle in Haystack: Facebook's photo storage

Finding a needle in Haystack: Facebook's photo storage Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,

More information

On the Relationship of Server Disk Workloads and Client File Requests

On the Relationship of Server Disk Workloads and Client File Requests On the Relationship of Server Workloads and Client File Requests John R. Heath Department of Computer Science University of Southern Maine Portland, Maine 43 Stephen A.R. Houser University Computing Technologies

More information

Performance Consequences of Partial RED Deployment

Performance Consequences of Partial RED Deployment Performance Consequences of Partial RED Deployment Brian Bowers and Nathan C. Burnett CS740 - Advanced Networks University of Wisconsin - Madison ABSTRACT The Internet is slowly adopting routers utilizing

More information

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced?

!! What is virtual memory and when is it useful? !! What is demand paging? !! When should pages in memory be replaced? Chapter 10: Virtual Memory Questions? CSCI [4 6] 730 Operating Systems Virtual Memory!! What is virtual memory and when is it useful?!! What is demand paging?!! When should pages in memory be replaced?!!

More information

A Framework for Efficient Class-based Sampling

A Framework for Efficient Class-based Sampling A Framework for Efficient Class-based Sampling Mohit Saxena and Ramana Rao Kompella Department of Computer Science Purdue University West Lafayette, IN, 47907 Email: {msaxena,kompella}@cs.purdue.edu Abstract

More information

Hashing. Hashing Procedures

Hashing. Hashing Procedures Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements

More information

Efficient Resource Management for the P2P Web Caching

Efficient Resource Management for the P2P Web Caching Efficient Resource Management for the P2P Web Caching Kyungbaek Kim and Daeyeon Park Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced Institute

More information

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio

Project Proposal. ECE 526 Spring Modified Data Structure of Aho-Corasick. Benfano Soewito, Ed Flanigan and John Pangrazio Project Proposal ECE 526 Spring 2006 Modified Data Structure of Aho-Corasick Benfano Soewito, Ed Flanigan and John Pangrazio 1. Introduction The internet becomes the most important tool in this decade

More information

Impact of bandwidth-delay product and non-responsive flows on the performance of queue management schemes

Impact of bandwidth-delay product and non-responsive flows on the performance of queue management schemes Impact of bandwidth-delay product and non-responsive flows on the performance of queue management schemes Zhili Zhao Dept. of Elec. Engg., 214 Zachry College Station, TX 77843-3128 A. L. Narasimha Reddy

More information

Memory Design. Cache Memory. Processor operates much faster than the main memory can.

Memory Design. Cache Memory. Processor operates much faster than the main memory can. Memory Design Cache Memory Processor operates much faster than the main memory can. To ameliorate the sitution, a high speed memory called a cache memory placed between the processor and main memory. Barry

More information

Reducing Disk Latency through Replication

Reducing Disk Latency through Replication Gordon B. Bell Morris Marden Abstract Today s disks are inexpensive and have a large amount of capacity. As a result, most disks have a significant amount of excess capacity. At the same time, the performance

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

Implementation of Boundary Cutting Algorithm Using Packet Classification

Implementation of Boundary Cutting Algorithm Using Packet Classification Implementation of Boundary Cutting Algorithm Using Packet Classification Dasari Mallesh M.Tech Student Department of CSE Vignana Bharathi Institute of Technology, Hyderabad. ABSTRACT: Decision-tree-based

More information

Summary Cache based Co-operative Proxies

Summary Cache based Co-operative Proxies Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck

More information

One Memory Access Bloom Filters and Their Generalization

One Memory Access Bloom Filters and Their Generalization This paper was presented as part of the main technical program at IEEE INFOCOM 211 One Memory Access Bloom Filters and Their Generalization Yan Qiao Tao Li Shigang Chen Department of Computer & Information

More information

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,

Chapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion, Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations

More information

Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS)

Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS) Hash-Based String Matching Algorithm For Network Intrusion Prevention systems (NIPS) VINOD. O & B. M. SAGAR ISE Department, R.V.College of Engineering, Bangalore-560059, INDIA Email Id :vinod.goutham@gmail.com,sagar.bm@gmail.com

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

Multimedia Streaming. Mike Zink

Multimedia Streaming. Mike Zink Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=

More information

An Oracle White Paper April 2010

An Oracle White Paper April 2010 An Oracle White Paper April 2010 In October 2009, NEC Corporation ( NEC ) established development guidelines and a roadmap for IT platform products to realize a next-generation IT infrastructures suited

More information

Scalable Enterprise Networks with Inexpensive Switches

Scalable Enterprise Networks with Inexpensive Switches Scalable Enterprise Networks with Inexpensive Switches Minlan Yu minlanyu@cs.princeton.edu Princeton University Joint work with Alex Fabrikant, Mike Freedman, Jennifer Rexford and Jia Wang 1 Enterprises

More information

Process size is independent of the main memory present in the system.

Process size is independent of the main memory present in the system. Hardware control structure Two characteristics are key to paging and segmentation: 1. All memory references are logical addresses within a process which are dynamically converted into physical at run time.

More information

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage

Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Tradeoff between coverage of a Markov prefetcher and memory bandwidth usage Elec525 Spring 2005 Raj Bandyopadhyay, Mandy Liu, Nico Peña Hypothesis Some modern processors use a prefetching unit at the front-end

More information

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2

Selective Boundary Cutting For Packet Classification SOUMYA. K 1, CHANDRA SEKHAR. M 2 ISSN 2319-8885 Vol.04,Issue.34, August-2015, Pages:6786-6790 www.ijsetr.com SOUMYA. K 1, CHANDRA SEKHAR. M 2 1 Navodaya Institute of Technology, Raichur, Karnataka, India, E-mail: Keerthisree1112@gmail.com.

More information

Design Issues 1 / 36. Local versus Global Allocation. Choosing

Design Issues 1 / 36. Local versus Global Allocation. Choosing Design Issues 1 / 36 Local versus Global Allocation When process A has a page fault, where does the new page frame come from? More precisely, is one of A s pages reclaimed, or can a page frame be taken

More information

PowerVault MD3 SSD Cache Overview

PowerVault MD3 SSD Cache Overview PowerVault MD3 SSD Cache Overview A Dell Technical White Paper Dell Storage Engineering October 2015 A Dell Technical White Paper TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS

More information

Limiting the Number of Dirty Cache Lines

Limiting the Number of Dirty Cache Lines Limiting the Number of Dirty Cache Lines Pepijn de Langen and Ben Juurlink Computer Engineering Laboratory Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology

More information

RD-TCP: Reorder Detecting TCP

RD-TCP: Reorder Detecting TCP RD-TCP: Reorder Detecting TCP Arjuna Sathiaseelan and Tomasz Radzik Department of Computer Science, King s College London, Strand, London WC2R 2LS {arjuna,radzik}@dcs.kcl.ac.uk Abstract. Numerous studies

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is:

1 Probability Review. CS 124 Section #8 Hashing, Skip Lists 3/20/17. Expectation (weighted average): the expectation of a random quantity X is: CS 124 Section #8 Hashing, Skip Lists 3/20/17 1 Probability Review Expectation (weighted average): the expectation of a random quantity X is: x= x P (X = x) For each value x that X can take on, we look

More information

Threshold-Based Markov Prefetchers

Threshold-Based Markov Prefetchers Threshold-Based Markov Prefetchers Carlos Marchani Tamer Mohamed Lerzan Celikkanat George AbiNader Rice University, Department of Electrical and Computer Engineering ELEC 525, Spring 26 Abstract In this

More information

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction

Chapter 6 Memory 11/3/2015. Chapter 6 Objectives. 6.2 Types of Memory. 6.1 Introduction Chapter 6 Objectives Chapter 6 Memory Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured.

More information

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang

CHAPTER 6 Memory. CMPS375 Class Notes (Chap06) Page 1 / 20 Dr. Kuo-pao Yang CHAPTER 6 Memory 6.1 Memory 341 6.2 Types of Memory 341 6.3 The Memory Hierarchy 343 6.3.1 Locality of Reference 346 6.4 Cache Memory 347 6.4.1 Cache Mapping Schemes 349 6.4.2 Replacement Policies 365

More information

Adaptive Parallel Compressed Event Matching

Adaptive Parallel Compressed Event Matching Adaptive Parallel Compressed Event Matching Mohammad Sadoghi 1,2 Hans-Arno Jacobsen 2 1 IBM T.J. Watson Research Center 2 Middleware Systems Research Group, University of Toronto April 2014 Mohammad Sadoghi

More information

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE Fei He 1, 2, Fan Xiang 1, Yibo Xue 2,3 and Jun Li 2,3 1 Department of Automation, Tsinghua University, Beijing, China

More information

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far

Introduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing

More information

III Data Structures. Dynamic sets

III Data Structures. Dynamic sets III Data Structures Elementary Data Structures Hash Tables Binary Search Trees Red-Black Trees Dynamic sets Sets are fundamental to computer science Algorithms may require several different types of operations

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

On characterizing BGP routing table growth

On characterizing BGP routing table growth University of Massachusetts Amherst From the SelectedWorks of Lixin Gao 00 On characterizing BGP routing table growth T Bu LX Gao D Towsley Available at: https://works.bepress.com/lixin_gao/66/ On Characterizing

More information

A Firewall Architecture to Enhance Performance of Enterprise Network

A Firewall Architecture to Enhance Performance of Enterprise Network A Firewall Architecture to Enhance Performance of Enterprise Network Hailu Tegenaw HiLCoE, Computer Science Programme, Ethiopia Commercial Bank of Ethiopia, Ethiopia hailutegenaw@yahoo.com Mesfin Kifle

More information

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE WHITEPAPER DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE A Detailed Review ABSTRACT While tape has been the dominant storage medium for data protection for decades because of its low cost, it is steadily

More information

Bloom filters and their applications

Bloom filters and their applications Bloom filters and their applications Fedor Nikitin June 11, 2006 1 Introduction The bloom filters, as a new approach to hashing, were firstly presented by Burton Bloom [Blo70]. He considered the task of

More information

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1

Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Multi-core Implementation of Decomposition-based Packet Classification Algorithms 1 Shijie Zhou, Yun R. Qu, and Viktor K. Prasanna Ming Hsieh Department of Electrical Engineering, University of Southern

More information

White paper ETERNUS Extreme Cache Performance and Use

White paper ETERNUS Extreme Cache Performance and Use White paper ETERNUS Extreme Cache Performance and Use The Extreme Cache feature provides the ETERNUS DX500 S3 and DX600 S3 Storage Arrays with an effective flash based performance accelerator for regions

More information

Hierarchically Aggregated Fair Queueing (HAFQ) for Per-flow Fair Bandwidth Allocation in High Speed Networks

Hierarchically Aggregated Fair Queueing (HAFQ) for Per-flow Fair Bandwidth Allocation in High Speed Networks Hierarchically Aggregated Fair Queueing () for Per-flow Fair Bandwidth Allocation in High Speed Networks Ichinoshin Maki, Hideyuki Shimonishi, Tutomu Murase, Masayuki Murata, Hideo Miyahara Graduate School

More information

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < : A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks Visvasuresh Victor Govindaswamy,

More information

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging

CS162 Operating Systems and Systems Programming Lecture 14. Caching (Finished), Demand Paging CS162 Operating Systems and Systems Programming Lecture 14 Caching (Finished), Demand Paging October 11 th, 2017 Neeraja J. Yadwadkar http://cs162.eecs.berkeley.edu Recall: Caching Concept Cache: a repository

More information

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi

Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Embedded Systems Dr. Santanu Chaudhury Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 13 Virtual memory and memory management unit In the last class, we had discussed

More information

Skewed-Associative Caches: CS752 Final Project

Skewed-Associative Caches: CS752 Final Project Skewed-Associative Caches: CS752 Final Project Professor Sohi Corey Halpin Scot Kronenfeld Johannes Zeppenfeld 13 December 2002 Abstract As the gap between microprocessor performance and memory performance

More information

FPGA Implementation of Lookup Algorithms

FPGA Implementation of Lookup Algorithms 2011 IEEE 12th International Conference on High Performance Switching and Routing FPGA Implementation of Lookup Algorithms Zoran Chicha, Luka Milinkovic, Aleksandra Smiljanic Department of Telecommunications

More information

indicates problems that have been selected for discussion in section, time permitting.

indicates problems that have been selected for discussion in section, time permitting. Page 1 of 17 Caches indicates problems that have been selected for discussion in section, time permitting. Problem 1. The diagram above illustrates a blocked, direct-mapped cache for a computer that uses

More information

Profile of CopperEye Indexing Technology. A CopperEye Technical White Paper

Profile of CopperEye Indexing Technology. A CopperEye Technical White Paper Profile of CopperEye Indexing Technology A CopperEye Technical White Paper September 2004 Introduction CopperEye s has developed a new general-purpose data indexing technology that out-performs conventional

More information

IP Mobility vs. Session Mobility

IP Mobility vs. Session Mobility IP Mobility vs. Session Mobility Securing wireless communication is a formidable task, something that many companies are rapidly learning the hard way. IP level solutions become extremely cumbersome when

More information

Memory Systems IRAM. Principle of IRAM

Memory Systems IRAM. Principle of IRAM Memory Systems 165 other devices of the module will be in the Standby state (which is the primary state of all RDRAM devices) or another state with low-power consumption. The RDRAM devices provide several

More information

Scalable Lookup Algorithms for IPv6

Scalable Lookup Algorithms for IPv6 Scalable Lookup Algorithms for IPv6 Aleksandra Smiljanić a*, Zoran Čiča a a School of Electrical Engineering, Belgrade University, Bul. Kralja Aleksandra 73, 11120 Belgrade, Serbia ABSTRACT IPv4 addresses

More information

Some Observations of Internet Stream Lifetimes

Some Observations of Internet Stream Lifetimes Some Observations of Internet Stream Lifetimes Nevil Brownlee CAIDA, UC San Diego, and The University of Auckland, New Zealand nevil@auckland.ac.nz Abstract. We present measurements of stream lifetimes

More information

Lightweight caching strategy for wireless content delivery networks

Lightweight caching strategy for wireless content delivery networks Lightweight caching strategy for wireless content delivery networks Jihoon Sung 1, June-Koo Kevin Rhee 1, and Sangsu Jung 2a) 1 Department of Electrical Engineering, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon,

More information

Chapter Seven. Large & Fast: Exploring Memory Hierarchy

Chapter Seven. Large & Fast: Exploring Memory Hierarchy Chapter Seven Large & Fast: Exploring Memory Hierarchy 1 Memories: Review SRAM (Static Random Access Memory): value is stored on a pair of inverting gates very fast but takes up more space than DRAM DRAM

More information

Mo Money, No Problems: Caches #2...

Mo Money, No Problems: Caches #2... Mo Money, No Problems: Caches #2... 1 Reminder: Cache Terms... Cache: A small and fast memory used to increase the performance of accessing a big and slow memory Uses temporal locality: The tendency to

More information

Advanced Database Systems

Advanced Database Systems Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor

Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor -0- Improving the Database Logging Performance of the Snort Network Intrusion Detection Sensor Lambert Schaelicke, Matthew R. Geiger, Curt J. Freeland Department of Computer Science and Engineering University

More information

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 6, June 1994, pp. 573-584.. The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor David J.

More information

General Objective:To understand the basic memory management of operating system. Specific Objectives: At the end of the unit you should be able to:

General Objective:To understand the basic memory management of operating system. Specific Objectives: At the end of the unit you should be able to: F2007/Unit6/1 UNIT 6 OBJECTIVES General Objective:To understand the basic memory management of operating system Specific Objectives: At the end of the unit you should be able to: define the memory management

More information

9/24/ Hash functions

9/24/ Hash functions 11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg

Computer Architecture and System Software Lecture 09: Memory Hierarchy. Instructor: Rob Bergen Applied Computer Science University of Winnipeg Computer Architecture and System Software Lecture 09: Memory Hierarchy Instructor: Rob Bergen Applied Computer Science University of Winnipeg Announcements Midterm returned + solutions in class today SSD

More information

Virtual Memory. Today.! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms

Virtual Memory. Today.! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms Virtual Memory Today! Virtual memory! Page replacement algorithms! Modeling page replacement algorithms Reminder: virtual memory with paging! Hide the complexity let the OS do the job! Virtual address

More information

CS 268: Route Lookup and Packet Classification

CS 268: Route Lookup and Packet Classification Overview CS 268: Route Lookup and Packet Classification Packet Lookup Packet Classification Ion Stoica March 3, 24 istoica@cs.berkeley.edu 2 Lookup Problem Identify the output interface to forward an incoming

More information

CSE 565 Computer Security Fall 2018

CSE 565 Computer Security Fall 2018 CSE 565 Computer Security Fall 2018 Lecture 19: Intrusion Detection Department of Computer Science and Engineering University at Buffalo 1 Lecture Outline Intruders Intrusion detection host-based network-based

More information

Reducing The De-linearization of Data Placement to Improve Deduplication Performance

Reducing The De-linearization of Data Placement to Improve Deduplication Performance Reducing The De-linearization of Data Placement to Improve Deduplication Performance Yujuan Tan 1, Zhichao Yan 2, Dan Feng 2, E. H.-M. Sha 1,3 1 School of Computer Science & Technology, Chongqing University

More information

Homework 1 Solutions:

Homework 1 Solutions: Homework 1 Solutions: If we expand the square in the statistic, we get three terms that have to be summed for each i: (ExpectedFrequency[i]), (2ObservedFrequency[i]) and (ObservedFrequency[i])2 / Expected

More information

Removing Belady s Anomaly from Caches with Prefetch Data

Removing Belady s Anomaly from Caches with Prefetch Data Removing Belady s Anomaly from Caches with Prefetch Data Elizabeth Varki University of New Hampshire varki@cs.unh.edu Abstract Belady s anomaly occurs when a small cache gets more hits than a larger cache,

More information

Memory. Objectives. Introduction. 6.2 Types of Memory

Memory. Objectives. Introduction. 6.2 Types of Memory Memory Objectives Master the concepts of hierarchical memory organization. Understand how each level of memory contributes to system performance, and how the performance is measured. Master the concepts

More information

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline

Forwarding and Routers : Computer Networking. Original IP Route Lookup. Outline Forwarding and Routers 15-744: Computer Networking L-9 Router Algorithms IP lookup Longest prefix matching Classification Flow monitoring Readings [EVF3] Bitmap Algorithms for Active Flows on High Speed

More information

ID Bloom Filter: Achieving Faster Multi-Set Membership Query in Network Applications

ID Bloom Filter: Achieving Faster Multi-Set Membership Query in Network Applications ID Bloom Filter: Achieving Faster Multi-Set Membership Query in Network Applications Peng Liu 1, Hao Wang 1, Siang Gao 1, Tong Yang 1, Lei Zou 1, Lorna Uden 2, Xiaoming Li 1 Peking University, China 1

More information

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems Tiejian Luo tjluo@ucas.ac.cn Zhu Wang wangzhubj@gmail.com Xiang Wang wangxiang11@mails.ucas.ac.cn ABSTRACT The indexing

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Amin Vahdat Department of Computer Science Duke University 1 Introduction Increasingly,

More information

Locality of Reference

Locality of Reference Locality of Reference 1 In view of the previous discussion of secondary storage, it makes sense to design programs so that data is read from and written to disk in relatively large chunks but there is

More information

Q.1 Explain Computer s Basic Elements

Q.1 Explain Computer s Basic Elements Q.1 Explain Computer s Basic Elements Ans. At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Memory Management ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part I: Operating system overview: Memory Management 1 Hardware background The role of primary memory Program

More information

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches

A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches A Pipelined Memory Management Algorithm for Distributed Shared Memory Switches Xike Li, Student Member, IEEE, Itamar Elhanany, Senior Member, IEEE* Abstract The distributed shared memory (DSM) packet switching

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

Worst-case running time for RANDOMIZED-SELECT

Worst-case running time for RANDOMIZED-SELECT Worst-case running time for RANDOMIZED-SELECT is ), even to nd the minimum The algorithm has a linear expected running time, though, and because it is randomized, no particular input elicits the worst-case

More information

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3. 5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000

More information

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( )

Lecture 15: Caches and Optimization Computer Architecture and Systems Programming ( ) Systems Group Department of Computer Science ETH Zürich Lecture 15: Caches and Optimization Computer Architecture and Systems Programming (252-0061-00) Timothy Roscoe Herbstsemester 2012 Last time Program

More information

Supplementary File: Dynamic Resource Allocation using Virtual Machines for Cloud Computing Environment

Supplementary File: Dynamic Resource Allocation using Virtual Machines for Cloud Computing Environment IEEE TRANSACTION ON PARALLEL AND DISTRIBUTED SYSTEMS(TPDS), VOL. N, NO. N, MONTH YEAR 1 Supplementary File: Dynamic Resource Allocation using Virtual Machines for Cloud Computing Environment Zhen Xiao,

More information

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol Li Fan, Pei Cao and Jussara Almeida University of Wisconsin-Madison Andrei Broder Compaq/DEC System Research Center Why Web Caching One of

More information

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions

Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions Dell PowerEdge R720xd with PERC H710P: A Balanced Configuration for Microsoft Exchange 2010 Solutions A comparative analysis with PowerEdge R510 and PERC H700 Global Solutions Engineering Dell Product

More information

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck Main memory management CMSC 411 Computer Systems Architecture Lecture 16 Memory Hierarchy 3 (Main Memory & Memory) Questions: How big should main memory be? How to handle reads and writes? How to find

More information