Demystifying Automata Processing: GPUs, FPGAs or Micron s AP?

Size: px
Start display at page:

Download "Demystifying Automata Processing: GPUs, FPGAs or Micron s AP?"

Transcription

1 Demystifying Automata Processing: GPUs, FPGAs or Micron s AP? Marziyeh Nourian 1,3, Xiang Wang 1, Xiaoong Yu 2, Wu-chun Feng 2, Michela Becchi 1,3 1,3 Department of Electrical an Computer Engineering, 2 Department of Computer Science 1 University of Missouri, 2 Virginia Tech, 3 North Carolina State University mnouria@ncsu.eu, xw7b4@mail.missouri.eu, xyu@vt.eu, feng@cs.vt.eu, mbecchi@ncsu.eu ABSTRACT Many establishe an emerging applications perform at their core some form of pattern matching, a computation that maps naturally onto finite automata abstractions. As a consequence, in recent years there has been a substantial amount of work on high-spee automata processing, which has le to a number of implementations targeting a variety of parallel platforms: CPUs, GPUs, FPGAs, ASICs, an Network Processors. More recently, Micron has announce its Automata Processor (AP), a DRAMbase accelerator of non-eterministic finite automata (NFA). Despite the abunance of work in this omain, the avantages an isavantages of ifferent automata processing accelerators an the innovation space in this area are still unclear. In this work we target this problem an propose a toolchain to allow an apples-to-apples comparison of NFA acceleration engines on three platforms: GPUs, FPGAs an Micron s AP. We iscuss the automata optimizations that are applicable to these three platforms. We perform an evaluation on large-scale atasets: to this en, we propose an NFA partitioning algorithm that minimizes the number of state replications require to maintain functional equivalence with an unpartitione NFA, an we evaluate the scalability of each implementation to both large NFAs an large numbers of input streams. Our experimental evaluation covers resource utilization, traversal throughput, an preprocessing overhea an shows that the FPGA provies the best traversal throughputs (on the orer of Gbps) at the cost of significant preprocessing times (on the orer of hours); GPUs eliver moest traversal throughputs (on the orer of Mbps), but offer low preprocessing times (on the orer of secons or minutes) an goo pattern ensities (they can accommoate large atasets on a single evice); Micron s AP elivers throughputs, pattern ensities, an preprocessing times that are intermeiate between those of FPGAs an GPUs, an it is most suite for applications that use atasets consisting of many small NFAs with a topology that is fixe an known a priori. 1 INTRODUCTION Many establishe an emerging applications perform at their core some form of pattern matching, a computation that maps Permission to make igital or har copies of all or part of this work for personal or classroom use is grante without fee provie that copies are not mae or istribute for profit or commercial avantage an that copies bear this notice an the full citation on the first page. Copyrights for components of this work owne by others than ACM must be honore. Abstracting with creit is permitte. To copy otherwise, or republish, to post on servers or to reistribute to lists, requires prior specific permission an/or a fee. Request permissions from Permissions@acm.org. ICS '17, June 14-16, 2017, Chicago, IL, USA 2017 Association for Computing Machinery. ACM ISBN /17/06 $ naturally onto finite automata abstractions. In biology, for example, several genomics tasks, such as motif iscovery, orthology inference, shotgun an e novo assembly, involve string-matching operations on genomics ata. In turn, avances in DNA sequencing technology have le to increasingly large volumes of ata available for these applications, resulting in a significant increase in their computational requirements. In the networking omain, several applications such as network intrusion etection, content-base routing, an application-level filtering require inspecting network packets for potentially large sets of preefine patterns, an they typically must perform this operation at the rate of packet arrival on the router interface. Given the number an relevance of applications requiring efficient pattern matching, there has been a substantial amount of work on high-spee automata processing, an this work has originate from ifferent communities: from the networking to the reconfigurable computing an computer architecture to the parallel computing community. These efforts have le to a number of algorithmic [1-9] an architectural solutions targeting ifferent parallel platforms: from CPUs to GPUs [10-12] to FPGAs [13-16] to ASICs [17-19] to Network Processors [20]. More recently, Micron has announce their Automata Processor [21], a DRAM-base accelerator of non-eterministic finite automata (NFA) that has been showcase on a variety of applications: motif iscovery in biological sequences [22], association rule mining [23], brill tagging [24], high-spee regular expression matching for network intrusion etection [25], graph processing [26], an sequential pattern mining [27]. Despite this abunance of work on high-spee automata processing, there is still lack of clarity as to how existing software an harware solutions are relate to an compare with each other. There are several reasons for this. First, existing solutions are base on ifferent automata moels: either non-eterministic or eterministic finite automata (NFAs an DFAs, respectively). While functionally equivalent, NFAs an DFAs have practical ifferences in terms of resource requirements an traversal behavior that are strongly epenent on the characteristics of the unerlying pattern set. While there has been a substantial boy of work proposing automata esigns that trae off the avantages an isavantages of NFAs an DFAs [1-9], no automata moel is preferable on all atasets. This makes it har to provie a fair comparison between automata processors relying on ifferent automata moels. Secon, some automata processing architectures are esigne to optimize the peak performance of a single input stream, while others offer better support for stream-level concurrency. Thir, applications relying on finite automata must operate

2 (a) 0 a 1 a b c 4 5 c 7 b 2 8 c e a: from 1-10 (b) b 1 2 remaining a transitions b: from b c 4 5 c: from 1,3,5-10 in two steps: in the preprocessing step, the require automaton must be generate, optimize, compile, an loae onto the target accelerator (through memory configuration an/or place&route operations); in the traversal step, the application performs pattern matching by traversing the automaton guie by the content of the input text. Most of the existing automata processing engines have been esigne to optimize automata traversal, often at the cost of a significant preprocessing cost. While the preprocessing time is unimportant for some categories of applications (for example, network intrusion etection systems can operate for ays or weeks between reconfigurations of their pattern sets), its effect on performance can be significant for other applications with more ynamic pattern sets or traversal times in the orer of a few secons. Unfortunately, the majority of the previous stuies on Micron s AP neglect to report the preprocessing overhea (or part of it) [21-24] or report substantial speeups over preexisting CPU tools (not necessarily base on automata) by comparing the full execution time of these tools to only the traversal time of the automatabase solution (on the orer of secons or millisecons), omitting the preprocessing time of the AP-esign (on the orer of minutes) in the speeup calculation [25]. This can lea to results that are misleaing or of limite practical use. To target these problems an provie an apples-to-apples comparison, we select automata accelerator esigns that rely on the same automata moel: NFAs. Since NFAs o not suffer from state explosion, their use allows us to perform an evaluation on large-scale atasets without posing any restrictions on the kin of patterns supporte. Specifically, we compare GPU- an FPGAbase NFA engines with Micron s AP. Micron s AP extens NFAs functionality with counters an boolean elements. To ensure functional equivalence an the same egree of programmability across the consiere platforms, we exten existing FPGA- an GPU-base esigns to support these features, an we aopt the same programming interface for all platforms: namely, Micron s Automata Network Markup Language (ANML). Different platforms offer ifferent automata ensity to take this into account, we perform an analysis on non-trivial ataset sizes, which require partitioning large NFAs across multiple evices. Besies consiering peak performance on a single input stream, we evaluate the scalability of the consiere automata processor esigns to multiple concurrent inputs. Finally, we evaluate the costs of the ifferent preprocessing steps require by the consiere architectures, an we stuy how the size of the automaton an the ensity of its transitions affect some of the preprocessing stages (e.g., place&route on Micron s c 8 9 c e 3 6 e Figure 1: (a) NFA an (b) DFA accepting regular expressions a+bc, bc+ an ce. Accepting states are bol. States active after processing text aabc are colore gray AP an FPGA). To summarize, we make the following contributions: We exten existing FPGA- an GPU-base automata processing esigns to support Micron s AP counters an boolean elements, an we propose a compiler toolchain to automatically eploy extene NFAs (in ANML form) onto these three platforms. We propose an NFA partitioning scheme aime at minimizing the amount of state replication require to hanle large NFAs while preserving functional equivalence with a single unpartitione NFA. For GPU eployment, we explore ifferent state layouts an kernels suite to NFAs with varying characteristics. We perform an apples-to-apples comparison between Micron s AP, GPU- an FPGA-base NFA accelerator esigns on large-scale atasets. Our evaluation covers resource utilization, throughput an preprocessing costs for real-worl NFAs use in networking an bioinformatics applications, as well as synthetic atasets covering regular expressions atasets with various characteristics. 2 BACKGROUND AND RELATED WORK 2.1 Backgroun on Automata Processing Regular expression matching has traitionally been implemente by representing the pattern-set through finite automata (FA) [28]. The matching operation is equivalent to a FA traversal guie by the content of the input stream. Worstcase performance guarantees can be offere by bouning the amount of processing performe per input character. However, techniques to keep per-character processing low involve increasing the size of the finite automaton, the basic ata structure in the regular expression matching engine. As the size of pattern-sets an the expressiveness of iniviual patterns increase, limiting the size of the automaton to fit on reasonably provisione harware platforms becomes challenging. Thus, the exploration space is characterize by a trae-off between the size of the automaton an the worst-case boun on the amount of per character processing. NFAs an DFAs are at the two extremes in this exploration space. NFAs have a limite size but can require expensive percharacter processing, whereas DFAs offer limite per-character processing at the cost of a possibly large automaton. In Figure 1 we show the NFA an DFA accepting three simple patterns (a + bc, bc + an ce). In the figure, states active after processing text aabc are colore gray. In the NFA, the number of states an transitions is limite by the number of symbols in the patternset. In the DFA, every state presents one transition for each character in the alphabet ( ). Each DFA state correspons to a set of NFA states that can be simultaneously active [28]; therefore, the number of states in a DFA equivalent to an N-state NFA can potentially be 2 N. In practice, previous work [2, 5, 29] has shown that this so-calle state explosion happens only in the presence of complex patterns (typically those containing repetitions of large character sets). Since each DFA state correspons to a set of simultaneously active NFA states, DFAs

3 Figure 2: Our toolchain ensure minimal per-character processing (only one state transition is taken for each input character). From an implementation perspective, existing regular expression matching engines can be classifie into two categories: memory-base [1-12, 17, 19], an logic-base [13-16]. Within the former, the FA is store in memory; within the latter, it is store in combinational an sequential logic. Memory-base implementations can be eploye on various platforms (GPUs, network processors, ASICs, FPGAs); logic-base implementations typically target FPGAs. In a memory-base implementation, esign goals are the minimization of the memory size neee to store the automaton an of the memory banwith neee to operate it. Similarly, in a logic-base implementation the esign aims at minimizing the logic utilization while allowing fast operation (that is, a high clock frequency). Existing proposals targeting DFA-base, memorycentric solutions have focuse on esigning compression mechanisms to reuce the DFA memory footprint an novel automata to alleviate the state explosion problem [1-9]. Despite the complexity of their esign, memory-centric solutions have three avantages: fast reconfigurability, low power consumption, an scalability in the number of input streams. On the other han, logic-centric solutions allow for easily achieving peak worst-case performance on a single input stream, at the expense of lack of scalability in the number of concurrent inputs. 2.2 Micron s Automata Processor Overview Micron's Automata Processor [21] is a DRAM-base, reconfigurable accelerator that simulates NFA traversal at high spee. The AP inclues three kins of programmable elements store in SDRAM: State Transition Elements (STE), Counter Elements (CE) an Boolean Elements (BE), which implement states/transitions, counters an logical operators between states, respectively. Each STE inclues a 256-bit mask (one bit per ASCII symbol), an symbols triggering state transitions are associate to states (an encoe into STEs) rather than to transitions. Transitions between states are then implemente through a routing matrix consisting of programmable switches, buffers, routing lines, an cross-point connections. The routing capacity is limite by traeoffs between clock rate, propagation elays an power consumption, an these constraints influence place&route of automata onto the AP harware. Micron's current generation of AP boar (AP-D480) inclues 16 or 32 chips organize into two to four ranks (8 chips per rank), an its esign can scale up to 48 chips. Each AP chip consists of two half-cores. There are no routes either between half-cores or inter-chips, which implies that NFA transitions across half-cores an chips are not possible. Programmable elements are organize in blocks: each block consists of 16 rows, where a row inclues eight groups of two STEs an one special purpose element (CE or BE). Each chip contains a total of 49,152 STEs, 768 CE an 2,304 BE, organize in 192 blocks an equally resiing in both half-cores. Current boars allow up to 6,144 elements per chip to be set as report elements. AP automata can be escribe in ANML (an XML-base language). Recently propose high-level programming languages for the AP are mappe an compile into ANML [30]. Micron s SDK inclues a toolchain that parses ANML esigns, compiles them into internal objects consisting of subgraphs, places an routes these subgraphs onto the AP harware, an finally generates a binary image that can be use to program the AP memory an routing matrix. Once the AP has been programme, it will be able to simulate the NFA traversal. AP chips can be groupe into logical cores of 2, 4 or 8, each processing a stream of 8-bit input characters [25]. The AP nominally operates at a 133MHz frequency, an, in absence of matches, it processes one input character per clock cycle from all input streams. Once matches occur, AP generates reporting events in vector format an stores them in an output-buffer; reporting matches to the host system requires from 91 to 291 clock cycles. 3 TOOLCHAIN 3.1 Overall esign Figure 2 shows the toolchain esigne to eploy ANML specifications on GPU, FPGA an Micron s AP. In the figure, grey boxes represent the software components that we have esigne an implemente. The last two moules leaing to FPGA an AP are Xilinx an Micron s software evelopment kits use for the final synthesis/compilation, map, an place&route on these two evices. The input to the toolchain is an ANML file that contains one or more automata networks (each incluing one or more NFAs). We on t impose any constraints on these networks: in other wors, they on t nee to be esigne to fit a particular evice or optimize for it. Once parse, these networks are store in our toolchain using an internal representation for later processing an optimization. We istinguish two categories of optimizations: automata-specific an platform-specific. Since the GPU, FPGA an AP are use as NFA traversal accelerators, optimizations to the automaton apply to all platforms. In our previous work [14], we have escribe several NFA optimizations (state reuction, alphabet compression an software striing) an put them to practice on FPGA; these optimizations apply to GPUs an AP as well. Automata-specific optimizations can be selectively enable an isable. Platform-specific optimizations are relate to the way the NFA is encoe for the particular target evice; these optimizations inclue compact an efficient memory encoings, logic utilizations, an striing mechanisms that are specific to a particular harware platform. Since the internals of the operation of the AP harware an its software stack (incluing the compilation, map an place&route processes) are proprietary, AP-specific optimizations are

4 Figure 3: NFA accepting regular expressions ab+[c]e an corresponing one-hot encoing representation eferre to the AP SDK tools (last phase of the toolchain). The partitioning step, that takes a potentially large network an breaks it into multiple NFA partitions so that each of them can fit the target harware, is performe after the automata-specific optimization step. This allows partitioning to be one on an alreay optimize NFA. Our partitioning algorithm is platforminepenent, but its configuration epens on the target platform. The coe an configuration generation step prouces the files require for the final eployment of the automata network on the harware. For GPUs, all is neee is a configuration file that inclues the information necessary to loa the NFA partitions into memory, an a heaer file with the efinition of the boolean connectors in the ANML specification. FPGAs are configure through a Verilog file escribing the NFA network an its interface. The AP is configure through an ANML file; this output file iffers from the input file in that it contains a partitione an optimize automata network. 3.2 GPU implementation We reuse an exten infant [18], an NFA-traversal engine for GPUs. infant stores the NFA in evice memory, an encoes the transition table as set of (source, estination) pairs inexe by the input character. In orer to allow efficient execution, infant stores the set of active states in share memory in bitvector form. For each input character, infant retrieves from memory all the transitions on that symbol, an, if their source state is active, the engine upates the active state vector with the estination state information. In infant, each threa-block is assigne an input stream, an threas within a block process the state transitions an upate the state vector cooperatively. We extene infant with the following functionalities: Support for multiple NFA partitions We map each NFA partition to a threa-block, allowing multiple blocks to process the same input stream on ifferent partitions. The transition lists corresponing to ifferent partitions are lai out sequentially, an an inexing array maps each partition to the proper set of threa-blocks, each operating on a ifferent input stream. Traversal kernels base on compresse sparse row (CSR) layout We consier an alternative memory layout where transitions represente as (input symbol, estination) pairs are inexe by the source state. For each input symbol, this layout allows processing only the transitions that originate from active states. We store the ientifiers of the active states in a queue in global memory. We consier two variants of this kernel: CSR-state an CSR-tx, the former mapping active states to threas, an the latter mapping outgoing transitions from active states to threas. Support for counters an boolean elements We associate a special state to each counter an boolean element, an store these special states at the en of the state vector. The activation of special states triggers coe implementing the operation of the particular counter or boolean element. Boolean operators are also associate combinational coe that is store in an automatically generate heaer file. 3.3 FPGA implementation On FPGA, NFA processing can be realize in two ways: either by implementing a traversal engine that accesses the NFA store in memory, or by irectly encoing the NFA in logic. Most logicbase NFA implementations are base on the one-hot encoing scheme [13], in which states are represente as flip-flops while transitions are implemente by an-ing an or-ing the outputs of the flip-flops with the ecoe input character. For example, Figure 3 shows the one-hot encoing representation of the NFA accepting regular expression ab + [c]e. The main avantage of this scheme is that it limits the traversal time to one clock cycle per input character inepenent of the number of states that are active (this property is share by Micron s AP). On the other han, this implementation suffers from two limitations: first, upating the NFA requires reprogramming the evice; secon, multiple input stream support requires logic replication. The pros an cons of a memory-base FPGA esign are comparable to those of a GPU solution: easy support for multiple input streams at the cost of irregular an unpreictable memory access patterns, leaing to ataset epenent performance. In this paper we use the optimize logic-base implementation that we have escribe in our previous work [14], an exten it to support counters an boolean elements (a trivial extension). 3.4 Automata-specific optimizations Our toolchain inclues three automata-specific optimizations: state reuction, alphabet compression an software striing [14]. Here, we briefly mention their effect on the consiere platforms. State reuction (which merges uplicate NFA paths) reuces the memory requirements on GPU an AP, an the logic requirements on FPGA. In aition, it reuces the number of states that can be active in parallel, which for GPU is beneficial to the throughput. Alphabet compression (which consoliates the alphabet base on the symbols appearing on the NFA transitions) reuces the wiring an LUT utilization on FPGA. However, because the AP stores a 256-bit mask in each STE, this optimization oes not benefit AP unless combine with software striing. Software striing (which allows processing multiple characters in one step) can be beneficial on all platforms if combine with alphabet reuction. This technique is applicable to the AP only if the alphabet generate by combining alphabet reuction an software striing oes not excee 256 symbols. GPUs an FPGAs offer also platform-specific striing schemes [6, 10, 15], which we have inclue in our toolchain. 3.5 Partitioning criteria An NFA must be partitione if it excees the resources

5 (a) Reference NFA (b) 1 st initial coloring step (c) 2 n initial coloring step () 3 r initial coloring step (e) Replication reuction step (f) Final consoliation Figure 4: Example of application of our coloring scheme (N max =8). available on a particular evice. Here, we inicate the platformspecific partitioning criteria we use. In Section 4, we escribe our propose partitioning algorithm. GPU: GPU partitioning is require if the share or global memory capacity is exceee, or if the state ientifier space is exhauste. In this paper, we use 16-bit state ientifiers, leaing to a maximum of states per NFA partition. This constraint is more restrictive than those on the global an share memory capacity (an, ue to threa-block concurrency, is not a limiting factor on performance see Section 5). FPGA: The logic esign use stores states in flip-flops an transitions in LUTs. We experimentally foun flip-flops to be the bottleneck resource. Thus, we perform NFA partitioning when the number of NFA states excees that of available flip-flops. AP The AP oes not allow transitions across half-cores, an has a limite number of STEs, Counter Elements an Boolean Elements per half-core (see Section 2.2). Thus, the AP NFA partitioning criterion is base on these constraints. 4 NFA PARTITIONING ALGORITHM In this section, we escribe our NFA partitioning algorithm. For the sake of simplicity, we iscuss the algorithm on traitional NFAs: its extension to counters an boolean elements is straightforwar. In orer to preserve functional equivalence, NFA partitioning requires state replication. For example, let us assume to break the NFA of Figure 4(a) into two partitions to be eploye an operate on two evices: one partition containing states from 0 to 16, an the other containing states from 17 to 24. In orer to maintain functional equivalence with the original NFA, the entry state 0, which is share by the patterns matche in the both partitions, must be replicate into the secon partition. In general, very large NFAs may require replication of sets of states share by several patterns. The goal of our partitioning algorithm is to split the NFA into a small number of balance partitions, while minimizing the require state replications. In particular, given a threshol N max on the number of states that can be accommoate on a particular evice or harware component, the algorithm must split the NFA into as few partitions as possible, each with size not exceeing N max. Balance partitions allow loa balancing within (for GPUs an Micron s AP) an across (for FPGAs) evices, which ultimately has a positive effect on throughput. It is worth noting that existing partitioning schemes for generic graphs [31] aim to minimize the size of the cut (number of inter-partition transitions), but, when applie to NFAs, they o not necessarily minimize the number of state replications require to preserve functional equivalence with the unpartitione NFA. Thus, the nee for a partitioning scheme tailore to NFAs. We propose an algorithm that colors the NFA so that each color represents a partition, an states assigne multiple colors are share across partitions an must be replicate in each of these partitions. In orer to meet the requirements above, the algorithm must limit the number of colors an of states with multiple colors, while allowing each color to appear in up to N max states. In the following, we call color size the number of states assigne a particular color. Our algorithm operates in two phases: initial coloring an color consoliation. In the initial coloring phase, the NFA is traverse from the entry state an recursively colore until the size of each color oesn t excee N max. The color consoliation phase consoliates multiple colors into one while keeping their size below the given threshol. We note that sets of states connecte by cyclic transitions (e.g., states 2 an 3 in Figure 4(a)) cannot be separate into multiple partitions. We recall that, for partitions to be inepenent, inter-partition activations (that is, cross-partition transitions) must be avoie. As a consequence, a state belonging to multiple partitions must be replicate along with all the states connecte to it in a cyclic fashion. Thus, we group states that are cyclically interconnecte into super-states, an we hanle all the states in a super-state together. For example, states 2 an 3 of Figure 4(a) form super-state {2, 3} an are hanle as a single state. In orer to operate, the algorithm

6 Table 1: Dataset characteristics an traversal information (ranges correspon to traces with p forw =0.5 an p forw =0.9). Type Small NIDS Bioinformatics Synth. Name # states NFA Characteristics # Partitions Traversal Information # trans. # ANML states GPU FPGA AP Avg. active set Max. active set # Matches Inputs w/ matches % l7-filter k snort k 10k gene_8k 33k 100k 55k gene_12k 86k 258k 243k gene_16k 138k 415k 230k gene_20k 190k 570k 317k gene_8k 137k 413k 229k gene_12k 621k 1863k 1035k gene_16k 1124k 3372k 1873k gene_20k 1619k 4858k 2699k eep-64char 800k 1801k 800k eep-256char 800k 4855k 800.3k shallow-64char 800k 1801k 800.1k shallow-256char 800k 4855k 800.1k requires super-states to inclue fewer than N max states. If this is not the case, the NFA cannot be split into inepenent partitions. In the presence of epenent partitions, multiple NFA traversals are require to hanle inter-partition activations. Fortunately, NFAs originate from regular expressions atasets ten to have only few super-states of small size. This is because backwar-irecte transitions in NFAs originate from subpattern repetitions within regular expressions (for example, subpattern (c) * in Figure 4(a), where string c can be repeate zero or more times). Sub-pattern repetitions are rare in real-worl atasets, an are rarely share by a large number of patterns. We now etail the operation of the two phases of the algorithm, an illustrate them in Figure 4. In the example, we assume that the threshol N max is equal to 8. Initial coloring The initial coloring proceure starts by assigning istinct colors to the states connecte to the entry state (or to the super-state to which it belongs). This is illustrate in Figure 4(b), where the chilren of the entry state 0 are colore brown, green, yellow, pink, white, blue an orange. The colors are propagate to all the connecte states following the transitions. The entry state is then assigne all the colors of its chilren. As can be seen, this leas to some states (states 11-12, 14-16, 19-21, besie state 0) be assigne multiple colors. This operation must be repeate recursively on all generate NFA partitions until their size oes not excee the threshol N max. As can be seen, after the first coloring step the brown color has size 12 (states 0-9 an 11-12). Therefore, the coloring proceure is repeate starting from state 1. This causes color brown to be split into colors re an violet, which are again propagate own to the terminal states of the NFA (Figure 4(c)). Since color violet has size 10 (incluing state 0), the algorithm invokes one aitional recursive step on super-state {2,3}, causing color violet to be split into colors cyan an grey (Figure 4()). Since the largest color (grey) has now size 8 (equal to N max ), the initial coloring phase is terminate. Color consoliation While respecting the constraint on the maximum partition size, the partitioning generate by the initial coloring step has two limitations: it inclues small an unbalance partitions, an it leas to significant state replication. In the example, states 2, 3, 11, 14-16, must be replicate once, states 1 an 12 must be replicate twice, an state 0 must be replicate 8 times. The coloring consoliation phase aims to combine ifferent colors into one so as to increase the partition size, ecrease the number of partitions an the number of state replications require, an achieve more balance partitions. This phase is broken own into two steps: replication reuction an final consoliation. The first step aims to reuce the number of state replications require by merging colors. To etermine which colors to consoliate, we sort pairs of colors in escening orer accoring to the number of state replications that their consoliation woul save. In the example, cyan/grey, yellow/pink an white/blue woul save 3 state replications, green/re woul save 2, an re/yellow, green/yellow, re/cyan an re/grey woul save only 1. We then consier all pair-wise consoliation opportunities in orer, an merge the two colors only if their merging oesn t violate the partition size constraint. In the example, we consoliate yellow+pink into yellow, white+blue into white, an green+re into green. Figure 4(e) shows the result of the replication reuction step. In the final consoliation step, we look for opportunities to consoliate colors accoring to their size. To this en, we first sort the colors in escening orer by size, an then traverse the list an consoliate each color with the next color in the list that oesn t lea to violating the partition size constraint (if such color exists). In the example, colors orange an green are consoliate into orange. Figure 4(f) shows the final coloring, which leas to 5 partitions: two of size 8 (grey an orange) an three of size 6 (cyan, yellow an white). 5 EXPERIMENTAL EVALUATION 5.1 Harware platform We conucte almost all our experiments on a machine equippe with a ual 6-core Intel Xeon 2.66GHz an

7 Table 2: Resource utilization for GPU (ranges correspon to ifferent numbers of streams) an FPGA (ranges correspon to the minimum an maximum values across partitions) Type Small NIDS Bioinformatics Synt. 64GB of memory, running CentOS 6.4. Since some of our AP syntheses run out-of-memory on that machine, for our AP experiments we use a server with similar harware settings but equippe with 256GB memory. For our GPU experiments we use an Nviia Titan X GPU (Maxwell architecture), equippe with 12GB of global memory an 24 streaming multiprocessors (SMs), each incluing 128 cores an 96KB of share memory. We use CUDA 7.0. For our FPGA experiments we use a Xilinx XC6VLX130T evice (Virtex-6 family), which inclues 20,000 slices (for a total of 160,000 flip-flops an 80,000 LUTs). We use the Xilinx ISE Design suite v13.2 to perform synthesis, mapping an place&route of our HDL esigns. This FPGA evice was chosen because it is in the same price range (~$1,200) as our GPU. For AP experiments, we refer to the architecture of a 32- chip AP-D480. Since Micron s AP harware is not yet available on the market, we on t have pricing information for it. For the AP, we use AP SDK v to collect resource utilization an preprocessing ata, an performe throughput projections using the nominal operating frequency (more etails in Section 5.3). 5.2 Datasets GPU We selecte atasets allowing to compare the three platforms on ifferent application omains an on NFAs with varying characteristics in terms of number of states an transitions, alphabet size, connectivity an epth. To this en, we use three types on atasets: small NIDS, bioinformatics, an synthetic. Recently propose benchmark suites for automata processing [32] are not meant for large scale analysis (they inclue NFAs with up to about 100k states). Table 1 (columns 3-5) summarizes the characteristics of the NFAs for the consiere atasets. Small NIDS (Snort538 an l7-filter) are small network intrusion etection atasets that inclue 538 an 116 regular expressions, respectively (see [10] for more etails). Bioinformatics atasets (ngene_kk) consist of a set of Hamming istance automata use to aress a motif-fining problem [33]. The problem requires ientifying all the substrings FPGA # GPU Memory Utilization # FPGA % utilization Name evices share global (MB) evices FF LUT Slice (KB) infant CSR l7-filter snort gene_8k gene_12k gene_16k gene_20k gene_8k gene_12k gene_16k gene_20k eep-64char eep-256char shallow-64char shallow-256char of length k that appear on multiple genes within hamming istance, an can be foun in a region of the gene of length l. Due to space limitation, here we show only the results for n genes from a yeast genome of about 5000 genes, with n={10, 100}, k={8, 12, 16, 20}, l=500, =2 an a 4-symbol alphabet (A, C, G, T). A Hamming istance NFA has (k+1)(+1)-(+1)/2 states, an each gene region of length l leas to (l-k+1) of these NFAs. The NFAs in Table 1 (use on all three platforms) have been statereuce. However, previous work [22, 30] has shown that, on the AP, preprocessing time can be significantly reuce if NFAs with known structure are precompile. Thus, on the AP we also use a non state-reuce variant of these bioinformatics atasets (see Table 5), leaing to networks of n(l-k+1) small NFAs (each with (2+1)k- 2 STEs) with fixe topology. Synthetic automata exhibit the structure of NFAs accepting sets of regular expressions with share prefixes: large state outegrees in the proximity of the entry state, an low state outegrees as we move eeper in the NFA. Our synthetic NFAs have configurable number of states, alphabet size, entry state outegree, outegree ecrease factor γ (the outegree ecreases as γ epth ), an frequency of wilcars, character sets an their repetitions. We set these parameters so as to generate 800k-state NFAs with two alphabet sizes (64 an 256), an two structures (eep an shallow, about 180- an 16-level eep, respectively). Whenever require, we partition these NFAs with the algorithm escribe in Section 4. We recall that the partitioning threshol is platform-specific (Section 3.5). This leas to the number of partitions shown in Table 1 (columns 6-8). In orer to simulate the NFA traversal, we use two kins of input streams. For bioinformatics atasets, we generate traces of length 500,000 (for 1,000 genes) by ranomly selecting symbols from the {A, C, G, T} alphabet. For NIDS an synthetic atasets, we generate 256k character traces through our trace generator [34], setting the probability to move eeper in the NFA (p forw ) to 0.5 an 0.9. The traversal characteristics (average an maximum number of active states per input character, an number an frequency of matches) are reporte in Table 1 (columns 9-12). 5.3 Results Resource utilization is reporte in Table 2 for GPU an FPGA, an in Table 5 (columns 7-12) for the AP. For GPU, we recall that the NFA is store in global memory, while the active state information (encoe in a bit vector) is store in share memory. In the CSR case, the active state

8 Table 3: Traversal throughput (ranges correspon to ifferent numbers of streams for GPU an to ifferent partitions for FPGA) Type Small NIDS Bioinformatics Synth. GPU Name # infant CSR-state CSR-tx Throughput streams Throughput Throughput Throughput (Mbps) (Mbps) (Mbps) l7-filter snort gene_8k gene_12k gene_16k gene_20k gene_8k gene_12k gene_16k gene_20k eep-64char eep-256char shallow-64char shallow-256char information is also store (in queue format) in global memory, an therefore the global memory requirement increases with the number of threa-blocks run. However, as can be seen in Table 2, the global memory utilization is very limite even for the CSR format, an even the largest ataset occupies only up to 133MB of the 12GB global memory. We recall that NFA partitioning is riven by the use of 16-bit state ientifiers, an share memory stores two bitmaps inicating the states active at the beginning an the en of each traversal step. Therefore, the use of partitions with at most 64k states limits the per-block share memory utilization to 16KB in the worst case, allowing at least 6 blocks to resie on a SM an hie each other s memory latencies. For FPGA, to facilitate the place&route process, we size the partitions so as to use up to 70% of the flip-flop capacity. Since the consiere evice has twice as many flip-flops as LUTs, on most experiments this setting leas to near full slice utilization. For the AP (Table 5), we report both the ieal utilization (the number of blocks an AP cores that a ataset woul require base on the number of its STEs an reporting elements), an the utilization numbers reporte by the AP s SDK (real utilization). The utilization efficiency in column 11 is the ratio between the ieal an real block utilization. As can be seen, ue to the place&route constraints on the routing matrix, the real utilization is significantly higher than the ieal one. Note that shallow synthetic atasets have significantly lower utilization efficiency (~20%) than eep ones (>80%): this is because the noe out-egree of non-terminal states is large for shallow an low for eep atasets, making the former much harer to route. Due to the generally low utilization efficiency, we partitione all the state-reuce NFAs so that each partition woul require 50% (rather than the whole) half-core capacity. This le to the number of AP partitions shown in Table 1 (column 8). In aition, we experience that the AP SDK tools run out-of-memory when FPGA processing large atasets. To avoi this, for statereuce NFAs we groupe partitions into batches of per evice (epening on the transition ensity of the (Gbps) ataset), an we run the AP SDK on one batch at a time. In the table, for each state-reuce ataset we report the cumulative results over all batches. In case of large fixe topology atasets (100gene*), which consist of many small NFAs with the same topology, we size each batch so as to use all 32 cores on the AP. Since the place&route algorithm use by the SDK is proprietary, this was a trial-an-error process. For these atasets, we report the number of batches (which correspons to the number of AP boars require), an the per-batch ata. As can be seen, for small k (i.e., small hamming istance NFAs) the place&route is easier an the number of STEs/batch an utilization efficiency are higher. Since larger hamming istance NFAs are harer to place, the utilization efficiency ecreases as k increases. Traversal throughput is compute using the following formulas, which assume 8-bit inputs. We assume that matches are reporte every 64K inputs (maximum IP packet length) for NIDS atasets, every 500 inputs (length of relevant portion of a gene) for bioinformatics atasets, an every 1000 inputs for synthetic atasets (N inputs ). For FPGA, we use the worst-case, post-place&route operating frequency reporte by the Xilinx tools. The number of cycles require to report the matches (N output_processing_cycles ) is equal to the ratio between the number of matching states in the NFA an the number of output pins on the FPGA evice. For the AP, we performe estimates base on the 133 MHz nominal operating frequency an the 291 clock cycle output processing time. Table 4: Preprocessing overhea (in case of large atasets, we show minimum an maximum per-partition ata) Type Small NIDS Bioinformatics Synt. GPU FPGA Name Parsing Mem. L. Loaing Parsing Verilog gen. Synt.+ (sec) gen. (sec) mem. (ms) (sec) (sec) p&r (min) l7-filter snort gene_8k gene_12k gene_16k gene_20k gene_8k gene_12k gene_16k gene_20k eep-64char eep-256char shallow-64char shallow-256char

9 Bioinformatics Synth. Type Small NIDS fixetopology statereuce Name # batches # states/ STEs ANML-NFA characteristics # start states # report states Table 5: AP Results Ieal utilization # # cores blocks Resource utilization from SDK profiling # # % utiliz. cores blocks efficiency # AP boars SDK preprocessing time p&r (sec) Comp (sec) total (min) Throughput per evice (Mbps) L7-Filter 1 4k Snort k gene_8k 1 177k 10k 25k gene_20k 1 462k 10k 24k gene_8k k 72k 181k gene_12k 5 597k 21k 53k gene_16k 9 436k 12k 29k gene_20k k 9k 21k gene_8k 1 56k 8 21k gene_20k 1 317k 8 22k gene_8k 1 230k k gene_12k 2 985k k k k gene_16k k k k k gene_20k k k 56 10k k k eep 64char 3 800k char 3 752k shallow 64char 3 779k k k k char 3 801k k k k Throughput ata are shown in Table 3 for GPU an FPGA an in Table 5 for the AP. As can be seen, while able to fit even large atasets on a single evice, GPU reports the lowest throughput ata. In the GPU experiments, we configure the threa-block size to 256 an 32 for bioinformatics an NIDS/synthetic atasets, respectively. This is because we expecte bioinformatics atasets to have larger active sets (as confirme in Table 1). We recall that the number of threablocks run is equal to the prouct between the number of partitions an the number of input streams processe. To avoi ile SMs an ensure processing all partitions, we set the number of blocks to be at least equal to the number of SMs an of partitions. We then increase the number of blocks (an, as a consequence, of streams) until noticeable throughput improvements coul no longer be observe. We make two observations. First, GPU resources are better utilize when processing a large number of input streams, leaing to better throughput. Secon, while the infant kernel greatly outperforms the CSR kernels on small atasets, the CSR-state kernel reports better performance on bioinformatics atasets with large k. On atasets with a large number of partitions, infant is penalize by looping through a large number of transitions that originate from inactive states. Since large atasets require multiple FPGAs an AP boars (or multiple iterations through the same boar), for FPGAs an the AP we report the traversal throughput per evice. Since for most partitions the slice capacity is fully utilize, the number of FPGA evices require is equal to the number of FPGA partitions (Table 2/column 7), while the number of AP boars require is reporte in Table 5/column 12. For small atasets requiring only a small portion of the evice, both platforms can run multiple streams by replicating the NFA. In case of FPGA to utilize ~70% of slice capacity, we run 6 an 4 streams for l7-filter an snort534 respectively. In case of the AP, we consier that chips can be groupe into logical cores processing streams in parallel (Section 2.2). As can be seen, on large atasets (100ups* an synthetic) FPGAs outperform the AP up to a factor ~2.6x, while requiring 2-3x more evices than the AP. Preprocessing cost: In this section, we focus on the platformspecific preprocessing time. The NFA optimization an partitioning steps, common to all platforms, take from 3 to 249 sec (smallest to largest ataset). After these two steps, we save the NFA into file. As can be seen from Table 4, the GPU preprocessing is mostly relate to the parsing of the NFA partition files, an varies from 5 sec to about 4.5 min. For FPGA, synthesis an place&route account for most of the preprocessing time, an preprocessing a large partition may require up to 165 minutes (leaing to several hours for the full atasets). Similar preprocessing times are observe on the AP (for example, the preprocessing time for the shallow-256-char ataset is about 12 hours). In aition, the preprocessing time increases with the transition ensity (eep atasets are preprocesse must faster than shallow ones), whereas the alphabet size has a lesser effect (since on the AP transition symbols are associate to STEs an store in memory). As mentione, the AP preprocessing time can be reuce in case of atasets with known topology (i.e., fixe topology atasets) by pre-compilation. However, fining a configuration that fully uses the AP is a trial-an-error process. Overall Comparison: Figure 5 summarizes the results (note that throughput an preprocessing time are in logarithmic scale). As can be seen, FPGAs provie the best traversal throughputs (up to ~2.6x those of the AP) at the cost of significant preprocessing times (~hours); GPUs eliver moest traversal throughputs (~Mbps) but incur limite preprocessing time (~secons-minutes) an can accommoate large atasets on a single evice; Micron s AP is an intermeiate choice between FPGAs an GPUs, an is most suite for applications that use atasets consisting of many small NFAs with a fixe topology.

10 Figure 5: Traversal throughput, evice utilization (in terms of number of evices) an overall preprocessing time Power consumption: While the AP is not yet on the market, its esign aims to a worst-case power consumption of 4W per chip [23]. Due to lack of space, here we report power ata only on a meium-size ataset (100genes_12k). Xilinx s Power Analyzer estimates the FPGA power consumption to be between 2.09W an 2.36W on ifferent partitions. In contrast, GPU experiments on Texas State's Marcher system report an average GPU power consumption of W an W on the best an worst implementations/kernel configurations, respectively. 6 CONCLUSION To summarize, large atasets with more than thousan states must be partitione in orer to be eploye on GPUs, FPGAs an Micron s AP. While for GPUs, partitioning is require only to effectively use the GPU resources (e.g., on-chip memory), FPGAs an the AP require splitting large NFAs onto multiple evices. On these large atasets, logic-base FPGA esigns can outperform the AP by a factor ~2x, while requiring 2-3x more evices to accommoate the ataset; GPUs unerperform FPGAs by up to a factor 900x. GPUs in general eliver low performance on a single input stream, but their cumulative throughput scales up to thousans of input streams. GPUs offer the avantage of limite preprocessing time (up to a few minutes on million-state NFAs), while FPGAs an AP can take several hours to preprocess the same atasets. Precompiling the NFA can hie the AP s preprocessing time, but this is possible only if the topology of the NFA is known a priori (e.g., Hamming or Levenshtein istance NFAs). Fining an NFA configuration that uses all 32 AP cores is a trial-an-error process that can require about an hour per experiment. Finally, ue to routing constraints, AP s SDK can keep utilization efficiency as low as 20%, while the FPGA utilization is more preictable given the NFA size. ACKNOWLEDGMENTS This work has been supporte by NSF awars CNS an CCF , an by the Institute for Critical Technology an Applie Science (ICTAS: REFERENCES [1] S. Kumar et al., Algorithms to accelerate multiple regular expressions matching for eep packet inspection, in Proc. of SIGCOMM [2] S. Kumar et al., Curing regular expressions matching algorithms from insomnia, amnesia, an acalculia, in Proc. of ANCS [3] S. Kumar et al., Avance algorithms for fast an scalable eep packet inspection, in Proc. of ANCS [4] M. Becchi, an P. Crowley, An improve algorithm to accelerate regular expression evaluation, in Proc. of ANCS [5] M. Becchi, an P. Crowley, A hybri finite automaton for practical eep packet inspection, in Proc. of CoNEXT [6] M. Becchi, an P. Crowley, Extening finite automata to efficiently match Perl-compatible regular expressions, in Proc. of CoNEXT [7] R. Smith et al., Deflating the big bang: fast an scalable eep packet inspection with extene finite automata, in Proc. of SIGCOMM [8] A. X. Liu, an E. Torng, An overlay automata approach to regular expression matching, in Proc. of INFOCOM [9] X. Yu et al., Revisiting State Blow-up: Automatically Builing Augmente-FA while Preserving Functional Equivalence, JSAC [10] N. Cascarano et al., infant: NFA pattern matching on GPGPU evices, SIGCOMM Comput. Commun. Rev., vol. 40, no. 5, pp , [11] Y. Zu et al., GPU-base NFA implementation for memory efficient high spee regular expression matching, in Proc. of PPOPP [12] X. Yu, an M. Becchi, GPU acceleration of regular expression matching for large atasets: exploring the implementation space, in Proc. of CF [13] R. Sihu, an V. K. Prasanna, Fast Regular Expression Matching Using FPGAs, in Proc. of FCCM [14] M. Becchi, an P. Crowley, Efficient regular expression evaluation: theory to practice, in Proc. of ANCS [15] Y.-H. E. Yang et al., Compact architecture for high-throughput regular expression matching on FPGA, in Proc. of ANCS [16] A. Mitra et al., Compiling PCRE to FPGA for accelerating SNORT IDS, in Proc. of ANCS [17] B. C. Broie et al., A Scalable Architecture For High-Throughput Regular- Expression Pattern Matching, in Proc. of ISCA [18] J. Van Lunteren et al., Designing a Programmable Wire-Spee Regular- Expression Matching Accelerator, in Proc. of MICRO [19] Y. Fang et al., Fast support for unstructure ata processing: the unifie automata processor, in Proc. of MICRO [20] M. Becchi et al., Evaluating regular expression matching engines on network an general purpose processors, in Proc. of ANCS [21] P. Dlugosch et al., An Efficient an Scalable Semiconuctor Architecture for Parallel Automata Processing, TPDS, vol. PP, no. 99, pp. 1-1, [22] I. Roy, an S. Aluru, Fining Motifs in Biological Sequences Using the Micron Automata Processor, in Proc. of IPDPS [23] K. Wang et al., Association Rule Mining with the Micron Automata Processor, in Proc. of IPDPS [24] K. Zhou et al., Regular expression acceleration on the micron automata processor: Brill tagging as a case stuy, Proc. of Big Data [25] I. Roy et al., High Performance Pattern Matching Using the Automata Processor, in Proc. of IPDPS [26] I. Roy et al., Algorithmic Techniques for Solving Graph Problems on the Automata Processor, in Proc of IPDPS [27] K. Wang et al., Sequential pattern mining with the Micron automata processor, in Proc. of CF [28] J. E. Hopcroft, an J. Ullman, Introuction to automata theory, languages, an computation: Aison-Wesley, Reaing, Massachusetts, [29] F. Yu et al., Fast an memory-efficient regular expression matching for eep packet inspection, in Proc. of ANCS [30] K. Angstat et al., RAPID Programming of Pattern-Recognition Processors, in Proc. of ASPLOS [31] G. Karypis, an V. Kumar, A Fast an High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM J. Sci. Comp., v. 20, n. 1, pp , [32] J. Waen et al., ANMLzoo: a benchmark suite for exploring bottlenecks in automata processing engines an architectures, in Proc. of IISWC [33] A. To et al., Parallel Gene Upstream Comparison via Multi-Level Hash Tables on GPU, in Proc. of ICPADS [34] M. Becchi et al., A workloa for evaluating eep packet inspection architectures, in Proc of IISWC 2008.

Study of Network Optimization Method Based on ACL

Study of Network Optimization Method Based on ACL Available online at www.scienceirect.com Proceia Engineering 5 (20) 3959 3963 Avance in Control Engineering an Information Science Stuy of Network Optimization Metho Base on ACL Liu Zhian * Department

More information

Skyline Community Search in Multi-valued Networks

Skyline Community Search in Multi-valued Networks Syline Community Search in Multi-value Networs Rong-Hua Li Beijing Institute of Technology Beijing, China lironghuascut@gmail.com Jeffrey Xu Yu Chinese University of Hong Kong Hong Kong, China yu@se.cuh.eu.h

More information

Chapter 9 Memory Management

Chapter 9 Memory Management Contents 1. Introuction 2. Computer-System Structures 3. Operating-System Structures 4. Processes 5. Threas 6. CPU Scheuling 7. Process Synchronization 8. Dealocks 9. Memory Management 10.Virtual Memory

More information

Non-homogeneous Generalization in Privacy Preserving Data Publishing

Non-homogeneous Generalization in Privacy Preserving Data Publishing Non-homogeneous Generalization in Privacy Preserving Data Publishing W. K. Wong, Nios Mamoulis an Davi W. Cheung Department of Computer Science, The University of Hong Kong Pofulam Roa, Hong Kong {wwong2,nios,cheung}@cs.hu.h

More information

Just-In-Time Software Pipelining

Just-In-Time Software Pipelining Just-In-Time Software Pipelining Hongbo Rong Hyunchul Park Youfeng Wu Cheng Wang Programming Systems Lab Intel Labs, Santa Clara What is software pipelining? A loop optimization exposing instruction-level

More information

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control

Almost Disjunct Codes in Large Scale Multihop Wireless Network Media Access Control Almost Disjunct Coes in Large Scale Multihop Wireless Network Meia Access Control D. Charles Engelhart Anan Sivasubramaniam Penn. State University University Park PA 682 engelhar,anan @cse.psu.eu Abstract

More information

Coupling the User Interfaces of a Multiuser Program

Coupling the User Interfaces of a Multiuser Program Coupling the User Interfaces of a Multiuser Program PRASUN DEWAN University of North Carolina at Chapel Hill RAJIV CHOUDHARY Intel Corporation We have evelope a new moel for coupling the user-interfaces

More information

AnyTraffic Labeled Routing

AnyTraffic Labeled Routing AnyTraffic Labele Routing Dimitri Papaimitriou 1, Pero Peroso 2, Davie Careglio 2 1 Alcatel-Lucent Bell, Antwerp, Belgium Email: imitri.papaimitriou@alcatel-lucent.com 2 Universitat Politècnica e Catalunya,

More information

Online Appendix to: Generalizing Database Forensics

Online Appendix to: Generalizing Database Forensics Online Appenix to: Generalizing Database Forensics KYRIACOS E. PAVLOU an RICHARD T. SNODGRASS, University of Arizona This appenix presents a step-by-step iscussion of the forensic analysis protocol that

More information

Compiler Optimisation

Compiler Optimisation Compiler Optimisation Michael O Boyle mob@inf.e.ac.uk Room 1.06 January, 2014 1 Two recommene books for the course Recommene texts Engineering a Compiler Engineering a Compiler by K. D. Cooper an L. Torczon.

More information

Message Transport With The User Datagram Protocol

Message Transport With The User Datagram Protocol Message Transport With The User Datagram Protocol User Datagram Protocol (UDP) Use During startup For VoIP an some vieo applications Accounts for less than 10% of Internet traffic Blocke by some ISPs Computer

More information

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks

MORA: a Movement-Based Routing Algorithm for Vehicle Ad Hoc Networks : a Movement-Base Routing Algorithm for Vehicle A Hoc Networks Fabrizio Granelli, Senior Member, Giulia Boato, Member, an Dzmitry Kliazovich, Stuent Member Abstract Recent interest in car-to-car communications

More information

Supporting Fully Adaptive Routing in InfiniBand Networks

Supporting Fully Adaptive Routing in InfiniBand Networks XIV JORNADAS DE PARALELISMO - LEGANES, SEPTIEMBRE 200 1 Supporting Fully Aaptive Routing in InfiniBan Networks J.C. Martínez, J. Flich, A. Robles, P. López an J. Duato Resumen InfiniBan is a new stanar

More information

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien

Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama and Hayato Ohwada Faculty of Sci. and Tech. Tokyo University of Scien Yet Another Parallel Hypothesis Search for Inverse Entailment Hiroyuki Nishiyama an Hayato Ohwaa Faculty of Sci. an Tech. Tokyo University of Science, 2641 Yamazaki, Noa-shi, CHIBA, 278-8510, Japan hiroyuki@rs.noa.tus.ac.jp,

More information

6.823 Computer System Architecture. Problem Set #3 Spring 2002

6.823 Computer System Architecture. Problem Set #3 Spring 2002 6.823 Computer System Architecture Problem Set #3 Spring 2002 Stuents are strongly encourage to collaborate in groups of up to three people. A group shoul han in only one copy of the solution to the problem

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

A Plane Tracker for AEC-automation Applications

A Plane Tracker for AEC-automation Applications A Plane Tracker for AEC-automation Applications Chen Feng *, an Vineet R. Kamat Department of Civil an Environmental Engineering, University of Michigan, Ann Arbor, USA * Corresponing author (cforrest@umich.eu)

More information

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation

Random Clustering for Multiple Sampling Units to Speed Up Run-time Sample Generation DEIM Forum 2018 I4-4 Abstract Ranom Clustering for Multiple Sampling Units to Spee Up Run-time Sample Generation uzuru OKAJIMA an Koichi MARUAMA NEC Solution Innovators, Lt. 1-18-7 Shinkiba, Koto-ku, Tokyo,

More information

Loop Scheduling and Partitions for Hiding Memory Latencies

Loop Scheduling and Partitions for Hiding Memory Latencies Loop Scheuling an Partitions for Hiing Memory Latencies Fei Chen Ewin Hsing-Mean Sha Dept. of Computer Science an Engineering University of Notre Dame Notre Dame, IN 46556 Email: fchen,esha @cse.n.eu Tel:

More information

Image Segmentation using K-means clustering and Thresholding

Image Segmentation using K-means clustering and Thresholding Image Segmentation using Kmeans clustering an Thresholing Preeti Panwar 1, Girhar Gopal 2, Rakesh Kumar 3 1M.Tech Stuent, Department of Computer Science & Applications, Kurukshetra University, Kurukshetra,

More information

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2

Intensive Hypercube Communication: Prearranged Communication in Link-Bound Machines 1 2 This paper appears in J. of Parallel an Distribute Computing 10 (1990), pp. 167 181. Intensive Hypercube Communication: Prearrange Communication in Link-Boun Machines 1 2 Quentin F. Stout an Bruce Wagar

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu Institute of Information Science Acaemia Sinica Taipei, Taiwan Da-wei Wang Jan-Jan Wu Institute of Information Science

More information

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions.

Overview. Operating Systems I. Simple Memory Management. Simple Memory Management. Multiprocessing w/fixed Partitions. Overview Operating Systems I Management Provie Services processes files Manage Devices processor memory isk Simple Management One process in memory, using it all each program nees I/O rivers until 96 I/O

More information

MODULE VII. Emerging Technologies

MODULE VII. Emerging Technologies MODULE VII Emerging Technologies Computer Networks an Internets -- Moule 7 1 Spring, 2014 Copyright 2014. All rights reserve. Topics Software Define Networking The Internet Of Things Other trens in networking

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 01 01 01 01 01 00 01 01 Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University

More information

Computer Organization

Computer Organization Computer Organization Douglas Comer Computer Science Department Purue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purue.eu/people/comer Copyright 2006. All rights reserve.

More information

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method

Transient analysis of wave propagation in 3D soil by using the scaled boundary finite element method Southern Cross University epublications@scu 23r Australasian Conference on the Mechanics of Structures an Materials 214 Transient analysis of wave propagation in 3D soil by using the scale bounary finite

More information

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH

SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH SURVIVABLE IP OVER WDM: GUARANTEEEING MINIMUM NETWORK BANDWIDTH Galen H Sasaki Dept Elec Engg, U Hawaii 2540 Dole Street Honolul HI 96822 USA Ching-Fong Su Fuitsu Laboratories of America 595 Lawrence Expressway

More information

Baring it all to Software: The Raw Machine

Baring it all to Software: The Raw Machine Baring it all to Software: The Raw Machine Elliot Waingol, Michael Taylor, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Srikrishna Devabhaktuni, Rajeev Barua, Jonathan Babb,

More information

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Questions? Post on piazza, or  Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)! EE122 Fall 2013 HW3 Instructions Recor your answers in a file calle hw3.pf. Make sure to write your name an SID at the top of your assignment. For each problem, clearly inicate your final answer, bol an

More information

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources

An Algorithm for Building an Enterprise Network Topology Using Widespread Data Sources An Algorithm for Builing an Enterprise Network Topology Using Wiesprea Data Sources Anton Anreev, Iurii Bogoiavlenskii Petrozavosk State University Petrozavosk, Russia {anreev, ybgv}@cs.petrsu.ru Abstract

More information

Table-based division by small integer constants

Table-based division by small integer constants Table-base ivision by small integer constants Florent e Dinechin, Laurent-Stéphane Diier LIP, Université e Lyon (ENS-Lyon/CNRS/INRIA/UCBL) 46, allée Italie, 69364 Lyon Ceex 07 Florent.e.Dinechin@ens-lyon.fr

More information

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks

EDOVE: Energy and Depth Variance-Based Opportunistic Void Avoidance Scheme for Underwater Acoustic Sensor Networks sensors Article EDOVE: Energy an Depth Variance-Base Opportunistic Voi Avoiance Scheme for Unerwater Acoustic Sensor Networks Safar Hussain Bouk 1, *, Sye Hassan Ahme 2, Kyung-Joon Park 1 an Yongsoon Eun

More information

Comparison of Methods for Increasing the Performance of a DUA Computation

Comparison of Methods for Increasing the Performance of a DUA Computation Comparison of Methos for Increasing the Performance of a DUA Computation Michael Behrisch, Daniel Krajzewicz, Peter Wagner an Yun-Pang Wang Institute of Transportation Systems, German Aerospace Center,

More information

Improving Performance of Sparse Matrix-Vector Multiplication

Improving Performance of Sparse Matrix-Vector Multiplication Improving Performance of Sparse Matrix-Vector Multiplication Ali Pınar Michael T. Heath Department of Computer Science an Center of Simulation of Avance Rockets University of Illinois at Urbana-Champaign

More information

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients

Probabilistic Medium Access Control for. Full-Duplex Networks with Half-Duplex Clients Probabilistic Meium Access Control for 1 Full-Duplex Networks with Half-Duplex Clients arxiv:1608.08729v1 [cs.ni] 31 Aug 2016 Shih-Ying Chen, Ting-Feng Huang, Kate Ching-Ju Lin, Member, IEEE, Y.-W. Peter

More information

Overlap Interval Partition Join

Overlap Interval Partition Join Overlap Interval Partition Join Anton Dignös Department of Computer Science University of Zürich, Switzerlan aignoes@ifi.uzh.ch Michael H. Böhlen Department of Computer Science University of Zürich, Switzerlan

More information

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks

Architecture Design of Mobile Access Coordinated Wireless Sensor Networks Architecture Design of Mobile Access Coorinate Wireless Sensor Networks Mai Abelhakim 1 Leonar E. Lightfoot Jian Ren 1 Tongtong Li 1 1 Department of Electrical & Computer Engineering, Michigan State University,

More information

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing

Indexing the Edges A simple and yet efficient approach to high-dimensional indexing Inexing the Eges A simple an yet efficient approach to high-imensional inexing Beng Chin Ooi Kian-Lee Tan Cui Yu Stephane Bressan Department of Computer Science National University of Singapore 3 Science

More information

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER

EFFICIENT ON-LINE TESTING METHOD FOR A FLOATING-POINT ADDER FFICINT ON-LIN TSTING MTHOD FOR A FLOATING-POINT ADDR A. Droz, M. Lobachev Department of Computer Systems, Oessa State Polytechnic University, Oessa, Ukraine Droz@ukr.net, Lobachev@ukr.net Abstract In

More information

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks

Robust PIM-SM Multicasting using Anycast RP in Wireless Ad Hoc Networks Robust PIM-SM Multicasting using Anycast RP in Wireless A Hoc Networks Jaewon Kang, John Sucec, Vikram Kaul, Sunil Samtani an Mariusz A. Fecko Applie Research, Telcoria Technologies One Telcoria Drive,

More information

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks

Coordinating Distributed Algorithms for Feature Extraction Offloading in Multi-Camera Visual Sensor Networks Coorinating Distribute Algorithms for Feature Extraction Offloaing in Multi-Camera Visual Sensor Networks Emil Eriksson, György Dán, Viktoria Foor School of Electrical Engineering, KTH Royal Institute

More information

Recitation Caches and Blocking. 4 March 2019

Recitation Caches and Blocking. 4 March 2019 15-213 Recitation Caches an Blocking 4 March 2019 Agena Reminers Revisiting Cache Lab Caching Review Blocking to reuce cache misses Cache alignment Reminers Due Dates Cache Lab (Thursay 3/7) Miterm Exam

More information

d 3 d 4 d d d d d d d d d d d 1 d d d d d d

d 3 d 4 d d d d d d d d d d d 1 d d d d d d Proceeings of the IASTED International Conference Software Engineering an Applications (SEA') October 6-, 1, Scottsale, Arizona, USA AN OBJECT-ORIENTED APPROACH FOR MANAGING A NETWORK OF DATABASES Shu-Ching

More information

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization

Offloading Cellular Traffic through Opportunistic Communications: Analysis and Optimization 1 Offloaing Cellular Traffic through Opportunistic Communications: Analysis an Optimization Vincenzo Sciancalepore, Domenico Giustiniano, Albert Banchs, Anreea Picu arxiv:1405.3548v1 [cs.ni] 14 May 24

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stolyar Abstract Backpressure-base aaptive routing

More information

Improving Spatial Reuse of IEEE Based Ad Hoc Networks

Improving Spatial Reuse of IEEE Based Ad Hoc Networks mproving Spatial Reuse of EEE 82.11 Base A Hoc Networks Fengji Ye, Su Yi an Biplab Sikar ECSE Department, Rensselaer Polytechnic nstitute Troy, NY 1218 Abstract n this paper, we evaluate an suggest methos

More information

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD.

Cloud Search Service Product Introduction. Issue 01 Date HUAWEI TECHNOLOGIES CO., LTD. 1.3.15 Issue 01 Date 2018-11-21 HUAWEI TECHNOLOGIES CO., LTD. Copyright Huawei Technologies Co., Lt. 2019. All rights reserve. No part of this ocument may be reprouce or transmitte in any form or by any

More information

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs

Distributed Line Graphs: A Universal Technique for Designing DHTs Based on Arbitrary Regular Graphs IEEE TRANSACTIONS ON KNOWLEDE AND DATA ENINEERIN, MANUSCRIPT ID Distribute Line raphs: A Universal Technique for Designing DHTs Base on Arbitrary Regular raphs Yiming Zhang an Ling Liu, Senior Member,

More information

Control of Scalable Wet SMA Actuator Arrays

Control of Scalable Wet SMA Actuator Arrays Proceeings of the 2005 IEEE International Conference on Robotics an Automation Barcelona, Spain, April 2005 Control of Scalable Wet SMA Actuator Arrays eslie Flemming orth Dakota State University Mechanical

More information

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises

Frequent Pattern Mining. Frequent Item Set Mining. Overview. Frequent Item Set Mining: Motivation. Frequent Pattern Mining comprises verview Frequent Pattern Mining comprises Frequent Pattern Mining hristian Borgelt School of omputer Science University of Konstanz Universitätsstraße, Konstanz, Germany christian.borgelt@uni-konstanz.e

More information

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes

Chapter 5 Proposed models for reconstituting/ adapting three stereoscopes Chapter 5 Propose moels for reconstituting/ aapting three stereoscopes - 89 - 5. Propose moels for reconstituting/aapting three stereoscopes This chapter offers three contributions in the Stereoscopy area,

More information

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition

A Neural Network Model Based on Graph Matching and Annealing :Application to Hand-Written Digits Recognition ITERATIOAL JOURAL OF MATHEMATICS AD COMPUTERS I SIMULATIO A eural etwork Moel Base on Graph Matching an Annealing :Application to Han-Written Digits Recognition Kyunghee Lee Abstract We present a neural

More information

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots

Adaptive Load Balancing based on IP Fast Reroute to Avoid Congestion Hot-spots Aaptive Loa Balancing base on IP Fast Reroute to Avoi Congestion Hot-spots Masaki Hara an Takuya Yoshihiro Faculty of Systems Engineering, Wakayama University 930 Sakaeani, Wakayama, 640-8510, Japan Email:

More information

k-nn Graph Construction: a Generic Online Approach

k-nn Graph Construction: a Generic Online Approach k-nn Graph Construction: a Generic Online Approach Wan-Lei Zhao arxiv:80.00v [cs.ir] Sep 08 Abstract Nearest neighbor search an k-nearest neighbor graph construction are two funamental issues arise from

More information

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA

Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA Implementation an Evaluation of AS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA Kazuya Matsumoto 1, orihisa Fujita 2, Toshihiro Hanawa 3, an Taisuke Boku 1,2 1 Center for Computational

More information

Impact of FTP Application file size and TCP Variants on MANET Protocols Performance

Impact of FTP Application file size and TCP Variants on MANET Protocols Performance International Journal of Moern Communication Technologies & Research (IJMCTR) Impact of FTP Application file size an TCP Variants on MANET Protocols Performance Abelmuti Ahme Abbasher Ali, Dr.Amin Babkir

More information

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation

On Effectively Determining the Downlink-to-uplink Sub-frame Width Ratio for Mobile WiMAX Networks Using Spline Extrapolation On Effectively Determining the Downlink-to-uplink Sub-frame With Ratio for Mobile WiMAX Networks Using Spline Extrapolation Panagiotis Sarigianniis, Member, IEEE, Member Malamati Louta, Member, IEEE, Member

More information

On the Placement of Internet Taps in Wireless Neighborhood Networks

On the Placement of Internet Taps in Wireless Neighborhood Networks 1 On the Placement of Internet Taps in Wireless Neighborhoo Networks Lili Qiu, Ranveer Chanra, Kamal Jain, Mohamma Mahian Abstract Recently there has emerge a novel application of wireless technology that

More information

Enabling Rollback Support in IT Change Management Systems

Enabling Rollback Support in IT Change Management Systems Enabling Rollback Support in IT Change Management Systems Guilherme Sperb Machao, Fábio Fabian Daitx, Weverton Luis a Costa Coreiro, Cristiano Bonato Both, Luciano Paschoal Gaspary, Lisanro Zambeneetti

More information

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis

Multilevel Linear Dimensionality Reduction using Hypergraphs for Data Analysis Multilevel Linear Dimensionality Reuction using Hypergraphs for Data Analysis Haw-ren Fang Department of Computer Science an Engineering University of Minnesota; Minneapolis, MN 55455 hrfang@csumneu ABSTRACT

More information

Adjacency Matrix Based Full-Text Indexing Models

Adjacency Matrix Based Full-Text Indexing Models 1000-9825/2002/13(10)1933-10 2002 Journal of Software Vol.13, No.10 Ajacency Matrix Base Full-Text Inexing Moels ZHOU Shui-geng 1, HU Yun-fa 2, GUAN Ji-hong 3 1 (Department of Computer Science an Engineering,

More information

Research Article REALFLOW: Reliable Real-Time Flooding-Based Routing Protocol for Industrial Wireless Sensor Networks

Research Article REALFLOW: Reliable Real-Time Flooding-Based Routing Protocol for Industrial Wireless Sensor Networks Hinawi Publishing Corporation International Journal of Distribute Sensor Networks Volume 2014, Article ID 936379, 17 pages http://x.oi.org/10.1155/2014/936379 Research Article REALFLOW: Reliable Real-Time

More information

Considering bounds for approximation of 2 M to 3 N

Considering bounds for approximation of 2 M to 3 N Consiering bouns for approximation of to (version. Abstract: Estimating bouns of best approximations of to is iscusse. In the first part I evelop a powerseries, which shoul give practicable limits for

More information

Fast Fractal Image Compression using PSO Based Optimization Techniques

Fast Fractal Image Compression using PSO Based Optimization Techniques Fast Fractal Compression using PSO Base Optimization Techniques A.Krishnamoorthy Visiting faculty Department Of ECE University College of Engineering panruti rishpci89@gmail.com S.Buvaneswari Visiting

More information

PAPER. 1. Introduction

PAPER. 1. Introduction IEICE TRANS. COMMUN., VOL. E9x-B, No.8 AUGUST 2010 PAPER Integrating Overlay Protocols for Proviing Autonomic Services in Mobile A-hoc Networks Panagiotis Gouvas, IEICE Stuent member, Anastasios Zafeiropoulos,,

More information

Learning Subproblem Complexities in Distributed Branch and Bound

Learning Subproblem Complexities in Distributed Branch and Bound Learning Subproblem Complexities in Distribute Branch an Boun Lars Otten Department of Computer Science University of California, Irvine lotten@ics.uci.eu Rina Dechter Department of Computer Science University

More information

Finite Automata Implementations Considering CPU Cache J. Holub

Finite Automata Implementations Considering CPU Cache J. Holub Finite Automata Implementations Consiering CPU Cache J. Holub The finite automata are mathematical moels for finite state systems. More general finite automaton is the noneterministic finite automaton

More information

Generalized Edge Coloring for Channel Assignment in Wireless Networks

Generalized Edge Coloring for Channel Assignment in Wireless Networks TR-IIS-05-021 Generalize Ege Coloring for Channel Assignment in Wireless Networks Chun-Chen Hsu, Pangfeng Liu, Da-Wei Wang, Jan-Jan Wu December 2005 Technical Report No. TR-IIS-05-021 http://www.iis.sinica.eu.tw/lib/techreport/tr2005/tr05.html

More information

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems

On the Role of Multiply Sectioned Bayesian Networks to Cooperative Multiagent Systems On the Role of Multiply Sectione Bayesian Networks to Cooperative Multiagent Systems Y. Xiang University of Guelph, Canaa, yxiang@cis.uoguelph.ca V. Lesser University of Massachusetts at Amherst, USA,

More information

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks

Backpressure-based Packet-by-Packet Adaptive Routing in Communication Networks 1 Backpressure-base Packet-by-Packet Aaptive Routing in Communication Networks Eleftheria Athanasopoulou, Loc Bui, Tianxiong Ji, R. Srikant, an Alexaner Stoylar arxiv:15.4984v1 [cs.ni] 27 May 21 Abstract

More information

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem

Throughput Characterization of Node-based Scheduling in Multihop Wireless Networks: A Novel Application of the Gallai-Edmonds Structure Theorem Throughput Characterization of Noe-base Scheuling in Multihop Wireless Networks: A Novel Application of the Gallai-Emons Structure Theorem Bo Ji an Yu Sang Dept. of Computer an Information Sciences Temple

More information

Top-down Connectivity Policy Framework for Mobile Peer-to-Peer Applications

Top-down Connectivity Policy Framework for Mobile Peer-to-Peer Applications Top-own Connectivity Policy Framework for Mobile Peer-to-Peer Applications Otso Kassinen Mika Ylianttila Junzhao Sun Jussi Ala-Kurikka MeiaTeam Department of Electrical an Information Engineering University

More information

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks

Non-Uniform Sensor Deployment in Mobile Wireless Sensor Networks 0 0 0 0 0 0 0 0 on-uniform Sensor Deployment in Mobile Wireless Sensor etworks Mihaela Carei, Yinying Yang, an Jie Wu Department of Computer Science an Engineering Floria Atlantic University Boca Raton,

More information

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means

Classifying Facial Expression with Radial Basis Function Networks, using Gradient Descent and K-means Classifying Facial Expression with Raial Basis Function Networks, using Graient Descent an K-means Neil Allrin Department of Computer Science University of California, San Diego La Jolla, CA 9237 nallrin@cs.ucs.eu

More information

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Automation Framework for Large-Scale Regular Expression Matching on FPGA Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Ming-Hsieh Department of Electrical Engineering University of Southern California

More information

Divide-and-Conquer Algorithms

Divide-and-Conquer Algorithms Supplment to A Practical Guie to Data Structures an Algorithms Using Java Divie-an-Conquer Algorithms Sally A Golman an Kenneth J Golman Hanout Divie-an-conquer algorithms use the following three phases:

More information

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees

Disjoint Multipath Routing in Dual Homing Networks using Colored Trees Disjoint Multipath Routing in Dual Homing Networks using Colore Trees Preetha Thulasiraman, Srinivasan Ramasubramanian, an Marwan Krunz Department of Electrical an Computer Engineering University of Arizona,

More information

Two Dimensional-IP Routing

Two Dimensional-IP Routing Two Dimensional-IP Routing Mingwei Xu Shu Yang Dan Wang Hong Kong Polytechnic University Jianping Wu Abstract Traitional IP networks use single-path routing, an make forwaring ecisions base on estination

More information

Towards a Low-Power Accelerator of Many FPGAs for Stencil Computations

Towards a Low-Power Accelerator of Many FPGAs for Stencil Computations 2012 Thir International Conference on Networking an Computing Towars a Low-Power Accelerator of Many FPGAs for Stencil Computations Ryohei Kobayashi Tokyo Institute of Technology, Japan E-mail: kobayashi@arch.cs.titech.ac.jp

More information

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks

Queueing Model and Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Queueing Moel an Optimization of Packet Dropping in Real-Time Wireless Sensor Networks Marc Aoun, Antonios Argyriou, Philips Research, Einhoven, 66AE, The Netherlans Department of Computer an Communication

More information

Overview : Computer Networking. IEEE MAC Protocol: CSMA/CA Internet mobility TCP over noisy links

Overview : Computer Networking. IEEE MAC Protocol: CSMA/CA Internet mobility TCP over noisy links Overview 15-441 15-441: Computer Networking 15-641 Lecture 24: Wireless Eric Anerson Fall 2014 www.cs.cmu.eu/~prs/15-441-f14 Internet mobility TCP over noisy links Link layer challenges an WiFi Cellular

More information

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem

Particle Swarm Optimization Based on Smoothing Approach for Solving a Class of Bi-Level Multiobjective Programming Problem BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 17, No 3 Sofia 017 Print ISSN: 1311-970; Online ISSN: 1314-4081 DOI: 10.1515/cait-017-0030 Particle Swarm Optimization Base

More information

Local Path Planning with Proximity Sensing for Robot Arm Manipulators. 1. Introduction

Local Path Planning with Proximity Sensing for Robot Arm Manipulators. 1. Introduction Local Path Planning with Proximity Sensing for Robot Arm Manipulators Ewar Cheung an Vlaimir Lumelsky Yale University, Center for Systems Science Department of Electrical Engineering New Haven, Connecticut

More information

Provisioning Virtualized Cloud Services in IP/MPLS-over-EON Networks

Provisioning Virtualized Cloud Services in IP/MPLS-over-EON Networks Provisioning Virtualize Clou Services in IP/MPLS-over-EON Networks Pan Yi an Byrav Ramamurthy Department of Computer Science an Engineering, University of Nebraska-Lincoln Lincoln, Nebraska 68588 USA Email:

More information

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques

Politehnica University of Timisoara Mobile Computing, Sensors Network and Embedded Systems Laboratory. Testing Techniques Politehnica University of Timisoara Mobile Computing, Sensors Network an Embee Systems Laboratory ing Techniques What is testing? ing is the process of emonstrating that errors are not present. The purpose

More information

NAND flash memory is widely used as a storage

NAND flash memory is widely used as a storage 1 : Buffer-Aware Garbage Collection for Flash-Base Storage Systems Sungjin Lee, Dongkun Shin Member, IEEE, an Jihong Kim Member, IEEE Abstract NAND flash-base storage evice is becoming a viable storage

More information

DATA PARALLEL FPGA WORKLOADS: SOFTWARE VERSUS HARDWARE. Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose

DATA PARALLEL FPGA WORKLOADS: SOFTWARE VERSUS HARDWARE. Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose DATA PARALLEL FPGA WORKLOADS: SOFTWARE VERSUS HARDWARE Peter Yiannacouras, J. Gregory Steffan, an Jonathan Rose Ewar S. Rogers Sr. Department of Electrical an Computer Engineering University of Toronto

More information

Wireless Sensing and Structural Control Strategies

Wireless Sensing and Structural Control Strategies Wireless Sensing an Structural Control Strategies Kincho H. Law 1, Anrew Swartz 2, Jerome P. Lynch 3, Yang Wang 4 1 Dept. of Civil an Env. Engineering, Stanfor University, Stanfor, CA 94305, USA 2 Dept.

More information

Shift-map Image Registration

Shift-map Image Registration Shift-map Image Registration Svärm, Linus; Stranmark, Petter Unpublishe: 2010-01-01 Link to publication Citation for publishe version (APA): Svärm, L., & Stranmark, P. (2010). Shift-map Image Registration.

More information

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11:

Pairwise alignment using shortest path algorithms, Gunnar Klau, November 29, 2005, 11: airwise alignment using shortest path algorithms, Gunnar Klau, November 9,, : 3 3 airwise alignment using shortest path algorithms e will iscuss: it graph Dijkstra s algorithm algorithm (GDU) 3. References

More information

6.854J / J Advanced Algorithms Fall 2008

6.854J / J Advanced Algorithms Fall 2008 MIT OpenCourseWare http://ocw.mit.eu 6.854J / 18.415J Avance Algorithms Fall 2008 For inormation about citing these materials or our Terms o Use, visit: http://ocw.mit.eu/terms. 18.415/6.854 Avance Algorithms

More information

Questions? Post on piazza, or Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)!

Questions? Post on piazza, or  Radhika (radhika at eecs.berkeley) or Sameer (sa at berkeley)! EE122 Fall 2013 HW3 Instructions Recor your answers in a file calle hw3.pf. Make sure to write your name an SID at the top of your assignment. For each problem, clearly inicate your final answer, bol an

More information

On-path Cloudlet Pricing for Low Latency Application Provisioning

On-path Cloudlet Pricing for Low Latency Application Provisioning On-path Cloulet Pricing for Low Latency Application Provisioning Argyrios G. Tasiopoulos, Onur Ascigil, Ioannis Psaras, Stavros Toumpis, George Pavlou Dept. of Electronic an Electrical Engineering, University

More information

Comparison of Wireless Network Simulators with Multihop Wireless Network Testbed in Corridor Environment

Comparison of Wireless Network Simulators with Multihop Wireless Network Testbed in Corridor Environment Comparison of Wireless Network Simulators with Multihop Wireless Network Testbe in Corrior Environment Rabiullah Khattak, Anna Chaltseva, Laurynas Riliskis, Ulf Boin, an Evgeny Osipov Department of Computer

More information

Coupon Recalculation for the GPS Authentication Scheme

Coupon Recalculation for the GPS Authentication Scheme Coupon Recalculation for the GPS Authentication Scheme Georg Hofferek an Johannes Wolkerstorfer Graz University of Technology, Institute for Applie Information Processing an Communications (IAIK), Inffelgasse

More information

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides

Threshold Based Data Aggregation Algorithm To Detect Rainfall Induced Landslides Threshol Base Data Aggregation Algorithm To Detect Rainfall Inuce Lanslies Maneesha V. Ramesh P. V. Ushakumari Department of Computer Science Department of Mathematics Amrita School of Engineering Amrita

More information

Using Vector and Raster-Based Techniques in Categorical Map Generalization

Using Vector and Raster-Based Techniques in Categorical Map Generalization Thir ICA Workshop on Progress in Automate Map Generalization, Ottawa, 12-14 August 1999 1 Using Vector an Raster-Base Techniques in Categorical Map Generalization Beat Peter an Robert Weibel Department

More information

Lab work #8. Congestion control

Lab work #8. Congestion control TEORÍA DE REDES DE TELECOMUNICACIONES Grao en Ingeniería Telemática Grao en Ingeniería en Sistemas e Telecomunicación Curso 2015-2016 Lab work #8. Congestion control (1 session) Author: Pablo Pavón Mariño

More information

Principles of B-trees

Principles of B-trees CSE465, Fall 2009 February 25 1 Principles of B-trees Noes an binary search Anoe u has size size(u), keys k 1,..., k size(u) 1 chilren c 1,...,c size(u). Binary search property: for i = 1,..., size(u)

More information