BIG data applications such as stock exchanges, smart. FPGA Based Custom Accelerator Architecture Framework for Complex Event Processing

Size: px
Start display at page:

Download "BIG data applications such as stock exchanges, smart. FPGA Based Custom Accelerator Architecture Framework for Complex Event Processing"

Transcription

1 FPGA Based Custom Accelerator Architecture Framework for Complex Event Processing Kavinga Upul Bandara Ekanayaka Department of Electronic and Telecommunication Engineering University of Moratuwa Sri Lanka Ajith Pasqual Department of Electronic and Telecommunication Engineering University of Moratuwa Sri Lanka Abstract Complex Event Processing (CEP) is an emerging field in high performance computing paradigm where real time (low latency) computing capability is expected over big data processing (high throughput). Significant number of software architectures have been developed to improve the throughput while reduce the latency but maintaining of the both aspects reaches the limits of the software platforms. This paper proposes a novel custom hardware accelerator architecture framework for CEP in big data domain. The proposed design improves the throughput performance more than 10 times over the software counterpart while keeping the latency value at less than 100 nano seconds. Same Structured Query Language(SQL) type queries used in reference software architecture were used to improve the flexibility. A query compiler based on the same query language grammar was designed to convert the queries in to Hardware Description Language(HDL) modules. All modules were parameterized to improve the scalability of the design. Those generated modules were synthesized through vendor tools and programmed in to Field Programmable Gate Array(FPGA) platform in order to implement the system. Proposed hardware architecture framework was verified using a sensor network data set of a football field and the results were compared with software counterpart to show the performance improvement. Keywords Complex Event Processing, Hardware Acceleration, FPGA, Big data. I. INTRODUCTION BIG data applications such as stock exchanges, smart grids, wireless sensor networks, RFID networks, social networks, etc. are having an essential need of processing a huge amount of serial data in real time. Complex Event Processing(CEP) is one of the most rapidly emerging field in data processing and it s a principal technology solution for moving large data processing in real time. A CEP identifies meaningful patterns, relationships & data abstractions among apparently unrelated events and fires an immediate response. Significant number of software architecture solutions with different algorithms have been developed over the past few years such as Aurora[1], PIPES[2], STREAM[3], Borealis[4] and S4 [5] as CEP engines to satisfy the high throughput and low latency(real time) processing requirement. Siddhi[6] is a recently published software architecture solution for CEP. It has used some novel concepts such as pipelining, multi threading in order to achieve above mentioned main targets /14/$31.00 c 2014 IEEE of a CEP which impressed authors of this paper to select it as the base architecture to follow. Siddhi shows a significant performance improvement over one of the well established CEP architectures ESPER[7] as a result of its novel architecture improvements over the traditional CEP architectures. But still its throughput performance parameters lie in few Megabits per second range. All of the above mentioned software architectures lack the ability to maintain and deliver the two main aspects of a CEP system at the same time due to the limitations of the software platforms such as CPU processing power, CPU-Memory data latency bottleneck. Therefore software CEP platforms will fail to satisfy the requirements of today s high performance computing application areas as they are expected to process data in near real time at least around 1 Gbps throughput range. A system consist with a hardware co-processor in line with the CPU can provide a promising capability to such individual software CEP systems with the hardware acceleration to improve the performance in terms of both latency and throughput. Main hardware co-processor design approaches of parallel processing and pipelining can be used very elegantly to address the high throughput and low latency requirements comparing with the architecture requirement of a CEP system. Hardware accelerated systems which are built on Field Programmable Gate Arrays(FPGAs) would show a great performance in stream processing and pattern matching applications as suggested in [8] and [9]. This research proposes a custom hardware acceleration architecture framework to enhance the performance parameters of individual software CEP systems. Here, the architecture and SQL based query language is designed based on the Siddhi software CEP platform. Moreover, the hardware design approach of this research would act as a generalized framework for CEP in hardware and will be able to function as a co-processor with any such CEP software application with minor changes in query compiler of the design which will be explained in section IV. A SQL based hardware approach for CEP is proposed in [10] and a C-based approach is proposed in [11] where both are much similar designs taking market trading application as the motivation example. Both of them directly process the data streams from network port and hence achieved a throughput of 20Gbps. A Hardware design of a CEP with

2 a query compiler is proposed in [12] which inspired authors to use the query compiler approach and [13] has proposed a pattern matching architecture design in hardware by showcasing a detailed overview of the advantages of using Nondeterministic Finite Automata(NFA) design approaches for the pattern recognizing state machines. That inspiration makes authors to use NFA architecture design methodologies in a proper way to implement the highly scalable and generalized sequence, pattern matching modules. Both of later papers were able to achieve a data throughput of 1Gbps each. All of the above identified hardware approaches of CEP designs work as individual processors where the gap with software platform is little bit high. This paper proposes a novel approach of hardware acceleration for CEP as hardware coprocessor which works in line with software platform by using a high speed PCI-Express communication link in between two architectures. The novel design approach increases the flexibility of the hardware design than earlier approaches by functioning as a hardware API to the software platform while enhancing the throughput than in individual software platform. This idea provides the ability to move parts of the whole design back and forth among hardware and software platforms with having the concerns of trade-off between flexibility and the efficiency of the design. Rest of the paper is organized as follows. Modeling of the custom hardware accelerator system with the basic building blocks and digital system design theories is detailed by the section II while the overall system architecture is explained at the section III. Section IV gives a description about the query compiler. An evaluating example for the system is discussed in section V and the results are compared with software counterpart in section VI. Finally the section VII concludes the paper. II. MODELING OF THE CUSTOM ARCHITECTURE COMPONENTS The main architecture of this research is built upon custom component models based on the basic query types of the reference CEP system. Any complex application in the CEP domain can be divided in to five main query types: select, filter, window & aggregation, pattern recognition and sequence recognition. These are defined in the Siddhi language specification [14] which is also the reference software platform for this research. Hence any complex CEP architecture can be built with the basic component models which increases the flexibility of the system. Since the whole architecture is based on the component models of the basic query types, the flexibility of the hardware accelerated CEP architecture would be much closer to the software counterpart of the system. This particular design approach gives the novelty of this research as it works as a hardware co-processor in line with the existing software based system architecture.therefore any system can be designed in such a way that part of the system is in hardware and the other part is in software platform where this partitioning would depend on the flexibility and efficiency concerns of the particular application. One basic component model may be a part of the architecture of another component model. Each and every input event is always collected to an input buffer register of event data length. The length of the input buffer register is decided by the input stream event definition provided by the define query. The output event is also collected to an output buffer register of output event length which is decided based on the mentioned output data fields in the select query. A. Selector Modeling The selector module would be the most basic design component of the architecture of this research because even all other basic component models use it as a sub part of their systems. This component model is built exactly based on the select query type[14]. As depicted by the Fig. 1, the selector module construction is achieved by a simple hardwired connection of the selected output data fields in between input buffer register and the output buffer register. The functionality of this module is to select some particular data fields among all input data fields at the input buffer register and copy them in to the output buffer register. Fig. 1: Selector module an advanced selector module has to have hardwired connections from aggregator outputs to output register as well as from stored registers of pattern and sequence filters to output register other than basic hardwired connections from input register to the output register. In any case a single clock cycle would be enough to transfer data between input and output registers according to the hardwired architecture and it can be operated with a higher clock frequency as it does not contain any complex combinational circuit part. This basic architecture depicts the advantage of a simple hardware parallelism technique to increase the throughput as well as maintain the latency at a very low level. B. Filter Modeling Next most basic module type would be filter module and that type is based on the filter query type in the reference query language[14]. Fig. 2 shows the hardware implementation

3 of a filter module. Here, one or more data fields are being filtered according to given conditions. Input register is being partitioned in to registers of separate data types. The required data fields to be filtered are sent through set of filters which are designed similarly but differed only based on the comparing operator type. Six operators are supported in the filter design as follows. less than(<), less than or equal(<=), greater than(>), greater than or equal(>=), equal(==) and not equal(!=). Each of these filter modules are consist with two inputs, the input register filtering data field value and the constant value to be compared with and the comparing operator combinational logic. Fig. 2: Filter module An advanced implementation of such a filter module has comparing input of stored filter output of earlier event instead of the constant comparator in normal filter module. These type of filters are used in pattern & sequence matching modeling types. The single bit outputs of each filter is being sent through set of AND or OR gates depending on their combination defined in the filter query. The whole process consists of a combinational circuit which includes only few steps of gates. Since all the filters function independently, they operate parallel in the circuit which emulate the parallelism benefits obtained by the hardware acceleration. Therefore the filter module can also operate at a higher frequency and within just one clock cycle. The selector module is included in the main filter module as a sub module which functions in parallel with the internal filter modules. The selection happens only if the final output of the filter combination comes true and this total filter output is considered as the event detection signal. Fig. 3 shows the latency advancement obtained in the hardware design comparing to the traditional sequential approach. If the design follows the sequential approach, it has to wait for several clock cycles to pass the data through all filters and combinational logic to get the result. But in parallel approach all these will be done within one clock cycle even though the clock latency is increased by a very little amount. C. Window & Aggregator Modeling The window + aggregating module is built based on the Window + Aggregator query type[14]. Fig. 4 shows an architecture model of window + aggregation module. Both of (a) Sequantial approach timing (b) Parallel approach timing Fig. 3: Latency advancement of parallel filter modeling above modeled modules are used here as sub modules to work parallel and pipelined with window & aggregation modules. First, some basic filters are applied on input buffer register on each event to filter and select events to be inserted to window. If whole filter conditions are satisfied, then the particular input event is being inserted in to a First In First Out(FIFO) window memory consists of input event length sized words. The window size is decided based on two different parameters, time value or a length value. An event in a time value will be expired after a defined time value calculated from the time stamp of the arrival time of that particular event to the window. A length window size is decided based on a defined length or number of events inside that window. In both cases some aggregation function is applied on a defined data field of the events stored in the window continuously. A separate aggregator module is Fig. 4: Window and Aggregation module implemented to handle this functionality and all together five aggregation functions are supported as follows. sum, average, count, min and max. Aggregator issues outputs at every expired event or input event as defined in the reference query. Finally another filter module is applied on the aggregator output to

4 filter the final output according to a given condition. All these modules are arranged in a pipelined architecture and usual select module functions parallel with this pipeline to create the output event. In this module, aggregator function causes some considerable latency comparing with above basic modules. Therefore this module functions in comparatively low frequency but more than enough to maintain the throughput at a very higher level comparing with the software counterpart. The latency is also at a very low level as every step of the pipeline would only take one clock cycle each to process and apparently there are only few steps. D. Pattern Recognizer modeling One of the important modules in this architecture would be Pattern recognizer module. This module is designed based on Pattern query[14]. The Pattern Recognizer module is shown in Fig. 6. In this module, the target is to check a pattern of two or more simple conditional events. The conditions for each event in the pattern are checked by parallel set of filter modules which are discussed above in subsection II-B. Recognizing Fig. 5: Pattern recognizer state machine of the pattern based on the output signal of each condition checking filters is done by a Finite State Machine(FSM). The FSM for pattern module is highly scalable to considerable number of events easily since all the state changes are similar as depicted in Fig. 5. The states are changed one by one according to the filter condition outputs at each event and fire an event detection at the last event matching. Here, also the select module do its functionality parallel and it creates the output using the input buffer register as well as some of the stored events at filter matching conditions. According to the filter modules, they take only one clock cycle to output the condition and FSM takes another clock cycle to change the state. Therefore the pattern module also has the ability to function at a higher frequency with having a very low latency for the whole process. E. Sequence Recognizer modeling The last basic type of module is sequence recognizer and it s designed based on the sequence query type[14]. The abstract architecture is almost similar to that of pattern recognizer module but the sequence recognizing FSM is having a total different architecture than in earlier case. The sequence of events should match continuously as well as each an every event is built upon a regular expression among exactly one match, one or more match(+), zero or more match(*), zero or one match(?) and or between two events. Fig. 7 shows the generalized state transition for each regular expression case. The main FSM module is a combination of two or more of these in the defined order in the query. The design of FSM for sequence recognizing shows the scalability and generalization given to even a considerable complex design in this research. d d d d ^&^D Fig. 6: Pattern recognizer module Fig. 7: Sequence recognizer FSM states Both of above FSMs are built according to NFA architecture since the state explosion would be minimum over a DFA approach as clearly explained in [13]. The NFA architecture suits well for hardware platform as it supports parallel implementation methodologies which leads to enhance the throughput parameter. All of the above modules are highly parameterized to increase the generalization of the architecture. Therefore any type of basic query can be built using these modules by just changing the parameters in the top module of the HDL design. III. OVERALL SYSTEM ARCHITECTURE Any of the complex CEP system consist of one or more combination of the above discussed basic modules. Overall architecture of the custom accelerated architecture is shown in Fig. 8. Main architecture consists of a software CEP, hardware CEP and a PCIe communication link in between them. The PCIe communication link is built using a PCIe kernel driver at the software platform side and with a PCIe core at the hardware(fpga) side. This research has chosen PCIe as communication link because it s the only one that can afford a very high data transfer rate in Giga bits per second range in the current context. The software CEP system(already exist [6]) runs on a CPU based processor architecture while the hardware CEP

5 Fig. 8: System architecture system (finding of this research) runs on a FPGA platform. A complex CEP system can be divided in to separate CEP engines cascaded together to form the whole system to increase the flexibility of the system while reducing the complexity which will be further explained in section V with an evaluation example. That particular design approach allows the total CEP system to be partitioned in to software components and hardware components which enhances the flexibility and scalability of the design. The software CEP system partition sees the hardware co-processor partition as an API which is connected through high speed PCIe link. The whole system is a PC master design where software system writes data to hardware system to create the input data stream to hardware CEP and reads back the output stream coming back from hardware. The data writes and reads are done through a kernel driver module specially designed to handle PCIe protocol which communicates with a PCIe IP core implemented at the hardware(fpga) side. A receiver and transmitter engine handles the data transmission in between the PCIe core and the hardware CEP application. The reconfigurability of the custom accelerated hardware system is achieved by a fully parameterized module design at the HDL level. Therefore in any CEP application only the top module(cep engine) has to be designed while instantiating other basic modules with required parameters. Architecture design approach of this research can be used as an inspired solution to high performance Bigdata processing hardware architecture design in cloud computing domain. IV. QUERIES TO CIRCUIT PROCESS Fig. 9: Queries to circuit process The CEP system in the software platform is built with a software query compiler as stated in [6] which is a part of the software CEP application. This research also proposes a similar approach with a software query compiler to compile the queries and identify the basic building blocks and extract the parameters inside them. Fig. 9 shows the design process. The query compiler application generates the top module of the hardware CEP engine according to a predefined mapping model between basic queries, their equivalent hardware modules and the connections in between them. This query compiler design approach is inspired by a similar design in [12]. The query compiler of this research needs to generate only the top module of the design as all other basic modules are designed in fully parameterized modeling which depicts the generalization and the flexibility of the research outcome. Final user of the application sees the same query abstraction level similar to that of the software system. V. EVALUATING EXAMPLE Authors have used a real world application example to show the flexibility of the design through the applicability of the research to a real world example. The other aspect of using an evaluating example in this paper is to evaluate the actual performance of the design in this research. The evaluating example used in this paper is based on the DEBS grand challenge, application problem on CEP, described in the 7 th ACM International Conference on Distributed Event Based Systems. A. Query 1 - Running data analysis The goal of this query is to calculate the analysis of the running performance of each of the players currently participating in the game. This use case is implemented using CEP event sequences to detect whenever a player crossed a threshold of event speeds using a sequence recognizing query. Here, four filters are used, first and last event both shares same set of filters and two of them are combined in parallel in each case. The whole scenario can be explained using the sequence matching basic query type where filter and select query types are act as sub queries inside itself. B. Query 2 - Ball possession This query needs to calculate the time of ball possession by each player. Design of the query has been divided in to three parts in order to reduce the complexity of the design. Those three queries are designed separately using basic modules and then cascaded them together to form whole system. As depicted in Fig. 10, one CEP engine detects and output Hit at ball events while another detects Ball leave the ground event and both of them can function parallel since they act independently. Both outputs are sent to another CEP engine

6 Fig. 10: Ball Possession detection CEP architecture simultaneously called Ball Possession which detect the ball possession by a particular player. In this Hit at a ball query, the hit is identified by the distance between ball and the player. getdistance function is to calculate that distance and compared the output with constant value of This particular function is also possible to built in hardware as a user defined function. Implementation of this complex query shows a highly complex system implementation using basic modules and cascading them in a pipeline to built the whole system. It proves the flexibility and scalability of the system in to a great extent. VI. RESULTS The above discussed evaluating example is implemented on Xilinx Virtex6 ML605[15] for the hardware CEP part and the data stream handling software application, driver are implemented on a PC with Linux OS having Linux-Kernel The development of hardware modules are done using Xilinx ISE The communication link between PC and FPGA was a PCIe Gen.1 x8 link. Sensor data were stored in a file inside the PC and they were sent to the hardware CEP through PC software application after creating it as a data stream. The total data size was about 49.5 million events of 56 bytes each. The design of Running analysis query was able to process data for the query approximately in a rate about 1 million events per second. The required rate of processing by the challengers is 15,000 events per seconds and the software counterpart, siddhi was able to achieve approximately 100,000 events per second. Even with the basic communication link of PCIe (8 lane, gen 1), hardware design is 10 times faster. The hardware implementation of Ball possession query is achieved about 0.74 million events per second throughput which is also more than 7 times faster than its software counterpart, siddhi and far more ahead than the required data rate expected by the challenge. Both of the cases data rate is about 1Gbps and latency is only few nano seconds. VII. CONCLUSION This research has designed and implemented a hardware based CEP system with a custom accelerator design approach which is highly scalable and flexible comparing with existing approaches. The hardware designed CEP system is being able to work as a hardware co-processor in line with an existing software CEP system and share the same queries to implement the CEP architecture. The design idea of the research has been proved using a practical example which evaluates the high throughput, low latency performance far more ahead than the software system while showing the scalability and flexibility of the system which is very close to its software counterpart. Dynamic reconfigurability can be added to system which is still not supported by the current design. The design of this research has used fixed size string variable due to the HDL language hardware design limitations. Authors hope to add the dynamic reconfigurability to the system as well as a method to use variable length string variables as future work. In addition to that the system performance can be further improved by using a PCIe communication link with higher performance than the used one. This design approach broadens the areas to develop a system for Bigdata processing in cloud computing architectures with hardware acceleration support while keeping the system flexibility very much closer to that of existing software platforms. REFERENCES [1] Daniel J. Abadi, Don Carney, Ugur etintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik, Aurora: a new model and architecture for data stream management. The VLDB Journal,12, 2 (August 2003), [2] M. Cammert, C. Heinz, et al. Pipes: A multi-threaded publishsubscribe architecture for continuous queries over streaming data sources. Technical report, Citeseer,2003. [3] D. Arvind, A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom. STREAM: the stanford stream data manager. IEEE Data Engineering Bulletin,2003. [4] D. Abadi, Y. Ahmad, et al. The design of the borealis stream processing engine. Second Biennial Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, CA,pages , [5] Neumeyer, L.; Robbins, B.; Nair, A.; Kesari, A., S4: Distributed Stream Computing Platform, Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, vol., no., pp.170,177, Dec [6] Suhothayan, Sriskandarajah and Gajasinghe, Kasun and Narangoda, Isuru Loku and Chaturanga, Subash and Perera, Srinath and Nanayakkara, Vishaka. Siddhi: a second look at complex event processing architectures. SC-GCE, ACM, page 43-50,2011. [7] EsperTech - event stream intelligence. [Online] [8] Sidhu, R.; Prasanna, V.K., Fast Regular Expression Matching Using FPGAs,Field-Programmable Custom Computing Machines, FCCM 01. The 9th Annual IEEE Symposium on,vol., no., pp.227,238, March April [9] Woods, L.; Teubner, J.; Alonso, G., Real-time pattern matching with FPGAs, Data Engineering (ICDE), 2011 IEEE 27th International Conference on,vol., no., pp.1292,1295, April [10] Takenaka, T.; Takagi, M.; Inoue, H., A scalable complex event processing framework for combination of SQL-based continuous queries and C/C++ functions, Field Programmable Logic and Applications (FPL), nd International Conference on, vol., no., pp.237,242, Aug [11] Inoue, H.; Takenaka, T.; Motomura, M., 20Gbps C-Based Complex Event Processing, Field Programmable Logic and Applications (FPL), 2011 International Conference on, vol., no., pp.97,102, 5-7 Sept [12] Rene Mueller, Jens Teubner, and Gustavo Alonso Streams on wires: a query compiler for FPGAs. Proc. VLDB Endow. 2, 1 (August 2009), [13] Louis Woods, Jens Teubner, and Gustavo Alonso Complex event detection at wire speed with FPGAs.Proc. VLDB Endow. 3,1-2 (September 2010), [14] Siddhi Language Specification. [Online] [15] Xilinx Virtex6 ML605. [Online]

ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS

ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS ACCELERATED COMPLEX EVENT PROCESSING WITH GRAPHICS PROCESSING UNITS Prabodha Srimal Rodrigo Registration No. : 138230V Degree of Master of Science Department of Computer Science & Engineering University

More information

An Efficient and Scalable Implementation of Sliding-Window Aggregate Operator on FPGA

An Efficient and Scalable Implementation of Sliding-Window Aggregate Operator on FPGA An Efficient and Scalable Implementation of Sliding-Window Aggregate Operator on FPGA Yasin Oge, Masato Yoshimi, Takefumi Miyoshi, Hideyuki Kawashima, Hidetsugu Irie, and Tsutomu Yoshinaga Graduate School

More information

StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments

StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments StreamGlobe Adaptive Query Processing and Optimization in Streaming P2P Environments A. Kemper, R. Kuntschke, and B. Stegmaier TU München Fakultät für Informatik Lehrstuhl III: Datenbanksysteme http://www-db.in.tum.de/research/projects/streamglobe

More information

UDP Packet Monitoring with Stanford Data Stream Manager

UDP Packet Monitoring with Stanford Data Stream Manager UDP Packet Monitoring with Stanford Data Stream Manager Nadeem Akhtar #1, Faridul Haque Siddiqui #2 # Department of Computer Engineering, Aligarh Muslim University Aligarh, India 1 nadeemalakhtar@gmail.com

More information

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture

A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture A Comparison of Memory Usage and CPU Utilization in Column-Based Database Architecture vs. Row-Based Database Architecture By Gaurav Sheoran 9-Dec-08 Abstract Most of the current enterprise data-warehouses

More information

StreamOLAP. Salman Ahmed SHAIKH. Cost-based Optimization of Stream OLAP. DBSJ Japanese Journal Vol. 14-J, Article No.

StreamOLAP. Salman Ahmed SHAIKH. Cost-based Optimization of Stream OLAP. DBSJ Japanese Journal Vol. 14-J, Article No. StreamOLAP Cost-based Optimization of Stream OLAP Salman Ahmed SHAIKH Kosuke NAKABASAMI Hiroyuki KITAGAWA Salman Ahmed SHAIKH Toshiyuki AMAGASA (SPE) OLAP OLAP SPE SPE OLAP OLAP OLAP Due to the increase

More information

System Verification of Hardware Optimization Based on Edge Detection

System Verification of Hardware Optimization Based on Edge Detection Circuits and Systems, 2013, 4, 293-298 http://dx.doi.org/10.4236/cs.2013.43040 Published Online July 2013 (http://www.scirp.org/journal/cs) System Verification of Hardware Optimization Based on Edge Detection

More information

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA. Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y. Published in: Proceedings of the 2010 International Conference on Field-programmable

More information

High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg

High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg High-Performance Event Processing Bridging the Gap between Low Latency and High Throughput Bernhard Seeger University of Marburg common work with Nikolaus Glombiewski, Michael Körber, Marc Seidemann 1.

More information

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing

Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Staying FIT: Efficient Load Shedding Techniques for Distributed Stream Processing Nesime Tatbul Uğur Çetintemel Stan Zdonik Talk Outline Problem Introduction Approach Overview Advance Planning with an

More information

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS)

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) The University of Akron IdeaExchange@UAkron Mechanical Engineering Faculty Research Mechanical Engineering Department 2008 High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) Ajay

More information

High Level Synthesis with Stream Query to C Parser:

High Level Synthesis with Stream Query to C Parser: R5-4 SASIMI 2013 Proceedings High Level Synthesis with Stream Query to C Parser: Eliminating Hardware Development Difficulties for Software Developers Eric Shun Fukuda Takashi Takenaka Hiroaki Inoue Hideyuki

More information

class 17 updates prof. Stratos Idreos

class 17 updates prof. Stratos Idreos class 17 updates prof. Stratos Idreos HTTP://DASLAB.SEAS.HARVARD.EDU/CLASSES/CS165/ early/late tuple reconstruction, tuple-at-a-time, vectorized or bulk processing, intermediates format, pushing selects

More information

Jakub Cabal et al. CESNET

Jakub Cabal et al. CESNET CONFIGURABLE FPGA PACKET PARSER FOR TERABIT NETWORKS WITH GUARANTEED WIRE- SPEED THROUGHPUT Jakub Cabal et al. CESNET 2018/02/27 FPGA, Monterey, USA Packet parsing INTRODUCTION It is among basic operations

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

LegUp: Accelerating Memcached on Cloud FPGAs

LegUp: Accelerating Memcached on Cloud FPGAs 0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are

More information

FPGP: Graph Processing Framework on FPGA

FPGP: Graph Processing Framework on FPGA FPGP: Graph Processing Framework on FPGA Guohao DAI, Yuze CHI, Yu WANG, Huazhong YANG E.E. Dept., TNLIST, Tsinghua University dgh14@mails.tsinghua.edu.cn 1 Big graph is widely used Big graph is widely

More information

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research 1 The world s most valuable resource Data is everywhere! May. 2017 Values from Data! Need infrastructures for

More information

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM

HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM HARDWARE IMPLEMENTATION OF LOSSLESS LZMA DATA COMPRESSION ALGORITHM Parekar P. M. 1, Thakare S. S. 2 1,2 Department of Electronics and Telecommunication Engineering, Amravati University Government College

More information

Multi-Model Based Optimization for Stream Query Processing

Multi-Model Based Optimization for Stream Query Processing Multi-Model Based Optimization for Stream Query Processing Ying Liu and Beth Plale Computer Science Department Indiana University {yingliu, plale}@cs.indiana.edu Abstract With recent explosive growth of

More information

An Efficient Execution Scheme for Designated Event-based Stream Processing

An Efficient Execution Scheme for Designated Event-based Stream Processing DEIM Forum 2014 D3-2 An Efficient Execution Scheme for Designated Event-based Stream Processing Yan Wang and Hiroyuki Kitagawa Graduate School of Systems and Information Engineering, University of Tsukuba

More information

An Efficient Implementation of LZW Compression in the FPGA

An Efficient Implementation of LZW Compression in the FPGA An Efficient Implementation of LZW Compression in the FPGA Xin Zhou, Yasuaki Ito and Koji Nakano Department of Information Engineering, Hiroshima University Kagamiyama 1-4-1, Higashi-Hiroshima, 739-8527

More information

An FPGA-based smart database storage engine

An FPGA-based smart database storage engine Research Collection Master Thesis An FPGA-based smart database storage engine Author(s): Nie, Chongling Publication Date: 2012 Permanent Link: https://doi.org/10.3929/ethz-a-007554630 Rights / License:

More information

Design and Evaluation of an FPGA-based Query Accelerator for Data Streams

Design and Evaluation of an FPGA-based Query Accelerator for Data Streams Design and Evaluation of an FPGA-based Query Accelerator for Data Streams by Yasin Oge A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Engineering

More information

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Database Acceleration Solution Using FPGAs and Integrated Flash Storage Database Acceleration Solution Using FPGAs and Integrated Flash Storage HK Verma, Xilinx Inc. August 2017 1 FPGA Analytics in Flash Storage System In-memory or Flash storage based DB reduce disk access

More information

Load Shedding for Aggregation Queries over Data Streams

Load Shedding for Aggregation Queries over Data Streams Load Shedding for Aggregation Queries over Data Streams Brian Babcock Mayur Datar Rajeev Motwani Department of Computer Science Stanford University, Stanford, CA 94305 {babcock, datar, rajeev}@cs.stanford.edu

More information

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE Anni Benitta.M #1 and Felcy Jeba Malar.M *2 1# Centre for excellence in VLSI Design, ECE, KCG College of Technology, Chennai, Tamilnadu

More information

High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs

High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2894-2900 ISSN: 2249-6645 High Speed Fault Injection Tool (FITO) Implemented With VHDL on FPGA For Testing Fault Tolerant Designs M. Reddy Sekhar Reddy, R.Sudheer Babu

More information

New Approach for Affine Combination of A New Architecture of RISC cum CISC Processor

New Approach for Affine Combination of A New Architecture of RISC cum CISC Processor Volume 2 Issue 1 March 2014 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Website: www.ijmemr.org New Approach for Affine Combination of A New Architecture

More information

FPGA Implementation of I2C and SPI Protocols using VHDL

FPGA Implementation of I2C and SPI Protocols using VHDL FPGA Implementation of I2C and SPI Protocols using VHDL Satish M Ghuse 1, Prof. Surendra K. Waghmare 2 1, 2 Department of ENTC 1, 2 SPPU/G.H.Raisoni College of Engineering and Management, Pune, Maharashtra/Zone,

More information

Dealing with Overload in Distributed Stream Processing Systems

Dealing with Overload in Distributed Stream Processing Systems Dealing with Overload in Distributed Stream Processing Systems Nesime Tatbul Stan Zdonik Brown University, Department of Computer Science, Providence, RI USA E-mail: {tatbul, sbz}@cs.brown.edu Abstract

More information

A Dynamic Attribute-Based Load Shedding Scheme for Data Stream Management Systems

A Dynamic Attribute-Based Load Shedding Scheme for Data Stream Management Systems Brigham Young University BYU ScholarsArchive All Faculty Publications 2007-07-01 A Dynamic Attribute-Based Load Shedding Scheme for Data Stream Management Systems Amit Ahuja Yiu-Kai D. Ng ng@cs.byu.edu

More information

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC Zoltan Baruch Computer Science Department, Technical University of Cluj-Napoca, 26-28, Bariţiu St., 3400 Cluj-Napoca,

More information

An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams

An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams R2-7 SASIMI 26 Proceedings An Overlay Architecture for FPGA-based Industrial Control Systems Designed with Functional Block Diagrams Taisei Segawa, Yuichiro Shibata, Yudai Shirakura, Kenichi Morimoto,

More information

An In-Kernel NOSQL Cache for Range Queries Using FPGA NIC

An In-Kernel NOSQL Cache for Range Queries Using FPGA NIC An In-Kernel NOSQL Cache for Range Queries Using FPGA NIC Korechika Tamura Dept. of ICS, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan Email: tamura@arc.ics.keio.ac.jp Hiroki Matsutani Dept.

More information

A SIMPLE 1-BYTE 1-CLOCK RC4 DESIGN AND ITS EFFICIENT IMPLEMENTATION IN FPGA COPROCESSOR FOR SECURED ETHERNET COMMUNICATION

A SIMPLE 1-BYTE 1-CLOCK RC4 DESIGN AND ITS EFFICIENT IMPLEMENTATION IN FPGA COPROCESSOR FOR SECURED ETHERNET COMMUNICATION A SIMPLE 1-BYTE 1-CLOCK RC4 DESIGN AND ITS EFFICIENT IMPLEMENTATION IN FPGA COPROCESSOR FOR SECURED ETHERNET COMMUNICATION Abstract In the field of cryptography till date the 1-byte in 1-clock is the best

More information

BlueDBM: An Appliance for Big Data Analytics*

BlueDBM: An Appliance for Big Data Analytics* BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting

More information

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach

Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Parallel Patterns for Window-based Stateful Operators on Data Streams: an Algorithmic Skeleton Approach Tiziano De Matteis, Gabriele Mencagli University of Pisa Italy INTRODUCTION The recent years have

More information

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMSs, with the aim of achieving

More information

Semantic Event Correlation Using Ontologies

Semantic Event Correlation Using Ontologies Semantic Event Correlation Using Ontologies Thomas Moser 1, Heinz Roth 2, Szabolcs Rozsnyai 3, Richard Mordinyi 1, and Stefan Biffl 1 1 Complex Systems Design & Engineering Lab, Vienna University of Technology

More information

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA

Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Dynamically Configurable Online Statistical Flow Feature Extractor on FPGA Da Tong, Viktor Prasanna Ming Hsieh Department of Electrical Engineering University of Southern California Email: {datong, prasanna}@usc.edu

More information

Exploiting Predicate-window Semantics over Data Streams

Exploiting Predicate-window Semantics over Data Streams Exploiting Predicate-window Semantics over Data Streams Thanaa M. Ghanem Walid G. Aref Ahmed K. Elmagarmid Department of Computer Sciences, Purdue University, West Lafayette, IN 47907-1398 {ghanemtm,aref,ake}@cs.purdue.edu

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Service-oriented Continuous Queries for Pervasive Systems

Service-oriented Continuous Queries for Pervasive Systems Service-oriented Continuous Queries for Pervasive s Yann Gripay Université de Lyon, INSA-Lyon, LIRIS UMR 5205 CNRS 7 avenue Jean Capelle F-69621 Villeurbanne, France yann.gripay@liris.cnrs.fr ABSTRACT

More information

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem

Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem I J C T A, 9(41) 2016, pp. 1235-1239 International Science Press Parallel HITS Algorithm Implemented Using HADOOP GIRAPH Framework to resolve Big Data Problem Hema Dubey *, Nilay Khare *, Alind Khare **

More information

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently

More information

VISIRI - Distributed Complex Event Processing System for Handling Large Number of Queries

VISIRI - Distributed Complex Event Processing System for Handling Large Number of Queries VISIRI - Distributed Complex Event Processing System for Handling Large Number of Queries Malinda Kumarasinghe, Geeth Tharanga, Lasitha Weerasinghe, Ujitha Wickramarathna, Surangika Ranathunga To cite

More information

Automatic compilation framework for Bloom filter based intrusion detection

Automatic compilation framework for Bloom filter based intrusion detection Automatic compilation framework for Bloom filter based intrusion detection Dinesh C Suresh, Zhi Guo*, Betul Buyukkurt and Walid A. Najjar Department of Computer Science and Engineering *Department of Electrical

More information

Efficient Self-Reconfigurable Implementations Using On-Chip Memory

Efficient Self-Reconfigurable Implementations Using On-Chip Memory 10th International Conference on Field Programmable Logic and Applications, August 2000. Efficient Self-Reconfigurable Implementations Using On-Chip Memory Sameer Wadhwa and Andreas Dandalis University

More information

Large-scale Multi-flow Regular Expression Matching on FPGA*

Large-scale Multi-flow Regular Expression Matching on FPGA* 212 IEEE 13th International Conference on High Performance Switching and Routing Large-scale Multi-flow Regular Expression Matching on FPGA* Yun Qu Ming Hsieh Dept. of Electrical Eng. University of Southern

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

HADP Talk BlueDBM: An appliance for Big Data Analytics

HADP Talk BlueDBM: An appliance for Big Data Analytics HADP Talk BlueDBM: An appliance for Big Data Analytics Sang-Woo Jun* Ming Liu* Sungjin Lee* Jamey Hicks+ John Ankcorn+ Myron King+ Shuotao Xu* Arvind* *MIT Computer Science and Artificial Intelligence

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

Measuring Performance of Complex Event Processing Systems

Measuring Performance of Complex Event Processing Systems Measuring Performance of Complex Event Processing Systems Torsten Grabs, Ming Lu Microsoft StreamInsight Microsoft Corp., Redmond, WA {torsteng, milu}@microsoft.com Agenda Motivation CEP systems and performance

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 11-18 www.iosrjen.org Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory S.Parkavi (1) And S.Bharath

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

"On the Capability and Achievable Performance of FPGAs for HPC Applications"

On the Capability and Achievable Performance of FPGAs for HPC Applications "On the Capability and Achievable Performance of FPGAs for HPC Applications" Wim Vanderbauwhede School of Computing Science, University of Glasgow, UK Or in other words "How Fast Can Those FPGA Thingies

More information

Design and Implementation of High Performance DDR3 SDRAM controller

Design and Implementation of High Performance DDR3 SDRAM controller Design and Implementation of High Performance DDR3 SDRAM controller Mrs. Komala M 1 Suvarna D 2 Dr K. R. Nataraj 3 Research Scholar PG Student(M.Tech) HOD, Dept. of ECE Jain University, Bangalore SJBIT,Bangalore

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators

Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Generation of Multigrid-based Numerical Solvers for FPGA Accelerators Christian Schmitt, Moritz Schmid, Frank Hannig, Jürgen Teich, Sebastian Kuckuk, Harald Köstler Hardware/Software Co-Design, System

More information

The Design of Sobel Edge Extraction System on FPGA

The Design of Sobel Edge Extraction System on FPGA The Design of Sobel Edge Extraction System on FPGA Yu ZHENG 1, * 1 School of software, Beijing University of technology, Beijing 100124, China; Abstract. Edge is a basic feature of an image, the purpose

More information

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems

A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems A Hardware Task-Graph Scheduler for Reconfigurable Multi-tasking Systems Abstract Reconfigurable hardware can be used to build a multitasking system where tasks are assigned to HW resources at run-time

More information

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double

More information

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE Assertion Based Verification of I2C Master Bus Controller with RTC Sagar T. D. M.Tech Student, VLSI Design and Embedded Systems BGS Institute of Technology,

More information

Is There A Tradeoff Between Programmability and Performance?

Is There A Tradeoff Between Programmability and Performance? Is There A Tradeoff Between Programmability and Performance? Robert Halstead Jason Villarreal Jacquard Computing, Inc. Roger Moussalli Walid Najjar Abstract While the computational power of Field Programmable

More information

what do we mean by event processing now, a checklist of capabilities in current event processing tools and applications,

what do we mean by event processing now, a checklist of capabilities in current event processing tools and applications, A View of the Current State of Event Processing what do we mean by event processing now, complex event processing, a checklist of capabilities in current event processing tools and applications, next steps

More information

Image Filtering with MapReduce in Pseudo-Distribution Mode

Image Filtering with MapReduce in Pseudo-Distribution Mode Image Filtering with MapReduce in Pseudo-Distribution Mode Tharindu D. Gamage, Jayathu G. Samarawickrama, Ranga Rodrigo and Ajith A. Pasqual Department of Electronic & Telecommunication Engineering, University

More information

SAMOA. A Platform for Mining Big Data Streams. Gianmarco De Francisci Morales Yahoo Labs

SAMOA. A Platform for Mining Big Data Streams. Gianmarco De Francisci Morales Yahoo Labs SAMOA! A Platform for Mining Big Data Streams Gianmarco De Francisci Morales Yahoo Labs Barcelona 1 gdfm@apache.org @gdfm7 Agenda Streams Applications, Model, Tools SAMOA Goal, Architecture, Avantages

More information

Memory-efficient and fast run-time reconfiguration of regularly structured designs

Memory-efficient and fast run-time reconfiguration of regularly structured designs Memory-efficient and fast run-time reconfiguration of regularly structured designs Brahim Al Farisi, Karel Heyse, Karel Bruneel and Dirk Stroobandt Ghent University, ELIS Department Sint-Pietersnieuwstraat

More information

A Framework for Rule Processing in Reconfigurable Network Systems

A Framework for Rule Processing in Reconfigurable Network Systems A Framework for Rule Processing in Reconfigurable Network Systems Michael Attig and John Lockwood Washington University in Saint Louis Applied Research Laboratory Department of Computer Science and Engineering

More information

Reconfigurable hardware for big data. Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland

Reconfigurable hardware for big data. Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland Reconfigurable hardware for big data Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland www.systems.ethz.ch Systems Group 7 faculty ~40 PhD ~8 postdocs Researching all

More information

U ur Çetintemel Department of Computer Science, Brown University, and StreamBase Systems, Inc.

U ur Çetintemel Department of Computer Science, Brown University, and StreamBase Systems, Inc. The 8 Requirements of Real-Time Stream Processing Michael Stonebraker Computer Science and Artificial Intelligence Laboratory, M.I.T., and StreamBase Systems, Inc. stonebraker@csail.mit.edu Uur Çetintemel

More information

AN OVERVIEW OF MICRON S

AN OVERVIEW OF MICRON S AN OVERVIEW OF MICRON S 1 Ke Wang, 1 Kevin Angstadt, 1 Chunkun Bo, 1 Nathan Brunelle, 1 Elaheh Sadredini, 2 Tommy Tracy II, 1 Jack Wadden, 2 Mircea Stan, 1 Kevin Skadron Center for Automata Computing 1

More information

Analytical and Experimental Evaluation of Stream-Based Join

Analytical and Experimental Evaluation of Stream-Based Join Analytical and Experimental Evaluation of Stream-Based Join Henry Kostowski Department of Computer Science, University of Massachusetts - Lowell Lowell, MA 01854 Email: hkostows@cs.uml.edu Kajal T. Claypool

More information

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box Volume 5 Issue 2 June 2017 ISSN: 2320-9984 (Online) International Journal of Modern Engineering & Management Research Website: www.ijmemr.org An Efficient FPGA Implementation of the Advanced Encryption

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

ISSN Vol.05, Issue.12, December-2017, Pages:

ISSN Vol.05, Issue.12, December-2017, Pages: ISSN 2322-0929 Vol.05, Issue.12, December-2017, Pages:1174-1178 www.ijvdcs.org Design of High Speed DDR3 SDRAM Controller NETHAGANI KAMALAKAR 1, G. RAMESH 2 1 PG Scholar, Khammam Institute of Technology

More information

Complex Event Processing in a High Transaction Enterprise POS System

Complex Event Processing in a High Transaction Enterprise POS System Complex Event Processing in a High Transaction Enterprise POS System Samuel Collins* Roy George** *InComm, Atlanta, GA 30303 **Department of Computer and Information Science, Clark Atlanta University,

More information

Design of memory efficient FIFO-based merge sorter

Design of memory efficient FIFO-based merge sorter LETTER IEICE Electronics Express, Vol.15, No.5, 1 11 Design of memory efficient FIFO-based merge sorter Youngil Kim a), Seungdo Choi, and Yong Ho Song Department of Electronics and Computer Engineering,

More information

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Hardware Design Environments Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University Outline Welcome to COE 405 Digital System Design Design Domains and Levels of Abstractions Synthesis

More information

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms Jingzhao Ou and Viktor K. Prasanna Department of Electrical Engineering, University of Southern California Los Angeles, California,

More information

Distributed In-GPU Data Cache for Document-Oriented Data Store via PCIe over 10Gbit Ethernet

Distributed In-GPU Data Cache for Document-Oriented Data Store via PCIe over 10Gbit Ethernet Distributed In-GPU Data Cache for Document-Oriented Data Store via PCIe over 10Gbit Ethernet Shin Morishima 1 and Hiroki Matsutani 1,2,3 1 Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Japan 223-8522

More information

Available online at ScienceDirect. Procedia Computer Science 98 (2016 )

Available online at   ScienceDirect. Procedia Computer Science 98 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 98 (2016 ) 515 521 The 3rd International Symposium on Emerging Information, Communication and Networks (EICN 2016) A Speculative

More information

Performance Analysis of Mobile Ad Hoc Network in the Presence of Wormhole Attack

Performance Analysis of Mobile Ad Hoc Network in the Presence of Wormhole Attack Performance Analysis of Mobile Ad Hoc Network in the Presence of Wormhole Attack F. Anne Jenefer & D. Vydeki E-mail : annejenefer@gmail.com, vydeki.d@srmeaswari.ac.in Abstract Mobile Ad-Hoc Network (MANET)

More information

TTCN-3 Test Architecture Based on Port-oriented Design and Assembly Language Implementation

TTCN-3 Test Architecture Based on Port-oriented Design and Assembly Language Implementation TTCN-3 Test Architecture Based on Port-oriented Design and Assembly Language Implementation Dihong Gong, Wireless Information Network Lab University of Science and Technology of China Hefei, China, 230027

More information

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Automation Framework for Large-Scale Regular Expression Matching on FPGA Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna Ming-Hsieh Department of Electrical Engineering University of Southern California

More information

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA

IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY FPGA Implementations of Tiny Mersenne Twister Guoping Wang Department of Engineering, Indiana University Purdue University Fort

More information

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool

Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool Synthesis of VHDL Code for FPGA Design Flow Using Xilinx PlanAhead Tool Md. Abdul Latif Sarker, Moon Ho Lee Division of Electronics & Information Engineering Chonbuk National University 664-14 1GA Dekjin-Dong

More information

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased platforms Damian Karwowski, Marek Domański Poznan University of Technology, Chair of Multimedia Telecommunications and Microelectronics

More information

A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION

A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION A FAST AND EFFICIENT HARDWARE TECHNIQUE FOR MEMORY ALLOCATION Fethullah Karabiber 1 Ahmet Sertbaş 1 Hasan Cam 2 1 Computer Engineering Department Engineering Faculty, Istanbul University 34320, Avcilar,

More information

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs

To Use or Not to Use: CPUs Cache Optimization Techniques on GPGPUs To Use or Not to Use: CPUs Optimization Techniques on GPGPUs D.R.V.L.B. Thambawita Department of Computer Science and Technology Uva Wellassa University Badulla, Sri Lanka Email: vlbthambawita@gmail.com

More information

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael

More information

A Low Power and High Speed MPSOC Architecture for Reconfigurable Application

A Low Power and High Speed MPSOC Architecture for Reconfigurable Application ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology Volume 3, Special Issue 3, March 2014 2014 International Conference

More information

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching Marco Paolieri, Ivano Bonesana ALaRI, Faculty of Informatics University of Lugano, Lugano, Switzerland {paolierm, bonesani}@alari.ch

More information

Design and Implementation of Buffer Loan Algorithm for BiNoC Router

Design and Implementation of Buffer Loan Algorithm for BiNoC Router Design and Implementation of Buffer Loan Algorithm for BiNoC Router Deepa S Dev Student, Department of Electronics and Communication, Sree Buddha College of Engineering, University of Kerala, Kerala, India

More information

Encryption / decryption system. Fig.1. Block diagram of Hummingbird

Encryption / decryption system. Fig.1. Block diagram of Hummingbird 801 Lightweight VLSI Design of Hybrid Hummingbird Cryptographic Algorithm NIKITA ARORA 1, YOGITA GIGRAS 2 12 Department of Computer Science, ITM University, Gurgaon, INDIA 1 nikita.0012@gmail.com, 2 gigras.yogita@gmail.com

More information

DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE

DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE DEVELOPMENT AND VERIFICATION OF AHB2APB BRIDGE PROTOCOL USING UVM TECHNIQUE N.G.N.PRASAD Assistant Professor K.I.E.T College, Korangi Abstract: The AMBA AHB is for high-performance, high clock frequency

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

Index Terms- Field Programmable Gate Array, Content Addressable memory, Intrusion Detection system.

Index Terms- Field Programmable Gate Array, Content Addressable memory, Intrusion Detection system. Dynamic Based Reconfigurable Content Addressable Memory for FastString Matching N.Manonmani 1, K.Suman 2, C.Udhayakumar 3 Dept of ECE, Sri Eshwar College of Engineering, Kinathukadavu, Coimbatore, India1

More information