Resource-efficient regular expression matching architecture for text analytics

Similar documents
TOKEN-BASED DICTIONARY PATTERN MATCHING FOR TEXT ANALYTICS. Raphael Polig, Kubilay Atasu, Christoph Hagleitner

Hardware-accelerated regular expression matching with overlap handling on IBM PowerEN processor

HARDWARE-ACCELERATED REGULAR EXPRESSION MATCHING FOR HIGH-THROUGHPUT TEXT ANALYTICS

Compiling Text Analytics Queries to FPGAs

Regular Expression Acceleration at Multiple Tens of Gb/s

Large-scale Multi-flow Regular Expression Matching on FPGA*

Automation Framework for Large-Scale Regular Expression Matching on FPGA. Thilan Ganegedara, Yi-Hua E. Yang, Viktor K. Prasanna

2068 (I) Attempt all questions.

Design Once with Design Compiler FPGA

Question Bank. 10CS63:Compiler Design

Leveraging the Intel HyperFlex FPGA Architecture in Intel Stratix 10 Devices to Achieve Maximum Power Reduction

FPGA Implementation of Token-Based Clam AV Regex Virus Signatures with Early Detection

VASim: An Open Virtual Automata Simulator for Automata Processing Research

Hardware Implementation for Scalable Lookahead Regular Expression Detection

Dynamic Packet Fragmentation for Increased Virtual Channel Utilization in On-Chip Routers

EFFICIENT FAILURE PROCESSING ARCHITECTURE IN REGULAR EXPRESSION PROCESSOR

Developing Dynamic Profiling and Debugging Support in OpenCL for FPGAs

ECE1387 Exercise 3: Using the LegUp High-level Synthesis Framework

Highly Space Efficient Counters for Perl Compatible Regular Expressions in FPGAs

ReCPU: a Parallel and Pipelined Architecture for Regular Expression Matching

Configurable String Matching Hardware for Speeding up Intrusion Detection

Parallel graph traversal for FPGA

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Computer Science at Kent

Automata Computing. Mircea R. Stan UVA-ECE Center for Automata Processing (CAP) Co-Director Kevin Skadron, UVA-CS, CAP Director

Improving Signature Matching using Binary Decision Diagrams

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Achieving Lightweight Multicast in Asynchronous Networks-on-Chip Using Local Speculation

Lab 1: Using the LegUp High-level Synthesis Framework

Accelerating Pattern Searches with Hardware. Kevin Angstadt CS 6354: Graduate Architecture 7. April 2016

! Readings! ! Room-level, on-chip! vs.!

INTRODUCTION TO FPGA ARCHITECTURE

Centip3De: A 64-Core, 3D Stacked, Near-Threshold System

NETWORK INTRUSION DETECTION SYSTEMS ON FPGAS WITH ON-CHIP NETWORK INTERFACES

Regular expression matching with input compression: a hardware design for use within network intrusion detection systems

TPUTCACHE: HIGH-FREQUENCY, MULTI-WAY CACHE FOR HIGH-THROUGHPUT FPGA APPLICATIONS. Aaron Severance, Guy G.F. Lemieux

Center Extreme Scale CS Research

PL in the Broader Research Community

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

RAPID Programming of Pattern-Recognition Processors

Visualizing Data Flow and Control Signaling Inside the Microprocessor

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

New Approach to Unstructured Data

Delay Time Analysis of Reconfigurable. Firewall Unit

Implementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm

Gujarat Technological University Sankalchand Patel College of Engineering, Visnagar B.E. Semester VII (CE) July-Nov Compiler Design (170701)

Park Sung Chul. AE MentorGraphics Korea

Last lecture CMSC330. This lecture. Finite Automata: States. Finite Automata. Implementing Regular Expressions. Languages. Regular expressions

Advanced VLSI Design Prof. Virendra K. Singh Department of Electrical Engineering Indian Institute of Technology Bombay

CSE450. Translation of Programming Languages. Lecture 20: Automata and Regular Expressions

Automatic compilation framework for Bloom filter based intrusion detection

An Operating System History of Operating Systems. Operating Systems. Autumn CS4023

General Overview of Compiler

8. Remote System Upgrades with Stratix II and Stratix II GX Devices

Comparing and Contrasting different Approaches of Code Generator(Enum,Map-Like,If-else,Graph)

Modular Multi-ported SRAMbased. Ameer M.S. Abdelhadi Guy G.F. Lemieux

A Methodology for Energy Efficient FPGA Designs Using Malleable Algorithms

Regular expression matching with input compression: a hardware design for use within network intrusion detection systems.

My First FPGA for Altera DE2-115 Board

Performance Issues and Query Optimization in Monet

Don t Forget Memories A Case Study Redesigning a Pattern Counting ASIC Circuit for FPGAs

Design of Adaptive Communication Channel Buffers for Low-Power Area- Efficient Network-on. on-chip Architecture

COE 561 Digital System Design & Synthesis Introduction

UNIT III & IV. Bottom up parsing

FPGA Design, Implementation and Analysis of Trigonometric Generators using Radix 4 CORDIC Algorithm

Compiler Optimisation

Torben./Egidius Mogensen. Introduction. to Compiler Design. ^ Springer

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

AN 831: Intel FPGA SDK for OpenCL

An FPGA Architecture Supporting Dynamically-Controlled Power Gating

Monday, August 26, 13. Scanners

CS 432 Fall Mike Lam, Professor. Finite Automata Conversions and Lexing

Networks-on-Chip Router: Configuration and Implementation

THE UNIVERSITY OF CHICAGO EFFCLIP + UAP: UNIFIED, EFFICIENT REPRESENTATION AND ARCHITECTURE FOR AUTOMATA PROCESSING A DISSERTATION SUBMITTED TO

Wednesday, September 3, 14. Scanners

CS2 Practical 2 CS2Ah

Lecture 5: Instruction Pipelining. Pipeline hazards. Sequential execution of an N-stage task: N Task 2

Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

Power Efficient Solutions w/ FPGAs. Bill Jenkins Altera Sr. Product Specialist for Programming Language Solutions

IP Forwarding. CSU CS557, Spring 2018 Instructor: Lorenzo De Carli

CSE302: Compiler Design

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center

The Nios II Family of Configurable Soft-core Processors

CHAPTER 1 INTRODUCTION

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

FPGAs Provide Reconfigurable DSP Solutions

High-Speed NAND Flash

Verilog for High Performance

Review. Pat Morin COMP 3002

INSTITUTE OF AERONAUTICAL ENGINEERING

CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

ALTDQ_DQS2 Megafunction User Guide

Intel Xeon with FPGA IP Asynchronous Core Cache Interface (CCI-P) Shim

Programmable Logic Devices FPGA Architectures II CMPE 415. Overview This set of notes introduces many of the features available in the FPGAs of today.

High-Performance Integer Factoring with Reconfigurable Devices

Utilizing Parallelism of TMR to Enhance Power Efficiency of Reliable ASIC Designs

Supporting Multithreading in Configurable Soft Processor Cores

Reconfigurable Multicore Server Processors for Low Power Operation

Transcription:

Resource-efficient regular expression matching architecture for text analytics Kubilay Atasu IBM Research - Zurich Presented at ASAP 2014

SystemT: an algebraic approach to declarative information extraction distill structured data from unstructured and semi-structured text exploit the extracted data in your applications For years, Microsoft Corporation CEO Bill Gates was against open source. But today he appears to have changed his mind. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access. Richard Stallman, founder of the Free Software Foundation, countered saying Annotations Name Title Organization Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman Founder Free Soft.. (from Cohen s IE tutorial, 2003) 2

A simple SystemT information extraction rule Find the names (regex) that are at most 20 chars after a title (dict.) Founder...Bill Gates dict. match regex match start offset end offset start offset end offset at most 20 chars start offset end offset result 3

Finding leftmost regular expression matches Assume that we are searching for the regex.*(a aa aaaa) in the input string aaaa Find the regex match with the smallest start offset value at each end offset position The leftmost maches are marked using solid lines 4

Regex matching: background Consider the regex.*(a b aa aba) Can be transformed into NFA/DFA NFA DFA Traditional architectures do not support start offset reporting & leftmost matching: Reconfigurable NFAs (Sidhu FCCM 2001, Bispo FPT 2006, Yang ANCS 2008) Programmable DFAs (Smith SIGCOMM 2008, Van Lunteren MICRO 2012) 5

Previous solution: network of state machines active_reg state_reg start_offset_reg -Dimension of the network (# state machines) is statically computed for each regex -DRAWBACK: replication of the state transition logic K. Atasu, R. Polig, C.Hagleitner, F. R. Reiss: Hardware-accelerated regular expression matching for high-throughput text analytics. FPL 2013: 1-7 6

Contributions of this work 1. Extending Sidhu and Prasanna s NFA architecture to support start offset reporting 2. A graph coloring based register clustering method to minimize the register usage 3. An efficient leftmost match computation method without using offset comparisons NFA Sidhu and Prasanna s NFA Architecture 7

A straightforward extension of Sidhu & Prasanna s architecture Add a start offset register to each NFA state offset_reg[0] = value of current offset position DRAWBACK: redundant start offset registers offset_reg [0] offset_reg [1] offset_reg [3] offset_reg [4] offset_reg [2] NFA Baseline Architecture 8

Clustering offset registers NFA DFA Build a conflict graph and apply graph coloring States with the same color can share registers 9

Leftmost match computation Assume that state 0 and state 1 are active and the current input is a We have to compute offset_reg[2] = MIN(offset_reg[0], offset_reg[1]) offset_reg [0] offset_reg [1] offset_reg [3] offset_reg [4] offset_reg [2] NFA Baseline Architecture 10

Leftmost match computation without offset comparisons (1) Assume that state K has M incoming state transitions We have to compute the minimum of M offset values offset_reg [1] offset_reg [2] offset_reg [0] offset_reg [M-1] K tree-based implementation (long latency) fully parallel implementation (expensive) offset_reg [K] Better solution: each state keeps track of the states that are activated earlier -Define an N N bit matrix: earlier[i,j]=1 if state j is activated before state i 11

Leftmost match computation without offset comparisons (2) offset_reg[0] = value of the current offset pointer, earlier[0, :] = active_reg Assume that state 1 is active (i.e., earlier[0,1] = 1), and the input is a we have two transitions into state 2: from state 0 and from state 1 since earlier[0,1] = 1, we choose the start offset provided by state 1 due to transitions 0 1 and 1 2, earlier[1,2]=1 in the next cycle offset_reg [0] offset_reg [1] tr[ 0,2] ( tr[1,2] earlier [0,1]) 2 tr[ 1,2] ( tr[0,2] earlier [1,0]) offset_reg [2] 12

Experiments (text analytics regexs) Altera Stratix IV GX530KH40C2, Altera Quartus II V11 tools 32-bit start offset registers, 250 MHz target clock frequency NFA representation: Follow Automata with character classes 13

Experiments (L7 filter regexs) Altera Stratix IV GX530KH40C2, Altera Quartus II V11 tools 32-bit start offset registers, 250 MHz target clock frequency NFA representation: Follow Automata with character classes 14

Summary & future work Support for start offset reporting and leftmost matching without replicating the state transition logic without using redundant offset registers without using expensive offset comparison > threefold reduction in the logic resource usage > 1.25-fold improvement in the clock frequency < 8.6-fold overhead w.r.t. Sidhu & Prasanna s architecture while using 32-bit offset registers Our current and future work includes design of intermediate fabrics for fast compilation analysis and optimization of the energy consumption 15

Resource-efficient regular expression matching architecture for text analytics Kubilay Atasu IBM Research - Zurich Presented at ASAP 2014