Outline. Motivation Background ARCH. Experiment Additional usages for Input-Depth. Regular Expression Matching DPI over Compressed HTTP

Similar documents
CS 340, Fall 2016 Sep 29th Exam 1 Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

Paradigm 5. Data Structure. Suffix trees. What is a suffix tree? Suffix tree. Simple applications. Simple applications. Algorithms

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

Internet Routing. IP Packet Format. IP Fragmentation & Reassembly. Principles of Internet Routing. Computer Networks 9/29/2014.

CS 241 Week 4 Tutorial Solutions

Compilers Spring 2013 PRACTICE Midterm Exam

Finite Automata. Lecture 4 Sections Robb T. Koether. Hampden-Sydney College. Wed, Jan 21, 2015

Minimal Memory Abstractions

Fig.25: the Role of LEX

Introduction to Algebra

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Lesson 4.4. Euler Circuits and Paths. Explore This

TO REGULAR EXPRESSIONS

Pattern Matching. Pattern Matching. Pattern Matching. Review of Regular Expressions

Error Numbers of the Standard Function Block

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

Table-driven look-ahead lexical analysis

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 4: Lexical Analyzers 28 Jan 08

Presentation Martin Randers

Definition of Regular Expression

Inter-domain Routing

Lexical Analysis: Constructing a Scanner from Regular Expressions

CSCE 531, Spring 2017, Midterm Exam Answer Key

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Applied Databases. Sebastian Maneth. Lecture 13 Online Pattern Matching on Strings. University of Edinburgh - February 29th, 2016

Midterm Exam CSC October 2001

[SYLWAN., 158(6)]. ISI

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

Greedy Algorithm. Algorithm Fall Semester

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

Dr. D.M. Akbar Hussain

Reducing a DFA to a Minimal DFA

Lexical analysis, scanners. Construction of a scanner

The Math Learning Center PO Box 12929, Salem, Oregon Math Learning Center

CS 551 Computer Graphics. Hidden Surface Elimination. Z-Buffering. Basic idea: Hidden Surface Removal

What are suffix trees?

Topic 2: Lexing and Flexing

Calculus Differentiation

10.5 Graphing Quadratic Functions

Evaluating Regular Expression Matching Engines on Network and General Purpose Processors

Introduction to Compilers and Language Design Copyright (C) 2017 Douglas Thain. All rights reserved.

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Distance vector protocol

Compression Outline :Algorithms in the Real World. Lempel-Ziv Algorithms. LZ77: Sliding Window Lempel-Ziv

Regular Expression Matching with Multi-Strings and Intervals. Philip Bille Mikkel Thorup

Width and Bounding Box of Imprecise Points

Theory of Computation CSE 105

COSC 6374 Parallel Computation. Communication Performance Modeling (II) Edgar Gabriel Fall Overview. Impact of communication costs on Speedup

Lecture 8: Graph-theoretic problems (again)

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

2 Computing all Intersections of a Set of Segments Line Segment Intersection

Asurveyofpractical algorithms for suffix tree construction in external memory

CMPSC 470: Compiler Construction

COMMON FRACTIONS. or a / b = a b. , a is called the numerator, and b is called the denominator.

On String Matching in Chunked Texts

COSC 6374 Parallel Computation. Non-blocking Collective Operations. Edgar Gabriel Fall Overview

Section 5.3 : Finding Area Between Curves

ITEC2620 Introduction to Data Structures

COMBINATORIAL PATTERN MATCHING

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it.

6.045J/18.400J: Automata, Computability and Complexity. Quiz 2: Solutions. Please write your name in the upper corner of each page.

Towards Unifying Advances in Twig Join Algorithms

CS481: Bioinformatics Algorithms

Stack Manipulation. Other Issues. How about larger constants? Frame Pointer. PowerPC. Alternative Architectures

Distance Computation between Non-convex Polyhedra at Short Range Based on Discrete Voronoi Regions

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Deterministic. Finite Automata. And Regular Languages. Fall 2018 Costas Busch - RPI 1

String comparison by transposition networks

GENG2140 Modelling and Computer Analysis for Engineers

Final Exam Review F 06 M 236 Be sure to look over all of your tests, as well as over the activities you did in the activity book

FASTEST METHOD TO FIND ALTERNATIVE RE-ROUTE

Can Pythagoras Swim?

An Efficient Code Update Scheme for DSP Applications in Mobile Embedded Systems

LR Parsing, Part 2. Constructing Parse Tables. Need to Automatically Construct LR Parse Tables: Action and GOTO Table

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection

Slides for Data Mining by I. H. Witten and E. Frank

Augmenting Sux Trees, with Applications Yossi Matias 1?, S. Muthukrishnan 2??,Suleyman Cenk Ṣahinalp 3???, and Jacob Ziv 4 y 1 Tel-Aviv University, an

Convex Hull Algorithms. Convex hull: basic facts

COMP 423 lecture 11 Jan. 28, 2008

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

CMPUT101 Introduction to Computing - Summer 2002

Duality in linear interval equations

Image Compression based on Quadtree and Polynomial

Efficient Subscription Management in Content-based Networks

Tabu Split and Merge for the Simplification of Polygonal Curves

UT1553B BCRT True Dual-port Memory Interface

McAfee Web Gateway

ZZ - Advanced Math Review 2017

LINX MATRIX SWITCHERS FIRMWARE UPDATE INSTRUCTIONS FIRMWARE VERSION

Scanner Termination. Multi Character Lookahead. to its physical end. Most parsers require an end of file token. Lex and Jlex automatically create an

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

Distributed Systems Principles and Paradigms. Chapter 11: Distributed File Systems

CS 340, Fall 2014 Dec 11 th /13 th Final Exam Note: in all questions, the special symbol ɛ (epsilon) is used to indicate the empty string.

Algorithm Design (5) Text Search

SMALL SIZE EDGE-FED SIERPINSKI CARPET MICROSTRIP PATCH ANTENNAS

Photovoltaic Panel Modelling Using a Stochastic Approach in MATLAB &Simulink

A METHOD FOR CHARACTERIZATION OF THREE-PHASE UNBALANCED DIPS FROM RECORDED VOLTAGE WAVESHAPES

Distributed Systems Principles and Paradigms

Transcription:

ARCH This work ws supported y: The Europen Reserh Counil, The Isreli Centers of Reserh Exellene, The Neptune Consortium, nd Ntionl Siene Foundtion wrd CNS-119748

Outline Motivtion Bkground Regulr Expression Mthing DPI over Compressed HTTP ARCH Input-Depth Clultion Experiment Additionl usges for Input-Depth 2

Deep Pket Inspetion Proessing of the pket pylod Identify ourrenes from predefined ptterns: strings or regulr expressions Internet IP pket Pttern Firewll Pttern ->

Motivtion High volume of ompressed HTTP trffi Compressed y the server, deompressed y the rowser 84% of top 1000 sites, 60% of ll we sites DPI is the urrent ottlenek of middle-oxes ARCH First lgorithm to elerte regulr expression mthing of ompressed HTTP 4

Regulr Expression Mthing Non-Deterministi Finite Automton (NFA) spe effiient Deterministi Finite Automton (DFA) time effiient Hyrid FA (CoNext 2007) spe/time effiieny Pttern: *d Zero or more ourrenes of the hrter NFA 2 0 1 4 5 6 Equivlent DFA 0 1 5

Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 6

Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 7

Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 8

Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 9

Regulr Expression Mthing The utomtons re equivlent Both will reh epting stte together Pttern: *d Input: d NFA 2 0 1 4 5 6 Equivlent DFA 0 1 10

Compressed HTTP Compressed HTTP is stndrd of HTTP 1.1 Minly uses GZIP nd DEFLATE Bsed on LZ77 (n dptive ompression) Plin Text: Compressed Text: Compression Algorithm: 1. Identify repeted strings 2. Reple eh string with the (distne, length) syntx. Further ompress the syntx using Huffmn Coding 11

DPI on Compressed HTTP An LZ77 pointer represents repeted string It is possile to skip snning most of it Borders must still e onsidered Existing works disuss mthing elertion ut re limited to string mthing (Infoom 2009) Trffi = Unompressed= e e m m d d e e f f e e { 7 d, e 7 f } e d d Pttern: *d 12

ARCH Upon enountering repeted string: 1. Sn the left order until Input-Depth() j o o is the urrent yte, j is its index inside the pointer Input-Depth numer of ytes tht n e prt of future mth 2. Skip internl pointer re. Sn the right order Trffi = Unompressed= e e m m d d e e f f e e { 7 d, e 7 f } e d d Pttern: *d Input-Depth=0 Input-Depth=1 Input-Depth= Input-Depth=2 j= j=0 j=2 j=1 0 1 1

ARCH ARCH is minly sed on Input-Depth Input-Depth(T) is the length of the shortest suffix of T in whih inspetion strting t S0 ends t S For string mthing, Input-Depth = DFA-Depth For regulr expression mthing it vries depends on oth the utomton nd the input 0 1 Pttern: *d Input = ed DFA-Depth = Input-Depth = 5 14

Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 0 1 4 5 6 Input = Input-Depth = 0 15

Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) 0 1 1 Pttern: *d 2 0 1 4 5 6 1 Input = Input-Depth = 1 16

Input-Depth for NFA 0 Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) 2 Pttern: *d 2 0 1 4 5 6 2 2 Input = Input-Depth = 2 17

Input-Depth for NFA 0 Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 1 4 5 6 Input = Input-Depth = 18

Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 4 0 1 4 5 6 Input = Input-Depth = 4 19

Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 5 0 1 4 5 6 Input = d Input-Depth = 5 20

Input-Depth for DFA NFA Input-Depth is ext A DFA trnsition my result in: Inresing the Input-Depth y one Deresing the Input-Depth y ny vlue (unlike NFA) For DFA we provide n upper ound: Simple nd Complex sttes Positive nd Negtive trnsitions 21

Simple nd Complex Sttes A simple stte S is stte for whih ll possile input strings tht upon sn from S0 terminte t S hve the sme length All other sttes re omplex Identified during the onstrution lgorithm Pttern: *d 0 1 22

Simple nd Complex Sttes A simple stte S is stte for whih ll possile input strings tht upon sn from S0 terminte t S hve the sme length All other sttes re omplex Identified during the onstrution lgorithm Pttern: *d 0 1 Complex sttes re mrked in red 2

Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Complex sttes re mrked in red 24

Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 0 25

Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 1 26

Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 2 27

Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 28

Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 4 29

Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = App. Input-Depth = 5 Atul Input-Depth = 1 0

Simple nd Complex Sttes Approximtion mintins orretness ut my impt performne It works well in prtie: Input-Depth is normlly low (vg. = 1.1) Most omplex sttes re t high depths (vg. > 5) In theory we n pproximte etter 1

Positive nd Negtive Trnsitions Input-Depth depends on oth the sttes nd the trnsition etween them We define two types of trnsitions: A positive trnsition inreses the Input-Depth y one A negtive trnsition dereses the Input-Depth y x 0 0 1 2

Positive nd Negtive Trnsitions During the DFA onstrution lgorithm determine: Trnsition Type (positive or negtive) Trnsition Input-Depth delt (for negtive trnsitions) Input = App. Input-Depth = Atul Input-Depth = 1-1 -2 0 1 Negtive trnsitions re dshed nd red

Experiment Rulesets from the Snort IPS 201 ompressed HTML pges from Alex top 500 glol sites 58MB in unompressed form nd 61.2MB in ompressed form Compred with simple seline lgorithm, whih does not perform ny yte skipping 4

Experimentl Results Automton Type Averge Skip Rte Averge Proessing Time Improvement Overhed ARCH-NFA 77.99% 77.21% 1% ARCH-DFA 77.69% 69.19% 11% Hyrid-FA 77.88% 69.41% 11% The overll proessing time of ARCH-NFA is 40 times longer thn ARCH-DFA The spe requirements of ARCH-NFA re 18 times smller thn those of ARCH-DFA 5

Additionl usges for Input-Depth Extrt the string tht reltes to mthed pttern without resnning the pket d? d? d? 0 1 Determine the numer of ytes tht should e stored to hndle ross-pket DPI d 6

Conlusion First generi frmework to elerte ny regulr expression mthing over ompressed trffi Signifint performne improvement ompred to plin sn: 70% fster Suitle for line rte DPI Input-Depth importnt to solve other prolem domins 7