ARCH This work ws supported y: The Europen Reserh Counil, The Isreli Centers of Reserh Exellene, The Neptune Consortium, nd Ntionl Siene Foundtion wrd CNS-119748
Outline Motivtion Bkground Regulr Expression Mthing DPI over Compressed HTTP ARCH Input-Depth Clultion Experiment Additionl usges for Input-Depth 2
Deep Pket Inspetion Proessing of the pket pylod Identify ourrenes from predefined ptterns: strings or regulr expressions Internet IP pket Pttern Firewll Pttern ->
Motivtion High volume of ompressed HTTP trffi Compressed y the server, deompressed y the rowser 84% of top 1000 sites, 60% of ll we sites DPI is the urrent ottlenek of middle-oxes ARCH First lgorithm to elerte regulr expression mthing of ompressed HTTP 4
Regulr Expression Mthing Non-Deterministi Finite Automton (NFA) spe effiient Deterministi Finite Automton (DFA) time effiient Hyrid FA (CoNext 2007) spe/time effiieny Pttern: *d Zero or more ourrenes of the hrter NFA 2 0 1 4 5 6 Equivlent DFA 0 1 5
Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 6
Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 7
Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 8
Regulr Expression Mthing An NFA my hve multiple tive sttes A DFA will hve only one urrent stte An NFA ontins ɛ trnsitions Pttern: *d Input: NFA 2 0 1 4 5 6 Equivlent DFA 0 1 9
Regulr Expression Mthing The utomtons re equivlent Both will reh epting stte together Pttern: *d Input: d NFA 2 0 1 4 5 6 Equivlent DFA 0 1 10
Compressed HTTP Compressed HTTP is stndrd of HTTP 1.1 Minly uses GZIP nd DEFLATE Bsed on LZ77 (n dptive ompression) Plin Text: Compressed Text: Compression Algorithm: 1. Identify repeted strings 2. Reple eh string with the (distne, length) syntx. Further ompress the syntx using Huffmn Coding 11
DPI on Compressed HTTP An LZ77 pointer represents repeted string It is possile to skip snning most of it Borders must still e onsidered Existing works disuss mthing elertion ut re limited to string mthing (Infoom 2009) Trffi = Unompressed= e e m m d d e e f f e e { 7 d, e 7 f } e d d Pttern: *d 12
ARCH Upon enountering repeted string: 1. Sn the left order until Input-Depth() j o o is the urrent yte, j is its index inside the pointer Input-Depth numer of ytes tht n e prt of future mth 2. Skip internl pointer re. Sn the right order Trffi = Unompressed= e e m m d d e e f f e e { 7 d, e 7 f } e d d Pttern: *d Input-Depth=0 Input-Depth=1 Input-Depth= Input-Depth=2 j= j=0 j=2 j=1 0 1 1
ARCH ARCH is minly sed on Input-Depth Input-Depth(T) is the length of the shortest suffix of T in whih inspetion strting t S0 ends t S For string mthing, Input-Depth = DFA-Depth For regulr expression mthing it vries depends on oth the utomton nd the input 0 1 Pttern: *d Input = ed DFA-Depth = Input-Depth = 5 14
Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 0 1 4 5 6 Input = Input-Depth = 0 15
Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) 0 1 1 Pttern: *d 2 0 1 4 5 6 1 Input = Input-Depth = 1 16
Input-Depth for NFA 0 Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) 2 Pttern: *d 2 0 1 4 5 6 2 2 Input = Input-Depth = 2 17
Input-Depth for NFA 0 Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 1 4 5 6 Input = Input-Depth = 18
Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 4 0 1 4 5 6 Input = Input-Depth = 4 19
Input-Depth for NFA Algorithm for Ative Sttes NFA: Input-Depth prmeter for eh tive stte When stte is dded to the list of tive sttes: Input-Depth = predeessor s Input-Depth + 1 (leled trnsition) Input-Depth = predeessor s Input-Depth (epsilon trnsition) Totl Input-Depth = mx(input-depth[ativesttes]) Pttern: *d 2 0 5 0 1 4 5 6 Input = d Input-Depth = 5 20
Input-Depth for DFA NFA Input-Depth is ext A DFA trnsition my result in: Inresing the Input-Depth y one Deresing the Input-Depth y ny vlue (unlike NFA) For DFA we provide n upper ound: Simple nd Complex sttes Positive nd Negtive trnsitions 21
Simple nd Complex Sttes A simple stte S is stte for whih ll possile input strings tht upon sn from S0 terminte t S hve the sme length All other sttes re omplex Identified during the onstrution lgorithm Pttern: *d 0 1 22
Simple nd Complex Sttes A simple stte S is stte for whih ll possile input strings tht upon sn from S0 terminte t S hve the sme length All other sttes re omplex Identified during the onstrution lgorithm Pttern: *d 0 1 Complex sttes re mrked in red 2
Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Complex sttes re mrked in red 24
Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 0 25
Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 1 26
Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 2 27
Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 28
Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = Input-Depth = 4 29
Simple nd Complex Sttes Upon trversl: to simple stte Input-Depth = DFA-Depth to omplex stte Input-Depth += 1 Pttern: *d 0 1 Input = App. Input-Depth = 5 Atul Input-Depth = 1 0
Simple nd Complex Sttes Approximtion mintins orretness ut my impt performne It works well in prtie: Input-Depth is normlly low (vg. = 1.1) Most omplex sttes re t high depths (vg. > 5) In theory we n pproximte etter 1
Positive nd Negtive Trnsitions Input-Depth depends on oth the sttes nd the trnsition etween them We define two types of trnsitions: A positive trnsition inreses the Input-Depth y one A negtive trnsition dereses the Input-Depth y x 0 0 1 2
Positive nd Negtive Trnsitions During the DFA onstrution lgorithm determine: Trnsition Type (positive or negtive) Trnsition Input-Depth delt (for negtive trnsitions) Input = App. Input-Depth = Atul Input-Depth = 1-1 -2 0 1 Negtive trnsitions re dshed nd red
Experiment Rulesets from the Snort IPS 201 ompressed HTML pges from Alex top 500 glol sites 58MB in unompressed form nd 61.2MB in ompressed form Compred with simple seline lgorithm, whih does not perform ny yte skipping 4
Experimentl Results Automton Type Averge Skip Rte Averge Proessing Time Improvement Overhed ARCH-NFA 77.99% 77.21% 1% ARCH-DFA 77.69% 69.19% 11% Hyrid-FA 77.88% 69.41% 11% The overll proessing time of ARCH-NFA is 40 times longer thn ARCH-DFA The spe requirements of ARCH-NFA re 18 times smller thn those of ARCH-DFA 5
Additionl usges for Input-Depth Extrt the string tht reltes to mthed pttern without resnning the pket d? d? d? 0 1 Determine the numer of ytes tht should e stored to hndle ross-pket DPI d 6
Conlusion First generi frmework to elerte ny regulr expression mthing over ompressed trffi Signifint performne improvement ompred to plin sn: 70% fster Suitle for line rte DPI Input-Depth importnt to solve other prolem domins 7