Design principles in parser design Glen Gibb Dept. of Electrical Engineering Advisor: Prof. Nick McKeown
Header parsing? 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Field Field Field Field 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Source Field Dest. Field Field Field Proto. 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Source Field Dest. Field Field Field Proto. Next Hop 1 2 3 4 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Field Field Dest. Source Dest. Field Field Field Proto. Next Hop 1 2 3 4 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Source Field Dest. Field Field Field Proto. Host X can talk to host Y Next Hop 1 2 3 4 Field Dest. Firewall ALLOW DENY ALLOW except via HTTP 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Source Field Dest. Field Field Field Proto. Host X can talk to host Y Next Hop 1 2 3 4 Field Dest. Firewall ALLOW DENY ALLOW except via HTTP 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Source Field Dest. Field Field Field Proto. Host X can talk to host Y Next Hop 1 2 3 4 Field Dest. Field Source Field Dest. Field Proto. Firewall ALLOW DENY ALLOW except via HTTP 2
Header parsing? Identify headers & extract fields A???? B???? C?? Field Source Field Dest. Field Field Field Proto. Host X can talk to host Y Next Hop 1 2 3 4 Field Dest. Field Source Field Dest. Field Proto. Firewall ALLOW DENY ALLOW except via HTTP > 1 billion packets / second New packet every ns 2
Almost no prior work 3
Leaping Multiple Headers in a Single Bound: Wire-Speed Parsing Using the Kangaroo System C. Kozanitis, J. Huber, S. Singh, & G. Varghese INFOCOM 2010 Programmable parser Parses multiple headers per cycle Receives all headers before parsing high latency 4
400 Gb/s Programmable Packet Parsing on a Single FPGA M. Attig & G. Brebner ANCS 2011 Language to describe header sequences Compile into efficient designs on FPGA FPGA-centric commercial switches are ASICs Extremely deep pipeline (100+ stages) 5
Neither paper analyzes design trade-offs or presents design principles 6
Outline 1. Packet parsing 2. Understanding parser design 3. Providing flexibility 7
Packet parsing Network review Parsing process 8
Internet 9
9
9
1 4 2 3 10
Packet Color Output Port 1 1 2 3 4 4 2 3 10
Packet Color Output Port 1 1 2 3 4 4 2 3 10
Packet 11
Header 1 Header 2 Packet Header 3 Payload 11
Header 1 Header 2 Packet Header 3 Payload Field 1 Field 2 Field 3... Field n 11
(Ethernet) (VLAN) (IPv4) Header 1 Header 2 Packet Header 3 Payload Field 1 Field 2 Field 3... Field n (Source Address) (Destination Address) 11
(Ethernet) (VLAN) (IPv4) Header 1 Header 2 Packet Header 3 Payload Field 1 Field 2 Field 3... Field n (Source Address) (Destination Address) Destination Port A 1 B 2 C 3 D 4 11
Header fields Match Tables Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets 12
Ethernet VLAN IP TCP Src MAC Dst MAC Eth Type VLAN ID Src IP Dst IP Protocol Priority Src Port Dst Port Header fields Match Tables Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets 12
Ethernet VLAN IP TCP Src MAC Dst MAC Eth Type VLAN ID Src IP Dst IP Protocol Priority Src Port Dst Port In Parser Src MAC Dst MAC Eth Type VLAN ID Header fields Ethernet Forwarding Eth Type Dst IP IP Routing Src MAC Dst MAC Eth Type VLAN ID Src IP Dst IP Protocol Priority Match Tables Access Control List Src Port Dst Port Action Processing Queues Out Packets 12
Packet parsing Network review Parsing process 13
Parsing: identify headers & extract fields 14
Parsing: identify headers & extract fields A B C A D A B B 14
Parsing: identify headers & extract fields A B C A D A B B?????????? 14
Parsing: identify headers & extract fields A B C A D A B B A?????????? 14
Parsing: identify headers & extract fields A B C A D A B B Next: B A?????????? 14
Parsing: identify headers & extract fields A B C A D A B B Next: B A?????????? 14
Parsing: identify headers & extract fields A B C A D A B B Next: B A?????? B???? 14
Parsing: identify headers & extract fields A B C A D A B B Next: B Len: 20B Next: C A?????? B???? 14
Parsing: identify headers & extract fields A B C A D A B B Field Next: B Len: 20B Next: C A?????? B???? 14
Parsing: identify headers & extract fields A B C A D A B B Field Next: B Len: 20B Next: C A???? B?????? 14
Parsing: identify headers & extract fields A B C A D A B B Field Next: B Len: 20B Next: C Field Field A???? B?????? 14
Parsing: identify headers & extract fields A B C A D A B B Field Next: B Len: 20B Next: C Next: Field Field Field A???? B???? C?? 14
Parsing: identify headers & extract fields A B C A B B A D Next Hop 1 2 3 4 Next: B Len: 20B Next: C Next: Field Field Field Field A???? B???? C?? 14
Parse graphs A B C D E F 15
Parse graphs A B C A C D F D E F 15
Parse graphs A Extract fields: 1, 2 B Extract fields: 2 C Extract fields: 1 D Extract fields: 2, 4 E Extract fields: 2 F Extract fields: 1, 2 15
Parse graphs A Extract fields: 1, 2 B Extract fields: 2 C Extract fields: 1 Parse graph is D E Extract fields: 2, 4 Extract fields: 2 the state machine F Extract fields: 1, 2 15
Parse graphs in the field ARP/RARP Ethernet IPv4 VLAN IPv6 VLAN Ethernet IPv4 VLAN VLAN Ethernet TCP UDP ICMP Enterprise TCP VXLAN UDP GRE Ethernet ARP/RARP NVGRE MPLS Data center MPLS MPLS Ethernet MPLS ARP IPv4 IPv6 RARP MPLS IPv4 IPv6 Service provider 16 TCP UDP SCTP GRE IPsec AH IPsec ESP Enterprise edge
Parse graphs in the field Ethernet VLAN (802.1ad) PBB (802.1ah) ARP VLAN (802.1Q) EoMPLS RARP ICMPv6 VLAN (802.1Q) MPLS IPv6 MPLS MPLS IPv4 MPLS MPLS GRE IPsec ESP IPsec AH SCTP UDP TCP ICMP NVGRE IPv6 IPv4 VXLAN Ethernet 16
What makes parsing hard? 17
What makes parsing hard? Ethernet Many headers Many paths VLAN (802.1ad) PBB (802.1ah) ARP VLAN (802.1Q) EoMPLS RARP ICMPv6 VLAN (802.1Q) MPLS IPv6 MPLS MPLS IPv4 MPLS MPLS Variable path lengths Ethernet GRE IPsec ESP IPsec AH SCTP UDP NVGRE IPv6 IPv4 VXLAN TCP ICMP 17
What makes parsing hard? Ethernet Many headers Many paths VLAN (802.1ad) PBB (802.1ah) ARP VLAN (802.1Q) EoMPLS RARP ICMPv6 VLAN (802.1Q) MPLS IPv6 MPLS MPLS IPv4 MPLS MPLS Variable path lengths Ethernet GRE IPsec ESP IPsec AH SCTP UDP NVGRE IPv6 IPv4 VXLAN TCP ICMP Variable header lengths Ethernet Len: 14B IPv4 Len: 20-60B TCP Len: 20-60B Payload Header identified by previous Next: IPv4 Len: 20B Next: TCP Len: 20B 17
What makes parsing hard? Ethernet Many headers Many paths VLAN (802.1ad) PBB (802.1ah) ARP VLAN (802.1Q) EoMPLS RARP ICMPv6 VLAN (802.1Q) MPLS IPv6 MPLS MPLS IPv4 MPLS MPLS Variable path lengths Ethernet GRE IPsec ESP IPsec AH SCTP UDP NVGRE IPv6 IPv4 VXLAN TCP ICMP Variable header lengths Ethernet Len: 14B IPv4 Len: 20-60B TCP Len: 20-60B Payload Header identified by previous Next: IPv4 Len: 20B Next: TCP Len: 20B Line rate Aggressive latency Area & power constrained 17
What makes parsing hard? Ethernet Many headers Many paths VLAN (802.1ad) PBB (802.1ah) ARP VLAN (802.1Q) EoMPLS RARP ICMPv6 VLAN (802.1Q) MPLS IPv6 MPLS MPLS IPv4 MPLS MPLS Variable path lengths Ethernet GRE IPsec ESP IPsec AH SCTP UDP NVGRE IPv6 IPv4 VXLAN TCP ICMP Variable header lengths Ethernet Len: 14B IPv4 Len: 20-60B TCP Len: 20-60B Payload Header identified by previous Next: IPv4 Len: 20B Next: TCP Len: 20B Line rate Aggressive latency Area & power constrained 64 x 10 Gb/s switch: 1 billion pkts/sec 250ns port-to-port 40W 17
Implementing a parser 18
Implementing a parser A B C D E F 18
Implementing a parser A Header Identification B C D F E Header types & locations Field Extraction Extracted Field Buffer Packet data 18 Extracted fields
Implementing a parser A Field (Source) Field (Dest) Field Field Header Identification Field (Proto)?????????? B D C Field (Source) F Field (Dest) E Field (Proto) Access Control ALLOW Header types DENY & locations Field ALLOWExtraction Extracted Field Buffer Packet data 18 Extracted fields
Implementing a parser B A D Field (Source) C Field (Source) F Field (Dest) E Field (Dest) Field (Proto) Field Field Header Identification Access Control ALLOW Header types DENY & locations Field ALLOWExtraction Field (Proto) A?????????? Extracted Field Buffer Packet data 18 Extracted fields
Implementing a parser A Field Field Header Identification Field (Proto) A?????????? B D C Field (Source) F Field (Dest) E Field (Proto) Extracted Field Buffer Access Control ALLOW Header types DENY & locations Field ALLOWExtraction Extracted Field Buffer Packet data 18 Extracted fields
Implementing a parser A???? B???? C?? B A D C Field (Source) F Field (Dest) E Field (Proto) Extracted Field Buffer Header Identification Access Control ALLOW Header types DENY & locations Field ALLOWExtraction Extracted Field Buffer Packet data 18 Extracted fields
Implementing a parser A Header Identification B C D F E Header types & locations Field Extraction Extracted Field Buffer Packet data 19 Extracted fields
Implementing a parser A Header Identification B C D F E State Machine Header types & locations Field Extraction Extracted Field Buffer Packet data 19 Extracted fields
Implementing a parser A Header Identification B C D F E State Machine Header types & locations Field Extraction Header Extract Fields A A1, A2 B B1 C C2, C4 Extracted Field Buffer Packet data 19 Extracted fields
Data processing width????????????? 20
Data processing width????????????? A B C D E F 20
Data processing width????????????? 0 4 8 12 Packet position (B) B A D C E B A D C E F F 20
Data processing width????????????? 0 4 8 12 Packet position (B) B A D C E B A D C E F F 20
Data processing width????????????? 0 4 8 12 Packet position (B) B A D C E B A D C E F F 4 cycles, 1 decision/cycle 20
Data processing width????????????? 0 4 8 12 Packet position (B) B A D C E B A D C E F F 4 cycles, 1 decision/cycle 20
Data processing width????????????? 0 4 8 12 Packet position (B) B A D C E B A D C E F F 4 cycles, 1 decision/cycle 20 2 cycles, 2 decisions/cycle
21
Processing width: 1B Processing width: 2B Processing width: 3B Processing width: 16B 21
Parser construction Processing width: 1B Prototype: 2 months Processing width: 2B Processing width: 3B Processing width: 16B 21
Parser construction Processing width: 1B Prototype: 2 months Processing width: 2B Processing width: 3B Processing width: 1B Processing width: 2B Rate: 10 Gb/s Rate: 20 Gb/s Processing width: 16B Processing width: 2B Rate: 100 Gb/s 21
Understanding parser design Parser generator Trade-offs in parser design 22
Parse graph Clock Processing width Parsers per chip Parser Generator.v Parser (Verilog) Synthesis Netlist Layout Reports: 23 area, power, timing
Parse graph Clock Processing width Parsers per chip Parser Generator.v Parser (Verilog) Synthesis Netlist Layout Reports: 24 area, power, timing
Parse graph Clock Processing width Parsers per chip Parser Generator Genesis [Shacham et. al., IEEE Micro 10] Architectural Template Per-Application Configuration A = 1 B = 12 + Design Instance.v Parser (Verilog) Synthesis Netlist Layout Reports: 24 area, power, timing
Parse graph Clock Processing width Parsers per chip Parser Generator //; foreach my $header (@headers) { //; my $hdrparser = generate('hdr_parser', //; "hdr_parser_". $n++, //; Header => $header); `$hdrparser->instantiate()` (.pkt_data (pkt data), Parser architectural template: mixed Perl/Verilog Genesis [Shacham et. al., IEEE Micro 10] Architectural Template Per-Application Configuration A = 1 B = 12 + Design Instance.v Parser (Verilog) Synthesis Netlist Layout Reports: 24 area, power, timing
Parse graph Processing Width Parser Generator.v Parser design 25
Parse graph Processing Width Parser Generator.v Parser design header { name: fields: extract: next-header: }... Parse Graph & Header Formats 25
Parse graph Processing Width Parser Generator.v Parser design header { name: fields: extract: next-header: } A... Parse Graph & Header Formats B D C E F 25
Parse graph Processing Width Parser Generator.v Parser design header { name: fields: extract: next-header: } A... Parse Graph & Header Formats B D C E F 25
Parse graph Processing Width Parser Generator.v Parser design header { name: fields: extract: next-header: } A A A B A C... Parse Graph & Header Formats B D C E F 25
Parse graph Processing Width Parser Generator.v Parser design header { name: fields: extract: next-header: } A A A B A C... C Parse Graph & Header Formats B D C F E D D F C D C E E E F 25
A B Next Header C D 26
A A B Next Header B Next Header C D C D 26
A A B Next Header Requires buffering to delay processing B Next Header C D C D Process all data by packet end more data some cycles 26
Meeting throughput needs 27
Meeting throughput needs r = f w throughput (rate) frequency data width 27
Meeting throughput needs r = f w throughput (rate) frequency data width width: w Parser width: w/n Parser 1 width: w/n Parser n 27
Meeting throughput needs r = f w throughput (rate) frequency data width r = n f w/n width: w Parser width: w/n Parser 1 width: w/n Parser n 27
Understanding parser design Parser generator Trade-offs in parser design 28
Data processing width? Fixed for switch r = n f w width: w width: w Parser 1 Parser n Single instance: Build a single parser of rate r (r = const n = 1 f 1/w) Multiple instances: Build multiple parsers with total rate r (r = const f = const n 1/w) 29
Single parser instance 10 Gb/s Big parse graph 8 M 6 M Gates Power 600 450 Gates 4 M 300 Power (mw) 2 M 150 0 M 2 4 8 16 Processing width (B) 30 0
Single parser instance 10 Gb/s Big parse graph 8 M 6 M Gates Power 600 450 Gates 4 M 300 Power (mw) 2 M 150 0 M 2 4 8 16 Processing width (B) 30 0
Single parser instance 10 Gb/s Big parse graph 8 M 6 M Gates Power 600 450 Gates 4 M 300 Power (mw) 2 M Area: narrow width 150 0 M Power: slow clock 2 4 8 16 Processing width (B) 30 0
Aggregating parsers 2M 640 Gb/s Big parse graph 600 1.5M 450 Gates 1M Size Power 300 Power (mw) 0.5M 150 0M 10 20 30 40 50 60 70 80 0 Rate (Gb/s) per instance 31
Aggregating parsers 2M 640 Gb/s Big parse graph 600 1.5M 450 Gates 1M Size Power 300 Power (mw) 0.5M 150 0M 10 20 30 40 50 60 70 80 0 Rate (Gb/s) per instance 31
Aggregating parsers 2M 640 Gb/s Big parse graph 600 1.5M 450 Gates 1M Size Power Area: independent of instance rate 300 Power (mw) 0.5M and count 150 Power: prefer fewer fast parsers 0M 10 20 30 40 50 60 70 80 Rate (Gb/s) per instance 31 0
Parse graph impacts area 32
Parse graph impacts area Enterprise Enterprise Edge Service Provider Big 32
Parse graph impacts area Enterprise Enterprise Edge Service Provider Big 2 M 1.5 M Gates 1 M 0.5 M 0 M 10 Gb/s 20 Gb/s 40 Gb/s 80 Gb/s Rate per instance 32
Parse graph impacts area Enterprise Enterprise Edge Service Provider Big 2 M 640 Gb/s aggregate 1.5 M Gates 1 M 0.5 M 0 M 10 Gb/s 20 Gb/s 40 Gb/s 80 Gb/s Rate per instance 32
Parse graph impacts area Enterprise Enterprise Edge Service Provider Big 2 M 1.5 M 640 Gb/s aggregate Why? Gates 1 M 0.5 M 0 M 10 Gb/s 20 Gb/s 40 Gb/s 80 Gb/s Rate per instance 32
Extracted fields dominate area 2 M 1.5 M 640 Gb/s 40 Gb/s per instance Header Identification Field Extraction Field Result Buffer Gates 1 M 0.5 M 0 M Enterprise Enterprise Edge Service Provider Composite 33
Extracted fields dominate area 2 M 1.5 M 640 Gb/s 40 Gb/s per instance Header Identification Field Extraction Field Result Buffer Gates 1 M 0.5 M 0 M 672 b 888 b 688 b 1664 b Enterprise Enterprise Edge Service Provider Composite 33
672b 888b 688b 1672b 34
672b 2 M 888b 688b 640 Gb/s 40 Gb/s per instance 1672b 1.5 M Gates 1 M 0.5 M 0 M 0 500 1000 1500 2000 Field Result Buffer Width (b) 34
672b 2 M 888b 688b 640 Gb/s 40 Gb/s per instance 1672b 1.5 M Gates 1 M 0.5 M 0 M 0 500 1000 1500 2000 Field Result Buffer Width (b) 34
672b 888b 2 M 688b 640 Gb/s 40 Gb/s per instance 1672b 1.5 M Gates 1 M 0.5 M 0 M 3 headers Extracted fields: 1672b 0 500 1000 1500 2000 Field Result Buffer Width (b) 34
672b 888b 2 M 688b 640 Gb/s 40 Gb/s per instance 1672b 1.5 M Gates 1 M 0.5 M 0 M 3 headers Extracted fields: 1672b 0 500 1000 1500 2000 Field Result Buffer Width (b) 34
672b 2 M Area 888b determined by 688b 640 Gb/s 40 Gb/s per instance 1672b extracted field buffer size 1.5 M Gates 1 M 0.5 M 0 M 0 500 1000 1500 2000 Field Result Buffer Width (b) 34 3 headers Extracted fields: 1672b
Design principles Single parser instances area minimize by reducing width power minimize by reducing clock Aggregating instances for throughput area independent of instance rate & count power minimize using few fast instances Extracted field buffer dominates area Area determined by extracted field size total 35
Providing flexibility RMT model Programmable parser Generating parse table entries 36
Parser specific to one parse graph 37
Parser specific to one parse graph Parser 37
Parser specific to one parse graph Parser 37
Parser specific to one parse graph Parser 37
Parser specific to one parse graph Parser S1 37
Parser specific to one parse graph Parser Switch = S1 S1 37
38
Header fields Match Tables Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets 38
Header fields Match Tables Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets CPU GPU FPGA OpenFlow/SDN? 38
Header fields Match Tables Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets 39
Header fields Match Tables Multiple Match Table (MMT) Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets 39
Header fields Match Tables Multiple Match Table (MMT) Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets Reconfigurable Multiple Table (RMT) Reconfigurable Match + Action Tables Queues In Programmable Parser Recombine Out Packets 39
Header fields Match Tables Multiple Match Table (MMT) Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets Reconfigurable Multiple Table (RMT) Reconfigurable Match + Action Tables Queues In Programmable Parser Recombine Out Packets 39
Header fields Match Tables Multiple Match Table (MMT) Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets Reconfigurable Multiple Table (RMT) Reconfigurable Match + Action Tables Queues In Programmable Parser Recombine Out Packets 39
Header fields Match Tables Multiple Match Table (MMT) Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets Reconfigurable Multiple Table (RMT) Reconfigurable Match + Action Tables Queues In Programmable Parser Recombine Out Packets 39
Header fields Match Tables Multiple Match Table (MMT) Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets Reconfigurable Multiple Table (RMT) Reconfigurable Match + Action Tables Queues In Programmable Parser Recombine Out Packets 39
Header fields Match Tables Multiple Match Table (MMT) Queues In Parser Ethernet Forwarding IP Routing Access Control List Action Processing Out Packets Reconfigurable Multiple Table (RMT) Reconfigurable Match + Action Tables Queues In Programmable Parser Recombine Out Packets 39
RMT architecture Stage 1 Stage n IN DATA HEADER Match Table Action Match Table Action Recombine Output Queues OUT 40
RMT architecture Stage 1 Stage n ta IN H DATA HEADER Match Table Action Match Table Action Recombine Output Queues OUT 40
RMT architecture Stage 1 Stage n IN Data DATA HEADER H Match Table Action Match Table Action Recombine Output Queues OUT 40
RMT architecture Stage 1 Stage n IN Data DATA HEADER Match Table Action Match Table Action Recombine Output Queues OUT 40
RMT architecture Stage 1 Stage n IN Data DATA HEADER Match Table Action Match Table Action Recombine Output Queues OUT 40
RMT architecture Stage 1 Stage n IN Data DATA HEADER Match Table Action Match Table Action Recombine Output Queues OUT 40
RMT architecture Stage 1 Stage n IN DATA HEADER Match Table Action Match Table Action Data Recombine Output Queues OUT 40
RMT architecture Stage 1 Stage n IN DATA HEADER Match Table Action Match Table Action Recombine Output Data Queues H OUT 40
RMT architecture Stage 1 Stage n IN DATA HEADER Match Table Action Match Table Action Recombine Output Queues OUT 40
RMT Match Tables Physical Stage 1 Physical Stage 2 Physical Stage n Logical Table 1 Logical Table 3 6 Logical Table 2 4 5 41
Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN P. Bosshart, G.Gibb, H.S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica & M. Horowitz SIGCOMM 2013 [to appear] 42
Providing flexibility RMT model Programmable parser Generating parse table entries 43
Providing programmability C Header Identification C D C E D D F Header types & locations Field Extraction Header Extract Fields A A1, A2 B B1 C C2, C4 Extracted Field Buffer B A C E E F Packet data Extracted fields D E 44 F
Providing programmability Replace hard-coded C Header Identification logic with C D programmable logic C E D D F Header types & locations Field Extraction Header Extract Fields A A1, A2 B B1 C C2, C4 Extracted Field Buffer B A C E E F Packet data Extracted fields D E 44 F
Providing programmability C Header Identification C D Curr. State Match Values Next State A A1, A2 B B B1 -- C C2, C4 D D C E D F Header types & locations Field Extraction Header Extract Fields A A1, A2 B B1 C C2, C4 Extracted Field Buffer B A C E E F Packet data Extracted fields D E 44 F
A B C D E F Current State Match Values Next State A 11 (A B) B A A C C C C D, D F F C C E E 45
A B C D E F Current State Match Values Next State A 11 (A B) B A A C C C C D, D F F C C E E 45
A B C D E F Current State Match Values Next State A 11 (A B) B A A C C C C D, D F F C C E E 45
A B C D E F Current State Match Values Next State A 11 (A B) B A A C C C C D, D F F C C E E 45
A B C D E F Current State Match Values Next State A 11 (A B) B A A C C C C D, D F F C C E E 45
A B C D E F Current State Match Values Next State A 11 (A B) B A A C C C C D, D F F C C E E 45
A B C D E F Current State Match Values Next State A 11 (A B) B A A C C C C D, D F F C C E E 45
Parser state table 46
Parser state table Current State Match Values Next State 46
Parser state table TCAM or RAM RAM Current State Match Values Next State 46
Parser state table TCAM or RAM RAM Current State Match Values Next State Header Length 46
Parser state table TCAM or RAM RAM Current State Match Values Next State Header Length Next Match Offsets 46
Parser state table TCAM or RAM RAM Current State Match Values Next State Header Length Next Match Offsets Next header location Next match locations 46
Parser state table TCAM or RAM RAM Optional Current State Match Values Next State Header Length Next Match Offsets Next header location Next match locations 46
Parser state table TCAM or RAM RAM Optional Current State Match Values Next State Header Length Next Match Offsets Next Lookup Mask Next header location Next match locations 46
Parser state table TCAM or RAM RAM Optional Current State Match Values Next State Header Length Next Match Offsets Next Lookup Mask Extract Fields Next header location Next match locations 46
Cost of programmability Extracted Field Buffer TCAM (State Table) Hdr Ident/Field Extract RAM (State Table) 47
Cost of programmability Extracted Field Buffer TCAM (State Table) Hdr Ident/Field Extract RAM (State Table) 7 M 5.25 M Gates 3.5 M 1.75 M 0 M Fixed Programmable 47
Cost of programmability Extracted Field Buffer TCAM (State Table) Hdr Ident/Field Extract RAM (State Table) 7 M 5.25 M Gates 3.5 M 1.75 M 0 M Fixed Programmable 47
Cost of programmability Extracted Field Buffer TCAM (State Table) Hdr Ident/Field Extract RAM (State Table) 7 M 4.4mm 2 Gates 5.25 M 3.5 M 2.6mm 2 1.75 M 0 M Fixed Programmable 47
Cost of programmability Extracted Field Buffer TCAM (State Table) Hdr Ident/Field Extract RAM (State Table) Programmability costs 1.5-3x 7 M 4.4mm 2 Gates State 5.25 M table size determines area 3.5 M 2.6mm 2 increase 1.75 M 0 M Fixed Programmable 47
Take-aways Cost of programmability 1.5-3x fixed parser area State table dominates additional area area minimize TCAM and RAM Parse graph edge count determines table size 48
Providing flexibility RMT model Programmable parser Generating parse table entries 49
Naïve generation of state table entries 50
Naïve generation of state table entries 150 TCAM table size (Kb) 112.5 75 37.5 0 1 2 4 6 8 10 12 14 16 Processing width (B) 50
Naïve generation of state table entries 150 TCAM table size (Kb) 112.5 75 37.5 0 1 2 4 6 8 10 12 14 16 Processing width (B) 50
State table entry generation A Current State Match Values Next State B C A 11 (A B) B D E A A C C C C D, D F F F C C E E 51
State table entry generation A Current State Match Values Next State B C A 11 (A B) B D E A A C C C C D, D F F F C C E E 51
State table entry generation A Current State Match Values Next State B C A 11 (A B) B D E A A C C C C D, D F F F C C E E Merge nodes to minimize edges 51
State table entry generation A Current State Match Values Next State B C A 11 (A B) B D E A A C C C C D, D F F F C C E E Merge nodes to minimize edges Problem: graph clustering is NP-hard 51
Kangaroo Intuition: iteratively identify minimal edge clustering starting at leaves 52
Kangaroo Intuition: iteratively identify minimal edge clustering starting at leaves 52
Kangaroo Intuition: iteratively identify minimal edge clustering starting at leaves 52
Kangaroo Intuition: iteratively identify minimal edge clustering starting at leaves 52
Kangaroo Intuition: iteratively identify minimal edge clustering starting at leaves Kangaroo s algorithm: access to data anywhere in header region non-minimal solutions for non-trees 52
Improving solution for non-trees 53
Improving solution for non-trees 53
Improving solution for non-trees 53
Improving solution for non-trees Two independent solutions 53
Improving solution for non-trees Solution: solve shared regions independently Two independent solutions 53
Improving solution for non-trees Solution: solve shared regions independently Two independent solutions 53
Improving solution for non-trees Solution: solve shared regions independently Two independent solutions 53
Improving solution for non-trees Solution: solve shared regions independently Two independent solutions 53
Improving solution for non-trees Solution: solve shared regions independently Two independent solutions 53
Streaming-aware algorithm Kangaroo: Streaming: 54
Streaming-aware algorithm Kangaroo: Streaming: 54
Streaming-aware algorithm Kangaroo: Streaming: 54
Streaming-aware algorithm Kangaroo: Streaming: 54
Streaming-aware algorithm Kangaroo: Streaming: Next Hdr Next Hdr 54
Streaming-aware algorithm Kangaroo: OPT(n, b) = min c2clusters(n) 0 @entries(c)+ 1 X OPT(j,...) A j2fringe(c) 55
Streaming-aware algorithm Kangaroo: OPT(n, b) = min c2clusters(n) 0 @entries(c)+ 1 X OPT(j,...) A j2fringe(c) Streaming: OPT(n, b, w) = min c2clusters(n,w) 0 @entries(c)+ X j2fringe(c) 1 OPT(j,..., NewLoc(w, j, c)) A 55
Streaming-aware algorithm Kangaroo: OPT(n, b) = min c2clusters(n) 0 @entries(c)+ 1 X OPT(j,...) A j2fringe(c) Streaming: OPT(n, b, w) = min c2clusters(n,w) 0 @entries(c)+ X j2fringe(c) 1 OPT(j,..., NewLoc(w, j, c)) A New parameter: window location 55
Streaming-aware algorithm Kangaroo: OPT(n, b) = min c2clusters(n) 0 @entries(c)+ 1 X OPT(j,...) A j2fringe(c) Streaming: OPT(n, b, w) = min c2clusters(n,w) 0 @entries(c)+ X j2fringe(c) 1 OPT(j,..., NewLoc(w, j, c)) A New parameter: window location Node clusters restricted by: windows location window size 55
Streaming-aware algorithm Kangaroo: OPT(n, b) = min c2clusters(n) 0 @entries(c)+ 1 X OPT(j,...) A j2fringe(c) Streaming: OPT(n, b, w) = min c2clusters(n,w) 0 @entries(c)+ X j2fringe(c) 1 OPT(j,..., NewLoc(w, j, c)) A New parameter: window location Node clusters restricted by: windows location window size Updated location for subgraphs 55
Algorithm performance O( E V d k ) Method Naive Algorithm (excluding non-tree logic) Algorithm 40b TCAM (8b state + 2 x 16b inputs) 342 entries 0.48s 177 entries 2.6s 112 entries 128.7s 56b TCAM (8b state + 3 x 16b inputs) 641 entries 0.48s 170 entries 5.5s 106 entries 207.6s 56
Benefits of parallel lookups? Table entries required 120 90 60 30 0 32 64 Lookups 1 2 3 4 Data arrival rate (bits/cycle) 57
Benefits of parallel lookups? Table entries required 120 90 60 30 0 32 64 Lookups 1 2 3 4 Data arrival rate (bits/cycle) 57
Benefits of parallel lookups? Table entries required 120 90 60 30 0 Unable to process at arrival rate 32 64 Lookups 1 2 3 4 Data arrival rate (bits/cycle) 57
Benefits of parallel lookups? 8000 TCAM bits required Table entries required 6000 4000 2000 0 120 90 60 30 0 Unable to process at arrival rate 32 64 Lookups 1 2 3 4 Data arrival rate (bits/cycle) 57
Benefits of parallel lookups? 8000 TCAM bits required Table entries required 6000 4000 2000 120 Minimize parallel lookups 0 Unable to process at arrival rate for single instance 90 60 30 0 32 64 Data arrival rate (bits/cycle) 57 Lookups 1 2 3 4
Contributions Parser generator Parser design trade-off analysis & principles Fixed parsers Single parser instances area minimize by reducing width power minimize by reducing clock Aggregating instances for throughput area independent of instance rate & count power minimize using few fast instances Extracted field buffer dominates area Programmable parsers Cost of programmability is low (1.5-3x) State table dominates area increase RMT model State table generation algorithm 58
Publications Forwarding Metamorphosis: Fast Programmable Match-Action Processing in Hardware for SDN Bosshart, P., Gibb, G., et. al. SIGCOMM 2013 [to appear] Outsourcing network functionality Gibb, G., Zeng, H., and McKeown, N., HotSDN '12. Initial Thoughts on the Waypoint Service Gibb, G., Zeng, H., and McKeown, N., WISH '11. Can the Production Network be the Testbed? Sherwood, R., Gibb, G., et. al, OSDI '10. A Packet Generator on the NetFPGA platform Covington, G.A., Gibb, G., et. al. FCCM '09,. NetFPGA An Open Platform for Teaching How to Build Gigabit-rate Network Switches and Routers Gibb, G., et. al. IEEE Transactions on Education 08. NetFPGA: Reusable Router Architecture for Experimental Research Naous, J., Gibb, G., et. al. PRESTO '08. Building a RCP (Rate Control Protocol) Test Network Dukkipati, N., Gibb, G. et. al. Hot Interconnects 07. NetFPGA An Open Platform for Gigabit- Rate Network Switching and Routing Lockwood, J., et. al. MSE '07 59