NetCheck: Network Diagnoses from Blackbox Traces

Similar documents
How do we troubleshoot this? How does Esmeralda know how to fix this?

Transport Layer (TCP/UDP)

Light & NOS. Dan Li Tsinghua University

9th Slide Set Computer Networks

UNIX Sockets. Developed for the Azera Group By: Joseph D. Fournier B.Sc.E.E., M.Sc.E.E.

ELEC / COMP 177 Fall Some slides from Kurose and Ross, Computer Networking, 5 th Edition

EEC-682/782 Computer Networks I

ECE 650 Systems Programming & Engineering. Spring 2018

VALLIAMMAI ENGINEERING COLLEGE. SRM Nagar, Kattankulathur QUESTION BANK

SOCKET. Valerio Di Valerio

Network programming(ii) Lenuta Alboaie

Z/TPF TCP/IP SOCK Driver 12/14/10. z/tpf TCP/IP SOCKET Driver Users Guide. Copyright IBM Corp. 2010

CSCE 463/612 Networks and Distributed Processing Spring 2018

Transport Layer. Gursharan Singh Tatla. Upendra Sharma. 1

CptS 360 (System Programming) Unit 17: Network IPC (Sockets)

A Client-Server Exchange

CSE 461 The Transport Layer

CSE 422 Jeopardy. Sockets TCP/UDP IP Routing Link $100 $200 $300 $400. Sockets - $100

CSE 461 Connections. David Wetherall

PROGRAMMING ASSIGNMENTS 3 & 4 TAO

Lecture 2. Outline. Layering and Protocols. Network Architecture. Layering and Protocols. Layering and Protocols. Chapter 1 - Foundation

Lecture 5 Overview! Last Lecture! This Lecture! Next Lecture! I/O multiplexing! Source: Chapter 6 of Stevens book!

CSCI-GA Operating Systems. Networking. Hubertus Franke

A set of processes distributed over a network that communicate via messages. Processes communicate via services offered by the operating system

Different Layers Lecture 20

LESSON PLAN. Sub. Code & Name : IT2351 & Network Programming and Management Unit : I Branch: IT Year : III Semester: VI.

Fence: Protecting Device Availability With Uniform Resource Control

TSIN02 - Internetworking

(Refer Slide Time: 1:09)

EEC-484/584 Computer Networks

sottotitolo Socket Programming Milano, XX mese 20XX A.A. 2016/17 Federico Reghenzani

CSEP 561 Connections. David Wetherall

NT1210 Introduction to Networking. Unit 10

Contents. Part 1. Introduction and TCP/IP 1. Foreword Preface. xix. I ntroduction 31

Interprocess Communication

Hybrid of client-server and P2P. Pure P2P Architecture. App-layer Protocols. Communicating Processes. Transport Service Requirements

Motivation of VPN! Overview! VPN addressing and routing! Two basic techniques for VPN! ! How to guarantee privacy of network traffic?!

Internet Technology. 06. Exam 1 Review Paul Krzyzanowski. Rutgers University. Spring 2016

CHETTINAD COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF MCA QUESTION BANK UNIT 1

CSE/EE 461 Lecture 14. Connections. Last Time. This Time. We began on the Transport layer. Focus How do we send information reliably?

ICS 351: Networking Protocols

Persistence Schemes. Chakchai So-In Department of Computer science Washington University

IT 540 Operating Systems ECE519 Advanced Operating Systems

CSC 4900 Computer Networks: Network Layer

TRANSMISSION CONTROL PROTOCOL. ETI 2506 TELECOMMUNICATION SYSTEMS Monday, 7 November 2016

Socket Programming. Omer Ozarslan

QUIZ: Longest Matching Prefix

Transport Layer. Application / Transport Interface. Transport Layer Services. Transport Layer Connections

Operating Systems and Networks. Network Lecture 8: Transport Layer. Adrian Perrig Network Security Group ETH Zürich

Operating Systems and Networks. Network Lecture 8: Transport Layer. Where we are in the Course. Recall. Transport Layer Services.

Computer Networks SYLLABUS CHAPTER - 2 : NETWORK LAYER CHAPTER - 3 : INTERNETWORKING

Internet Technology 3/2/2016

Connections. Topics. Focus. Presentation Session. Application. Data Link. Transport. Physical. Network

Design Considerations : Computer Networking. Outline. Challenge 1: Address Formats. Challenge. How to determine split of functionality

Outline. Distributed Computing Systems. Socket Basics (1 of 2) Socket Basics (2 of 2) 3/28/2014

Lecture 3: The Transport Layer: UDP and TCP

CSEN 503 Introduction to Communication Networks. Mervat AbuElkheir Hana Medhat Ayman Dayf. ** Slides are attributed to J. F.

Lecture 13: Transportation layer

Introduction to Network. Topics

CSEP 561 Connections. David Wetherall

Introduction to TCP/IP networking

PLEASE READ CAREFULLY BEFORE YOU START

Computer Networks. Course Reference Model. Topic. Error Handling with ICMP. ICMP Errors. Internet Control Message Protocol 12/2/2014.

Interprocess Communication Mechanisms

shared storage These mechanisms have already been covered. examples: shared virtual memory message based signals

CSE 461 Module 11. Connections

Layering and Addressing CS551. Bill Cheng. Layer Encapsulation. OSI Model: 7 Protocol Layers.

EECS122 Communications Networks Socket Programming. Jörn Altmann

Transport: How Applications Communicate

Memory Management Strategies for Data Serving with RDMA

APPENDIX F THE TCP/IP PROTOCOL ARCHITECTURE

CS457 Transport Protocols. CS 457 Fall 2014

Python Networking Chris Seddon

History Page. Barracuda NextGen Firewall F

Outline. Operating Systems. Socket Basics An end-point for a IP network connection. Ports. Network Communication. Sockets and the OS

Python INTRODUCTION: Understanding the Open source Installation of python in Linux/windows. Understanding Interpreters * ipython.

Internet Applications and the Application Layer Material from Kurose and Ross, Chapter 2: The Application Layer

Overview. Exercise 0: Implementing a Client. Setup and Preparation

Some slides courtesy David Wetherall. Communications Software. Lecture 4: Connections and Flow Control. CSE 123b. Spring 2003.

Overview. Last Lecture. This Lecture. Daemon processes and advanced I/O functions

Last Class. CSE 123b Communications Software. Today. Naming Processes/Services. Transmission Control Protocol (TCP) Picking Port Numbers.

EC441 Fall 2018 Introduction to Computer Networking Chapter4: Network Layer Data Plane

CSC 401 Data and Computer Communications Networks

Computer and Network Security

OSI Transport Layer. objectives

Configuring Traffic Policies

Outline. Distributed Computer Systems. Socket Basics An end-point for a IP network connection. Ports. Sockets and the OS. Transport Layer.

IPv4 addressing, NAT. Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley.

NETWORK PROGRAMMING. Ipv4 and Ipv6 interoperability

Server algorithms and their design

ch02 True/False Indicate whether the statement is true or false.

Lecture 11: IP routing, IP protocols

Application Programming Interfaces

Scanning. Course Learning Outcomes for Unit III. Reading Assignment. Unit Lesson UNIT III STUDY GUIDE

CISCO EXAM QUESTIONS & ANSWERS

Sistemas Operativos /2016 Support Document N o 1. Files, Pipes, FIFOs, I/O Redirection, and Unix Sockets

Internet Protocol Stack! Principles of Network Applications! Some Network Apps" (and Their Protocols)! Application-Layer Protocols! Our goals:!

Overview. ACE Appliance Device Manager Overview CHAPTER

EEC-682/782 Computer Networks I

TCP /IP Fundamentals Mr. Cantu

Transcription:

NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou *, Fraida Fund *, Steven Portzer @, Monzur Muhammad^, Ivan Beschastnikh^, Justin Cappos *! (*) New York University, (^) University of British Columbia, (@) University of Washington

Goal Find bugs in networked applications Large complex unknown applications!!! Large complex unknown networks!!! Understandable output / fix 2

Motivation Apache Server Chrome Client 3

Motivation Apache Server Chrome Client probing ping 4

Motivation Chrome Client probing ping Different traffic (ICMP) Often different result Apache Server 5

Motivation Apache Server Chrome Client 6

Motivation Apache Server Chrome Client packet capture 7

Motivation Chrome Client Requires detailed protocol / app knowledge packet capture Apache Server 8

Motivation Apache Server Chrome Client 9

Motivation Apache Server Chrome Client Model Model apps Magpie, Xtrace, Pip... Model 10

Motivation Need a model per application Apache Server Chrome Client Model Model apps Magpie, Xtrace, Pip... Model 11

Motivation Apache Server Chrome Client 12

Motivation Chrome Client Header Space Analysis, etc. Network Config Analysis Apache Server Model & Config Model & Config Model & Config Model & Config 13

Motivation Chrome Client Need detailed network knowledge HW + config Network Config Analysis Apache Server Model & Config Model & Config Model & Config Model & Config 14

Motivation Apache Server Chrome Client? 15

NetCheck Chrome Client programmer programmer Apache Server 16

NetCheck Chrome Client programmer programmer Apache Server 17

NetCheck Chrome Client programmer programmer Apache Server Model Programmer s Understanding Deutsch s Fallacies 18

Outline Motivation NetCheck Overview Trace Ordering Network Model Fault Classification Results / Conclusion 19

NetCheck overview Application Fail Traces NetCheck Likely Faults 20

NetCheck overview Application Traces ktrace Fail strace NetCheck Likely Faults 21

NetCheck overview Application Traces Host Traces Ordering Algorithm Diagnoses Engine Diagnosis NetCheck Input syscall simulation result Network Model simulation state errors Output NetCheck Likely Faults 22

NetCheck overview Application Traces Network Configuration Issues NetCheck Likely Faults Traffic Statistics Problem Detected 23

Outline Motivation NetCheck Overview Trace Ordering Network Model Fault Classification Results / Conclusion Traces (a) Trace Ordering 24

Traces Series of locally ordered system calls Don t want to modify apps or use a global clock Gathered by strace, ktrace, systrace, truss, etc. Call arguments and return values! socket() = 3 bind(3, ) Call arguments = 0 listen(3, 1) = 0 accept(3, ) = 4 recv(4, "HTTP", ) = 4 Return values close(4) = 0 Return buffer 25

What we see is this:! Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3,...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) = 5 4. accept(3,...) = 4 4. close(3) = 0 5. recv(4,"hello",..) = 5 6. close(4) = 0 - one trace per host - local order but no global order Q: how do we reconstruct what really happened? 26

What we want is this A1. socket() = 3 B1. socket() = 3 A2. bind(3,...) = 0 A3. listen(3, 1) = 0 B2. connect(3,...) = 0 A4. accept(3,...) = 4 B3. send(3, "Hello",...) = 5 A5. recv(4, "Hello",...) = 5 B4. close(3) = 0 A6. close(4) = 0 The ground truth A B 27

What we want is this A1. socket() = 3 B1. socket() = 3 A2. bind(3,...) = 0 A3. listen(3, 1) = 0 B2. connect(3,...) = 0 A4. accept(3,...) = 4 B3. send(3, "Hello",...) = 5 A5. recv(4, "Hello",...) = 5 B4. close(3) = 0 A6. close(4) = 0 The ground truth!!!!!!! A Goal: find an equivalent interleaving B 28

Observation 1: Order Equivalence! Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3,...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) = 5 4. accept(3,...) = 4 4. close(3) = 0 5. recv(4,"hello",..) = 5 6. close(4) = 0 - one trace per host - local order but no global order Q: how do we reconstruct what really happened? The socket() calls are not visible to the other side Some orders are equivalent! 29

Observation 2: Return Values Guide Ordering! Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3,...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) = 5 4. accept(3,...) = 4 4. close(3) = 0 5. recv(4,"hello",..) = 5 6. close(4) = 0 - one trace per host - local order but no global order Q: how do we reconstruct what really happened? 30

Return values guide ordering! A2. bind(3,...) = 0! A3. listen(3, 1) = 0 B2. connect(3,...) = 0! One valid ordering: all syscalls returned successfully. A2. bind(3,...) = 0 B2. connect(3,...) = -1, ECONNREFUSED A second valid ordering: connect failed with ECONNREFUSED. A3. listen(3, 1) = 0! A call s return value may-depend-on a remote call s action Result indicates order of calls 31

Deciding call order bind socket getsockname getsockopt, setsockopt accept getpeername poll, select connect recv, recvfrom, recvmsg, read listen close, shutdown send, sendto, sendmsg, write, writev, sendfile full set of may-depend-on relations 32

Ordering Algorithm Input traces A B socket socket Algorithm process bind listen connect send Output Ordering accept recv 33

Ordering Algorithm Input traces A B socket socket Algorithm process Try socket on host A: accepted bind listen accept connect send Output Ordering A socket recv 34

Ordering Algorithm A Input traces B Algorithm process listen connect Try connect on host B: rejected accept send Output Ordering recv A B A socket socket bind 35

Ordering Algorithm A Input traces B Algorithm process listen connect Try listen on host A: accepted accept send Output Ordering recv A B A socket socket bind listen 36

Ordering Algorithm A recv Input traces B send Algorithm process Try recv on host A: rejected TCP BUFFER: Output Ordering A B A A B A socket socket bind listen connect accept Hola! 37

Ordering Algorithm Input traces A B recv None send Algorithm process Try send on host B: accepted TCP BUFFER: Output Ordering A B A A B A socket socket bind listen connect accept B send Hola! 38

Ordering Algorithm Input traces A B recv None Algorithm process Try send on host B: accepted TCP BUFFER: Hello Output Ordering A B A A B A socket socket bind listen connect accept B send Hola! 39

Ordering Algorithm A Input traces B Algorithm process recv TCP BUFFER: Hello None Try recv on host A: Fatal Error Output Ordering A B A A B A socket socket bind listen connect accept B send Hola! 40

Outline Motivation NetCheck Overview Trace Ordering Network Model Fault Classification Results / Conclusion Model Fatal Error Accept Reject 41

Network Model Simulates invocation of a syscall datagrams sent/lost reordering / duplication is notable track pending connections buffer lengths and contents send -> put data into buffer recv -> pop data from buffer! Simulation outcome Accept can process (correct buffer) Reject wrong order (incomplete buffer) Model Fatal Error Permanent reject abnormal behavior (incorrect buffer) Accept Reject 42

Network Model Simulates invocation of a syscall Capture programmer assumptions Assumes a simplified network view Assume transitive connectivity Little, random loss No middle boxes Assume uniform platform Flag OS differences 43

How Model Return Values Impact Trace Ordering Blackbox Tracing mechanism Host Traces Ordering Algorithm Diagnoses Engine Diagnosis Input syscall simulation result simulation state errors Output Network Model NetCheck Trace Ordering: linear running time (total trace length) * number of traces 44

Outline Motivation NetCheck Overview Trace Ordering Network Model Fault Classification Results / Conclusion (c) Fault Classifier Output 45 45

Fault Classifier Goal: Decide what to output Problem: Show relevant information Fault classifier: global (rather than local) view uncovers high-level patterns by extracting low-level features Examples: middleboxes, non-transitive connectivity, MTU, mobility, network disconnection All look like loss, but have different patterns in the context of other flows 46

Fault Classifier Options to show different levels of detail Network admins / developers detailed info Network Configuration Issues End users Classification Recommendations Traffic Statistics Problem Detected 47

Outline Motivation NetCheck Overview Trace Ordering Network Model Fault Classification Results / Conclusion 48

Evaluation: Production Application Bugs Reproduce reported bugs from bug trackers (Python, Apache, Ruby, Firefox, etc.) A total of 71 bugs Grouped into 23 categories Virtualization incurred/portability bugs SO_REUSEADDR behaves differently across OSes accept inherit O_NONBLOCK Correct analysis of >95% bugs 49

Evaluation: Observed Network Faults Twenty faults observed in practice on a live network MTU bug Intermediary device Port forward Traffic sent to non-relevant addresses Provide supplemental info packet loss buffers being closed with data in 90% of cases correctly detected 50

General Findings in Practice Middle boxes Multiple unaccepted connections client behind NAT in FTP TCP/UDP non-transitive connectivity in VLC Complex failures o VirtualBox send data larger than buffer size o Pidgin returned IP different from bind o Skype NAT + close socket from a different thread Used on Seattle Testbed seattle.poly.edu 51

NetCheck Performance Overhead VLC Skype Telnet Firefox SSH 52

Conclusion Built and evaluated NetCheck, a tool to diagnose network failures in complex apps! Key insights: model the programmer s misconceptions relation between calls reconstruct order NetCheck is effective Everyday applications & networks Real network / application bugs No per-network knowledge No per-application knowledge Try it here: https://netcheck.poly.edu/ 53

Backup slides. 54

What is NetCheck? No app- or network-specific knowledge No modification to apps/infrastructure No synchronized global clock! Blackbox Tracing mechanism (eg, strace) Reconstruct a plausible total ordering of syscall traces from multiple hosts Uses simulation and captured state to identify network related issues Map low-level issues to higher-level characterizations of failure 55

Diagnosis Model Blackbox Tracing mechanism Traces Trace Ordering Call dependency Application- Agnostic Model Collating Fault Classifier 56

Diagnosis Model Blackbox Tracing mechanism Traces Trace Ordering Call dependency accept/reject/fe Application- Agnostic Model Collating Fault Classifier 57

Diagnosis Model Blackbox Tracing mechanism Traces Trace Ordering Call dependency accept/reject/fe Application- Agnostic Model Trace Ordering: linear running time reject reorder Collating Fault Classifier 58

Pseudocode and Analysis 1. push trace t 0 in stack s 0,, trace t n-1 in stack s n-1 2. while (s 0,, s n-1 ) not empty: O(L) 3. q = peek_stack(s 0,, s n-1 ); q.sort(priority) 4. while True: 5. if q empty: raise FatalError 6. i j = q.dequeue(); 7. outcome = model_simulate(i j ) 8. if outcome == ACCEPT: 9. ordered_trace.push(s j.pop()); break Best case: O(1) Worst case: O(n) 10. elif outcome == REJECT: pass 11. elif outcome == FatalError: raise FatalError Overall: Best case O(L) Worst Case O(n*L) 59

Pseudocode and Analysis 1. push trace t 0 in list s 0,, trace t n-1 in list s n-1 2. while (s 0,, s n-1 ) not empty: 3. q = peek_stack(s 0,, s n-1 ); q.sort(priority) 4. while True: 5. if q empty: raise FatalError 6. i j = q.dequeue(); 7. outcome = model_simulate(i j ) 8. if outcome == ACCEPT: Accept Traverse 9. ordered_trace.push(s j.pop()); break 10. elif outcome == REJECT: continue Reject Backtrack 11. elif outcome == FatalError: raise FatalError 60

NetCheck input! Syscall Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3,...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3,"hello",..) =5 4. accept(3,...) = 4 4. close(3) = 0 5. recv(4, "Hello",..) = 5 6. close(4) = 0 61

NetCheck input! Syscall Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3,...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) =5 4. accept(3,...) = 4 4. close(3) = 0 5. recv(4, "Hello",..) = 5 6. close(4) = 0 62

connect depends on listen Order 1 A1 bind(3,...) = 0 A2 listen(3, 5) = 0 B1 connect(3,...) = 0! Order 2 A1 bind(3,...) = 0 B1 connect(3,...) = -1 ECONNREFUSED A2 listen(3, 5) = 0! Order 3 B1 connect(3,...) = -1 ECONNREFUSED A1 bind(3,...) = 0 A2 listen(3, 5) = 0 63

Example Rules Middle boxes Multiple unaccepted connections client behind NAT in FTP Missing connect on accepted connections server behind NAT or port forwarding Multiple connect non-standard failure firewall filtering connections Multiple connect to listening address get refused Multiple non-blocking connect failure Traffic sent to non-relevant addresses NAT or 3rd party proxy/traffic forwarding 64

Example fault classifier rules Middle boxes Multiple unaccepted connections client behind NAT in FTP Missing connect on accepted connections server behind NAT or port forwarding Traffic sent to non-relevant addresses NAT or 3rd party proxy/traffic forwarding TCP select/poll timeout send data after connection closed 65

Example rules (cont.) UDP o datagram sent/lost per connection o high datagram loss rate non-transitive connectivity in VLC Misc o apps send data larger than default OS buffer size bug report from VirtualBox bug tracker o returned IP different from bind simultaneous net disconnect/reconnect in Pidgin Skype attempted to close socket from a different thread 66

Evaluation: Everyday Applications FTP All reverse connections from server lost Client behind NAT Pidgin getsockname returns different IP Client poor connection results in IP changes Skype Poor call quality, msg drop Network delay, NAT Skype closes socket from different thread VLC Packet loss Non-transitive connectivity issue 67