Introduction. Can we use Google for networking research?

Similar documents
Mosaic: Quantifying Privacy Leakage in Mobile Networks

BLINC: Multilevel Traffic Classification in the Dark

Network traffic classification: From theory to practice

TOWARDS HIGH-PERFORMANCE NETWORK APPLICATION IDENTIFICATION WITH AGGREGATE-FLOW CACHE

Trisul Network Analytics - Traffic Analyzer

Link Homophily in the Application Layer and its Usage in Traffic Classification

EECS 122: Introduction to Computer Networks Switch and Router Architectures. Today s Lecture

NetFlow Multiple Export Destinations

Internet Inter-Domain Traffic. C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide, F. Jahanian, Proc. of SIGCOMM 2010

Internet Inter-Domain Traffic

Cisco Cloud Security. How to Protect Business to Support Digital Transformation

Bloom Filters. References:

An Overview of Search Engine. Hai-Yang Xu Dev Lead of Search Technology Center Microsoft Research Asia

Anonymous Communication and Internet Freedom

Anonymous Communication and Internet Freedom

Improved Classification of Known and Unknown Network Traffic Flows using Semi-Supervised Machine Learning

Generic Architecture. EECS 122: Introduction to Computer Networks Switch and Router Architectures. Shared Memory (1 st Generation) Today s Lecture

Detecting Malicious Hosts Using Traffic Flows

Multimedia Streaming. Mike Zink

Data Sources for Cyber Security Research

Subscriber Data Correlation

Filtering Trends Sorting Through FUD to get Sanity

Network infrastructure, routing and traffic. q Internet inter-domain traffic q Traffic estimation for the outsider

NeighborWatcher: A Content-Agnostic Comment Spam Inference System

Topology-Based Spam Avoidance in Large-Scale Web Crawls

Master Course Computer Networks IN2097

Network Forensics. CSF: Forensics Cyber-Security. Section II. Basic Forensic Techniques and Tools. MSIDC, Spring 2017 Nuno Santos

PoP Level Mapping And Peering Deals

Automated Application Signature Generation Using LASER and Cosine Similarity

Web Caching and Content Delivery

Understanding Online Social Network Usage from a Network Perspective

Network Forensics Prefix Hijacking Theory Prefix Hijacking Forensics Concluding Remarks. Network Forensics:

HOW TO CHOOSE A NEXT-GENERATION WEB APPLICATION FIREWALL

Connection Logging. Introduction to Connection Logging

Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network

Searching the Deep Web

CONTENT DISTRIBUTION. Oliver Michel University of Illinois at Urbana-Champaign. October 25th, 2011

ASA Access Control. Section 3

Compare Security Analytics Solutions

Connection Logging. About Connection Logging

Master Course Computer Networks IN2097

Features of a proxy server: - Nowadays, by using TCP/IP within local area networks, the relaying role that the proxy

Antonio Cianfrani. Access Control List (ACL) Part I

Lecture 12. Application Layer. Application Layer 1

Configuring the Botnet Traffic Filter

How to Configure ATP in the HTTP Proxy

Detecting Botnets Using Cisco NetFlow Protocol

Keywords Traffic classification, Traffic flows, Naïve Bayes, Bag-of-Flow (BoF), Correlation information, Parametric approach

News Filtering and Summarization System Architecture for Recognition and Summarization of News Pages

MPLS, THE BASICS CSE 6067, UIU. Multiprotocol Label Switching

A Flexible Model for Resource Management in Virtual Private Networks. Presenter: Huang, Rigao Kang, Yuefang

Design and Development of Secure Data Cache Framework. Please purchase PDF Split-Merge on to remove this watermark.

Pro ling-by-association: A Resilient Traf c Pro ling Solution for the Internet Backbone

Efficient Resource Management for the P2P Web Caching

Cisco Tetration Analytics

Introduction. IP Datagrams. Internet Service Paradigm. Routers and Routing Tables. Datagram Forwarding. Example Internet and Conceptual Routing Table

Malicious Activity and Risky Behavior in Residential Networks

KNOM Tutorial Internet Traffic Matrix Measurement and Analysis. Sue Bok Moon Dept. of Computer Science

Searching the Deep Web

Statistical based Approach for Packet Classification

Configuring Application Visibility and Control for Cisco Flexible Netflow

CS 268: Route Lookup and Packet Classification

WCCPv2 and WCCP Enhancements

Using NetFlow Sampling to Select the Network Traffic to Track

Cisco Firepower NGFW. Anticipate, block, and respond to threats

WhitePaper: XipLink Real-Time Optimizations

Dynamic Orchestration & Operation of Chained Network Services

DATA MINING II - 1DL460. Spring 2014"

GTIC Monthly Threat Report June 2017

Perimeter Defenses T R U E N E T W O R K S E C U R I T Y DEPENDS ON MORE THAN

Traffic and Performance Visibility for Cisco Live 2010, Barcelona

Competitive Analysis. Version 1.0. February 2017

Improved C&C Traffic Detection Using Multidimensional Model and Network Timeline Analysis

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

Search Engine Optimization

This course incorporates a variety of hands-on lab exercises allowing participants to put the lesson content into action.

deseo: Combating Search-Result Poisoning Yu USF

Power of Slicing in Internet Flow Measurement. Ramana Rao Kompella Cristian Estan

A Survey And Comparative Analysis Of Data

Automated Threat Management - in Real Time. Vectra Networks

NAT Support for Multiple Pools Using Route Maps

Detecting Network Reconnaissance with the Cisco Cyber Threat Defense Solution 1.0

AMP-Based Flow Collection. Greg Virgin - RedJack

Cisco Firepower NGFW. Anticipate, block, and respond to threats

Replicate It! Scalable Content Delivery: Why? Scalable Content Delivery: How? Scalable Content Delivery: How? Scalable Content Delivery: What?

DATA MINING II - 1DL460. Spring 2017

Configuring IP SLAs TCP Connect Operations

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017

This chapter provides information to configure Cflowd.

Tanium Endpoint Detection and Response. (ISC)² East Bay Chapter Training Day July 13, 2018

CHAPTER 4 OPTIMIZATION OF WEB CACHING PERFORMANCE BY CLUSTERING-BASED PRE-FETCHING TECHNIQUE USING MODIFIED ART1 (MART1)

Configuring QoS. Finding Feature Information. Prerequisites for QoS. General QoS Guidelines

IP Profiler. Tracking the activity and behavior of an IP address. Author: Fred Thiele (GCIA, CISSP) Contributing Editor: David Mackey (GCIH, CISSP)

YouLighter: An Unsupervised Methodology to Unveil YouTube CDN Changes

Detect Cyber Threats with Securonix Proxy Traffic Analyzer

Drafting Behind Akamai (Travelocity-Based Detouring)

Doing Analysis Carnegie Mellon University

Cache Management for TelcoCDNs. Daphné Tuncer Department of Electronic & Electrical Engineering University College London (UK)

Evaluating external network bandwidth load for Google Apps

Using Flexible NetFlow Flow Sampling

Transcription:

Unconstrained Profiling of Internet Endpoints via Information on the Web ( Googling the Internet) Ionut Trestian1 Soups Ranjan2 Aleksandar Kuzmanovic1 Antonio Nucci2 1 Northwestern 2 Narus University Inc. http://networks.cs.northwestern.edu http://www.narus.com

Introduction Can we use Google for networking research? Huge amount of endpoint information available on the web Can we systematically exploit search engines to harvest endpoint information available on the Internet? 2

Application: Googling IP-addresses for Network Forensics

Where Does the Information Come From? Some popular proxy services also display logs Even P2P information is available Websites run logging software on the Internet and display since statistics the first point of contact with a P2P swarm is a publicly available IP address Blacklists, banlists, spamlists also have web interfaces Malicious Servers Clients P2P Popular servers (e.g., gaming) IP addresses are listed 4

Detecting Application Usage Trends Can we infer what applications people are using across the world without having access to network traces? 5

Traffic Classification Problem traffic classification Current approaches (port-based, payload signatures, numerical and statistical etc.) Our approach Use information about destination IP addresses available on the Internet 6

Methodology Web Classifier and IP Tagging IP Address xxx.xxx.xxx.xxx 58.61.33.40 QQ Chat Server Rapid Match IP tagging URL Hit text URL Hit text URL Hit text.. Search hits Domain name Domain name. Keywords Keywords. Website cache 7

Evaluation Ground Truth from Traces France China United States No traces available Packet level traces available Sampled NetFlow available Brasil Packet level traces available 8

Inferring Active IP Ranges in Target Networks Actual endpoints from trace Google hits XXX.163.0.0/17 network range Overlap is around 77% 9

Application Usage Trends 10

Correlation Between Network Traces and UEP 11

Traffic Classification 12

Traffic Classification 165.124.182.169 Mail server 193.226.5.150 Website 68.87.195.25 Router 186.25.13.24 Halo server Tagged IP Cache Hold a small % of the IP addresses seen Is this scalable? Look at source and destination IP addresses and classify traffic 13

Traffic Classification 5% of the destinations sink 95% of traffic 14

BLINC vs. UEP UEP BLINC (SIGCOMM 2008) 2005, NANOG 35) - Uses Works information in the dark available (doesn t on examine the webpayload) - Constructs Uses graphlets a semantically to identify rich traffic endpoint patterns database - Very Uses flexible thresholds (can to be further used classify in a variety traffic of scenarios) 15

BLINC vs. UEP (cont.) Traffic[%] UEP classifies twice as much traffic as BLINC BLINC doesn t find some categories UEP also provides better semantics Classes can be further divided into different services 16

UEP vs. Signature-based Traffic[%] Unconstrained Endpoint Profiling based Traffic Classification Based on ip-addresses L7 signature based UEP has comparable performance

Working with Sampled Traffic Sampled data is considered to be poorer in information However ISPs consider scalable to gather only sampled data X X X X X X X X Each packet has a 1/Sampling rate chance of being kept (Cisco Netflow) 18

Working with Sampled Traffic Most of the popular IP addresses still in the trace A quarter of the IP addresses still in the trace at sampling rate 100 19

Working with Sampled Traffic When no sampling is done UEP outperforms BLINC UEP maintains a large classification ratio even at higher sampling rates BLINC stays in the dark 2% at sampling rate 100 UEP retains high classification capabilities with sampled traffic 20

Endpoint Clustering Performed clustering of endpoints in order to cluster out common behavior Please see the paper for detailed results Real strength: We managed to achieve similar results both by using the trace and only by using UEP 21

Conclusions Key contribution: Shift research focus from mining operational network traces to harnessing information that is already available on the web Our approach can: Predict application and protocol usage trends in arbitrary networks Dramatically outperform classification tools Retain high classification capabilities when dealing with sampled data 22

Thanks Ionut Trestian, Soups Ranjan, Aleksandar Kuzmanovic, Antonio Nucci http://ccr.sigcomm.org/online/?q=node/396