Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine

Similar documents
SNARE: Spatio-temporal Network-level Automatic Reputation Engine

PreSTA: Preventing Malicious Behavior Using Spatio-Temporal Reputation. Andrew G. West November 4, 2009 ONR-MURI Presentation

The evolution of malevolence

For example, if a message is both a virus and spam, the message is categorized as a virus as virus is higher in precedence than spam.

Revealing Botnet Membership Using DNSBL Counter-Intelligence

BOTNET-GENERATED SPAM

PreSTA: Preventing Malicious Behavior Using Spatio-Temporal Reputation. Andrew G. West November 4, 2009 ONR-MURI Presentation

Sender Reputation Filtering

Detecting and Quantifying Abusive IPv6 SMTP!

Security Gap Analysis: Aggregrated Results

arxiv: v1 [cs.cr] 7 May 2012

GFI product comparison: GFI MailEssentials vs. McAfee Security for Servers

An Empirical Study of Behavioral Characteristics of Spammers: Findings and Implications

Cisco s Appliance-based Content Security: IronPort and Web Security

Fighting Spam, Phishing and Malware With Recurrent Pattern Detection

Spamming Botnets: Signatures and Characteristics

Introduction to Antispam Practices

Structural and Temporal Properties of and Spam Networks

Be certain. MessageLabs Intelligence: May 2006

Ethical Hacking and. Version 6. Spamming

exam. Number: Passing Score: 800 Time Limit: 120 min File Version: CHECKPOINT

Franzes Francisco Manila IBM Domino Server Crash and Messaging

Protecting DNS Critical Infrastructure Solution Overview. Radware Attack Mitigation System (AMS) - Whitepaper

Testing? Here s a Second Opinion. David Koconis, Ph.D. Senior Technical Advisor, ICSA Labs 01 October 2010

MARCH Covering the global threat landscape. VBSPAM COMPARATIVE REVIEW MARCH 2018 Martijn Grooten & Ionuţ Răileanu RESULTS

GFI product comparison: GFI MailEssentials vs. Barracuda Spam Firewall

Spam Mitigation using Spatio temporal Reputations from Blacklist History*

GFI Product Comparison. GFI MailEssentials vs Sophos PureMessage

Test-king q

GFI product comparison: GFI MailEssentials vs. Trend Micro ScanMail Suite for Microsoft Exchange

Creating Threat Prevention Connection Rules

Collaborative Spam Mail Filtering Model Design

An Empirical Study of Spam and Spam Vulnerable Accounts

GFI product comparison: GFI MailEssentials vs Symantec Mail Security for Microsoft Exchange 7.5

Tracking Evil with Passive DNS

Commtouch Messaging Security for Hosting Providers

Documentation for: MTA developers

Intelligent and Secure Network

Fireware-Essentials. Number: Fireware Essentials Passing Score: 800 Time Limit: 120 min File Version: 7.

Mailspike. Henrique Aparício

Contents. Introduction. WSA WebBase Network Participation

Imma Chargin Mah Lazer

Efficacious Spam Filtering and Detection in Social Networks

The Challenge of Spam An Internet Society Public Policy Briefing

FortiGuard Antispam. Frequently Asked Questions. High Performance Multi-Threat Security Solutions

An Analysis of Correlations of Intrusion Alerts in an NREN

MARCH Covering the global threat landscape. VBSPAM SECURITY COMPARATIVE REVIEW MARCH 2019 Martijn Grooten & Ionuţ Răileanu RESULTS

ADVANCED THREAT PREVENTION FOR ENDPOINT DEVICES 5 th GENERATION OF CYBER SECURITY

Vendor: Cisco. Exam Code: Exam Name: ESFE Cisco Security Field Engineer Specialist. Version: Demo

BOTMAGNIFIER: Locating Spambots on the Internet

Mitigating Spam Using Spatio-Temporal Reputation

Handling unwanted . What are the main sources of junk ?

Basic Concepts in Intrusion Detection

CPSC156a: The Internet Co-Evolution of Technology and Society

Deliverability Terms

Defining Which Hosts Are Allowed to Connect Using the Host Access Table

I G H T T H E A G A I N S T S P A M. ww w.atmail.com. Copyright 2015 atmail pty ltd. All rights reserved. 1

Defining Which Hosts Are Allowed to Connect Using the Host Access Table

CISCO NETWORKS BORDERLESS Cisco Systems, Inc. All rights reserved. 1

Domain name system black list false reporting attack

Collaborative Filtering. Doug Herbers Master s Oral Defense June 28, 2005

Postscreen for Zimbra

CIS 551 / TCOM 401 Computer and Network Security. Spring 2007 Lecture 12

Managing SonicWall Gateway Anti Virus Service

Managing Spam. To access the spam settings in admin panel: 1. Login to the admin panel by entering valid login credentials.

A Reputation-based Collaborative Approach for Spam Filtering

Mining Web Data. Lijun Zhang

Design and Implementation of A P2P Cooperative Proxy Cache System

Security Architect Northeast US Enterprise CISSP, GCIA, GCFA Cisco Systems. BRKSEC-2052_c Cisco Systems, Inc. All rights reserved.

A brief Incursion into Botnet Detection

UP & DOMAIN ADMINISTRATION GUIDE

Mapping Internet Sensors with Probe Response Attacks

Automating Security Response based on Internet Reputation

«On the Internet, nobody knows you are a dog» Twenty years later

Detecting Spam Zombies by Monitoring Outgoing Messages

Detecting Abuse in TLDs

Intermediaries and regulation

Technical Approaches to Spam and Standards Activities (ITU WSIS Spam Conference)

SaaS Providers. ThousandEyes for. Summary

Novel Comment Spam Filtering Method on Youtube: Sentiment Analysis and Personality Recognition

Introduction This paper will discuss the best practices for stopping the maximum amount of SPAM arriving in a user's inbox. It will outline simple

Red Condor had. during. testing. Vx Technology high availability. AntiSpam,

Dell Service Level Agreement for Microsoft Online Services

Web Mail and e-scout Instructions

Messaging Anti-Abuse Working Group (MAAWG) Message Sender Reputation Concepts and Common Practices

The Barracuda Web Application Firewall Versus Anonymous. Best Practices for Planning and Defending Against Attacks by Anonymous.

Using Centralized Security Reporting

Internet Architecture

Detecting Wikipedia Vandalism via Spatio- Temporal Analysis of Revision Metadata. Andrew G. West June 10, 2010 ONR-MURI Presentation

Mining Web Data. Lijun Zhang

Introduction to Security. Computer Networks Term A15

Configuring the Botnet Traffic Filter

S a p m a m a n a d n d H a H m 성균관대학교 최형기

Temporal Correlations between Spam and Phishing Websites

HTTP BASED BOT-NET DETECTION TECHNIQUE USING APRIORI ALGORITHM WITH ACTUAL TIME DURATION

The Barracuda Web Application Firewall Versus Anonymous. Best Practices for Planning and Defending Against Attacks by Anonymous.

SIPS: A Stateful and Flow-Based Intrusion Prevention System for Applications

FPGA based Network Traffic Analysis using Traffic Dispersion Graphs

Multi-phase IRC Botnet & Botnet Behavior Detection Model

Objectives CINS/F1-01

Transcription:

Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine Shuang Hao, Nadeem Ahmed Syed, Nick Feamster, Alexander G. Gray, Sven Krasser

Motivation Spam: More than Just a Nuisance Spam: unsolicited bulk emails Ham: legitimate emails from desired contacts 95% of all email traffic is spam (Sources: Microsoft security report, MAAWG and Spamhaus) In 2009, the estimation of lost productivity costs is $130 billion worldwide (Source: Ferris Research) Spam is the carrier of other attacks Phishing Virus, Trojan horses,

Motivation Current Anti-spam Methods Content-based filtering : What is in the mail? More spam format rather than text (PDF spam ~12%) Customized emails are easy to generate High cost to filter maintainers IP blacklist : Who is the sender? (e.g., DNSBL) ~10% of spam senders are from previously unseen IP addresses (due to dynamic addressing, new infection) ~20% of spam received at a spam trap is not listed in any blacklists

Motivation SNARE: Our Idea Spatio-temporal Network-level Automatic Reputation Engine Network-Based Filtering: How the email is sent? Fact: > 75% spam can be attributed to botnets Intuition: Sending patterns should look different than legitimate mail Example features: geographic distance, neighborhood density in IP space, hosting ISP (AS number) etc. Automatically determine an email sender s reputation 70% detection rate for a 0.2% false positive rate

Motivation Why Network-Level Features? Lightweight Do not require content parsing Even getting one single packet Need little collaboration across a large number of domains Can be applied at high-speed networks Can be done anywhere in the middle of the network Before reaching the mail servers More Robust More difficult to change than content More stable than IP assignment

Outline Talk Outline Motivation Data From McAfee Network-level Features Building a Classifier Evaluation Future Work Conclusion

Data Data Source McAfee s TrustedSource email sender reputation system Time period: 14 days October 22 November 4, 2007 Message volume: Each day, 25 million email messages from 1.3 million IPs 1) Email Mail Server User Reported appliances 2,500 distinct appliances ( recipient domains) 3) Feedback Reputation score: certain ham, likely ham, certain spam, likely spam, uncertain Domain 2) Lookup Repository Server

Features Finding the Right Features Question: Can sender reputation be established from just a single packet, plus auxiliary information? Low overhead Fast classification In-network Perhaps more evasion resistant Key challenge What features satisfy these properties and can distinguish spammers from legitimate senders?

Features Network-level Features Feature categories Single-packet features Single-header and single-message features Aggregate features A combination of features to build a classifier No single feature needs to be perfectly discriminative between spam and ham Measurement study McAfee s data, October 22-28, 2007 (7 days)

Features Summary of SNARE Features Category Single-packet Single -header/ message Aggregate features Features geodesic distance between the sender and the recipient average distance to the 20 nearest IP neighbors of the sender probability ratio of spam to ham when getting the message status of email-service ports on the sender AS number of the sender s IP number of recipient length of message body average of message length in previous 24 hours standard deviation of message length in previous 24 hours average recipient number in previous 24 hours standard deviation of recipient number in previous 24 hours average geodesic distance in previous 24 hours standard deviation of geodesic distance in previous 24 hours Total of 13 features in use

Features Single-packet Based What Is In a Packet? Packet format (incoming SMTP example) IP Header TCP Header SMTP Source IP, Destination IP Destination port : 25 Text Command Empty for the first packet Help of auxiliary knowledge: Timestamp: the time at which the email was received Routing information Sending history from neighbor IPs of the email sender

Features Single-packet Based (1) Sender-receiver Geodesic Distance Legitimate sender close Spammer distant Recipient Intuition: Social structure limits the region of contacts The geographic distance travelled by spam from bots is close to random

Features Single-packet Based (1) Distribution of Geodesic Distance Find the physical latitude and longitude of IPs based on the MaxMind s GeoIP database Calculate the distance along the surface of the earth 90% of legitimate messages travel 2,500 miles or less Observation: Spam travels further

Features Single-packet Based (2) Sender IP Neighborhood Density Subnet Legitimate sender Spammer Recipient Intuition: The infected IP addresses in a botnet are close to one another in numerical space Often even within the same subnet

Features Single-packet Based (2) Distribution of Distance in IP Space IPs as one-dimensional space (0 to 2 32-1 for IPv4) Measure of email sender density: the average distance to its k nearest neighbors (in the past history) For spammers, k nearest senders are much closer in IP space Observation: Spammers are surrounded by other spammers

Features Single-packet Based (3) Local Time of Day At Sender Legitimate sender Spammer Recipient Intuition: Diurnal sending pattern of different senders Legitimate email sending patterns may more closely track workday cycles

Features Single-packet Based (3) Differences in Diurnal Sending Patterns Local time at the sender s physical location Relative percentages of messages at different time of the day (hourly) Spam peaks at different local time of day Observation: Spammers send messages according to machine power cycles

Features Single-packet Based (4) Status of Service Ports Ports supported by email service provider Protocol Port Intuition: SMTP 25 SSL SMTP 465 HTTP 80 HTTPS 443 Legitimate email is sent from other domains MSA (Mail Submission Agent) Bots send spam directly to victim domains

Features Single-packet Based (4) Distribution of number of Open Ports Actively probe back senders IP to check out what service ports open Sampled IPs for test, October 2008 and January 2009 <1% <1% 2% 7% <1% 4% 8% 33% 90% of spamming IPs have none of the standard mail service ports open Spammers 90% 55% Legitimate senders Observation: Legitimate mail tends to originate from machines with open ports

Features Single-packet Based (5) AS of sender s IP Intuition: Some ISPs may host more spammers than others Observation: A significant portion of spammers come from a relatively small collection of ASes* More than 10% of unique spamming IPs originate from only 3 ASes The top 20 ASes host ~42% of spamming IPs *RAMACHANDRAN, A., AND FEAMSTER, N. Understanding the network-level behavior of spammers. In Proceedings of the ACM SIGCOMM (2006).

Features Summary of SNARE Features Category Single-packet Single -header/ message Aggregate features Features geodesic distance between the sender and the recipient average distance to the 20 nearest IP neighbors of the sender probability ratio of spam to ham when getting the message status of email-service ports on the sender AS number of the sender s IP number of recipient length of message body average of message length in previous 24 hours standard deviation of message length in previous 24 hours average recipient number in previous 24 hours standard deviation of recipient number in previous 24 hours average geodesic distance in previous 24 hours standard deviation of geodesic distance in previous 24 hours Total 13 features in use

Classifier SNARE: Building A Classifier RuleFit (ensemble learning) is the prediction result (label score) are base learners (usually simple rules) are linear coefficients Example Rule 1 Rule 2 0.080 + 0 0.080 0.257 Geodesic distance > 63 AND AS in (1901, 1453, ) Port status: no SMTP service listening Feature instance of a message Geodesic distance = 92, AS=1901, port SMTP is open

Outline Talk Outline Motivation Data From McAfee Network-level Features Building a Classifier Evaluation Setup Accuracy Detetcting Fresh Spammers In Paper: Retraining, Whitelisting, Feature Correlation Future Work Conclusion

Evaluation Evaluation Setup Data 14-day data, October 22 to November 4, 2007 1 million messages sampled each day (only consider certain spam and certain ham) Training Train SNARE classifier with equal amount of spam and ham (30,000 in each categories per day) Temporal Cross-validation Temporal window shifting Trial 1 Trial 2 Train Test Data subset

Evaluation Receiver Operator Characteristic (ROC) False positive rate = Misclassified ham/actual ham Detection rate = Detected spam/actual spam (True positive rate) FP under detection rate 70% False Positive Single Packet 0.44% Single Header/Message 0.29% 24+ Hour History 0.20% As a first of line of defense, SNARE is effective

Evaluation Detection of Fresh Spammers Fresh senders IP addresses not appearing in the previous training windows Accuracy Fixing the detection rate as 70%, the false positive is 5.2% SNARE is capable of automatically classifying fresh spammers (compared with DNSBL)

Future Work Future Work Combine SNARE with other anti-spam techniques to get better performance Can SNARE capture spam undetected by other methods (e.g., content-based filter)? Make SNARE more evasion-resistant Can SNARE still work well under the intentional evasion of spammers?

Conclusion Conclusion Network-level features are effective to distinguish spammers from legitimate senders Lightweight: Sometimes even by the observation from one single packet More Robust: Spammers might be hard to change all the patterns, particularly without somewhat reducing the effectiveness of the spamming botnets SNARE is designed to automatically detect spammers A good first line of defense