Detecting Spam with Artificial Neural Networks

Similar documents
CSI5387: Data Mining Project

Spam Detection ECE 539 Fall 2013 Ethan Grefe. For Public Use

Character Recognition

Practical Tips for using Backpropagation

Perceptron-Based Oblique Tree (P-BOT)

Establishing Virtual Private Network Bandwidth Requirement at the University of Wisconsin Foundation

Artificial Neuron Modelling Based on Wave Shape

Applications of Machine Learning on Keyword Extraction of Large Datasets

Deep Learning With Noise

Implementing Machine Learning in Earthquake Engineering

Detecting Network Intrusions

CS231A Course Project Final Report Sign Language Recognition with Unsupervised Feature Learning

Project Report. Prepared for: Dr. Liwen Shih Prepared by: Joseph Hayes. April 17, 2008 Course Number: CSCI

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Notes on Multilayer, Feedforward Neural Networks

CSC 2515 Introduction to Machine Learning Assignment 2

Louis Fourrier Fabien Gaie Thomas Rolf

Probabilistic Abstraction Lattices: A Computationally Efficient Model for Conditional Probability Estimation

Study of Residual Networks for Image Recognition

Automated Crystal Structure Identification from X-ray Diffraction Patterns

Neuron Selectivity as a Biologically Plausible Alternative to Backpropagation

The Data Mining Application Based on WEKA: Geographical Original of Music

Performance analysis of a MLP weight initialization algorithm

Team Members: Yingda Hu (yhx640), Brian Zhan (bjz002) James He (jzh642), Kevin Cheng (klc954)

Noise-based Feature Perturbation as a Selection Method for Microarray Data

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Performance Analysis of Data Mining Classification Techniques

CS 4510/9010 Applied Machine Learning

A Neural Network Model Of Insurance Customer Ratings

The Detection of Faces in Color Images: EE368 Project Report

Deep Tracking: Biologically Inspired Tracking with Deep Convolutional Networks

CS229 Final Project: Predicting Expected Response Times

In this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.

Report: Privacy-Preserving Classification on Deep Neural Network

Optimizing Number of Hidden Nodes for Artificial Neural Network using Competitive Learning Approach

A Semi-Supervised Approach for Web Spam Detection using Combinatorial Feature-Fusion

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Automatic Labeling of Issues on Github A Machine learning Approach

CS420 Project IV. Experimentation with Artificial Neural Network Architectures. Alexander Saites

Learning highly non-separable Boolean functions using Constructive Feedforward Neural Network

Face Detection Using Radial Basis Function Neural Networks With Fixed Spread Value

Deep Learning. Visualizing and Understanding Convolutional Networks. Christopher Funk. Pennsylvania State University.

Back propagation Algorithm:

arxiv: v2 [cs.lg] 11 Sep 2015

Graph Neural Network. learning algorithm and applications. Shujia Zhang

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Emotion Detection using Deep Belief Networks

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Enabling a Robot to Open Doors Andrei Iancu, Ellen Klingbeil, Justin Pearson Stanford University - CS Fall 2007

3 Virtual attribute subsetting

Reducing the SPEC2006 Benchmark Suite for Simulation Based Computer Architecture Research

HYBRIDIZED MODEL FOR EFFICIENT MATCHING AND DATA PREDICTION IN INFORMATION RETRIEVAL

Assignment 1: CS Machine Learning

LSTM: An Image Classification Model Based on Fashion-MNIST Dataset

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

COMBINING NEURAL NETWORKS FOR SKIN DETECTION

Motivation. Problem: With our linear methods, we can train the weights but not the basis functions: Activator Trainable weight. Fixed basis function

Homework 2: Search and Optimization

PROBLEM FORMULATION AND RESEARCH METHODOLOGY

EECS 349 Machine Learning Homework 3

Creating a Classifier for a Focused Web Crawler

Analysis of Modified Rule Extraction Algorithm and Internal Representation of Neural Network

Cover Page. The handle holds various files of this Leiden University dissertation.

A Fast Personal Palm print Authentication based on 3D-Multi Wavelet Transformation

A study of classification algorithms using Rapidminer

Applying Supervised Learning

Correlation Based Feature Selection with Irrelevant Feature Removal

Chapter-8. Conclusion and Future Scope

Optimization Model of K-Means Clustering Using Artificial Neural Networks to Handle Class Imbalance Problem

An Implementation on Histogram of Oriented Gradients for Human Detection

Locality through Modular Network Topology

Classifying Depositional Environments in Satellite Images

2. Neural network basics

Transactions on Information and Communications Technologies vol 16, 1996 WIT Press, ISSN

DEVELOPMENT OF NEURAL NETWORK TRAINING METHODOLOGY FOR MODELING NONLINEAR SYSTEMS WITH APPLICATION TO THE PREDICTION OF THE REFRACTIVE INDEX

Classification of Procedurally Generated Textures

Sentiment Classification of Food Reviews

Constructing Hidden Units using Examples and Queries

Neural Nets. General Model Building

Allstate Insurance Claims Severity: A Machine Learning Approach

Lossy Compression of Scientific Data with Wavelet Transforms

Data Mining. Neural Networks

Rise-Time Enhancement Techniques for Resistive Array Infrared Scene Projectors

Simple Model Selection Cross Validation Regularization Neural Networks

Subgraph Matching Using Graph Neural Network

CS 540: Introduction to Artificial Intelligence

In this assignment, we investigated the use of neural networks for supervised classification

Boosting Simple Model Selection Cross Validation Regularization

Exploring Econometric Model Selection Using Sensitivity Analysis

Network Traffic Classification Based on Deep Learning

Comparison Study of Different Pattern Classifiers

A neural network that classifies glass either as window or non-window depending on the glass chemistry.

4.1 Review - the DPLL procedure

FACE DETECTION AND RECOGNITION OF DRAWN CHARACTERS HERMAN CHAU

Gesture Recognition using Neural Networks

PARAMETER OPTIMIZATION FOR AUTOMATED SIGNAL ANALYSIS FOR CONDITION MONITORING OF AIRCRAFT SYSTEMS. Mike Gerdes 1, Dieter Scholz 1

Artificial Neural Networks (Feedforward Nets)

Evolving SQL Queries for Data Mining

Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help?

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

Transcription:

Detecting Spam with Artificial Neural Networks Andrew Edstrom University of Wisconsin - Madison Abstract This is my final project for CS 539. In this project, I demonstrate the suitability of neural networks for the task of classifying spam emails. I discuss how I was able to attain a classification accuracy of 94.6% through minor changes in network configuration and the momentum alpha parameter, ultimately outperforming existing research on this same dataset. Keywords: Artificial Intelligence, Machine Learning, Neural Networks, Spam Detection I. Introduction Neural networks are powerful tools for any machine learning task which involves classification. They are utilized in a wide range of applications including recommendation engines, computer vision, and dashboard customization. Because of their versatility, they are emerging as one of the primary tools in the machine learning professional s toolkit. However, neural networks are not as widely used in spam email classification as one might expect. Instead, most modern spam filters employ naïve Bayes classifiers, due in large part to Paul Graham s famed article A Plan For Spam. Naïve Bayes is a great approach for spam classification with high accuracy and a low false-positive rate, but by itself it may not be enough to achieve the 99.99+% accuracy which we would like to see.

Google reported that introducing neural networks into Gmail s spam filters took them from 99.5% to 99.9% accuracy, suggesting that neural networks may be useful for enhancing spam filters, especially when used in conjunction with Bayesian classification and other methodologies. However, there is not much research on the use of neural networks for spam detection, and most of the existing research holds the network configuration, momentum, and learning rate as a constant, investigating the effectiveness of the network across datasets rather than the suitability of different network configurations for the task. In my project, I have done the opposite, holding the dataset constant while adjusting the network configuration and parameters, in order to find the ideal network configuration for spam classification. II. Work Performed Because I wanted to focus on network configuration rather than the dataset preparation, I chose to use the UCI Spambase dataset. In this dataset, each email is assigned a label of spam or not spam. There are 4601 emails, all of which have been processed to extract a number of features, including the frequency of certain spammy words and the amount of capital letters used. Before performing any experiments, I randomized this dataset once. I used this same random ordering in each of my trials, so as to not skew my results. To implement my neural network, I first attempted to use Caffe, a deep learning library from UC Berkeley. However, I encountered numerous problems while attempting to get it to build on my computer. I found Caffe to be a very poorly-documented open-source library which depended on numerous other poorly-documented open-source libraries, all of which were themselves quite tricky to install. It was shamefully complicated to even obtain all of the dependencies, some of which requiring days of waiting for an application to be approved before they could even be download. Unfortunately, in order to compile Caffe one must install all the correct versions of all the correct

libraries and put them in the correct location in the file system, which is completely different for each library. Once the dependencies are downloaded and installed, you must set numerous environment variables and manually configure a makefile of several hundred lines. Each time you make a change to any one of these pieces along the way, it takes about 30 minutes of compilation to determine whether it fixed the problem or not. After well over 10 hours of fierce conflict with Caffe, I decided to explore other options. After playing with several libraries, I settled on modifying a Matlab implementation of a feedforward MLP with backpropagation by Hesham Eraqi. I chose this implementation as a basis for my project because it made it easy to change the network configuration, momentum alpha, number of epochs, and learning rate, all by changing a single line in the Configurations/Parameters section. Eraqi s MLP implementation only supported calculation of training error, so I added code to evaluate the network with a testing set once it had finished training. I added additional code to calculate and display final results. After all 10 trials have completed, the testing errors are averaged. The average testing error and the average training error are both displayed, because if there is a large difference between the two this is a good indicator that the network is having a problem of overfitting. I also display the network configuration, learning rate, and momentum alpha. After some initial exploratory trials, I found that a learning rate of.1 was ideal. Trials with several epoch sizes between 200 and 2000 showed that increasing the number of epochs did not give any improvement in accuracy beyond 199 epochs, so I used 199 epochs for each trial. My actual experiments consisted of 29 trials, each with its own configuration and parameter settings. III. Results Figure 1 shows how all of my experiments yielded accuracies in the 92-95% range, demonstrating that neural networks have a fairly high accuracy regardless of the configuration or

parameters used. Across my trials I tested a wide range of configurations, from a single hidden layer of eight neurons, to two hidden layers of five neurons, to three hidden layers of 50, 50, and 200 neurons. It seems that any neural network will perform fairly well, no matter its set-up, but through fine tuning we can increase the performance by several percentage points. Figure 1 I tested several numbers of hidden layers (Figure 2), and I tried many different sizes for each layer. However, no matter the number of neurons per layer, a single layer proved to be ideal. Both my lowest error and my lowest average error across trials came from networks with one layer.

Figure 2 Once I determined that one layer was sufficient, I tried several different numbers of neurons for this layer (Figure 3). Preliminary tests showed that any number over 15 caused overfitting, however I did one experiment with 40 just to confirm. Interestingly, 11 performed best, outperforming both 10 and 12 by almost 0.5%. Combined with the previous experiment, it became clear that a simple network always worked best. My best results came from networks with one hidden layer of 11 hidden neurons. Networks that were larger either vertically or horizontally often got a very low average training error sometimes below 0.5% while testing error increased past 6%. I took this as a clear sign of overfitting.

Figure 3 The final parameter I tried adjusting was the momentum alpha (Figure 4). This variable had a surprisingly large effect on the error rate. I performed several experiments, holding the network configuration and learning rate constant, and found that networks with a momentum alpha of 0.1 dramatically outperformed those with momentum alphas that were higher or lower.

Figure 4 IV. Conclusions Through my experiments, I found that the best configuration for spam detection on the UCI Spambase dataset with a neural network is 11 hidden neurons in a single hidden layer, and a momentum alpha of 0.1. My results confirmed the findings of Idris, who used a neural network to classify spam on this same dataset and attained an accuracy of 94.3%. Most of my results fell in this general range, though after tweaking and experimentation I was able to train a network that slightly beat their best result. Using what I found to be the ideal configuration, I attained an accuracy of 94.6%. This goes to show that fine tuning of network configuration and parameters is quite important in neural network research. Even though all neural networks will perform quite well,

adding just a single neuron can have a nontrivial effect on error rate. In my case, tiny changes like this sometimes reduced my error rate by as much as half of a percent. V. Future Work Further researchers on this topic might consider looking at the false-positive rate of networks with different configurations. In spam detection, false positives are essentially unacceptable, and one of the primary advantages of naïve Bayes is that it promises a low falsepositive rate. If one could develop a neural network with a very low false-positive rate, neural networks would seem a much more viable option for commercial spam detection. It would be quite interesting to see whether the network which yielded the lowest error rate also yielded the lowest false-positive rate. Graham, P. (2002). A Plan for Spam. References Idris, I. (2014). E-mail Spam Classification with Artificial Neural Network and Negative Selection Algorithm. International Journal of Computer Science, 1. Massey, B., et al., Learning Spam: Simple Techniques for Freely-Available Software, Proceedings of Freenix Track 2003 Usenix Annual Technical Conference, Online!, Jun. 9, 2003, pp. 63-76, Berkley, CA, USA.

Metz, Cade. "Google Says Its AI Catches 99.9 Percent of Gmail Spam." Wired.com. July 09, 2015. Accessed May 12, 2016. http://www.wired.com/2015/07/google-says-ai-catches-99-9- percent-gmail-spam/all/1. Sallab, A. A., & Rashwan, M. A. (2012). E-Mail Classification Using Deep Networks. Journal of Theoretical and Applied Information Technology, 37(2), 241-251.