Chapter-8. Conclusion and Future Scope

Similar documents
Panda Security. Protection. User s Manual. Protection. Version PM & Business Development Team

Spam Classification Documentation

Spam Detection ECE 539 Fall 2013 Ethan Grefe. For Public Use

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

MEASURING AND FINGERPRINTING CLICK-SPAM IN AD NETWORKS

Is the Best Defense a Good Offense? Christopher T. Pierson, CIPP/US, CIPP/G James T. Shreve, CIPP/US, CIPP/IT

A1. Actions which have been undertaken by Governments

Non-linearity and spatial correlation in landslide susceptibility mapping

Federal Trade Commission Protecting Consumer Privacy. J. Howard Beales, III, Director Bureau of Consumer Protection Federal Trade Commission

2. On classification and related tasks

Countering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008

Lecture #11: The Perceptron

5 Learning hypothesis classes (16 points)

The commission communication "towards a general policy on the fight against cyber crime"

PTLGateway Acceptable Use Policy

CYBERCRIME LEGISLATION DEVELOPMENT IN NIGERIA AN UPDATE. Octopus Conference, Strasbourg 06 June, 2012

Decision Science Letters

NIGERIAN CYBERCRIME LAW: WHAT NEXT? BY CHINWE NDUBEZE AT THE CYBER SECURE NIGERIA 2016 CONFERENCE ON 7 TH APRIL 2014

Online Survey on Spamming Issues

2 nd ARF Seminar on Cyber Terrorism PAKISTAN S PERSPECTIVE AND EXPERIENCE WITH REFERENCE TO CERT IN COMBATING CYBER TERRORISM

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

INFORMATION SECURITY-SECURITY INCIDENT RESPONSE

Train employees to avoid inadvertent cyber security breaches

Linear methods for supervised learning

Non-ML Anti-Spamming: A Role Based Solution

SCSUG 2017 Classifying and Predicting Spam Messages using Text Mining in SAS Enterprise Miner

Business Logic Attacks BATs and BLBs

GFI Product Comparison. GFI MailEssentials vs Sophos PureMessage

Cyber Security. February 13, 2018 (webinar) February 15, 2018 (in-person)

The State of Spam A Monthly Report August Generated by Symantec Messaging and Web Security

Backpropagation Learning Algorithms for Classification.

Factors that affects deliverability

Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data

Security & Phishing

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

6.034 Quiz 2, Spring 2005

DDoS Detection in SDN Switches using Support Vector Machine Classifier

Search Engines. Information Retrieval in Practice

An Empirical Performance Comparison of Machine Learning Methods for Spam Categorization

Functional Skills Mathematics

Internet Security Threat Report Volume XIII. Patrick Martin Senior Product Manager Symantec Security Response October, 2008

An Empirical Study of Lazy Multilabel Classification Algorithms

Performance Analysis of Data Mining Classification Techniques

Phishing URLs and Decision Trees. Hitesh Dharmdasani

Classifying and Predicting Spam Messages Using Text Mining in SAS Enterprise Miner Session ID: 2650

spam goes mobile Comverse User Forum 29th June 2005, Marbella (Spain)

Final Exam. Introduction to Artificial Intelligence. CS 188 Spring 2010 INSTRUCTIONS. You have 3 hours.

Detecting Spam with Artificial Neural Networks

Panda Security 2010 Page 1

Code of Practice. Mobile Spam. 1.0 February Official Document PPC.01. Security Classification Category (see next page)

2. Design Methodology

UNODC tackling cybercrime in support of a safe and secure AP-IS

Support Vector Machines

Legal Foundation and Enforcement: Promoting Cybersecurity

CYBER CRIME A COMPARATIVE LAW ANALYSIS SANDRA MARIANA MAAT. submitted in part fulfilment of the requirements for the degree of MAGISTER LEGUM.

Introduction This paper will discuss the best practices for stopping the maximum amount of SPAM arriving in a user's inbox. It will outline simple

Fighting Spam, Phishing and Malware With Recurrent Pattern Detection

Collaborative Spam Mail Filtering Model Design

Project Report. Prepared for: Dr. Liwen Shih Prepared by: Joseph Hayes. April 17, 2008 Course Number: CSCI

Application of Support Vector Machine Algorithm in Spam Filtering

AN EFFECTIVE SPAM FILTERING FOR DYNAMIC MAIL MANAGEMENT SYSTEM

GFI product comparison: GFI MailEssentials vs. McAfee Security for Servers

Lecture 3: Linear Classification

Advanced Filtering. Tobias Eggendorfer

Best Customer Services among the E-Commerce Websites A Predictive Analysis

Preparing for Canada s Anti-Spam Legislation (CASL) Miyo Yamashita, Partner Sylvia Kingsmill, Senior Manager

Law Enforcement Recommended RAA Amendments and ICANN Due Diligence Detailed Version

CS229 Final Project: Predicting Expected Response Times

FAQ. Usually appear to be sent from official address

Service Level Agreement for Microsoft Online Services

Concept Note: GIDC. Feasibility Study(F/S) on Government Integrated Data Center (GIDC) for the Republic of Nicaragua

How the GDPR will impact your software delivery processes

Syllabus for P.G. Diploma in Cyber Law and Information Technology

Risk Outlook Anti money Laundering and Cybercrime. Steve Wilmott and George Hawkins

Performance Measures

Detecting Spammers with SNARE: Spatio-temporal Network-level Automatic Reputation Engine

International Journal of Computer Engineering and Applications, Volume XI, Issue VIII, August 17, ISSN

Multiview Pedestrian Detection Based on Online Support Vector Machine Using Convex Hull

Fighting the. Botnet Ecosystem. Renaud BIDOU. Page 1

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

BitDefender Antispam NeuNet

Diagnosis of Spams Some Statistical Considerations

News English.com Ready-to-use ESL / EFL Lessons

Machine Learning for NLP

6 Model selection and kernels

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Naïve Bayes for text classification

Insider Threat Program: Protecting the Crown Jewels. Monday, March 2, 2:15 pm - 3:15 pm

Lecture 9: Support Vector Machines

Security Gap Analysis: Aggregrated Results

A Comparison of Text-Categorization Methods applied to N-Gram Frequency Statistics

Fraud Detection using Machine Learning

Cisco Cyber Range. Paul Qiu Senior Solutions Architect June 2016

Review of Phishing Detection Techniques

Tools, Techniques, and Methodologies: A Survey of Digital Forensics for SCADA Systems

COMPUTER FORENSICS (CFRS)

The Global Cybercrime Industry

Spam Evolution Report: October 2009

ICORS Terms of Service

Transcription:

Chapter-8 Conclusion and Future Scope This thesis has addressed the problem of Spam E-mails. In this work a Framework has been proposed. The proposed framework consists of the three pillars which are Legislative measures, Behavioural measures and Technical measures. These three pillars have equal importance to fight against the problem of Spam E- mails. After studying Legislative, Behavioural and Technical measures important conclusions are drawn. These conclusions are included to propose an effective framework for Spam management. This chapter consists of three sections which include conclusion and summary of Legislative, Behavioural and Technological measures in which findings of each measure are summarized. The last section of this chapter focuses on directions for future research. 8.1 Legislative Measures The study of legislative measures is carried out which consist of study of current legislative mechanism implemented all over the world to fight against the problem of Spam E-mails. The parameters such as type of subscription, scope of the subscription, the type of sender as well as receiver and group of possible accusers are considered for this study. In India, no Anti-Spam law and general ID theft laws are implemented but, relevant provisions have been made in the criminal law, which includes the reporting regarding identity theft and related issues. For addressing cyber security and privacy issues several amendments have been made to the Information Technology Act 2000 (IT ACT 2000) which was notified on 17th October, 2000 by the Indian Parliament. In India it is need to have separate Anti-Spam legislation. The summary of study carried out on Legislative measures is as follows:- It is found that, only few countries have enacted on Spam legislation which also includes identity theft legislation. Traditional provisions are also made including fraud, forgery, and cybercrime. In India it is need to have separate Anti-Spam legislation. 86

Different countries are having different legislations with variety of options, the method of investigation including prosecution are also varying in nature. This variation will lead to situation where investigation process of one country will be blocked by another country. So, there is need to have a homogeneous legislation on Spam E-mail all over the World. Lack of reporting mechanism. Only few countries have provided reporting mechanism which are either online or offline. It is advisable that, each country should establish at least one single online reporting mechanism using which samples of Spam E-mails and incidents of Spamming can be reported. Only two metro cities Mumbai and Bangalore in India, is having online mechanism for reporting identity theft, which does not include Spamming. The users should be aware of these reporting mechanisms as well as the provisions of punishment made under Anti-Spam law for the effective implementation of it. The reporting mechanisms should also provide appropriate information to the victims regarding follow-up and action taken so far on the complaints registered by them. The list of Spammers who have been punished for Spamming should be published with wide publicity. The reporting mechanisms would become a useful data collection tool, which can be useful for Content based Filter to understand the current pattern of Spam E- mails for the purpose of updating it. 8.2 Behavioural Measures The study of behavioural measure is carried out with the objective to find out behavioural pattern which may be common in sending Spam E-mails. This pattern found to be useful to set a foundation for technological measures for proposing an Anti-Spam solution. The study of E-mail delivery pattern is carried out. The content analysis of header part and body part has been carried out. The content analysis carried out which has played an important role for the Content based Filter proposed in technological measures. The summary of behavioural study which is mentioned below is used and found very useful while suggesting the technological solution to block Spam E-mail. The E-mails which contains almost all words of subject field or body of an E- mail or both are in uppercase, then it is definitely Spam E-mails. 87

The E-mail which do have subject field empty is definitely Spam. The E-mail which has different domain names in From field and Reply-to field is Spam. Some Spam E-mails contains many E-mail addresses in To field. Presence of many E-mail addresses in To field shows that it is Spam. Many Spam E-mails does not contain E-mail address at To field generally, it is added to CC or BCC field. The Spam E-mail does not contain information in the field Return-path. It is also observed that, some E-mails which has typical words or combination of these in the From field such as NOKIA MOBILE LOTTREY DRAW, Promo Enlargement, BBC NATIONAL LOTTERY, UNITED KINGDOM LOTTERY, COCA COLA DRAW, Free Trial Men s Supplement are Spam. During this behavioural study, some words are identified presence or combination of these words increases the chances of E-mail being Spam. These words are WON POUNDS, job offers, UK-LOTTERY, huge stick, increase your length, desired proportion and size, Customer Survey, WON 500,000GBP, LOAN OFFER!!, WINNING NOTIFICATION..!!, making money, income going down, LOTTERY DRAW, Weight Loss, Diet, WON 750,000.GBP, SEX PILL, Buy Viagra at Half Price, Winner, MyDailyFlog!, HasDonated (,,500,000.GBP) etc. Some Spammers intentionally break-up the words or misspelling the Spam words in order to bypass filtering mechanisms. 8.3 Technological Measures In order to propose technological solution to block the Spam E-mail, initially existing solutions are implemented. The Anti-Spam Framework has been proposed which consists of combination of Origin based Filters with Content based Filters. The Origin based Filter such as White-list and Black-list are implemented. The Challenge Response System which is used to differentiate between human and machines is implemented The drawback of C-R System are solved by proposing the architectures. 88

After studying the content of Spam E-mail in behavioural measure, the process of feature extraction is applied on the standard dataset Enron, LingSpam, PU123A and PEM based on the pattern important features are extracted. The Content based Filters with machine learning based classifiers and semantic similarity with edge based classifier are implemented. The machine learning based classifier including Decision Tree, Rough Sets, k-nearest Neighbor (k-nn) and Support Vector Machine (SVM) are implemented. The Rough Set classifier is implemented with various rule generation methods such as Genetic Algorithm, Learn by Example Method (LEM), Covering Algorithm, and Exhaustive Algorithm. The SVM is implemented with various kernel functions like Linear Kernel, Multi Layer Perceptrons, Quadratic Functions, Radial Basis Function. These classifiers are executed on the extracted features of standard dataset Enron, LingSpam, PU123A, Spambase and on PEM. The frequency of occurrences is the meaningful attribute added to the features which are extracted and it has contributed for improvement in results. The overall performance of SVM using polynomial kernel is high for PU123A, LingSpam and PEM datasets. In the polynomial kernel the degree of polynomial is three and classification categories are two (such as Spam and Ham). The hyper plane formed using SVM Polynomial does binary classification since, input data is linearly separable, therefore the results achieved are promising. During empirical analysis it is found that, accurate feature extraction has reduced the gap between low level features and high level feature of an E-mail. Thus, the accuracy of Spam filtering is improved and Spam misclassification is reduced. The empirical analysis of ML based classifiers shows that, the Naive Bayesian classifier is suitable classifier for the dataset like Enron while, Rough Set with Genetic Algorithm is suitable for the dataset Spambase. The SVM with polynomial kernel outperforms on dataset like LingSpam, PU123A and PEM. The experimental results show that, the ML based classifier is both effective and efficient Anti-Spam filter. The Content based Filter using semantic similarity with edge based classifier is implemented with the intent to improve the results of machine learning classifier. The results are compared with machine learning based classifier. Table-7.8 shows that, semantic similarity with edge based approach outperforms ML based classifiers with misclassification such as false positive and false negative is almost zero. 89

The Content based Filter is made adaptive in nature to improve the accuracy of filter during the course of time. It has proved that, semantic relationship specifically synonyms plays an important role in Spam classification. The semantic similarity with edge based classifier has advantage that, it do not depend on the corpora. The experimental results outperform previous machine learning based classifiers also it has reduced the misclassification. The overall analysis shows that, Naive Bayesian, SVM with Polynomial Kernel and Semantic Similarity with Edge based approach classifier are promising techniques that can be applied to fight against the problem of Spam E-mails. Finally, the combination of Origin based Filter with Content based Filter would produce the optimal results The results clearly demonstrate that, the proposed Anti- Spam Framework can effectively filter the Spam E-mail with very less misclassification (as 100 % classification is impossible) since, it is adaptive in nature. 8.4 Future Scope Though, thesis has made efforts towards solving the problem of Spam E-mail using legislative, behavioural and technological measures, the solution proposed are not complete solutions. The problem of Spam E-mail and Anti-Spam solution is game of cat and mouse since, every day Spammer will come up with new techniques of sending Spam E-mails. This work has given the potential direction for classification of the Spam E-mails. The future efforts would be extended towards: Achieving accurate classification, with zero percent (0%) misclassification of Ham E-mail as Spam and Spam E-mail as Ham. The efforts would be applied to block Phishing E-mails, which carries the phishing attacks and now-days which is more matter of concern. Also, the work can be extended to keep away the Denial of Service attack (DoS) which has now, emerged in Distributed fashion called as Distributed Denial of Service Attack (DDoS). 90