Improving anti-spam filtering, based on Naive Bayesian and neural networks in multi-agent filters

Similar documents
Lecture 5: Multilayer Perceptrons

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

User Authentication Based On Behavioral Mouse Dynamics Biometrics

Cluster Analysis of Electrical Behavior

X- Chart Using ANOM Approach

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Classifying Acoustic Transient Signals Using Artificial Intelligence

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Machine Learning 9. week

Classifier Selection Based on Data Complexity Measures *

A Binarization Algorithm specialized on Document Images and Photos

Edge Detection in Noisy Images Using the Support Vector Machines

The Research of Support Vector Machine in Agricultural Data Classification

Wishing you all a Total Quality New Year!

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

Backpropagation: In Search of Performance Parameters

Mathematics 256 a course in differential equations for engineering students

An Image Fusion Approach Based on Segmentation Region

For instance, ; the five basic number-sets are increasingly more n A B & B A A = B (1)

A New Feature of Uniformity of Image Texture Directions Coinciding with the Human Eyes Perception 1

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Module Management Tool in Software Development Organizations

Related-Mode Attacks on CTR Encryption Mode

Load Balancing for Hex-Cell Interconnection Network

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

An Anti-Noise Text Categorization Method based on Support Vector Machines *

Support Vector Machines

SHAPE RECOGNITION METHOD BASED ON THE k-nearest NEIGHBOR RULE

Meta-heuristics for Multidimensional Knapsack Problems

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Review of approximation techniques

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Face Recognition Based on SVM and 2DPCA

Simulation Based Analysis of FAST TCP using OMNET++

BAYESIAN MULTI-SOURCE DOMAIN ADAPTATION

Support Vector Machines

The Codesign Challenge

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Decision Strategies for Rating Objects in Knowledge-Shared Research Networks

Lecture 4: Principal components

(1) The control processes are too complex to analyze by conventional quantitative techniques.

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

A Semi-parametric Regression Model to Estimate Variability of NO 2

Programming in Fortran 90 : 2017/2018

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

CS 534: Computer Vision Model Fitting

Distributed Resource Scheduling in Grid Computing Using Fuzzy Approach

Face Recognition Based on Neuro-Fuzzy System

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Visual Thesaurus for Color Image Retrieval using Self-Organizing Maps

Data Mining: Model Evaluation

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

CSCI 5417 Information Retrieval Systems Jim Martin!

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Spam Filtering Based on Support Vector Machines with Taguchi Method for Parameter Selection

Fusion Performance Model for Distributed Tracking and Classification

An Efficient Algorithm for PC Purchase Decision System

Outline. Self-Organizing Maps (SOM) US Hebbian Learning, Cntd. The learning rule is Hebbian like:

Brief Study of Identification System Using Regression Neural Network Based on Delay Tap Structures

Query Clustering Using a Hybrid Query Similarity Measure

Fuzzy Modeling of the Complexity vs. Accuracy Trade-off in a Sequential Two-Stage Multi-Classifier System

A User Selection Method in Advertising System

Audio Content Classification Method Research Based on Two-step Strategy

The Study of Remote Sensing Image Classification Based on Support Vector Machine

An Entropy-Based Approach to Integrated Information Needs Assessment

3. CR parameters and Multi-Objective Fitness Function

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

INTELLECT SENSING OF NEURAL NETWORK THAT TRAINED TO CLASSIFY COMPLEX SIGNALS. Reznik A. Galinskaya A.

Recommended Items Rating Prediction based on RBF Neural Network Optimized by PSO Algorithm

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

S1 Note. Basis functions.

An Improved Neural Network Algorithm for Classifying the Transmission Line Faults

EYE CENTER LOCALIZATION ON A FACIAL IMAGE BASED ON MULTI-BLOCK LOCAL BINARY PATTERNS

Performance Evaluation of an ANFIS Based Power System Stabilizer Applied in Multi-Machine Power Systems

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

Biological Sequence Mining Using Plausible Neural Network and its Application to Exon/intron Boundaries Prediction

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Three supervised learning methods on pen digits character recognition dataset

Virtual Machine Migration based on Trust Measurement of Computer Node

VISUAL SELECTION OF SURFACE FEATURES DURING THEIR GEOMETRIC SIMULATION WITH THE HELP OF COMPUTER TECHNOLOGIES

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

A Method of Hot Topic Detection in Blogs Using N-gram Model

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

THE FUZZY GROUP METHOD OF DATA HANDLING WITH FUZZY INPUTS. Yuriy Zaychenko

GA-Based Learning Algorithms to Identify Fuzzy Rules for Fuzzy Neural Networks

Performance Evaluation of Information Retrieval Systems

Neural Network Control for TCP Network Congestion

Associative Based Classification Algorithm For Diabetes Disease Prediction

Transcription:

J. Appl. Envron. Bol. Sc., 5(7S)381-386, 2015 2015, TextRoad Publcaton ISSN: 2090-4274 Journal of Appled Envronmental and Bologcal Scences www.textroad.com Improvng ant-spam flterng, based on Nave Bayesan and neural networs n mult-agent flters Adeleh Jafar gholbe 1, Mohammad amn Edalatrad 2 1 Department of Computer Scence, Shush Branch, Islamc Azad Unversty, Shush, Iran 2 Department of Computer Scence, Dezful Branch, Islamc Azad Unversty, Dezful, Iran ABSTRACT Receved: March 19, 2015 Accepted: May 2, 2015 Ths artcle wll dscuss offerng pre-processng flter n the stage before processng of flter content. One alternatve has been proposed by whch a Nave Bayesan classfcaton automatcally have been taught to detect spam messages. In ths study, a new verson has been provded based on Nave Bayesan approach by mult-agent flters. The advantage of smultaneous usng of the user nterests, Keywords and content-based valuaton vew s based on ssue. Followng the above verson wll be developed and we dentfy spam usng MLP neural networ algorthm. KEYWORDS: spam, mult-agent flter, emal, nave bayesan, neural networ. 1. INTRODUCTION The problem of unwanted e-mals nown as spam has become a serous problem today, so that 57-80% of the volume of e-mal s spam. Spam creates several problems rather; spam causes traffc and destroys storage space and computng power. Spam causes the users a lot of tme to separate and remove wasted mals and also hurt users and causes nsecurty, fnally spam causes legal problems such as pornography, pyramd schemes and economc fraud as well as Phshng. Now, accordng to studes conducted by Pnon and Stuc about dfferent types of tools and technques of ant-spam, flterng s the most common method to protect aganst spam [1]. The research shows that the spam flterng s a method based on machne learnng that s a very good method among the methods based on learnng n terms of effcency n spam capture and the best value n terms of the lowest false postves. Flterng method s usng of Nave Bayesan method whch has had a better performance than smlar methods such as decson trees and Svm (Support Vector Machne). A study n 1997 found that spam messages almost form 10 percent of receved messages on a shared networ [2]. It seems that the stuaton s gettng worse. Legslatve measures aganst spam are gradually beng concluded, but despte ths they have had lttle and very lmted effect. Currently ant-spam flters are software tools that attempt to automatcally bloc spam messages. 2. Mult-Agent flters Snce spam has several dfferent aspects wth dfferent characterstcs, therefore, methods usng several agents were suggested. An mportant feature of ths method s that the user can f there s only one agent n name of A, when receves x e-mal, based on value of A, the possblty that x s spam or not s shown by PS ( x ). If the possblty of PS ( x ) a value that cannot be decded on the bass of a sngle agent, then sends t to the secton of agent group dentfcaton. In group statea, sends x e-mal to other agents n mult-agent system AC and other AC agents specfy the possblty that X s spam by PS ( x ) and send t to source agent. Agent of source A maes the fnal dagnoss based on other agents. In order to classfy the ablty of any agent, f decson of spam s rght, t receves a postve score and otherwse t receves a negatve score. Methodology In ths method, every agent has the ablty to detect, so they are called heterogeneous agents. To learn any agent we use a number of methods such as: Nave Bayesan, decson tree and neural networ. Learnng of agent s done n two layers: the frst layer, based on pror nowledge of agent, wthout the partcpaton and control of the user. In second layer, accordng to the user's montorng, wth the nowledge of characterstcs of spam and the real e-mal, ads agent to learn more. In fact, every agent used the smart technques for the dagnoss. 2.1. Naïve Bayesan method Ths algorthm s the smplest and most wdely used algorthms, and of machne learnng methods for text classfcaton whch arses from the theory of Bayese decson[3]. Thus ths method s to use a tranng set (whch ncludes a number e- mals that have spam tag or correct mal) to calculate probablty of beng spam or beng correct of any words (features) n the messages. So when a new letter to the user's malbox by usng the probablty of beng spam or beng correct of words, has Correspondng Author: Adeleh Jafar gholbe, Department of Computer Scence, Shush Branch, Islamc Azad Unversty, Shush, Iran 381

Gholbe and Edalatrad, 2015 n t, the possblty that t s a spam or a proper one would be calculated. By usng the Bayese prncple, class occurrence probablty s expressed usng the vector of feature x. To dsplay an e-mal, a feature vector x = {x 1, x 2, x n} s used. n: number of features (words) contaned n that letter. x: obtaned probablty value (va the tranng set) for class C. In order to use Nave Bayesan prncple, t s assumed that the propertes of a message ndependent of each other occur n a letter. However, assumng that feature vector of a mal has features x n, x 2, x 1 the followng formula (1)s resulted. Π n P ( X IC )* P ( C ) P( CIX ) = = 1 Π n P( X ) = 1 So that o(x c): perhaps feature x occurs n class C P(C): perhaps class C occurs n set of mals. P(x ): perhaps word x occurs n message Output of the class that ts probablty s more. (1) 2.2. Neural Networs Neural networs s of the most wdely used and practcal methods of modellng of complex and great problems (ncludng hundreds varable). Neural networs can be used to classfy thngs (the output of a class), or regresson ssues (the output of a numerc value). The neural networ ncludes an nput layer whch each node n ths layer s equvalent to one of the predctor varables. Nodes n the mddle layer are connected n number of nodes n the hdden layer. Each nput node s connected to all nodes n the hdden layer. Nodes n the hdden layer can be connected to other nodes n a hdden layer or can be connected output layer. Output layer contans one or more output varables. Each edge between two nodes has a weght. The weghts are used n the calculaton of ntermedate layers and how to use them n such a way that each node n the mddle layers (layers other than the frst layer) has some nput from varous edges, whch, as mentoned earler, each s wth a specfc weght. Each node n the mddle layer multples value of each nput by ts relatve edge, and the outcome of these are added together and then apples a predetermned functon (actvaton functon) on ths outcome and gves the result as the output to nodes of the next layer. Edge weghts are unnown parameters that are determned by the tranng functon and tranng data gven to the data system. The number of nodes and the number of hdden layers and how to connect the nodes to each other wll specfy the archtecture (topology) of neural networ. User or software that desgns neural networs should specfy the number of nodes, number of hdden layers, the actvaton functon and edge weght constrants. Neural networs are powerful estmators that do estmaton as well as other technques and sometmes better. The maor dsadvantage of ths method s to spend a lot of tme for parameter selecton and tranng of the networ. An example of used neural networ s the type of nonlnear feedbac wth the sgmod actvaton functon: f ( x ) = 1 (2) x 1+ e Ths actvaton functon produces output n the range of (0, 1). Accordng to the fgure below shows a networ wth an outlet for every word n the dctonary, s formed of a hdden layer and an output neuron. Input and hdden layer nclude a bas neuron wth constant actvty 1[4, 5, 6]. Fg. 1. the structure of neural networs When an e-mal s ntroduced to the networ, nputs for the words that are found n a mal wll be 1. And other nputs 0. If the output value s more than 0/5 the electronc mal s dentfed as spam. When the desred output networ s beng traned the correct letters are taen 0/1 and for spam 0/1. The number of correct letters and spam that are ntroduced to the networ should have a lttle dfference, because f the number of nstances of a class s more than the other class the method tends to that class and tres to class the letters of that class better. A learnng algorthm wth descendng slope that uses rear-facng publcaton s used to optmze the weghts of the networ. 382

J. Appl. Envron. Bol. Sc., 5(7S)381-386, 2015 2.3. The proposed method of mult-agent flters n artfcal neural networs MLP Artfcal neural networs are those of dynamcal systems that wth processng on the expermental data transfer nowledge or code behnd the data to the networ structure. A one neuron networs have some lmtatons, ncludng the nablty to mappng of nonlnear functons. In the meantme, Artfcal Neural Networs Multlayer Perceptron (MLP) has ablty of approxmaton and mplementaton of nonlnear functons. Perceptron networ model s as a feed forward networ wth bac-propagaton learnng law n. In the feed forward networ nput vector s appled to frst layer and ts mpacts are through ntermedate layers to the output layer, n ths route, networ parameters are consdered fxed wth no change. In the second route that follows the law BP, the networ parameters MLP are regulated and changed. Ths change s done accordng to the law of error correcton [5, 6]. Fg. 2. Multlayer Perceptron networ Amount of error s equal to the dfference between the desred response and the actual response of networ. After calculatng the error amount, n bac trac and through the networ layers would be the dstrbuted n networ. Snce error dstrbuton occurs aganst n the opposte drecton of weght communcaton error the term of propagaton s sad to t. MLP networ s formed of an nput layer, a number ntermedate layers and an output layer. Fg. 3. Shows the structure of an MLP networ 2.3.1. MLP neural networ learnng algorthm Lst of terms used n the algorthm[7]: Input vector: X=(x 1.x,.x n) Output vector desred : t =(t 1..t,...t n) δ : a percentage of error to correct weght W that n fact s used to dstrbute error from output layer Y K to prevous layers of Z (hdden layer). δ : a percentage of error to correct weght V. ths error s transferred from mddle layer Z to nput layer X. α: learnng rate ;X : nput unt f ; Z : hdden unt of nput networ of Z s shown by z_n. m Z _ n = v o + x v o = 1 Output sgnal Z s shown by z z = f ( z _ n ) (4) Y : output unt of K Input networ Y s shown by y_n p y _ n = w 0 + z w = 1 Output sgnal Y s specfed by y (3) (5) 383

Gholbe and Edalatrad, 2015 y = f ( y _ n ) (6) 2.3.2. tranng algorthm Networ tranng s performed by propagaton n three stages: 1. Feed forward propagaton of nput learnng patterns 2. Feedbac propagaton to the obtaned error 3. Set Weghts Tranng algorthm runs as follows: Step 0: Determne the ntal weghts Stage 1: untl the stop condtons are wrong, do steps 2 to 9. Step 2: For each par of correspondng nput and output do steps 3 to 8. 2.3.3. feed forward networs Step 3: Every Input unt (X, =1,.,n) receves nput sgnal X and sends t to all of the hgher layers unts (hdden unts). Step 4: every hdden layer (Z, = 1,, p) calculates the total of ts nput sgnals. m Z _ n = v o + x v o = 1 Trgger functon s used to calculate the output sgnal and send t to the other layers unts. z = f ( z_ n ) (8) - step 5: every output sgnal ( Y, = 1,.,m) calculates total of ts nput weghted sgnals. p y_ n = w 0 + z w = 1 And uses ts actve functon to calculate ts output sgnal. y = f ( y _ n ) (10) 2.3.4. Error propagaton error step 6: every output unt Y receves the output of a desred pattern correspondng to the nput tranng pattern that calculates δ accordng to obtaned and desred output. δ = ( t y ) f ( y _ n ) (11) w 0 = αδ w = αδ z (12) And sends δ to underlyng layers. Step 7: every hdden layer Z calculates total of ts nput delta. δ _ n m = δ w = 1 And multples result by trgger functon dervatve to obtan error nformaton term: δ = δ _ n f ( z _ n ) (14) And from δ we measure the values ΔV and ΔV 0 that are later needed to correct V 0 and v are measured. S L 2.3.4. correcton weghts and bases Step 8: every output unt Y corrects ts bas and weghts (=0, p). w ( new ) = w ( old) + w (15) Every hdden unt Z corrects ts bas and weghts as follows: v ( new ) = v ( old) + v (16) Step 9: fnally we chec the stop condtons. (7) (9) (13) 384

J. Appl. Envron. Bol. Sc., 5(7S)381-386, 2015 3. Implementaton and expermental results of flter mult-agent method n neural networ 3.1. Smulaton In order to obtan an optmum structure to understand the behavor t s necessary to nvestgate neural networ error for the number of dfferent repettons, a varety of trgger functons and dfferent layers. The networ wth the least square error of Square root wll be the desred networ. 3.2. Optmzed archtecture As mentoned prevously we consdered nput number of the networ accordng to propertes as equal to the number of words n the desred dctonary and the number of output 2. The number of mddle layers was obtaned usng tral and error method. That s, the number of 1 to 3 hdden layer and 2 to 20 neurons for each layer have been tested that at ths pont two trgger functons are consdered. The followng table, for example shows root mean square error of a number of done tests. The best archtecture has the lowest root mean square error. At ths pont, frequences s also of partcular mportance because each structure shows dfferent behavor for the number of dfferent repettons. Fgure (4) shows root mean square error (RMSE) of any structure for the number of repettons. Fg. 4. Error of (RMSE) of every structure per number of dfferent repeat 3.3. Forecast of Networ After defnng the networ archtecture n prevous secton, n order to assess and compare the proposed methods wth real data, we appled the test seres to the networ. Fgure (5) (A) shows the comparson (B) of real and proposed nterest. As you can see, the horzontal axs s allocated to networ response and vertcal axs to the predcted value. Dstrbuton of ponts around the frst bsector ndcates how networ behaves. Proxmty of ponts to the bsector means networ closer response to expected value. (B) (A) Fg. 5. Comparson(A), Real nterest and recommended (B) 3.4 Evaluaton of Networ To be able to show the traned neural networ, the output of Fgure (6) shows the user's nterests. Testng s by subect, content and combnaton that ndcates better performance. 385

Gholbe and Edalatrad, 2015 (A) (B) Fg. 6. Combnng of methods and the user's nterests (A) the user's nterests nterest (B) 5,000 prvate emals Subect + user nterests Subect + transmtter Transmtter + user's nterests Subect + transmtter + user's nterests Table 1. data of mass emals publshed n publc way Detect spam based on Detect spam based on Detect spam based on subect sender content 61/09% 52/86% 72/60% 61/09% 52/60% 71/82% 61/09% 52/60% 72/60% 61/09% 52/60% 8/60% Detect spam based on user nterests 88/50% 88/50% 98/18% 99/49% 4. Concluson As was observed, addng a pre-processng stage for processng e-mal content to users usng user categores to categores based on the content and theme and the sender and the combnaton of three methods based on user nterests, determnng the postve and negatve rate of false would mprove accuracy of statstcal flters n the processng stage and also would cause more speed and better functon of these flters n processng stage. These calculatons would cause that n an electronc message server users' nterests are consdered n a category as the threshold and for ths reason e-mals that users of ths group have not shown nterest to see them are consdered as spam that ths wll result n no need to send the mal for processng at a later stage. REFERENCES 1. Sponen, M., and Stuce, C, Effectve antspam strateges n companes: An nternatonal study. In Proceedngs of HICSS '06,2006, vol 6, PP 245-252. 2. Androutsopoulos, I., Karaletss, V., Sas, G., Spyropoulos, C., and Stamatopoulos, P., Palouras, G. Learnng to flter spam e-mal: A comparson of a nave bayesan and a memorybased approach. In H. Zaragoza, P. Gallnar, and M. Raman, edtors, Proceedngs of the Worshop on Machne Learnng and Textual Informaton Access, 4th European Conference on Prncples and Practce of Knowledge Dscovery n Databases,2000, pp 1-13. 3. Golbec, J., and Hendler, J. Reputaton networ analyss for emal flterng. In Proceedngs of the Frst Conference on Emal and Ant-Spam,2004, VOL 4825. 4. Bhandar, N.M., and Kumar P., Rtfcal Neural Networ - A tool for Load Ratng of Masonry Arch Brdges,2006, Advances n Brdge Engneerng. 5. Menha M.B., Fundamentals of neural networs. Computatonal ntellgence, Internatonal MultConference of Engneers and Computer Scents, 2008,vol1. 6. Tang X., Hybrd Hdden Marov Model and Artfcal Neural Networ for Automatc Speech Recogonaton,2009 Pacfc-Asa Conference on Crcuts, Communcatons and Systems, pp 682-685. 7. Seyed Mostafa Ka.,Neural networs n Matlab.Tehran,2009, Kan Publcaton of Rayaneh Sabz. 386