Improving anti-spam filtering, based on Naive Bayesian and neural networks in multi-agent filters

J. Appl. Envron. Bol. Sc., 5(7S)381-386, 2015 2015, TextRoad Publcaton ISSN: 2090-4274 Journal of Appled Envronmental and Bologcal Scences www.textroad.com Improvng ant-spam flterng, based on Nave Bayesan and neural networs n mult-agent flters Adeleh Jafar gholbe 1, Mohammad amn Edalatrad 2 1 Department of Computer Scence, Shush Branch, Islamc Azad Unversty, Shush, Iran 2 Department of Computer Scence, Dezful Branch, Islamc Azad Unversty, Dezful, Iran ABSTRACT Receved: March 19, 2015 Accepted: May 2, 2015 Ths artcle wll dscuss offerng pre-processng flter n the stage before processng of flter content. One alternatve has been proposed by whch a Nave Bayesan classfcaton automatcally have been taught to detect spam messages. In ths study, a new verson has been provded based on Nave Bayesan approach by mult-agent flters. The advantage of smultaneous usng of the user nterests, Keywords and content-based valuaton vew s based on ssue. Followng the above verson wll be developed and we dentfy spam usng MLP neural networ algorthm. KEYWORDS: spam, mult-agent flter, emal, nave bayesan, neural networ. 1. INTRODUCTION The problem of unwanted e-mals nown as spam has become a serous problem today, so that 57-80% of the volume of e-mal s spam. Spam creates several problems rather; spam causes traffc and destroys storage space and computng power. Spam causes the users a lot of tme to separate and remove wasted mals and also hurt users and causes nsecurty, fnally spam causes legal problems such as pornography, pyramd schemes and economc fraud as well as Phshng. Now, accordng to studes conducted by Pnon and Stuc about dfferent types of tools and technques of ant-spam, flterng s the most common method to protect aganst spam [1]. The research shows that the spam flterng s a method based on machne learnng that s a very good method among the methods based on learnng n terms of effcency n spam capture and the best value n terms of the lowest false postves. Flterng method s usng of Nave Bayesan method whch has had a better performance than smlar methods such as decson trees and Svm (Support Vector Machne). A study n 1997 found that spam messages almost form 10 percent of receved messages on a shared networ [2]. It seems that the stuaton s gettng worse. Legslatve measures aganst spam are gradually beng concluded, but despte ths they have had lttle and very lmted effect. Currently ant-spam flters are software tools that attempt to automatcally bloc spam messages. 2. Mult-Agent flters Snce spam has several dfferent aspects wth dfferent characterstcs, therefore, methods usng several agents were suggested. An mportant feature of ths method s that the user can f there s only one agent n name of A, when receves x e-mal, based on value of A, the possblty that x s spam or not s shown by PS ( x ). If the possblty of PS ( x ) a value that cannot be decded on the bass of a sngle agent, then sends t to the secton of agent group dentfcaton. In group statea, sends x e-mal to other agents n mult-agent system AC and other AC agents specfy the possblty that X s spam by PS ( x ) and send t to source agent. Agent of source A maes the fnal dagnoss based on other agents. In order to classfy the ablty of any agent, f decson of spam s rght, t receves a postve score and otherwse t receves a negatve score. Methodology In ths method, every agent has the ablty to detect, so they are called heterogeneous agents. To learn any agent we use a number of methods such as: Nave Bayesan, decson tree and neural networ. Learnng of agent s done n two layers: the frst layer, based on pror nowledge of agent, wthout the partcpaton and control of the user. In second layer, accordng to the user's montorng, wth the nowledge of characterstcs of spam and the real e-mal, ads agent to learn more. In fact, every agent used the smart technques for the dagnoss. 2.1. Naïve Bayesan method Ths algorthm s the smplest and most wdely used algorthms, and of machne learnng methods for text classfcaton whch arses from the theory of Bayese decson[3]. Thus ths method s to use a tranng set (whch ncludes a number e- mals that have spam tag or correct mal) to calculate probablty of beng spam or beng correct of any words (features) n the messages. So when a new letter to the user's malbox by usng the probablty of beng spam or beng correct of words, has Correspondng Author: Adeleh Jafar gholbe, Department of Computer Scence, Shush Branch, Islamc Azad Unversty, Shush, Iran 381

Gholbe and Edalatrad, 2015 n t, the possblty that t s a spam or a proper one would be calculated. By usng the Bayese prncple, class occurrence probablty s expressed usng the vector of feature x. To dsplay an e-mal, a feature vector x = {x 1, x 2, x n} s used. n: number of features (words) contaned n that letter. x: obtaned probablty value (va the tranng set) for class C. In order to use Nave Bayesan prncple, t s assumed that the propertes of a message ndependent of each other occur n a letter. However, assumng that feature vector of a mal has features x n, x 2, x 1 the followng formula (1)s resulted. Π n P ( X IC )* P ( C ) P( CIX ) = = 1 Π n P( X ) = 1 So that o(x c): perhaps feature x occurs n class C P(C): perhaps class C occurs n set of mals. P(x ): perhaps word x occurs n message Output of the class that ts probablty s more. (1) 2.2. Neural Networs Neural networs s of the most wdely used and practcal methods of modellng of complex and great problems (ncludng hundreds varable). Neural networs can be used to classfy thngs (the output of a class), or regresson ssues (the output of a numerc value). The neural networ ncludes an nput layer whch each node n ths layer s equvalent to one of the predctor varables. Nodes n the mddle layer are connected n number of nodes n the hdden layer. Each nput node s connected to all nodes n the hdden layer. Nodes n the hdden layer can be connected to other nodes n a hdden layer or can be connected output layer. Output layer contans one or more output varables. Each edge between two nodes has a weght. The weghts are used n the calculaton of ntermedate layers and how to use them n such a way that each node n the mddle layers (layers other than the frst layer) has some nput from varous edges, whch, as mentoned earler, each s wth a specfc weght. Each node n the mddle layer multples value of each nput by ts relatve edge, and the outcome of these are added together and then apples a predetermned functon (actvaton functon) on ths outcome and gves the result as the output to nodes of the next layer. Edge weghts are unnown parameters that are determned by the tranng functon and tranng data gven to the data system. The number of nodes and the number of hdden layers and how to connect the nodes to each other wll specfy the archtecture (topology) of neural networ. User or software that desgns neural networs should specfy the number of nodes, number of hdden layers, the actvaton functon and edge weght constrants. Neural networs are powerful estmators that do estmaton as well as other technques and sometmes better. The maor dsadvantage of ths method s to spend a lot of tme for parameter selecton and tranng of the networ. An example of used neural networ s the type of nonlnear feedbac wth the sgmod actvaton functon: f ( x ) = 1 (2) x 1+ e Ths actvaton functon produces output n the range of (0, 1). Accordng to the fgure below shows a networ wth an outlet for every word n the dctonary, s formed of a hdden layer and an output neuron. Input and hdden layer nclude a bas neuron wth constant actvty 1[4, 5, 6]. Fg. 1. the structure of neural networs When an e-mal s ntroduced to the networ, nputs for the words that are found n a mal wll be 1. And other nputs 0. If the output value s more than 0/5 the electronc mal s dentfed as spam. When the desred output networ s beng traned the correct letters are taen 0/1 and for spam 0/1. The number of correct letters and spam that are ntroduced to the networ should have a lttle dfference, because f the number of nstances of a class s more than the other class the method tends to that class and tres to class the letters of that class better. A learnng algorthm wth descendng slope that uses rear-facng publcaton s used to optmze the weghts of the networ. 382

J. Appl. Envron. Bol. Sc., 5(7S)381-386, 2015 2.3. The proposed method of mult-agent flters n artfcal neural networs MLP Artfcal neural networs are those of dynamcal systems that wth processng on the expermental data transfer nowledge or code behnd the data to the networ structure. A one neuron networs have some lmtatons, ncludng the nablty to mappng of nonlnear functons. In the meantme, Artfcal Neural Networs Multlayer Perceptron (MLP) has ablty of approxmaton and mplementaton of nonlnear functons. Perceptron networ model s as a feed forward networ wth bac-propagaton learnng law n. In the feed forward networ nput vector s appled to frst layer and ts mpacts are through ntermedate layers to the output layer, n ths route, networ parameters are consdered fxed wth no change. In the second route that follows the law BP, the networ parameters MLP are regulated and changed. Ths change s done accordng to the law of error correcton [5, 6]. Fg. 2. Multlayer Perceptron networ Amount of error s equal to the dfference between the desred response and the actual response of networ. After calculatng the error amount, n bac trac and through the networ layers would be the dstrbuted n networ. Snce error dstrbuton occurs aganst n the opposte drecton of weght communcaton error the term of propagaton s sad to t. MLP networ s formed of an nput layer, a number ntermedate layers and an output layer. Fg. 3. Shows the structure of an MLP networ 2.3.1. MLP neural networ learnng algorthm Lst of terms used n the algorthm[7]: Input vector: X=(x 1.x,.x n) Output vector desred : t =(t 1..t,...t n) δ : a percentage of error to correct weght W that n fact s used to dstrbute error from output layer Y K to prevous layers of Z (hdden layer). δ : a percentage of error to correct weght V. ths error s transferred from mddle layer Z to nput layer X. α: learnng rate ;X : nput unt f ; Z : hdden unt of nput networ of Z s shown by z_n. m Z _ n = v o + x v o = 1 Output sgnal Z s shown by z z = f ( z _ n ) (4) Y : output unt of K Input networ Y s shown by y_n p y _ n = w 0 + z w = 1 Output sgnal Y s specfed by y (3) (5) 383

Gholbe and Edalatrad, 2015 y = f ( y _ n ) (6) 2.3.2. tranng algorthm Networ tranng s performed by propagaton n three stages: 1. Feed forward propagaton of nput learnng patterns 2. Feedbac propagaton to the obtaned error 3. Set Weghts Tranng algorthm runs as follows: Step 0: Determne the ntal weghts Stage 1: untl the stop condtons are wrong, do steps 2 to 9. Step 2: For each par of correspondng nput and output do steps 3 to 8. 2.3.3. feed forward networs Step 3: Every Input unt (X, =1,.,n) receves nput sgnal X and sends t to all of the hgher layers unts (hdden unts). Step 4: every hdden layer (Z, = 1,, p) calculates the total of ts nput sgnals. m Z _ n = v o + x v o = 1 Trgger functon s used to calculate the output sgnal and send t to the other layers unts. z = f ( z_ n ) (8) - step 5: every output sgnal ( Y, = 1,.,m) calculates total of ts nput weghted sgnals. p y_ n = w 0 + z w = 1 And uses ts actve functon to calculate ts output sgnal. y = f ( y _ n ) (10) 2.3.4. Error propagaton error step 6: every output unt Y receves the output of a desred pattern correspondng to the nput tranng pattern that calculates δ accordng to obtaned and desred output. δ = ( t y ) f ( y _ n ) (11) w 0 = αδ w = αδ z (12) And sends δ to underlyng layers. Step 7: every hdden layer Z calculates total of ts nput delta. δ _ n m = δ w = 1 And multples result by trgger functon dervatve to obtan error nformaton term: δ = δ _ n f ( z _ n ) (14) And from δ we measure the values ΔV and ΔV 0 that are later needed to correct V 0 and v are measured. S L 2.3.4. correcton weghts and bases Step 8: every output unt Y corrects ts bas and weghts (=0, p). w ( new ) = w ( old) + w (15) Every hdden unt Z corrects ts bas and weghts as follows: v ( new ) = v ( old) + v (16) Step 9: fnally we chec the stop condtons. (7) (9) (13) 384

J. Appl. Envron. Bol. Sc., 5(7S)381-386, 2015 3. Implementaton and expermental results of flter mult-agent method n neural networ 3.1. Smulaton In order to obtan an optmum structure to understand the behavor t s necessary to nvestgate neural networ error for the number of dfferent repettons, a varety of trgger functons and dfferent layers. The networ wth the least square error of Square root wll be the desred networ. 3.2. Optmzed archtecture As mentoned prevously we consdered nput number of the networ accordng to propertes as equal to the number of words n the desred dctonary and the number of output 2. The number of mddle layers was obtaned usng tral and error method. That s, the number of 1 to 3 hdden layer and 2 to 20 neurons for each layer have been tested that at ths pont two trgger functons are consdered. The followng table, for example shows root mean square error of a number of done tests. The best archtecture has the lowest root mean square error. At ths pont, frequences s also of partcular mportance because each structure shows dfferent behavor for the number of dfferent repettons. Fgure (4) shows root mean square error (RMSE) of any structure for the number of repettons. Fg. 4. Error of (RMSE) of every structure per number of dfferent repeat 3.3. Forecast of Networ After defnng the networ archtecture n prevous secton, n order to assess and compare the proposed methods wth real data, we appled the test seres to the networ. Fgure (5) (A) shows the comparson (B) of real and proposed nterest. As you can see, the horzontal axs s allocated to networ response and vertcal axs to the predcted value. Dstrbuton of ponts around the frst bsector ndcates how networ behaves. Proxmty of ponts to the bsector means networ closer response to expected value. (B) (A) Fg. 5. Comparson(A), Real nterest and recommended (B) 3.4 Evaluaton of Networ To be able to show the traned neural networ, the output of Fgure (6) shows the user's nterests. Testng s by subect, content and combnaton that ndcates better performance. 385

Gholbe and Edalatrad, 2015 (A) (B) Fg. 6. Combnng of methods and the user's nterests (A) the user's nterests nterest (B) 5,000 prvate emals Subect + user nterests Subect + transmtter Transmtter + user's nterests Subect + transmtter + user's nterests Table 1. data of mass emals publshed n publc way Detect spam based on Detect spam based on Detect spam based on subect sender content 61/09% 52/86% 72/60% 61/09% 52/60% 71/82% 61/09% 52/60% 72/60% 61/09% 52/60% 8/60% Detect spam based on user nterests 88/50% 88/50% 98/18% 99/49% 4. Concluson As was observed, addng a pre-processng stage for processng e-mal content to users usng user categores to categores based on the content and theme and the sender and the combnaton of three methods based on user nterests, determnng the postve and negatve rate of false would mprove accuracy of statstcal flters n the processng stage and also would cause more speed and better functon of these flters n processng stage. These calculatons would cause that n an electronc message server users' nterests are consdered n a category as the threshold and for ths reason e-mals that users of ths group have not shown nterest to see them are consdered as spam that ths wll result n no need to send the mal for processng at a later stage. REFERENCES 1. Sponen, M., and Stuce, C, Effectve antspam strateges n companes: An nternatonal study. In Proceedngs of HICSS '06,2006, vol 6, PP 245-252. 2. Androutsopoulos, I., Karaletss, V., Sas, G., Spyropoulos, C., and Stamatopoulos, P., Palouras, G. Learnng to flter spam e-mal: A comparson of a nave bayesan and a memorybased approach. In H. Zaragoza, P. Gallnar, and M. Raman, edtors, Proceedngs of the Worshop on Machne Learnng and Textual Informaton Access, 4th European Conference on Prncples and Practce of Knowledge Dscovery n Databases,2000, pp 1-13. 3. Golbec, J., and Hendler, J. Reputaton networ analyss for emal flterng. In Proceedngs of the Frst Conference on Emal and Ant-Spam,2004, VOL 4825. 4. Bhandar, N.M., and Kumar P., Rtfcal Neural Networ - A tool for Load Ratng of Masonry Arch Brdges,2006, Advances n Brdge Engneerng. 5. Menha M.B., Fundamentals of neural networs. Computatonal ntellgence, Internatonal MultConference of Engneers and Computer Scents, 2008,vol1. 6. Tang X., Hybrd Hdden Marov Model and Artfcal Neural Networ for Automatc Speech Recogonaton,2009 Pacfc-Asa Conference on Crcuts, Communcatons and Systems, pp 682-685. 7. Seyed Mostafa Ka.,Neural networs n Matlab.Tehran,2009, Kan Publcaton of Rayaneh Sabz. 386