Improvement of SPIT prevention technique based on Turing test. Alexander Joseph Johansen

Size: px

Start display at page:

Download "Improvement of SPIT prevention technique based on Turing test. Alexander Joseph Johansen"

Shawn Ferguson
5 years ago
Views:

1 Improvement of SPIT prevention technique based on Turing test Alexander Joseph Johansen A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Information Technology Faculty of Information Science and Technology Mahanakorn University of Technology 2010

2 Thesis Title Author Improvement of SPIT prevention technique based on Turing test Alexander Joseph Johansen Student ID Degree Programme Master of Science Information Technology Year 2010 Thesis Advisor Dr. Woraphon Lilakiatsakun ABSTRACT This thesis presents a design of a novel SPIT preventing system which is based on a modular mechanism design. The thesis will be presented with the different prototype modules that have been developed to fight SPIT. With the modular design, new modules can be created and added on the fly. We will then analyze the effectiveness of the 2 primary modules against a simulation of 100,000 calls entering the system. For our revision, some more experiments have been added to prove that our Voice CAPTCHAs are not breakable with current computer software. I

3 Acknowledgments This thesis would not be successful without guidelines from Dr. Woraphon Lilakiatsakun. I m very grateful for your support under the whole thesis writing process. I would also like to thank my family and friends for their emotional support. I would also like to dedicate this work to my father whose has passed away in I m very thankful for all his support until his death. You will always be in my mind. II

4 Table of contents Page Abstract. Acknowledgments. Table of contents... List of tables... List of figures.. Publications I II III VIII IX XI Chapter 1 Introduction Problem statement Aim of the study Limitations Hypothesis Research questions Outline... 3 Chapter 2 Related Work and Literature Review Session Initiation Protocol (SIP) SIP Network Elements SIP Messages Session Establishment (Call Setup) Spam over Internet Telephone (SPIT) VoIP spam from the technical point of view III

5 Table of contents (continued) Page Why People Spam Literature Review Introduction Survey of the literature Conclusion Black listing Grey listing White listing Traffic Analysis Reputation filtering of users Rate limiting Computational Puzzle / Proof of effort Handshake / Challenge / Turing test Content filtering Evaluation of current Voice CAPTCHA recaptcha Captchas.net DanCaptcha Google s CAPTCHA ebay CAPTCHA. 30 IV

6 Table of contents (continued) Page 2.5 SPIT Proposed Solution. 30 Chapter 3 Modular Anti-SPIT framework Framework structure Logging Overview Implementation Detection Overview Implementation Action Overview Implementation Detection modules Whitelist module Blacklist module DNSBL module Rate-limit module Turing test module Enhanced Turing test module Enhanced Turing test module (version 2) V

7 Table of contents (continued) Page Chapter 4 Experiment Experimental environment Call Simulation Call Set-up Delay (CSD) Maximum Concurrent Calls Turing test Call set-up Delay (CSD) on a high performance machine Maximum Concurrent Calls on a high performance machine Voice CAPTCHA Audio CAPTCHA comparative overview Evaluation of selected audio CAPTCHA SPHINX bot experimental of Captcha V Chapter 5 Conclusion and Future Work Conclusion Future Work References Appendices Appendix A Source code A.1 DBLOGGER A.2 White list module VI

8 Table of contents (continued) Page A.3 Black list module A.4 DNSBL module A.5 Rate-limit module A.6 Rand.pl for Turing test module A.7 Resetter.pl for Rate-limit module A.8 Asterisk dialplan for Turing test module A.9 CAPTCHA creation script for enhanced Turing test module 82 A.10 att_sound.pl 86 A.11 accuracy_generator.pl A.12 extract_sound_length.pl 89 Appendix B Call Simulation Appendix C Presented and Published Papers C.1 A VOIP anti-spam Sys. based on Audio Turing Test Server 91 C.2 A VOIP anti-spam Sys. based on Modular Mech. Design. 96 C.3 A VOIP anti-spam Sys. based on Modular Mech. Design. 100 VII

9 List of tables Table Page Table 2.1 SIP Request method... 7 Table 2.2 SIP response code... 7 Table 4.1 Possible sums from the turing test Table 4.2 Processing time introduced by modules Table 4.3 Call Setup Delay Table: 4.4 Elapsed time on high performance server 62 Table: 4.5 Audio CAPTCHA comparative overview 64 VIII

10 List of figures Figure Page Fig. 2.1: an example of an INVITE message... 8 Fig. 2.2: SIP Requests and Responses... 8 Fig. 2.3: Session establishment of voice connection with two different domains Fig. 3.1: Design of Modular Anti-SPIT framework Fig. 3.2: Activity diagram of Modular Anti-SPIT framework Fig. 3.3: Flowchart of DBLOGGER Fig. 3.4: Flowchart of white list module Fig. 3.5: Flowchart of black list module Fig. 3.6: Flowchart of DNSBL module Fig. 3.7: Flowchart of rate-limit module Fig. 3.8: Flowchart of Turing test module Fig. 3.9: Flowchart of Voice CAPTCHA creation.. 48 Fig Visual design of Voice CAPTCHA. 49 Fig noise(2) 1 - noise(2) 0 - noise(2) 8 - noise(2).. 50 Fig. 4.1: Testing s network topology Fig. 4.2: Call Distribution Fig. 4.3: Rate-limit Effectiveness Fig. 4.4: Turing Test Fig. 4.5: SIP call setup (initial portion) Fig. 4.6: Code insertion for elapsed time Fig. 4.7: A live call entering the system IX

11 List of figures Figure Page Fig: 4.8 Concurrent call and CSD Fig: 4.9 Testing of the transcriber Fig: 4.10 Elapsed time of modules.. 63 Fig: 4.11 Success rate of selected audio CAPTCHA.. 66 Fig: 4.12 Accuracy of Spinx deciphering CAPTCHAs.. 67 Fig: 4.13 Success rate of Spinx deciphering CAPTCHAs.. 68 X

12 Publications Johansen, A., Lilakiatsakun, W., (2010) A VOIP anti-spam System based on Audio Turing Test Server, Proceedings of the ECTI-CARD 2nd Conference on Application Research and Development, May 2010, Thailand, 2010 Johansen, A., Lilakiatsakun, W., (2010) A VOIP anti-spam System based on Modular Mechanism Design, Proceedings of the 2010 International Conference on Intelligence and Information Technology, October 2010, Pakistan, 2010 Johansen, A., Lilakiatsakun, W., (2010) A VOIP anti-spam System based on Modular Mechanism Design, Proceedings of the 3rd National Conference on Information Technology, October 2010, Thailand, 2010 XI

13 Chapter 1 Introduction 1.1 Problem statement One term that everyone knows by today is the term spam. Spam refers to any unsolicited bulk messages in different forms. This can be in voice, text messages, digital pictures. One problem that the Internet community is facing now is the problem of spam. Spam is increasing for each year that is passing by. Worldwide, it is estimated that spam will cost businesses $130 billion; in the U.S. alone, $42 billion. That s a 30% increase over 2007 estimates, which themselves were a 100% increase over 2005 figures [36]. Many people have never heard about VoIP spam. Even if they use internet telephony, they have probably never heard the term VoIP either. SPIT or Spam over Internet Telephony is a new treat that we going to face in Internet Telephony world. SPIT refers to any unsolicited bulk voice messages send over Internet. SPIT is not only limited to voice messages only, it can also include Instant Messaging Spam and presence spam. Technically, problem of SPIT is different from the -spam in the amount of data that is needed to transfer to the end-user. spam requires less data to be sent than SPIT because spam normally is made by compiling a text message and send it over the Internet while SPIT has to carry the voice data which mean more data to be transferred. Designing a solution for fighting SPIT is a challenging task. We believe that one simple method is not good enough for being proposed as a SPIT anti-spam solution. We have then in this thesis designed an architecture by designing different prototype modules as its base for a VoIP anti-spam system. Our design includes both intrusive and non-intrusive modules for fighting SPIT. The proposed solution has its strength in preventing SPIT to circulate into the internet cloud by stopping at the call. It also has detection mechanism for an inbound SPIT targeting the callee s domain. 1

14 Our protocol of interest is SIP[37], a widely used application protocol for setting up a session/call. 1.2 Aim of the study The aim of this thesis is to understand the SIP protocol better for the student himself. The second purpose is to understand how SPIT is being developed using current computer programs. The aim of this research is to develop modules and the VoIP anti-spit framework to tackle SPIT using a well-known open source PBX system called Asterisk PBX by analyzing previous methods on the current literature. An in depth study of the Turing test module will be studied. In our Turing test module, we will be evaluating Voice CAPTCHA that we have designed. We will also describe the framework to fight SPIT but our focus will be on the Turing test module using Voice CAPTCHA as an Anti-SPIT solution. 1.3 Limitations In order to limit the scope and efforts of this thesis the following parts were not considered: -Other protocols than SIP. There are several VoIP signaling protocols present in the literature both h.323 and MGCP are protocols that also are implemented widely. - Non Automated call. This thesis will be focusing how to prevent and detect pre-recorded call spam. Human call spam is not considered. - Spam on traditional telephone. This thesis takes only the problem of VoIP spam into account and not spam on traditional telephone. VoIP spam solutions. The paper deals with the most promising VoIP spam solutions. Because of the sheer quantity of solutions and approaches it is not possible to take all of the approaches into account. The thesis discusses the most promising papers. 1.4 Hypothesis The main goal of this research and of any research is, and should be, to gather information objectively. The research questions are based on previous researches and 2

15 results and try to avoid leading in any direction. Nevertheless there is always a hypothesis regarding a research. There are several methods to fight SPIT in the literature and our work is to combine them and do experiment if it is feasible or not to fight SPIT by designing a modular framework. Our hypothesis is quite simple. We are asking if it would be feasible to fight SPIT by designing a framework consisting of different modules both intrusive and non-intrusive. 1.5 Research questions - How can SPIT be created with current computer programs? - Would it be feasible to prevent and detect SPIT by using a modular design? - Can a machine transcribe the question in the Turing test into readable text? - Is there any way to circumvent the machine to be able to transcribe? -Would human be able to understand the announcement with some background noises in the Turing test? - How can we design the system that isn t interrupting the users too much? - How can the system be implemented successful by having the six HIP (Human Interactive Proof) properties? 1.6 Outline The following chapter discusses the thesis from the theoretical point of view. It defines the term VoIP spam before it discusses the different VoIP spam solutions while taking into account the personal information users have to offer, in order for the different solutions to work as well. The thesis then turns to the research and envisions the reason for it. It discusses why the research is necessary, what the expectations there are and what kind of problems might occur. Moreover, it gives information about the background of the topic and the research of this thesis. The methodology section explains the way the research took place and defines in greater detail what the research looked like, how it was structured and what research methods were used. Afterwards the research itself is presented. The research results are described later on in chapter 4, results and discussion. The chapter discusses the different research there is and its results. It describes the different conclusions that were drawn in relation to the research, describes surprises and puts all results in relation to the different VoIP 3

16 spam solutions. Finally, the most accepted solution shall be evaluated. Chapter 5, summary and conclusion, sums up the thesis and discusses further research in relation to the thesis as well as it reflects upon the research and summarizes the results. 4

17 Chapter 2 Related Work and Literature Review 2.1 Session Initiation Protocol (SIP) SIP is an application-layer signaling protocol for creating, modifying, and terminating sessions with one or more participants. These sessions include internet telephone calls, multimedia conferences, etc. The following are the five aspects of SIP for the facilitation of call establishment and call maintenance as mentioned in [37]: 1.User location: Determination of the end system to be used for communication. 2.User availability: Determination of the willingness of the called party to engage in communications. 3.User capabilities: Determination of the media and media parameters to be used. 4. Session setup: Establishment of session parameters at both called and calling party. 5. Session management: Including transfer and termination of sessions, modifying session parameters, and invoking services SIP Network Elements A SIP network usually consists of User Agents, SIP Registrar Servers, SIP Proxy Servers and SIP Redirect Servers. Below is a high-level description of each component and the role it plays in this process. SIP user agent (UA) is the end-user device, such as PDA, cellphone, softphone, PC used to create and manage SIP sessions. A UA has two roles, the first one is to send requests. In this mode the UA is acting as a User Agent Client (UAC). The second role is where the UA is responding to requests by sending responses. In 5

18 this mode the UA is acting as User Agent Server (UAS). These roles of UAC and UAS only last for the duration of a SIP transaction. Each resource of a SIP network, such as UA, is identified by Uniform Resource Identifier (URI). A typical SIP URI is of the form : sip:username: password@host:port SIP also defines server network elements. Although two SIP endpoints can communicate without any intervening SIP infrastructure, which is why the protocol is described as peer-to-peer, this approach is often impractical for a public service. RFC 3261 [37] defines these server elements: A proxy server "is an intermediary entity that acts as both a server and a client for the purpose of making requests on behalf of other clients. A proxy server primarily plays the role of routing, which means its job is to ensure that a request is sent to another entity "closer" to the targeted user. Proxies are also useful for enforcing policy (for example, making sure a user is allowed to make a call). A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it." "A registrar is a server that accepts REGISTER requests and places the information it receives in those requests into the location service for the domain it handles." "A redirect server is a user agent server that generates 3xx responses to requests it receives, directing the client to contact an alternate set of URIs. The redirect server allows SIP Proxy Servers to direct SIP session invitations to external domains." The RFC specifies: "It is an important concept that the distinction between types of SIP servers is logical, not physical SIP Messages SIP is a client-server protocol similar to HTTP. A message consists of a message header and an optional message body. Messages can be classified into either requests or responses. The original RFC 3261 [X] presents six types of request (also called methods) methods: INVITE, BYE, ACK, OPTIONS, CANCEL, and REGISTER. In short, INVITE method is used to start a call, BYE method is used to 6

end a call, OPTIONS method is used to enable the negotiation of the capability using SIP options negotiation, CANCEL option is used to abort a call setup,

1 SIP Request method If a SIP entity receives a request, it performs the corresponding action and then sends back a response to the originator of the

19 end a call, OPTIONS method is used to enable the negotiation of the capability using SIP options negotiation, CANCEL option is used to abort a call setup, and REGISTER option is used to register the current location of user at the registrar. Table 2.1 SIP Request method If a SIP entity receives a request, it performs the corresponding action and then sends back a response to the originator of the request. Responses are three-digit status codes (similar as in http/1.1), categorized into six classes. Concrete examples for response codes are 180-ringing, 302-moved temporarily, or 404-not found. Table 2.2 SIP response code 7

20 Fig. 2.1 an example of an INVITE message Session Establishment (Call Setup) Figure 2.2 illustrates a SIP session setup between two endpoints which belong to a single operator with a proxy server. Fig. 2.2: SIP Requests and Responses The setup and termination of a voice connection between two users is illustrated in Figure 2.2. It shows the messages (requests and responses) that are being exchanged if user agent A wants to initiate a session with user agent B. Consider the 8

21 case where two users belong to the same operator. Both user agents here use the same proxy. User agent A starts the request (INVITE), the proxy passes it on to the receiver (user agent B) and sends back a 100 (trying) respond message. User agent B responds to the request first with a 180 (ringing) message, and eventually with a 200 (OK) after a user has picked up the phone. Both of these messages are forwarded to user agent A by the proxy. User agent A can now request to start of the media transfer (ACK). Note that the proxy is not needed for this. After a session has been established, user agents can communicate with each other directly. At the end of the conversation, some user agent (here: B) terminates the session by sending a BYE request to its counterpart. Figure 2.3 illustrates the session establishment in the case where two different domain proxies are involved. In this example, it is assumed that user agent A and B exist in different domains and have different proxies. First, the user agent B needs to register with its local registrar (1) to be able to receive calls from the any user. The registrar stores the local information at a location server (2). When user agent A wants to call user agent B, it sends an INVITE-request to its local SIP-proxy (3) which passes on the request (possibly after a DNS lookup) to the proxy of user B s domain (4). The proxy in domain B needs to look up the IP-address of user agent B at the location server (5, 6) before it can send the request to user agent B (7). In this example, the response message for user agent A takes the same route back (8, 9, 10), possibly for billing purposes. The remaining steps (11, 12,13) are performed in a similar manner as shown in Figure

22 Fig. 2.3: Session establishment of voice connection with two different domains 2.2 Spam over Internet Telephone (SPIT) The term Spam refers to sending any type of unsolicited messages using any media. It is often sent from an unknown sender. Spam could also be sent from a known sender, for example friends, a sample of this type of spam is jokes send as to you. This kind of spam can be stopped by contacting the sender that is sending the jokes (your friend). Another kind of spam is what we are dealing in this thesis, the kind that we can t contact the spammer asking them to stop sending spam. Spammer refers to a person that is sending spam. This can be done on any type of medium. From this point of view, it can include; SMS, , phone, mobilephone, telemarketing, etc. Due to the fact that spam describes unwanted messages on every platform it is essential for this thesis to define spam on the telephone; VoIP spam. VoIP spam is widely known and described as SPIT (spam over Internet telephony), sometimes known as vam (voice or VoIP spam), is unsolicited bulk messages broadcast over VoIP (Voice over Internet Protocol) to phones connected to the Internet. 10

23 The definition for spam that was stated beforehand is quite similar to the one from Tech Target. In order to make the definition clearer, it is stated that VoIP spam is unwanted calls from senders the receiver does not know VoIP spam from the technical point of view In order to explain how VoIP spam actually looks like, the reader (you) should read previous section for SIP background. SIP is the signaling protocol that many VoIP providers are using today for their VoIP service. VoIP is based on the idea of sending large amount of digital data over the Internet. Different from a traditional analogue telephone system, the voice here is cut down in different packages. The packages are then reassembled at the receiver s end. Voice is carried by RTP protocol by encoding it with an agreed codec to be used by both ends. Codec might be g.711, g.729, ilbc, etc. Different codec has its own tradeoff. Taking for example g.711 which require large amount of bandwidth to transfer voice but this codec has the best voice quality and does support sending fax over Internet. Packet loss with g.711 codec will decrease the quality dramatically. While g.729 requires less bandwidth, the quality is lowered. G.729 works well with some packet loss and this codec is mostly used where limited bandwidth is a factor when using making calls with VoIP. VoIP technology has enabled users from all over the world to be able to make free call to each other. This kind of communication is called PC-to-PC calls. Users normally install a software (softphone) on their computers and they can start calling each other for free. There are many providers that have this kind of service embedded in their softphone. Skype and Voip Discount enable their users to call each other for free. Another kind of service is a service where the user can make calls to traditional phones. There are many providers that have unlimited calling plan. E.g. Telio in Norway has an unlimited calling plan for making local calls and other 130 countries for 159NOK per month Why People Spam Many might be wondering why there are people that are sending spam. Spamming is an actual business process. Like other businesses, the goal is to make profit. With the cheap monthly rate for making calls over the Internet, it enables 11

24 spammers to be able to send SPIT cheaply over the Internet. As in any other business, spammers must perform a few essential activities in order to create a profit: 1. Find potential customer. For spammers this involves obtaining a list of working telephone numbers or a list of IP addresses where SIP endpoint can be reached. There are two methods that can be used to obtain the list: address harvesting and list purchasing. 2. Offer a product or service to the potential customers. This involves sending information or an offer to the list of telephone numbers or IP addresses. 3. Sell and deliver the product or service to some percentage of the potential customers. Sending spam can cost $ per recipient; direct mail can cost $1.21 per recipient, or about 2,400 times more. Direct mailers usually require a response rate of about 2 percent; spammers, on the other hand, can break even with response rates as low as percent about 2,000 times lower. For example, a spammer can send 500,000 messages and still be pleased and profitable with five responses. Technically, sending spam require little technical knowledge. For address harvesting, a War Dialer might be used to find live telephone numbers. iwar, a war dial that is able to do war dialing over the Internet can be used for detecting working telephone numbers. After we have got a list of working telephone numbers, the next stage is to send SPIT to them. There is a software that enables us to send SPIT with Asterisk PBX. To send SPIT, we need to record a voice prompt (voice commercial) to be played back when the receiver has answered the call. To send SPIT with SPITTER (the program that actually incorporate with Asterisk to send SPIT), a call file is needed to be created. This call file includes the telephone number that we are going to send SPIT to. The call file can easily be created with perl by reading the log file created by iwar. Another way to send SPIT is to send it directly with SIP protocol. This can be done by using nmap to scan rage of IP addresses for an open port for SIP (port 5060 UDP). After we have got a list of opened SIP ports, we can write a macro in asterisk to dial those sip phones directly using SIP protocol bypassing any SIP proxies and gateways. 12

25 In conclusion, SPIT is likely to be a problem in the future causing by the growth of Internet Telephony services and the decreasing calling rates with VoIP. We will soon see that today s spammers are likely to be using VoIP as a medium to send spam in the future. To solve this problem, a novel design is needed to combat this kind of SPAM. In this thesis, we will focus on how we can prevent and detect spam by designing modules for Asterisk PBX system. The system is to be used in conjunction with a SIP proxy. These modules will be discussed in chapter Literature Review Introduction This literature review is being done for these reasons; to give the student a firm foundation with the topic of interest, find the gap in current topic and to set the direction for future work. This literature review is being focused on SPAM applied to Voice over IP technology (SPIT - SPam over Internet Telephony). The trade-off of different techniques is shown in the conclusion section. The literature review has been written in order of appearance of the papers Survey of the literature Dantu and Kolan [1] describe the Voice Spam Detector (VSD), a multi-stage SPIT filter based on trust, reputation, and feedback among the various filter stages. The primary filter stages are call pattern and volume analysis, black and white lists of callers, per-caller behavior profile based on Bayesian classification and prior history, and reputation information from the callee s contacts and social network. They provide a formal model for trust and reputation in a voice network, based on intuitive human behavior. They evaluate their system in a laboratory experiment using a small number of real users and injected SPIT calls. With the belief in preventing spam in signaling exchange stage, R. Macintosh [2] proposed an innovate method to detect and block spam (unsolicited bulk calls) in IP telephony network by using signaling protocol analysis. Since analysing data content may not be legal and sometimes impractical, they have chosen to do the signaling analysis. Their idea of spam detection they propose is based on three main constituents. First, the observable signaling routing data of the voice spam are valid and may point either to anonymizer, gateway or spammer. Secondly, spam calls are 13

26 unidirectional: spammer initiates the calls to the target network, but nobody initiates calls to him. Third, spam calls termination behavior is statistically consistent, i.e. these calls are terminated mostly by the same conversation party. They outlined 5 voice spam distribution scenarios. The statistics they consider are calculated per SPIT source over the number of calls made from this source to the recipients in the monitored network. Even with the advantage of signaling analysis over content filtering, this proposed solution has some drawbacks: some legitimate phone services or entities may behave like spam, such as automated phone notification service. Secondly, if the spammer manages to re-register himself often enough, there is no way to build a statistic per this volatile identity. PMG as proposed in [3] monitors call patterns from each caller and determines Spam over Internet Telephony (SPIT) based on the call patterns. The essence of this algorithm is that, as a caller attempts to make numerous calls through the call server in a certain time span, his/her gray level will increase, thus classifying the caller as a potential spam source. Once the gray level becomes higher than the given spam threshold, the caller will not be able to make any more calls. When the spam caller stops initiating SPIT within a certain time period, the gray level will decrease and eventually stay below the threshold. PMG computes 2 levels based on call pattern. One is the short term gray level and the other is the long term gray level. Short term gray level represents a short period of time (say for example one minute) during which a voice spam source is able to generate many calls to attack a server. Short term gray level increases and decreases very quickly and would last for a short period of time. Long term gray level considers the call pattern over a much longer period of time. (For example one day or one hour etc). The long term gray level value increases and decreases very slowly. It also considers the history of a caller that has been detected as a spam generator. The Long term gray-level value is multiplied by the number of times a particular caller was detected as spam. Y. Rehabi [4] presented a reputation based spam blocking algorithm where reputation network manager is built from SIP repository. This technique requires that the SIP providers allow their users to set their preferences through some contact list. Secondly, some agreement must be established between different SIP providers 14

27 allowing each other to exchange information related to the user's contact lists. Another requirement for this technique is that it must allow any/every SIP user to be able to score the users on his contact list. The Reputation Network Manager first functionality is to find out all possible paths in the reputation network between the SIP request destination and the SIP request sender. If no path is found in the domain receiving the request, the RNM will contact its neighboring RNMs to check whether they have any entry in their SIP repositories. Once, a path is found, the RNM has to compute the reputation value corresponding to this path and compare it to a predefined threshold. If this reputation value is greater than the threshold, the RNM will inform the SIP proxy to process the request, otherwise, the SIP server will reject it. SPam over Internet Telephony (SPIT) Prevention Framework suggests by R. Schegel [5] is a design of a two-stage SPIT prevention which is scalable by having a possibility to add and remove modules on the fly. The system contains two stages for detecting SPIT. The first stage contains modules which analyze a call only by looking at information which is available before the actually answering a call. The prototype modules includes; White / Black List, Simultaneous Calls, Call Rate, IP/Domain Correlation. A proper combination of all of them provides a robust SPIT prevention solution. The second stage on the other hand consists of modules which actually interact with either the caller or the callee to refine detection. These modules are indeed more intrusive as they introduce an inconvenience either to the caller or to the callee. A turing test prototype module has been developed for this stage. The advantage of this architecture is that it minimized the interaction with the callee when determining whether a call is SPIT. However, most of the modules are not effective if the SPIT originator changes the SIP identity for each subsequent call. One way to circumvent this is to use strong authentication identities. Detecting SPIT Calls by Checking Human Communication Patterns proposed by J. Quittek [8] is another work done my people at NEC Europe. In this proposed solution, they have created a module for their VoIP SEAL which is a modular SPIT detection platform. The module is a turing test that monitors communication patterns of a call. Human communication pattern in phone conversations have been studied extensively and some basic patterns are assumed to be well understood. A simple conversation model is with 4 states. In state M (mutual silence) both participants are 15

28 silent, in states A and B exactly one of them speaks, and in state D (double talk) both are speaking at the same time. From this factor, one can monitor the voice signal energy in a given call to perform a turing test that is based on checking human conversation patterns. There are 2 turing tests that they have developed; "Silence Checking" and "Answer Length Checking". For silence checking the voice energy of the caller during the initial greeting at call start is evaluated. If the signal energy exceeds a certain value, then it is assumed that the caller is violating the call start pattern and hence is classified as machine. Answer length checking can be applied if silence checking does not produce a clear result. Here the communication pattern after the initial greeting is checked. The check is based on a question the caller is asked at the end of the greeting, for example, the caller can be asked for the name of the person that is to be called. In conclusion, The turing tests are based on checking human communication patterns and have the advantage that they are hidden from the caller and therefore will not be perceived as impolite call interference. A problem for full evaluation of the system is the small number of available SPIT call records that can be used for testing the failure rate. In this paper, J. Ahn [10],have proposed an approach to fine tune the performance of the PMG algorithm using the concept of frequency detection by introducing two new parameters: short and long term call density thresholds. If the incoming caller exceeds the short or long term call density threshold, he will be treated as a high frequency user and a set of configuration parameters that best suit to block the spammer will be applied at that particular instant. Otherwise the user will be treated as a low frequency user and a set of configuration parameters that best suit the normal users will be applied. Thus Adaptive PMG provides an automated way to apply multiple configuration parameters to the PMG based on the user s instantaneous behavior to combat SPIT more efficiently. They have developed a plug-in that works on top of the PMG algorithm to fine tune its performance. Their experiment results demonstrate the values of configuration parameters that best suit the normal users and spammers. Their experiment results confirm that using their fine tuning approach, the Adaptive PMG works more efficiently than the original PMG algorithm. They believe 16

29 that the Adaptive PMG could prove itself to be a powerful weapon to combat SPIT when used with strong user authentication. In the paper "A SIP-oriented SPIT Management Framework" by D. Gritzalis [11], they proposed 2 new methods for detecting and preventing SPIT. In this article they introduced new definitions in context of SPIT: D1. Prevent SPIT: To keep SPIT from happening, or to make unsolicited bulk calls or messages impossible. D2. Detecting SPIT: To discover the presence (existence) of bulk unsolicited calls. Although these definitions look alike, they have a subtle difference that can affect the overall understanding and approach of SPIT handling. The difference is that prevent corresponds to an event that has not happened (i.e., no impact) and we suppose that it will not happen, while detect implies that some event has already happened (i.e., some impact) and we just identify it. Their new methods include; SIP SPIT detection through attack modeling. B. Mathieu [12] proposed a solution framework combining well-known detection schemes, including blacklists, white lists, with method based on statistical traffic analysis such as the number and duration of calls a user conducts. The SPIT Detection and Reactor System (SDRS) also take into account users' and operators' preferences. One of the approaches their SDRS solution uses to detect SPIT is based on identifying anomalies in the number or duration of calls a user conducts and the percentage of failed calls. They first classified normal user behavior, such as that of nonspammers. To do so, they monitored and analyzed the VoIP traffic of 8,700 France Telecom VoIP users, capturing usage in nine locations in France over a onemonth period. The average number of calls per user was about two and a half per working day (Monday through Friday) and about two per weekend day. The maximum number of calls for a single user was about 20 per day, a figure still small enough to fall into the normal range for users such as teenagers. SDRS uses four primary detection criteria for computing SPIT likehood score before call establishment: white lists, blacklists, caller behavior monitoring and spoofing detection. To improve detection and user classification efficiency, SDRS also collects information about calls identified as SPIT only after receipt. As part of offline detection, SDRS monitors call duration. This test consists of classifying a call as SPIT based on the duration of initiated calls. To help with detection after call establishment, SDRS also implements feedback mechanisms. 17

30 In this paper, P. Patankar [13] examined two different frameworks for identifying spammers in large VoIP networks. The first technique uses a centralized SIP-based approach that relies on the SIP proxy servers to identify spam calls during the call establishment phase. The second technique is based on a decentralized referral social network model, where a user is assigned a reputation score by its neighbors. The reputation of a node changes depending on the type of calls it makes. Based on the reputation, a callee can decide to either accept or decline a call. They have conducted an in-depth evaluation of both the frameworks to analyze their sensitivity and specificity with single and multiple spammers. Their simulation results show that the proposed referral social network model can provide better spam isolation than the SIP based approach by correctly detecting spam calls over 98% of the time. On the other hand, the SIP based approach can always identify legitimate calls. Moreover, the referral network model provides better opportunity to tune the system parameters for tradeoffs between sensitivity and specificity. Based on these results, the social network model seems a viable approach to handle spam in VoIP networks. For treatment of SIP spam without increase of SIP network complexity or SIP complexity, P. So Young [15] proposes a SIP spam labeling method without SIP extension, addition of new function or addition of new equipment. The system only requires the labeling function of spammer's terminal, label analysis function of proxy server and management function of spam treatment policy of call receivers. The label analysis function can be a part of packet analysis function of proxy server and may extend function scope of the server. They also propose to use SIP INVITE message which is the first SIP request message used to establish SIP session for the insertion of spam indicator. Very fast identification of SIP spam is essential, since most of SIP services are real-time. Analyzing only SIP INVITE message to identify SIP spam can minimize the time and the load which are required to identify SIP spam. Finally, They suggest SIP spam be treated according to the spam management policy of SIP service users. It is expected that it can improve the utility of SIP service users for SIP spam. The spam treatment policy includes blocking, filtering and storing, and so on. It can vary according to spam management policies of SIP service providers. However, it seems that national regulation is required to apply SIP spam labeling method, since there is little motivation to make spammers label spam messages voluntarily. They 18

31 expect application of SIP spam labeling method can reduce potential damage of SIP spam without much effort, time and cost. In this paper, Y. Soupionis [16] proposed an anti-spit policy-based management system (aspm), with an eye towards the effective management of SPIT phenomenon. The suggested approach was primarily based on a SIP protocol threat and vulnerability analysis, which results in the identification of a series of attack scenarios. Then, the attack scenarios were analyzed, in an effort to define SPIT detection rules. These rules led to the identification and description of specific actions and controls, capable of countering and mitigating SPIT attacks. An XML schema was proposed as a means for both, first, describing the detection rules, and, second, stipulating the SPIT handling controls and actions. Finally, it was demonstrated how the aspm approach could be practically integrated into a real SIP environment. M. Hirschbichler [19] presented a SIP detection method using a well-known blacklisting technique that has already been widely implemented for preventing SPAM. This technique uses DNS lookup for finding if the IP is in the black list or not. They believe that a VoIP call setup from a client with an IP already blacklisted for e- mail SPAM is at a higher propability SPIT than a VoIP call initiated from a nonblacklisted IP address. For parsing SIP messages, the hostname and IP must be extracted from the SIP messages. These hostnames and IP addresses can be found in; Via headers, contact header, SDP-messages, the source IP. For their implementation, they have developed a module for OpenSER (SIP proxy server) by modifying SpamAssassin which is a modular Perl application used by mail server providers to analyze and mark incoming mails. They also proved that the use of DNSBLs for one or more checked IP addresses per call do not increase the call setup delay to an unacceptable value as long as the CPU power is adequate. A SPIT Detection Method Using Voice Activity Analysis proposes by H. Huang [20] indicates that there are 2 type of spam detection types, signaling-based detection and content-based detection. Their method used is a content-based detection. In this paper, based on the difference between human-human dialogue and human-computer dialogue in behavior, using behavioral characteristics parameters identifies spam. And the parameters of the behavioral characteristics are obtained by 19

32 analyzing voice activity status of the calling and called. Without semantic identification, the method effectively avoids the above problems about capacity bottlenecks and legal obstacles. This paper based on SVM established a spam voices detection model, according to spam phone call behavior. Experimental results show that it can reach a higher spam calls detection rate. This model can be used for regulatory authorities and operators to monitor spam voices in telecommunications networks. In addition, as the main calculating spending of the model is about VAD of the voice signals, the model is easy to be applied on the CODEC chip with VAD function and VOIP gateway to design spam voices detection equipment. This paper by J. Jeong [21] divides VoIP service scope into three domains as the outbound, the intermediary, and the inbound domain respectively. They implemented modules for countering spam attacks in each domain. In the outbound domain, They focus on preventing spam and detecting callers who generate abnormal call traffic similar with that of spammers. In the intermediary domains, they focus on preventing from abusing intermediary domain name as the originator of the forged SIP messages. In the inbound domain, they focus on handling spam information immediately as well as the prevention and detection of spam. They use blacklist filter which is used for screening calls from the black-listed SIP URI and Turing Tester that is used for screening real spammers from suspicious callers. They also use Graylist filter which is used for detecting users who have call traffic pattern similar with that of spammer. Easy SPAM which they have implemented on a hardphone reporter enables a victim to report received spam information to the administrator by pushing a button on the hardphone. The proposed system makes administrator take into account both objective and subjective basis of distinguishing spam calls. Considering SPIT level, administrator can distinguish suspicious callers. Easy spam reporter function enables administrator to get definite spam caller list. Simulation of SPIT filtering: Quantitative Evaluation of Parameter Tuning by F. Menna [23] has been trying to tune some parameters for different SPITdetection modules in their simulation. They have been trying to tweak the following modules: blacklist, call-rate, IP-domain, statistical. In particular, they test the efficiency of such 20

33 mechanisms for a variety of attack scenarios and investigate how configuration parameters can impact the performance of the single modules. The four modules that we have defined in making the final decision to accept or reject a call (note that for legal reason a call cannot be rejected by the system; therefore the word rejection is not used in a strict term in this paper, but rather means that the call is identified to be a SPIT one, and might be e.g. redirected to a voic system). The scores of the modules are expressed as a percentage and summed up. Any call whose total score is 100% or more is rejected and does not reach the callee. The results show that opportune design and combination of detection methodologies analyzing different characteristics of the call setup can be tuned to obtain high SPIT rejection ratio and low false alarms (both positive and negative) rates at the same time. A very innovative contribution of this paper is the achievement of a configuration able to defend users from SPIT in a wide range of possible attack scenarios thanks to proper parameters tuning and self-adaptation improvements of one of the key modules Conclusion From the literature survey, we can conclude that SPIT handling can be done at 2 different phases. The first type for handling SPIT is in the signaling phase. We can extract multiple information from this signaling exchange phase. The second type is after the call has been established. This phase includes content filtering, interaction with caller/callee and voice activity detection. There are three types of SPIT in the literature, Voice SPAM, Presence SPAM, Message SPAM. We can also conclude that combining different techniques are needed to handle SPIT efficiently. For the protocol that have been discussion most is the SIP protocol which is a text based protocol for setting up/manipulation/terminate calls. There has been little discussion of how SPIT can be prevented. Most of the techniques that have been proposed are mainly for detection of SPIT. This may lead us for a development for a SPIT prevention system where we combine different techniques that have been already been proposed in a preventive way. There are many techniques that can be used to combat SPIT. Some of these techniques are being used successfully on the internet today against spam. From the literature, we can group the techniques in the following groups: 21

34 Black listing Blacklisting is a very common solution that is used for many different security problems. At its most simply a blacklist is a list of things that are bad. The accompanying system that uses the blacklist checks to see if a certain action, packet, program, etc. is on the list and, if so removes or stops it. Examples of blacklists include: - Virus scanners - Default allow firewall ACLs - Intrusion Detection Systems - Spammer addresses - The no-fly list Advantages: Blacklisting does not require any changes to the VoIP architecture; the filtering can be performed at either the Proxy or the User Agent. Blacklisting does not require any changes to the protocols in use in SIP. Disadvantages: Although making a list of bad things seems like a good idea when the list is still small it quickly grows in size until it becomes unmanageable. spammers have been using fake addresses for such a long time that hardly anybody still uses blacklists. In VoIP it is likely that spammers will quickly adopt the strategies that spammers are using and circumvent the blacklist by using randomly generated SIP URIs Grey listing Although the name grey listing would lead you to believe that this method is a middle ground between blacklisting and whitelisting it is in fact a separate method. Greylisting is basically disallowing all incoming messages or calls unless those messages or calls have already been sent to you once before. The idea behind this is that legitimate users will notice that the message or call did not get through and will try again while the spammer doesn t respond to error messages and simply moves on to the next user to spam. 22

35 Advantages: No changes need to be made to the architecture. The Proxy or User Agent can simply keep a record of which callers have previously made a call and respond with a 200 OK message the second time a call is made. Disadvantages: Since all calls will have to be attempted twice in order to get through normal callers will be inconvenienced. Those callers that are not aware of this security feature might believe that the user cannot be reached and give up White listing White listing is the opposite of blacklisting. Instead of making a big list of all the bad people the white list attempts to make a list of all the good people. Since each person will not have more than a hundred contacts this list will remain small enough to be useful. Advantages: No changes need to be made to the architecture. Assuming that user identities can not be faked by spammers this method eliminates spam entirely. Only messages and calls from authorized people can get through, all other messages are dropped immediately. Disadvantages: This method is very unfriendly to strangers. Since only messages that come from people whose identities are stored in the list are let through it becomes impossible for strangers to call users. Although this drawback can be mitigated somewhat by white listing entire organizations it will still be very inconvenient for normal users to use their phone Traffic Analysis Traffic analysis is an age old technique used in cryptography where instead of looking at the content of the message the attacker looks at who is talking to who and where the signals are physically coming from. In the field of cryptography this is considered a bad thing but it can be used for good. Telephone companies have for a long time been monitoring call traffic. If a certain customer suddenly starts making many long distance calls the telephone company blocks the service because it 23

36 assumes that the phone has been stolen or cloned. A similar method can be used to determine whether incoming calls are from legitimate users or spammers. Advantages: No changes in the VoIP architecture are required. This method is completely transparent to the user. The user will never know that a call has been dropped and does not need to take action. Since the filtering happens at the provider the User Agent will not even show a record of the dropped calls. Disadvantages: Traffic analysis is reasonably difficult and requires significant resources in order to be effective. The VoIP provider might not be willing to accept the extra costs Reputation filtering of users A reputation system works by having users rate each other. If a certain user does not have any negative marks against him it is likely that that user is not a spammer and can be allowed to connect, if the user has a bad reputation or no reputation at all then that user can be denied access. The user reputations are stored in a central repository and can be accessed by anybody. Users would have an account on this server in which the trusted users list is stored. There are two ways in which users can choose to make use of the reputations: - Users can choose to allow any other user to contact them provided they have a favorable reputation - Users can choose to only allow other users to contact them which have a link to them. In this case it is assumed that trust is transitive and friends of friends of friends, ad infinitum, can therefore be trusted. Advantages: The method does not stop professional spammers but it does make harassment by amateurs more difficult. Disadvantages: The user will have to determine which users are trustworthy based on their ranking, the user can either trust only those people he knows (friends of friends), or choose to trust people who have a high ranking. Neither option is very appealing. 24

37 This method suffers from an information poisoning attack if this method is setup to allow unknown but highly ranked persons to contact the user. This attack would involve spammers creating many different identities and then ranking the identities up to each other. The friend-of-a-friend option does not suffer from this flaw but can still be abused if a friend is tricked into vouching for the spammer Rate limiting Rate limiting involves setting limits on the number of calls that a given user can receive or will allow during a set period of time. This method can be implemented in a very simple fashion by not discriminating between calls or in a more advanced way be looking at where calls are coming from or when they are being made. A rule could say that a given user will only accept 5 calls a day from anyplace that is not his home country. If we assume that spammers will try to do jurisdiction shopping and we can detect where the connection comes from this could seriously limit the exposure of the user to spam. Advantages: The solution does not require any architectural changes. The method can simply be implemented by the User Agent or the Proxy. The method is user friendly in the sense that the user will not be bothered by many calls from spammers and it requires none or very little configuration. Unfortunately it could also disadvantage the user by not allowing legitimate users through. Disadvantages: Legitimate users can be disadvantaged by this method. Users will be forced to make a trade-off between spam and legitimate strangers. Stringent rate limiting will remove almost all spam but it will also make it very hard for unknown callers to reach the user. Lax rate limiting makes it easier for unknown callers to get through but the user will also have to endure more spam Computational Puzzle / Proof of effort One of the reasons spam has proliferated is its economic attractiveness; it is very easy and very cheap to send thousands of messages. The cost of bandwidth is falling all the time and it is therefore increasingly profitable to send spam. The proof-of-effort solution attempts to fight this profitability by making it 25

38 more expensive to send messages. The easiest way to accomplish this is to provide a mathematical challenge to the sender; the sender calculates the result and sends it back to the receiver. If the answer is correct the message is let through. In this mechanism is not being used mostly because there is no two way communication, a user deposits a message at a server and that server then sends it on to wherever it needs to go. Eventually the message reaches the recipients mail server and it is deposited into the user mailbox. VoIP on the other hand has direct two way real-time communication and is therefore much more suited to implement a proof of effort system. Advantages: The Proof-of-effort method does not require changes to the architecture. The proof-of-effort can be given by the caller to the proxy or to the user agent. The proof of effort method is effective against both robots and human callers alike. Everybody has to make the effort to get through. Disadvantages: The proof-of-effort method works by having the caller solve a mathematical equation and verifying the result. This equation needs to be defined and the input and output of it needs to be communicated between the caller s user agent and the proxy or the caller s user agent and the callee s user agent. Protocol changes will have to be made to implement this communication Handshake / Challenge / Turing test This method has several names but they all describe a system that is in essence the same. The system provides some test to the caller which the caller has to answer in order to prove that he is a human being and not a machine. It is similar to proof-of-effort with the change that the test should not be possible for a machine to accomplish. Advantages: Just like the proof-of-effort method this solution does not require any changes to the architecture of VoIP. The test can be generated by the proxy or user agent of the callee and answered by the user agent of the caller. 26

39 Disadvantages: The method is very unfriendly towards callers. The callers will have to jump through a lot of hoops just to be able to reach a person. In the current phone system this requirement does not exist Content filtering Content filtering is the primary defense against spam currently in use. The technique runs the message through a filter which compares the message against some magic data. Eventually this system gives a spam-or-not-spam rating and the spam is removed while the good messages are put into the inbox. There are many different forms of filters; some might look only for the existence or non existence of some data, other might give each message an overall rating and throw everything away that is below or above some value. Combinations of these approaches exist. For VoIP the content filter will have to perform voice recognition. Advantages: Content filtering does not require any changes to be made to the protocols in use. Messages will simply be let through and analyzed while they are sitting in the voice mail box or on the fly in the case of a direct call. The method is very userfriendly. Content filtering can be turned on by default and work in the background without the user providing any input. Disadvantages: Voice recognition is difficult, it takes a lot of time and a lot of processing capacity to determine what is being said. In the current architecture there are only two places where these resources can come from; the proxy or the user agent. The proxy cannot be used since it potentially services thousands of users and the user agent cannot be used because mobile phones would simply run out of battery power. 2.4 Evaluation of current Voice CAPTCHA Since it is important to evaluate the current implementation of audio CAPTCHA, this section will provide the reader with some insight of the current implementation. The evaluation of current audio CAPTCHA is being carried for finding a suitable candidate for our Turing test module. 27

40 2.4.1 recaptcha recaptcha is a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows. This is a securer version of its implementation. recaptcha improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly. Advantages: This solution is easy to implement on your website. They have an instruction for implementing on your website. There are also many plug-in ready to be implemented on well-known CMS. Their audio CAPTCHA are easy for human to solve if you are a native English speaker. The background noise which is noise of people chatting in background makes it harder for a computer to decipher the words. Since they have large vocabulary and use multiple announcers, making a training set for ASR would be almost impossible. Disadvantages: The words in the CAPTCHA are very hard for a non-native English speaker to solve. In Internet Telephony environment, we are only equipped with keypad which has only the digit 0-9. It could be possible to enter those words with the keypad but it would make it very uncomfortable Captchas.net The Captchas.net audio CAPTCHA is provided as supplement for the visual CAPTCHA. Configuration can be made for having only digits in the CAPTCHA. This CAPTCHA service is a good alternative for VoIP since it is configurable for use in VoIP environment. The problems with this audio CAPTCHA are that it doesn t have any background noise inserted and the interval between digits is fixed. Furthermore, they use only one announcer. With only one announcer, the CAPTCHA could be easily trained and deciphered into readable digits. 28

41 Advantages: The advantage of this implementation is that you could select what should be included in the CAPTCHA. For VoIP, We would need a CAPTCHA consisting of digits. The creation of CAPTCHA is by using their web-form and a most script to do the creation of CAPTCHA automatically is easy to be implemented. Disadvantages: In the CAPTCHA, there is only 1 announcer and fixed interval between digits. These make it very easy to train the CAPTCHA and crack them. In our lab, we have successfully cracked the CAPTCHA from captchas.net. A CAPTCHA is considered weak if the success rate is over 5% DanCaptcha DanCaptcha is implemented for the World Wide Web environment. DanCaptcha is an AJAX-based CAPTCHA system that is accessible to blind people since it uses sound. The system lacks English announcer but this could be added by the developers themselves. In their algorithm, the creation of the CAPTCHA is simple forward. There is no background noise insertion nor is there variable interval between each prompts which make the CAPTCHA vulnerable to attacks. Also, the speaker is always the same. Advantages: The creation of audio CAPTCHA is simple. It should be easy for a human to understand what is being said in the CAPTCHA. The script is very easy to be integrated with your system. Disadvantages: In the CAPTCHA, there is only 1 announcer. This makes it very easy to train the CAPTCHA and crack them. Their current version lacks English pronunciation Google s CAPTCHA The audio CAPTCHA might be considered the hardest CAPTCHA that have been implemented. Google uses CAPTCHA in the phase of register for services on their website. The audio CAPTCHA is played twice but it is still hard for a native speaker to solve. 29

42 Advantages: It is very hard for a computer to decipher the alphabet in the CAPTCHA. The users have the opportunity to hear the CAPTCHA twice. The background noise is very high compared to other CAPTCHAs. They do stream their CAPTCHA over the Internet which makes it harder to capture the CAPTCHA for training. Disadvantages: Not solvable by non-native speakers. It contains too many characters to be suitable for VoIP environment ebay CAPTCHA ebay CAPTCHA is implemented to help the website from being used by automated programs. Their CAPTCHA usually appears when user s interaction is required. The author has seen ebay CAPTCHA when he was going to contact a seller via their contact system. ebay CAPTCHA contains 6 digits that the user has to decipher. Advantages: It is easy for humans to solve the CAPTCHA. It has a limited data field of ten (10) digits (0-9). The number of spoken characters is always six (6). They use multiple announcers to announce the digits in the CAPTCHA. It is also a streaming reproduction. Disadvantages: Fixed duration and interval between digits makes it easy to train and crack the CAPTCHA. There is also no intermediate noise between the digits which makes it easy to decipher the digits with a computer program. 2.5 SPIT Proposed Solution Although none of these candidate techniques work on their own it is possible to combine them so that the combined solution satisfies all our requirements. In the literature, three researchers [5][8][21] have been trying to invent a solution for fighting SPIT with Turing Test. The researchers haven t explained how their Turing Test has been implemented. We have taken this into account and ask how 30

43 we could implement a Turing Test that will satisfy the HIP (Human Interactive Proof). Our Turing Test should be easy for a human to solve and should be hard for current computer programs to solve. As one technique is not enough to fight SPIT our system will be analyzing the pre-call phase (signaling exchange stage) before we decide if the caller should be sent to the Turing Test or not. We would also evaluate if it is feasible to fight SPIT by combing multiple techniques for handling SPIT created by robots (automated SPIT calls). Our system should be a standalone server handling spam instead of having the SIP proxy to do this work. This is an innovate approach to handle SPIT since in the proposed solutions, they designed their solutions based on the sip proxy that could be overloaded by excessive jobs for handling SPIT. As mentioned before, our solution will be using different techniques for handling SPIT. A new module will be developed for the first stage of analyzing the SIP messages. Instead of restricting users to receive a certain amount of calls, we will instead make limitation for callers to make calls. We call this Reverse Call Limit technique. Our design prevents SPIT from going out from the caller s domain. This will prevent SPIT messages to circulate in the Internet cloud. This hasn t been done before since all the proposed solutions have been concerned about how they can detect a SPIT message after it has been arrived at the callee s domain. Finally, we will evaluate the system by having humans to test the Turing Test which have never done before in the literature. 31

Chapter 3 Modular Anti-SPIT framework The software developed in this research is called Modular Anti-SPIT framework (MASF) and it is designed to be flexible and extensible because it is important

44 Chapter 3 Modular Anti-SPIT framework The software developed in this research is called Modular Anti-SPIT framework (MASF) and it is designed to be flexible and extensible because it is important that the software can adapt as the SPIT changes. MASF is split into two classes named logging and detection. The logging part is used to store IP addresses that are accessing the system. The data collected is written to a database. The database is then read and analyzed by detection modules. If any of the detection modules detects something suspicious an action is then taken. This could be call termination, sending the call to the Turing test. Fig. 3.1: Design of Modular Anti-SPIT framework 32

45 This chapter will start by covering structures of the framework and code of the modules that have been developed as proof of concept. We will then continue with a more detailed description on the different MASF parts in the following order: - Logging - Detection - Action - Detection modules 3.1 Framework structure The framework is based on Asterisk Dialplan Program Structure. In Asterisk, functions or programs can be implemented either externally, through an Asterisk Gateway Interface (AGI) script (in much the same way that a Common Gateway Interface [CGI] script can add functionality to a web page) or internally, through functions and applications in the dialplan. The dialplan is defined in the extensions.conf configuration file. The dialplan itself looks much some archic language created in 1950th and look somewhat like Fortran II program. The administrator can implement features and call flow using a simple scripting language. Program structure Each telephone number defined in the Asterisk dialplan (/etc/asterisk/extensions.conf) is really a small program. In Asterisk, the program is called an "extension." An extension looks like this: exten => 1001,1,Answer() exten => 1001,n,Playback(hello-world) exten => 1001,n,Hangup() Priorities may also be numbered sequentially: exten => 1001,1,Answer() exten => 1001,2,Playback(hello-world) exten => 1001,3,Hangup() 33

46 The two extensions depicted here are functionally identical. If you use n, however, it makes adding and deleting entries in the extension much easier later on. The MASF has been designed to be fast to develop new modules. New module can be developed directly to the dialplan, as Asterisk macro or as AGI script that can be written in multiple languages. Asterisk AGI currently supports ActiveX, Java, Pascal/ObjectPascal, Perl, PHP, Python, Ruby, C,.NET. Currently some variables are common that all modules need to use. Those are; 'SPAM', 'IP'. 'SPAM' variable is used for to indicate if the module has detected that the call is a SPIT. If the call is SPIT, the call is then forwarded to the turing test for futher investigation of the call. 'IP' variable is the IP of the caller. This information is extracted from the SIP VIA header from the current call. Below is the dialplan for our framework. As you can see, first stage modules (non-intrusive) have been developed as AGI scripts written in perl followed by the turing test module written as Asterisk macro in the second stage (intrusive). 1 exten => _X.,1,Set(IP=${CUT(CUT(SIP_HEADER(Via),,2),:,1)}) 2 exten => _X.,n,Set(TESTAT=${CUT(SIP_HEADER(From),@,2)}) 3 exten => _X.,n,Set(SPAM=0) 4 exten => _X.,n,AGI(dblogger.pl,${IP}) 5 exten => _X.,n,AGI(whitelist.pl,${IP}) 6 exten => _X.,n,AGI(blacklist.pl,${IP}) 7 exten => _X.,n,AGI(dnsbl.pl,${IP}) 8 exten => _X.,n,AGI(ratelimit.pl,${IP}) 9 exten => _X.,n,noop(SPAM = ${SPAM}) 10 exten => _X.,n,GotoIf($[${SPAM} = 1]?turing:dial) 11 exten => _X.,n(turing),Macro(turingtest, ,${EXTEN}) 12 exten => _X.,n(dial),goto(demo,${EXTEN},1) 34

47 Fig. 3.2: Activity diagram of Modular Anti-SPIT framework 35

3.2 Logging This section includes a detailed description on how the logging part of MASF works and how it is implemented. 3.2.1 Overview To log IP of the calls that is entering our system, an AGI script called dblogger.

48 3.2 Logging This section includes a detailed description on how the logging part of MASF works and how it is implemented Overview To log IP of the calls that is entering our system, an AGI script called dblogger.pl has been written to collect this data. Firstly the IP variable is sent to the module as argument. This can also be considered as input to the function from the program from which it is called. This module will insert the IP and the current time into the table in the database. This is shown in figure 3.3. Fig. 3.3: Flowchart of DBLOGGER 36

49 3.2.2 Implementation The module has been written as a perl script. We have used the perl DBI as interface to MySQL database. The DBI is the standard database interface module for Perl. It defines a set of methods, variables and conventions that provide a consistent database interface independent of the actual database being used. The module starts working by defining the usage of DBI module. It then connects to the database and execute INSERT query. The IP and current time is then inserted to the access table in our database. Lastly, it disconnects itself from the database to terminate the connection. 3.3 Detection This section includes a detailed description on how the detection part of MASF works and how it is implemented. It will also cover the concept of detection modules Overview The job of the detection part is to read the information from multiple sources to indicate if the call is SPIT or not. The prototype modules that have been developed are currently using information from database, DNS query and user's interaction. The modules, or mods, can then return either a normal status or execute an action of suspicious behavior. An action can be call termination or a flag indicating that a SPIT has been detected Implementation Detection's job is to determine if the call is a legitimate call or SPIT. Implementation of the detection part is mostly done by writing a perl module to detect SPIT. We will see how modules work and how they are developed in Detection modules section. 3.4 Action This section explains how the action part of MASF works. It will also include a detailed explanation of the implementation. 37

50 3.4.1 Overview The purpose of the action part is to act on the alerts raised by the detection part. What action should be taken and how it should be accomplished will differ for each detection modules. Possible actions include: - Call termination - Set a flag - ing the administrator - Send call to the turing test (intrusive test) - Adds a SIP header to the SIP dialog Implementation In our proof of concept implementation of this system, we have included three possible actions; call termination, set a flag and send call to the turing test. All of these are done within the different modules. We will see how each module is taking action in Detection modules section. 3.5 Detection modules In this section we cover the five detection modules that we have implemented. Each subsection will start with a short overview and after that a more detailed information regarding the implementation Whitelist module This section describes the whitelist module implementation. Overview The whitelist module takes an IP and tries to match it to an entry in a predefined whitelist database. If the IP was matched the module will take the action by calling the callee and skip all other modules. If a match was not made the module will take no action and the test is continued. The flow of white list module is presented is figure

51 Fig. 3.4: Flowchart of white list module Implementation The whitelist module has been implemented with perl as Asterisk AGI module. It takes the IP as argument to be checked in the database. The whitelist database has one field in its table which is 'IP'. The IP of know sources whose are not SPIT should be added to be database to skip all other detection modules. This will save time and resources. Taking an example of a broker telephone's usage, they usually make multiple calls in a given time of period and this could result a false-positive alert on the rate-limit module. Whitelist module works by first getting the IP which is sent as argument into the module. It then queries the database. If the IP is found, the call is then forwarded 39

52 to the callee skipping all the other modules. If the IP is not in the database then test is continued Blacklist module This section describes the blacklist module implementation. Overview The blacklist module takes an IP and tries to match it to an entry in a predefined blacklist database. If the IP was matched the module will take the action by terminating the call. If a match was not made the module will take no action and the test is continued. The flow of blacklist module is presented in figure 3.5. Fig. 3.5: Flowchart of black list module 40

53 Implementation The blacklist module has been implemented with perl as Asterisk AGI module. It takes the IP as argument to be checked in the database. The blacklist database has one field in its table which is 'IP'. The IP of know sources whose are SPIT should be added to be database to skip all other detection modules and terminate the call. This will save time and resources. This could be a known SPIT source that hasn t been added in the DNSBL database or IPs that has been trying to spam the system. To add an entry to the blacklist, the administrator can add the IP manually or the system can add it automatically. For the system to add it automatically, this could be the case if the caller couldn t pass the Turing test multiple times. Blacklist module works by first getting the IP which is sent as argument into the module. It then queries the database. If the IP is found, the call is terminated. If the IP is not in the database then test is continued DNSBL module This section describes the DNSBL module implementation. Overview The DNSBL module takes an IP and tries to get a respond back from a DNS query. If the IP is returned, the call is terminated. If nothing is returned, the module will take no action and the test is continued. A DNSBL is a list of IP addresses published through the Internet Domain Name Service in a particular format. DNSBLs are most often used to publish the addresses of computers or networks linked to spamming; most mail server software can be configured to reject or flag messages which have been sent from a site listed on one or more such lists. The flow of DNSBL module is presented in figure

Fig. 3.6: Flowchart of DNSBL module Implementation The DNSBL module has been implemented with perl as Asterisk AGI module. It takes the IP as argument to make a DNS query from a DNSBL entry.

54 Fig. 3.6: Flowchart of DNSBL module Implementation The DNSBL module has been implemented with perl as Asterisk AGI module. It takes the IP as argument to make a DNS query from a DNSBL entry. The DNSBL module works as the following. It first extracts the IP address from the VIA header and reserves the order of octets (say, , yielding ). The next step is where the query begins. If the return is positive, which mean that the IP address is returned, then the IP address is a SPAM source. If the return is negative, then the call is not SPIT and will be forward to the network. This module is quite effective where it requires very little of time to determinate if the call is from the spam source or not. The script execution's time is as little as second. This module is also implemented in perl scripting language as 42

55 an AGI script. A major advantage of this module is that we don't have to maintain the spam source list. This is done by the provider. The Spamhaus Block List is an example of a DNSBL provider. The DNSBL module has been implemented using multiple perl modules to do the IP reversing and to do the DNS query. For a detailed design of the module please see appendix Rate-limit module This section describes the rate-limit module implementation. Overview The rate-limit module works by querying the database that has been created by the logger module. It determines if the call is SPIT if the call rate is being over the threshold set. If the call rate over the threshold, the call is sent to the Turing test module for further inspection. This module is a novel technique that has been developed to fight SPIT. The flow of rate-limit module is presented in figure

Fig. 3.7: Flowchart of rate-limit module Implementation The rate-limit module is a novel method to fight SPIT by limiting calls to be made in a given time of period.

56 Fig. 3.7: Flowchart of rate-limit module Implementation The rate-limit module is a novel method to fight SPIT by limiting calls to be made in a given time of period. This is configurable in the script to suite the environment. For a legitimate caller, he/she wouldn't call into system more than a few calls within 1 hour. For spitters, they would try to calls as much as possible in a given time of period. This module works by getting the IP address from the "VIA" header and does a SQL query to find how many calls have been made from that IP address. If the amount exceeds the rate-limit, then call is considered a SPIT and is send to the second stage (the Turing Test). 44

57 The rate-limit module has been implemented with perl as Asterisk AGI module. It first reads the configuration file "config.ini" to set value of the rate-limit and the time interval. Next, it starts the query process to find how many calls that have been made from the calling IP. If the call count is over the threshold, the module takes the action by setting the SPAM flag to '1'. There is also a module to reset the call-rate after the caller has completed the Turing test. This module works by deleting the entries in the database for a specific IP. This is to reset the call-rate for a given period of time Turing test module This section describes the Turing test module implementation. Overview The Turing test module is a module which interacts with the caller, thus, an intrusive test being conducted. It has been design by following the recommendation from the HIP (Human Interactive Proof). Generally speaking, a successful HIP should have the six following desired properties [38, 39, 40]: - The test should be automatically generated and graded by computer; - The test should be user-friendly; - The test should be easy for human users; - The test should be hard for machine to pass; - The test should be resistant to no-effort attacks; - The test should be robust when database is publicized; - The test should be universal for global users. The flow of the Turing test is presented as flow chart in figure

58 Fig. 3.8: Flowchart of Turing test module 46

59 Implementation The Turing test module has been implemented as Asterisk macro with an internal perl module to generate the voice prompts. The Turing test starts by asking the caller to make an addition of two numbers answering by using the keypad. If the caller can solve the challenge, the data in the logger database is reset and the call is forwarded to the callee. Technically, this module does the sound mixing by using sox util. The sound mixing is done to accomplish that the test should be hard for machine to pass and easy for human users. The background music is composed by Mozart and is freely available on the Internet Enhanced Turing test module This section describes the enhanced Turing test module implementation. Overview The enhanced Turing test module is an improved version of the Turing test module described in In this implementation, Voice CAPTCHA has been used for generating the challenge-response token for the caller. Voice CAPTCHA is originally developed as an alternative for visual CAPTCHA. We cannot expect that a blind user is able to solve visual CAPTCHA, so, a Voice CAPTCHA is used with those people with this kind of handicap. There are many implementations of Voice CAPTCHA where most of them are most suitable to use keyboard as an input interface. Our design of Voice CAPTCHA is more suitable to use numeric-keypad as this is the main interface on every phone including softphones and hardphones. The module generates a series of digits for the user to input from the keypad. If the caller keys in the correct order of the digits, the caller is granted access to the VoIP network and the call is forwarded to the callee. The flow of the process on creation of Voice CAPTCHA is presented in figure

60 Fig. 3.9: Flowchart of Voice CAPTCHA creation Implementation The enhanced Turing test module has been implemented as Asterisk macro with an internal perl module to generate the voice CAPTCHA. The Turing test starts by asking the caller to make an input of what they hear in the challenge token by using the keypad. If the caller can solve the challenge, the data in the logger database is reset and the call is forwarded to the callee. Technically, this module has many randomize algorithms for making the module more robust against attacks. There are multiple announcers to announce the digits in the CAPTCHA. Offset to insert noise is randomly selected by a randomize algorithm. Interval between digits is randomized to making it even harder for a 48

computerize transcriber to be able to decipher the digits into a readable text. For a more detailed process s description, the reader could have a look at the source code in the appendix.

61 computerize transcriber to be able to decipher the digits into a readable text. For a more detailed process s description, the reader could have a look at the source code in the appendix. The flow of enhanced Turing test module is the same as in figure 3.8 with a minor difference. The Turing test is expecting 3 digits to be entered and compared to the expected digits that have been generated beforehand. The rest of the processes are the same. For the utility that have been used to mix and trim audios, SoX, a Linux command line audio processing program has been chosen for the entire task in creation of Voice CAPTCHA Enhanced Turing test module (version 2) This section describes the enhanced Turing test module (version 2) implementation. Overview The enhanced Turing test module (version 2) is an improved version of the Turing test module described in In this implementation, Voice CAPTCHA has been used for generating the challenge-response token for the caller has one more attribute. The new attribute is the random sound of words from a dictionary. In this version, the CAPTCHA is divided into 7 periods. In the odd periods (1,3,5,7), are there noises generated from English words pronounced in Spanish. In the even periods (2,4,6), are there digits pronounced by multiple announcers in English. This can be seen in figure Fig Visual design of Voice CAPTCHA As one might see, the number of noises might be a value of 0, 1, or 2. This will make it harder for a computer program to guess the start position of the digits. After the creation of the CAPTCHA in the first stage, background noise is added to it. The final CAPTCHA is presented in figure

Fig. 3.11 noise(2) 1 - noise(2) 0 - noise(2) 8 - noise(2) In figure 3.11, 2 words are inserted as noise in each period.

62 Fig noise(2) 1 - noise(2) 0 - noise(2) 8 - noise(2) In figure 3.11, 2 words are inserted as noise in each period. This is not always true since we use a randomize algorithm to select a number of 0-2. It might be that some CAPTCHAs might not have any noise at all. Some might have only 1 word in the first period and 2 words in the third period. This is done for the sake of fooling the bot(s) or computer program(s) that is trying to crack the CAPTCHA. This design can handle a whole English dictionary so it will be almost infinite with words for generating noises. More on this will be discussed in the experiment chapter. Implementation The enhanced Turing test (version 2) module has been implemented as Asterisk macro with an internal perl module to generate the voice CAPTCHA. The Turing test starts by asking the caller to make an input of what they hear in the challenge token by using the keypad. If the caller can solve the challenge, the data in the logger database is reset and the call is forwarded to the callee. Technically, this module has many randomize algorithms for making the module more robust against attacks. There are multiple announcers to announce the digits in the CAPTCHA. Offset to insert noise is randomly selected by a randomize algorithm. Noises generated from English words announced in Spanish are inserted between digits. For a more detailed process s description, the reader could have a look at the source code in the appendix. The flow of enhanced Turing test module is the same as in figure 3.8 with a minor difference. The Turing test is expecting 3 digits to be entered and compared to the expected digits that have been generated beforehand. The rest of the processes are the same. For the utility that have been used to mix and trim audios, SoX, a Linux command line audio processing program has been chosen for the entire task in creation of Voice CAPTCHA. 50

63 Chapter 4 Experiment In this chapter, various testing and experimenting are shown. We will simulate 100,000 calls entering our system to find out the effectiveness of the rate-limit module and to test the ability of a computer program to guess the answer in the Turing test. Call set-up delay will be calculated and evaluated if it is acceptable in a production environment. How many concurrent calls the system can handle will be calculated by using SIPp software to send calls into the system. Lastly, the noise insertion for the Turing test to circumvent a transcriber but still be understood by human will be tested and evaluated. 4.1 Experimental environment For testing our system, we have chosen to use SIPp. SIPs is a performance testing tool for the SIP protocol. It includes a few basic SipStone user agent scenarios (UAC and UAS) and establishes and releases multiple calls with the INVITE and BYE methods. It can also read XML scenario files describing any performance testing configuration. It features the dynamic display of statistics about running tests (call rate, round trip delay, and message statistics), periodic CSV statistics dumps, TCP and UDP over multiple sockets or multiplexed with retransmission management, regular expressions and variables in scenario files, and dynamically adjustable call rates. SIPp can be used to test many real SIP equipments like SIP proxies, B2BUAs, SIP media servers, SIP/x gateways, SIP PBX,... It is also very useful to emulate thousands of user agents calling your SIP system. The two machines used have the following characteristics: PC1: System: Linux generic #48-Ubuntu SMP CPU: Intel(R) Pentium(R) M processor 600 MHz 51

Memory: 0.50 GB RAM Connection: 20MBPS ADSL2+ PC2: System: Linux 2.6.18-164.11.1.el5.028stab068.3 #1 SMP CPU: Intel(R) Xeon(R) CPU X3220 @ 2.40GHz (QUAD) Memory: 1.

64 Memory: 0.50 GB RAM Connection: 20MBPS ADSL2+ PC2: System: Linux el5.028stab068.3 #1 SMP CPU: Intel(R) Xeon(R) CPU 2.40GHz (QUAD) Memory: 1.00 GB RAM Connection: 100MBPS Fiber optic PC3: System: Linux el5.028stab068.3 #1 SMP CPU: Intel(R) Xeon(R) CPU 2.40GHz (QUAD) Memory: 1.00 GB RAM Connection: 100MBPS Fiber optic Machine 1 has Asterisk version installed along with modules that have been developed. On machine 2, we have installed SIPp to generate SIP traffic. Both machines are connected as shown in figure 4.1. For some of the experiments, machine 1 is being replaced with machine 3 which has more CPU power and more RAM. Fig. 4.1: Testing s network topology 52

It is also important to mention that on machine 1, we are using MySQL Version 14.14 Distribution 5.1.37, for debian-linux-gnu (i486) using EditLine wrapper as relational DBMS. 4.

65 It is also important to mention that on machine 1, we are using MySQL Version Distribution , for debian-linux-gnu (i486) using EditLine wrapper as relational DBMS. 4.2 Call Simulation The modules residing on Asterisk SIP server are triggered when an INVITE message is received. DBLogger module gets the local system time on which it is installed upon receiving the INVITE message and the rate-limit module determines the call interval between this call and previous call placed by the same user from the same IP. MASF passes the INVITE message to the concerned SIP server if the computed SPIT level for that user is less than the threshold or it will just block the INVITE message from getting to the SIP server. A simulation was run using 100,000 individual callers randomly making 1 21 calls within a one-hour timeframe. Callers that were randomly assigned 20 or less calls were tagged as human callers for the simulation, while callers that were assigned 21 calls were randomly tagged as either a human caller or a SPIT program caller. Figure 4.2 shows the distribution of calls for the given simulation. Fig. 4.2: Call Distribution As shown in Figure 4.2, approximately 99% (98,721) of the calls in the simulation were tagged as human while only about 1% (1,279) of them was tagged as SPIT. The average number of calls made by humans in the simulation was found to be 53

66 A constant rate limit of 20 calls was used such that any call in excess of 20 was redirected to the turing test. Figure 4.3 shows the number of calls redirected to the turing test. Fig. 4.3: Rate-limit Effectiveness As seen from Figure 4.3, 1209 human callers were redirected to the turing test. This consists of about 1% of the human callers. On the other hand, all of the SPIT callers were redirected to the turing test. This is intuitive given the circumstances of the simulation. In an actual setting, it is possible for the number of calls made by SPIT programs to be adjusted such that they can successfully bypass the rate limit. This can be done by running an algorithm that would first determine the rate limit by calling the system until it gets to the turing test. Once the rate limit is revealed, it would be possible for the SPIT program to successfully make calls lower than the rate-limit per unit time. To counteract this possibility, the system can be further developed to randomly change rate limits, which may prevent SPIT programs from determining its rate-limit. However, this would also increase the number of human callers that would be redirected to the turing test. Still, the relatively small probability that a human caller would fail the turing test makes randomizing the rate limit within a to-bedetermined optimized interval a feasible option in further study. Once the calls are redirected to the turing test, it is assumed that human callers will be able to answer the turing test question correctly at 99% accuracy. The literacy 54

67 rate of the United States and the United Kingdom was used as the probability of successfully answering the turing test [35]. On the other hand, it is also assumed that the SPIT program would attempt to answer the turing test question by randomly selecting a number from 0 to 9. A table representing the possible sums given the randomization algorithm used in the software is shown below. As shown on Table 4.1, the most frequent sums based on the randomization algorithm used are 4 and 5, each of which appears 5 times. Next to this, the sums 3 and 6 both appear 4 times. With these characteristics of the randomization algorithm, the probability that a SPIT program guessing the correct answer to the turing test given three tries to accomplish the task is computed as This may be achieved by selecting either 4 or 5 as the first option, the unselected number between 4 and 5 as the second option if the first option is incorrect, and either 3 or 6 if the second option is incorrect. Using this probability, the simulation was continued to determine how many of the calls that were redirected to the turing test would be cut off. Figure 4.4 shows the results. 55

68 Fig. 4.4: Turing Test As seen on Figure 4.4, only 14 out of 1209 of the human callers were cut off. On the other hand, 549 out of the 1209 SPIT callers were cut off. This yields a total of 1744 correct decisions out of 2488 during the turing test stage which translates to an estimated 70.01% accuracy rating under the given assumptions. This accuracy may be improved primarily by altering the algorithm for determining the numbers to be used for the turing test. Firstly, the current algorithm creates an imbalance in the probability of each answer, making it possible for a SPIT program to capitalize on the most likely sums. This can be avoided by assuring that only one pair of numbers given by the turing test yields a particular sum. By doing so, the probability distribution of the possible sums is converted to a uniform distribution where each sum is equally likely to occur. Thus, the probability that the SPIT program would be able to guess the correct answer to the turing test within 3 tries would be reduced without any change in the software s interaction with human users. 4.3 Call Set-up Delay (CSD) Call setup delay (also known as post-dialing delay or post-selection delay [34]) is defined as the interval between entering the last dialed digit and receiving ringback. Another, related, measure is the time between entering the last dialed digit and when the callee s phone starts to ring. We will refer to this delay as the dial-toring delay, as there does not seem to be a standard designation. In a traditional phone 56

69 system, there is no acoustic feedback between dialing and ringback, so that an excessive delay until ringback may lead the caller to believe that something is wrong and abandon the call. Internet telephony has the advantage that it can provide additional feedback during call setup, before ringback. For example, SIP servers can send any number of provisional responses that indicate the progress of address translations or other network actions, as discussed in Section III. E.721 [34] recommends an average delay of no more than 3.0, 5.0 or 8.0 s, for local, toll and international calls, respectively. Fig. 4.5: SIP call setup (initial portion) Figure 4.5 shows part of a basic SIP call setup. A Client sends an INVITE call setup message to a User Agent Server (callee). Usually, the UAS returns one or more provisional response messages indicating receipt of the INVITE request and call progress. This is roughly equivalent to the ISDN IAM/ACM message exchange, with the delay representing the post-dial delay. This simple call setup, comprising the reliable exchange of an INVITE and provisional response messages (with the postdial delay shown in the figure), is a key element of our comparative study. The call setup delay introduced by our modules is calculated by inserting codes to them. The code will calculate the elapsed time that each module is using. 57

After summarizing all the elapsed times, the total should be less than 3 seconds to follow the recommendation. The code that has been inserted into all the non-intrusive modules is shown in figure 4.

70 After summarizing all the elapsed times, the total should be less than 3 seconds to follow the recommendation. The code that has been inserted into all the non-intrusive modules is shown in figure 4.6. Fig. 4.6: Code insertion for elapsed time Below is the screen capture of a live call entering our system. As one might see, the call is passed thru different modules before the RING signal is initiated by the SIP server. Fig. 4.7: A live call entering the system. 58

71 It is clearly that our processing time for all the modules is less than 3 seconds and is acceptable for using in a production environment as shown in table 4.2. Table 4.2: Processing time introduced by modules Modules Elapsed time DBLOGGER Whitelist Blacklist DNSBL Rate-limit Total (sec) Maximum Concurrent Calls We start our testing by inserting calls into our Asterisk system where all modules are processed for each call that is entering the system. We then increase callrate as shown in table 4.3. In this table, you will also be able to see the average CSD for each call-rate that has been inserted into the VoIP network. Table: 4.3 Call Setup Delay call-rate CSD With data from table 4.3, we have plotted a graph based on those data and it is shown in figure 4.8. The steps for calculating CSD for each call rate is done by following these steps; first, we find the CSD by making a summation for each call by doing a plus for elapsed time of every modules. We then find an average for those calls by making a summation of every call s CSD and divide with the number of calls. 59

72 Fig: 4.8 Concurrent call and CSD It is clearly that the maximum number of concurrent calls the system can support without affecting the 3 second s call setup delay is 16 calls. There are several factors that lead to this data. One factor is the low-end CPU used to test the system. Secondly, is the I/O rate of machine 1. Hard disk rotation rate on this machine is only 4200 RPM and this could affect the read/write performance of database. Thirdly, MySQL DBMS might not be as fast as other DBMSs. For our future work, we would like to test this system against a DBMS other than MySQL. There are several DBMSs out there that could be implemented into the system. These include but not limited to; Postgres, Oracle, SQLite, etc. 4.5 Turing test Our Turing test design is one of the important implementation that is being done. The Turing test can distinguish between human calls and automated calls by introducing the caller to make an addition of 2 numbers. The test is designed to test if a computer based transcriber can transcribe what is being announced by the system. We have chosen 2 tools for this; Sphinx and Sox. Sphinx is an open source toolkit for speech recognition, a project by Carnegie Mellon University. Sound exchange, abbreviated SoX, is a free cross-platform digital audio editor, licensed under the GNU General Public License, and distributed by Chris 60

Bagwell through SourceForge.net. SoX is written in standard C, and has a command line interface. We first test to transcribe a sound file that has been generated by the system (with SoX).

73 Bagwell through SourceForge.net. SoX is written in standard C, and has a command line interface. We first test to transcribe a sound file that has been generated by the system (with SoX). We have already edited the grammar file to recognize the numbers 0-9 and how, much, is, plus. As you can see on line #5 in figure 4.9, the transcriber can transcribe the announcement correctly. This means that the transcriber is working as it should and our sound format is supported by the Sphinx tool kit. We then insert a background noise into the announcement with SoX util. We have chosen Mozart as our background noise. We first insert 10% of background noise into the announcement and increases it to 20%, 30%, 40%, n% until the transcriber cannot transcribe any longer. From our study, the level of background noise should be 30% to circumvent the transcriber. Fig: 4.9 Testing of the transcriber The next question is if the announcement is still understandable for humans. We have questioned 5 people to listen to the announcement and make a correct computation for the 2 numbers that are going to be calculated and the success rate was 100%. All of them could understand the announcement and answered correctly on the 61

74 first try on the Turing test. Those people that were in our sample were native European whose are educated and have had English as a subject at school since the second year at primary school. One more important thing to mention is that some sources are saying that Audio CAPTCHA has been cracked. After a search on Internet, we found that Google s Audio CAPTCHA has been cracked. This has given us the idea of creating a new version of Turing test. This will be discussed in the next chapter. 4.6 Call set-up Delay (CSD) on a high performance machine We have replaced machine 1 with machine 3 and testing out CSD produced on a high level performance machine. The procedures for collecting data are the same as we do in section 4.3. The test has been conducted on a closed environment where 2 machines are connected together via 100 Mbps connection. All the modules have been moved to a new server in our lab in Dallas and we ran the same procedures as we did on the machine 1. Table: 4.4 Elapsed time on high performance server Modules Elapsed time DBLOGGER Whitelist Blacklist DNSBL Rate-limit Total (sec) The two data are compared in figure 4.10 and we can conclude that our modules are running faster on server graded hardware. 62

75 Fig: 4.10 Elapsed time of modules 4.7 Maximum Concurrent Calls on a high performance machine We have replaced machine 1 with machine 3 and testing out maximum concurrent calls on a high level performance machine. The procedures for collecting data are the same as we do in section 4.4. The test has been conducted on a closed environment where 2 machines are connected together via 100 Mbps connection. The result of injecting SIP calls into the server for finding maximum concurrent calls is the following; we started with small amount of injecting calls by using SIPp into the server. The result is quite promising. In our test, Asterisk could handle up to 399 calls simultaneously without effecting the 3 second s CSD limitation. From this factor, we can conclude that running our framework on server graded hardware will allow the system to handle more calls simultaneously. We also found out that SIPp could send out cps (call per second). 4.8 Voice CAPTCHA We would like to introduce some basic principles of audio CAPTCHA and evaluate some well-known CAPTCHA services where they have audio CAPTCHA as an option in their implementation. 63

76 Several audio CAPTCHA exist for filtering SPAM on the Internet but not all of them are suitable for Internet Telephony. Basically, audio CAPTCHA is a supplement of a visual CAPTCHA used on people with visual disability. Audio CAPTCHA or Voice CAPTCHA may consist of words, alphabets, and/or digits. Most of the audio CAPTCHA usually contain a background noise for making it harder for a machine to decipher the challenge token. Users are expected to enter what they hear back to the system. If the response is correct, access to the restricted information is granted to the user. As mentioned before, there are many CAPTCHA implementations available today most of them are not suitable for Internet Telephony. We would like to invite you to take the journey together with us to find the most suitable Audio CAPTCHA before we reveal why none of them are suitable for being proposed as a novel approach for handling SPIT Audio CAPTCHA comparative overview In this section, a comparative result is being shown. The result is drawn by comparing different current CAPTCHA with our newly designed CAPTCHA. As you can see from the table, there are many attributes taken into account for doing this comparative review. Table: 4.5 Audio CAPTCHA comparative overview In this section we discuss which of the CAPTCHA in table 4.5 could be candidates for anti-spit purposes. The only requirement this CAPTCHA should is that the vocabulary should be limited to digits (0,...,9), as the CAPTCHA is to be used in SIP environment where the input capability is limited to DTMF tones only. Sending letters with a phone keypad is possibly but it would require too much time to complete. Pressing multiple times on the phone keypad for sending a character would just make it timeouts on the server side. The user success rate should be high (>80%). Google and recaptcha cannot satisfy this requirement. Moreover, recaptcha 64

77 uses phrases in their CAPTCHA which makes in very impractical for VoIP environment. ebay CAPTCHA has already been cracked and is not considered in this review. ebay CAPTCHA also contains long character variation (6) which make its almost impossible to memorize during the challenge period. DanCaptcha seems to be a good candidate but it lacks English language support in their CAPTCHA. DanCaptcha is an open source project which you can modify the source code to suite your need. It seems like the implementation of this project is not completed since it lacks English support. DanCaptcha is then not considered for further study. The remaining CAPTCHA implementations (Captchas.net; Captcha V1; Captcha V2) could be, in principle, used for anti-spit purposes. Even though Captchas.net contains long character variation, this can be changed in the configuration part on their website when creating CAPTCHA. Captcha V1 and Captcha V2 are our implementations that have been designed to be robust against bots and have high user success rate. The design of our CAPTCHA implementations could be read in chapter Evaluation of selected audio CAPTCHA At this stage, we have decided upon the 3 selected CAPTCHAs. The next step was to evaluate them against the two bots (devoicecaptcha and SPHINX). For the devoicecaptcha bot we had to create a training session, because it works with a comparison to a training set. We took 500 audio files of each CAPTCHA as a training set and tested it with the remaining 500 audio files. The result (figure 4.11) was a clear defeat of the two CAPTCHA, as the bot had an 81% success rate for Captchas.net and an 83% success rate for Captcha V1. For Captcha V2, the bot could only decipher 3% of the CAPTCHA. We can conclude that our CAPTCHA implementation version 2 is the clear winner. With high user success rate and low bot success rate makes it the most suitable for VoIP environment. 65

78 Fig: 4.11 Success rate of selected audio CAPTCHA In the next evaluation, a sample of users will be testing the system by listening to the CAPTCHA. They will be given 10 tries to decipher the CAPTCHA. The CAPTCHA is randomly generated on the fly in our Turing test module. The user's task is to guess the number in the CAPTCHA. There are 4 persons in our experiment, 3 male and 1 female. In the experiment for CAPTCHA version 2, guys could decipher correctly at 76.66% while the only female in the experiment could decipher correct at 100%. From this data, we can conclude that the human success rate is 82.5%. To sum up, based on the aforementioned tests and the VoIP system requirement (e.g., only digits in vocabulary), we concluded that our implementation of CAPTCHA (version 2) could be considered as efficient enough for a VoIP system SPHINX bot experimental of Captcha V1 For the SPHINX test environment a small custom application was created, in order to decode multiple wav files in batch form and send to output the corresponding results. We have used our CAPTCHA generation script to generate CAPTCHAs from the command-line to find the accuracy. The first set is with fixed interval between digits and the second set is with variable interval between digits. Noise is injected to those 2 sets by 10% until the noise insertion is 100%. The noise is a sound of monks chanting in a Japanese temple. 66

79 From our experiment, the multiple announcers don t help much to circumvent the transcriber. What really has great impact is the variable interval between digits. Referring to the figure 4.12, we can conclude that accuracy rate is dropped significantly when it has noise inserted by 10% in variable interval. The meaning of interval is the time between each announced digits. Another thing that might be concluded is that the optimum noise level to be used is 40%. Accuracy rate of both intervals at noise level of 40% has almost the same value. The next question is if noise level of 40% is acceptable for humans or not. Fig: 4.12 Accuracy of Spinx deciphering CAPTCHAs In figure 4.13, the success rate is shown. The success rate is the percentage of success among a number of attempts to decipher the digits correctly. As we can see in the graph, variable interval has a very great impact on the success rate on the performance of the ASR. The success rate for deciphering the CAPTCHA with variable intervals dropped dramatically compared to those with fixed intervals. With fixed intervals, the ASR could decipher 64.6% in the first try without any noise inserted. In the other side, the success rate is only 4.8%. From this data, we may conclude that our design of the Voice CAPTCHA should have this attribute (variable intervals). The noise level also has impact on the success rate & accuracy rate; the more noise is inserted, the more we are able to circumvent the ASR. The noise should be inserted intelligently keeping in mind of the human ability to distinguish between announced digits and the inserted noise. 67

80 Fig: 4.13 Success rate of Spinx deciphering CAPTCHAs 68

81 Chapter 5 Conclusion and Future Work 5.1 Conclusion This thesis has a primary focus on the design of an Anti-SPIT framework based on modular mechanism design. From our design, we can conclude that the framework is capable to detect SPIT that was simulated in the simulation. From the experiment, 43% of the SPIT has been terminated (cut-off) at the origin (caller s domain if the SPIT prevention framework is installed). Because of a poor design of the random algorithm, the spitter could guess the sum of the 2 randomized numbers that are going to be calculated. A new design of the Turing Test has been designed and tested with devoicecaptcha software to find out if the CAPTCHA is secure enough. CAPTCHA is considered breakable if the success rate for guessing it correctly is over 5%. From our experiment, devoicecaptcha could only crack our CAPTCHA by 3%. This makes our CAPTCHA a superior alternative for implementing for VoIP environment. As a spitter can determinate rate-limit, a new design of rate-limit algorithm should be designed. 5.2 Future Work Several modules should be improved. The Turing test could be redesigned using voice recognition. For further study, I would like to create a new scheme for protecting the callee for SPIT by using voice recognition. This could be done by having a database with the owner s name of all peers. This information should be exchanged securely between SIP providers. When a call is being suspected to be SPIT, it would trigger the alarm and send the call to the Turing test where the caller is being asked for the callee s name. If the caller knows the name of the callee, the caller is granted access to the VoIP network. The callee s name could be queried securely over a SSL protocol upon request. After the SIP provider of caller s domain receives 69

82 the information, it would then create a prompt asking for the name of the callee. This should be easy to implement and should be made as standard in SIP protocol. On the rate-limit module, a redesign of the algorithm should be done. A rate limit could be dynamically by randomizing the rate-limit. A new design of rate-limit could be using the algorithm described in [3]. PMG monitors call patterns from each caller and determines Spam over Internet Telephony (SPIT) based on the call patterns. The essence of this algorithm is that, as a caller attempts to make numerous calls through the call server in a certain time span, his/her gray level will increase, thus classifying the caller as a potential spam source. Once the gray level becomes higher than the given spam threshold, the caller will not be able to make any more calls. When the spam caller stops initiating SPIT within a certain time period, the gray level will decrease and eventually stay below the threshold. The number of concurrent calls might get higher if the framework is tested with a more robust SIP proxy. This could be implemented on OpenSER or SER to find out the maximum concurrent calls the framework could support in a real SIP proxy 70

83 References [1] R. Dantu and P. Kolan, "Detecting spam in VoIP networks," presented at the Proceedings of the Steps to Reducing Unwanted Traffic on the Internet on Steps to Reducing Unwanted Traffic on the Internet Workshop, Cambridge, MA, [2] R. MacIntosh and D. Vinokurov, "Detection and mitigation of spam in IP telephony networks using signaling protocol analysis," in Advances in Wired and Wireless Communication, 2005 IEEE/Sarnoff Symposium on, 2005, pp [3] S. Dongwook, et al., "Progressive multi gray-leveling: a voice spam protection algorithm," Network, IEEE, vol. 20, pp , [4] Y. Rebahi, et al., "SIP Spam Detection," in Digital Telecommunications,, ICDT '06. International Conference on, 2006, pp [5] R. Schlegel, et al., "ISE03-2: SPam over Internet Telephony (SPIT) Prevention Framework," in Global Telecommunications Conference, GLOBECOM '06. IEEE, 2006, pp [6] S. Dritsas, et al., "Threat Analysis of the Session Initiation Protocol Regarding Spam," in Performance, Computing, and Communications Conference, IPCCC IEEE Internationa, 2007, pp [7] G. F. Marias, et al., "SIP Vulnerabilities and Anti-SPIT Mechanisms Assessment," in Computer Communications and Networks, ICCCN Proceedings of 16th International Conference on, 2007, pp [8] J. Quittek, et al., "Detecting SPIT Calls by Checking Human Communication Patterns," in Communications, ICC '07. IEEE International Conference on, 2007, pp

84 [9] A. Ruiz-Martinez, et al., "SIP extensions to support (micro)payments," in Advanced Information Networking and Applications, AINA '07. 21st International Conference on, 2007, pp [10] J. Ahn, et al., "Enhancing the Blockage of Spam over Internet Telephony (SPIT) Using Adaptive PMG Algorithm," ed, 2008, pp [11] D. Gritzalis and Y. Mallios, "A SIP-oriented SPIT Management Framework," Computers & Security, vol. 27, pp , [12] B. Mathieu, et al., "SDRS: A Voice-over-IP Spam Detection and Reaction System," Security & Privacy, IEEE, vol. 6, pp , [13] P. Patankar, et al., "Exploring Anti-Spam Models in Large Scale VoIP Systems," in Distributed Computing Systems, ICDCS '08. The 28th International Conference on, 2008, pp [14] Y. Rebahi and C. S. Farres, "Spam over Internet telephony: Prototype implementation," in Telecommunications, ICT International Conference on, 2008, pp [15] P. So Young and K. Shin Gak, "Labeling System for Countering SIP spam," in Advanced Communication Technology, ICACT th International Conference on, 2008, pp [16] Y. Soupionis, et al., "An Adaptive Policy-Based Approach to SPIT Management," ed, 2008, pp [17] R. Dantu, et al., "Issues and challenges in securing VoIP," Computers & Security, vol. In Press, Corrected Proof, [18] S. Dritsas, et al., "OntoSPIT: SPIT management through ontologies," Computer Communications, vol. 32, pp , [19] M. Hirschbichler, et al., "Using SPAM DNS Blacklists for Qualifying the SPAM-over-Internet-Telephony Probability of a SIP Call," in Digital Society, ICDS '09. Third International Conference on, 2009, pp [20] H. Huang, et al., "A SPIT Detection Method Using Voice Activity Analysis," in Multimedia Information Networking and Security, MINES '09. International Conference on, 2009, pp

85 [21] J. Jeong, et al., "VoIP SPAM Response System Adopting Multi-leveled Anti- SPIT Solutions," ed, 2009, pp [22] T. Kusumoto, et al., "Using Call Patterns to Detect Unwanted Communication Callers," in Applications and the Internet, SAINT '09. Ninth Annual International Symposium on, 2009, pp [23] F. Menna, et al., "Simulation of SPIT Filtering: Quantitative Evaluation of Parameter Tuning," in Communications, ICC '09. IEEE International Conference on, 2009, pp [24] S. Phithakkitnukoon and R. Dantu, "Defense against SPIT using community signals," in Intelligence and Security Informatics, ISI '09. IEEE International Conference on, 2009, pp [25] F. J. Puente, et al., "Anti-Spit Mechanism Based on Identity SIP," in New Trends in Information and Service Science, NISS '09. International Conference on, 2009, pp [26] C. Sorge and J. Seedorf, "A Provider-Level Reputation System for Assessing the Quality of SPIT Mitigation Algorithms," in Communications, ICC '09. IEEE International Conference on, 2009, pp [27] Y. Soupionis and D. Gritzalis, "Audio CAPTCHA: Existing solutions assessment and a new implementation for VoIP telephony," Computers & Security, vol. In Press, Corrected Proof, [28] Y. Soupionis, et al., "Audio CAPTCHA for SIP-Based VoIP," ed, 2009, pp [29] A. Sperotto, et al., "Detecting Spam at the Network Level," ed, 2009, pp [30] L. TaiJin, et al., "User reputation based VoIP spam defense architecture," in Information Networking, ICOIN International Conference on, 2009, pp [31] B. Yan, et al., "Detection and filtering Spam over Internet Telephony -- a userbehavior-aware intermediate-network-based approach," in Multimedia and 73

86 Expo, ICME IEEE International Conference on, 2009, pp [32] B. Yan, et al., "Adaptive Voice Spam Control with User Behavior Analysis," in High Performance Computing and Communications, HPCC '09. 11th IEEE International Conference on, 2009, pp [33] S. Dritsas and D. Gritzalis, "An Ontology-Driven antispit Architecture," ed, 2010, pp [34] International Telecommunication Union, Network grade of service parameters and target values for circuit-switched services in the evolving isdn, Recommendation E.721, Telecommunication Standardization Sector of ITU, Geneva, Switzerland, May [35] United Nations. (2009). Human Development Report Retrieved September 23, 2010 from: [36] Richi Jennings, Cost of Spam is Flattening Our 2009 Predictions, [cited 2010 Oct 01], Available HTTP: [37] J. Rosenberg, et al. SIP: Session Initiation Protocol". RFC June [38] Luis von Ahn, Manuel Blum, and John Langford, Telling Humans and Computer Apart (Automatically) or How Lazy Cryptographers do AI, to appear in Communications of the ACM [39] Mikhail M. Bongard. (1970). Pattern Recognition. Rochelle Park, N.J.: Hayden Book Co., Spartan Books. [40] Y. Rui, Z. Liu, ARTiFACIAL: Automated Reverse Turing test using FACIAL features, International Multimedia Conference Proceedings of the eleventh ACM international conference on Multimedia 2003, Berkeley, CA, USA, 2003, pp [41] Ahmedy, I.; Portmann, M.;, "Using Captchas to Mitigate the VoIP Spam Problem," Computer Research and Development, 2010 Second International 74

87 Conference on, vol., no., pp , 7-10 May 2010 [42] Markkola, A; Lindqvist, J.;, "Accessible Voice CAPTCHAs for Internet Telephony," Internet: [Nov.29,2010]. 75

88 Appendices Appendix A Source code A.1 DBLOGGER #! /usr/bin/perl # load module use DBI; use Asterisk::AGI; use Time::HiRes; $AGI = new Asterisk::AGI; my $start = [ Time::HiRes::gettimeofday( ) ]; # connect my $dbh = DBI->connect("DBI:mysql:database=logger;host=localhost", "root", "mysqlmysql*2", {'RaiseError' => 1}); # execute INSERT query my $rows = $dbh->do("insert INTO access (IP, datetime) VALUES ('$ARGV[0]', NOW())"); print "$rows row(s) affected\n"; # clean up $dbh->disconnect(); my $elapsed = Time::HiRes::tv_interval( $start ); my %input = $AGI->ReadParse(); my $uniqueid = $input{'uniqueid'}; $AGI->verbose("$uniqueid Elapsed time: $elapsed seconds!"); A.2 White list module #! /usr/bin/perl # load module use Asterisk::AGI; use DBI; use Time::HiRes; my $start = [ Time::HiRes::gettimeofday( ) ]; $AGI = new Asterisk::AGI; 76

89 my $IP=$ARGV[0]; # connect my $dbh = DBI->connect("DBI:mysql:database=logger;host=localhost", "root", "mysqlmysql*2", {'RaiseError' => 1}); # execute SELECT query my $sth = $dbh->prepare("select COUNT(*) FROM whitelist WHERE IP='$IP'"); $sth->execute(); # iterate through resultset # print values # Read the results of the query, then clean up. my $count = $sth->fetchrow_array(); $sth->finish(); # clean up $dbh->disconnect(); $count = "(Couldn't obtain count)" if!defined($count); print $count; if ($count >= 1 ){ $AGI->verbose("NOT SPAM!"); $AGI->exec('goto', 'dial'); print " NOT SPAM!\n"; } else {print " SPAM\n";} my $elapsed = Time::HiRes::tv_interval( $start ); my %input = $AGI->ReadParse(); my $uniqueid = $input{'uniqueid'}; $AGI->verbose("$uniqueid Elapsed time: $elapsed seconds!"); A.3 Black list module #! /usr/bin/perl # load module use Asterisk::AGI; use DBI; $AGI = new Asterisk::AGI; use Time::HiRes; my $start = [ Time::HiRes::gettimeofday( ) ]; my $IP=$ARGV[0]; # connect my $dbh = DBI->connect("DBI:mysql:database=logger;host=localhost", "root", "mysqlmysql*2", {'RaiseError' => 1}); # execute SELECT query my $sth = $dbh->prepare("select COUNT(*) FROM blacklist WHERE IP='$IP'"); $sth->execute(); 77

90 # iterate through resultset # print values # Read the results of the query, then clean up. my $count = $sth->fetchrow_array(); $sth->finish(); # clean up $dbh->disconnect(); $count = "(Couldn't obtain count)" if!defined($count); print $count; if ($count >= 1 ){ $AGI->verbose("Black Listed!"); $AGI->hangup(); print " NOT SPAM!\n"; } else {print " SPAM\n";} my $elapsed = Time::HiRes::tv_interval( $start ); my %input = $AGI->ReadParse(); my $uniqueid = $input{'uniqueid'}; $AGI->verbose("$uniqueid Elapsed time: $elapsed seconds!"); A.4 DNSBL module #! /usr/bin/perl use Net::IP; use Net::DNS; use Asterisk::AGI; use Time::HiRes; my $start = [ Time::HiRes::gettimeofday( ) ]; $AGI = new Asterisk::AGI; my $DNSBL ="sbl.spamhaus.org"; my $res = Net::DNS::Resolver->new; my $ip = new Net::IP($ARGV[0],4); my $ip2 = $ip->reverse_ip(); $ip2 =~ tr/./,/; = split(',', $ip2); my $ip4 = "@values[0].@values[1].@values[2].@values[3].$dnsbl"; my $query = $res->search("$ip4"); if ($query) { foreach my $rr ($query->answer) { next unless $rr->type eq "A"; print $rr->address, "\n"; $returns = $rr->address; $AGI->verbose("SPAM - $returns"); } # $AGI->exec('goto 1000,1'); $AGI->hangup(); } else { #warn "query failed: ", $res->errorstring, "\n"; $AGI->verbose("NOT SPAM!"); 78

91 } my $elapsed = Time::HiRes::tv_interval( $start ); my %input = $AGI->ReadParse(); my $uniqueid = $input{'uniqueid'}; $AGI->verbose("$uniqueid Elapsed time: $elapsed seconds!"); A.5 Rate-limit module #! /usr/bin/perl # load module use DBI; #use strict; use Asterisk::AGI; use Time::HiRes; my $start = [ Time::HiRes::gettimeofday( ) ]; $AGI = new Asterisk::AGI; my $config_file = '/usr/share/asterisk/agi-bin/config.ini'; open CONFIG, "$config_file" or die "Program stopping, couldn't open the configuration file '$config_file'.\n"; my $config = join "", <CONFIG>; close CONFIG; eval $config; die "Couldn't interpret the configuration file ($config_file) that was given.\nerror details follow: $@\n" if $@; my $IP=$ARGV[0]; my $SPAM=1; my $current_time = `date +"%Y%m%d%H%M%S"`; my $previous_time = `date +"%Y%m%d%H%M%S" -d "$mins minutes ago"`; my $current_time = substr $current_time, 0, 14; my $previous_time= substr $previous_time, 0, 14; # connect my $dbh = DBI->connect("DBI:mysql:database=logger;host=localhost", "root", "mysqlmysql*2", {'RaiseError' => 1}); # execute SELECT query my $sth = $dbh->prepare("select COUNT(*) FROM access WHERE IP='$IP' AND datetime >= '$previous_time' AND datetime < '$current _time'"); $sth->execute(); # iterate through resultset # print values 79

92 # Read the results of the query, then clean up. my $count = $sth->fetchrow_array(); $sth->finish(); # clean up $dbh->disconnect(); $count = "(Couldn't obtain count)" if!defined($count); #print $count; print $previous_time; print "\n$current_time"; if ($count < $rate_limit) { $AGI->verbose("NOT SPAM!!!"); } else { $AGI->verbose("SPAM!!!"); $AGI->set_variable('SPAM', $SPAM); #print " SPAM\n";} } my $elapsed = Time::HiRes::tv_interval( $start ); my %input = $AGI->ReadParse(); my $uniqueid = $input{'uniqueid'}; $AGI->verbose("$uniqueid Elapsed time: $elapsed seconds!"); A.6 Rand.pl for Turing test module #!/usr/bin/perl #use strict; use warnings; use Asterisk::AGI; $AGI = new Asterisk::AGI; my $range = 9; do { $random_number1 = int(rand($range)); $random_number2 = int(rand($range)); $sum = $random_number1+$random_number2; #print $random_number1. "\n". $random_number2."\n"; } while ($sum >9); print $random_number1. "\n". $random_number2."\n"; $AGI->verbose("$ARGV[0]"); $AGI->set_variable('var1', $random_number1); $AGI->set_variable('var2', $random_number2); system("sox", "/usr/share/asterisk/agi-bin/sounds/how-much-is.wav", "/usr/share/asterisk/agibin/sounds/$random_number1.wav ", "/usr/share/asterisk/agi-bin/sounds/plus.wav", "/usr/share/asterisk/agibin/sounds/$random_number2.wav", "/tmp/$argv[0].wa v"); 80

93 $mix = `sox -m -v 1 /tmp/$argv[0].wav -v -0.3 /usr/share/asterisk/agibin/sounds/mozartoutnew.wav /tmp/$argv[0]-3.wav`; $output= `sox /tmp/$argv[0]-3.wav -r 8000 /tmp/$argv[0]-2.wav resample -ql`; $AGI->verbose("$ARGV[0]"); exit(); A.7 Resetter.pl for Rate-limit module #! /usr/bin/perl # load module use DBI; #use strict; my $config_file = '/usr/share/asterisk/agi-bin/config.ini'; open CONFIG, "$config_file" or die "Program stopping, couldn't open the configuration file '$config_file'.\n"; my $config = join "", <CONFIG>; close CONFIG; eval $config; die "Couldn't interpret the configuration file ($config_file) that was given.\nerror details follow: $@\n" if $@; my $IP=$ARGV[0]; my $SPAM=1; my $current_time = `date +"%Y%m%d%H%M%S"`; my $previous_time = `date +"%Y%m%d%H%M%S" -d "$mins minutes ago"`; my $current_time = substr $current_time, 0, 14; my $previous_time= substr $previous_time, 0, 14; # connect my $dbh = DBI->connect("DBI:mysql:database=logger;host=localhost", "root", "mysqlmysql*2", {'RaiseError' => 1}); # execute SELECT query my $sth = $dbh->prepare("delete FROM access WHERE IP='$IP' AND datetime >= '$previous_time' AND datetime < '$current_time'"); $sth->execute(); # iterate through resultset # print values # Read the results of the query, then clean up. #my $count = $sth->fetchrow_array(); $sth->finish(); # clean up $dbh->disconnect(); A.8 Asterisk dialplan for Turing test module [macro-turingtest] exten => s,1,set(var1=0) exten => s,n,set(var2=0) 81

94 exten => s,n,agi(rand.pl,${uniqueid}) exten => s,n,set(attempt=0) exten => s,n,set(max_attempt=3) exten => s,n,playback(/usr/share/asterisk/agi-bin/sounds/on-the-turing-test) exten => s,n(start),noop(${attempt}) exten => s,n,gotoif($[${attempt} = ${MAX_ATTEMPT}]?dial3) exten => s,n,playback(/tmp/${uniqueid}-2) exten => s,n(collect),set(attempt=$[${attempt} + 1]) exten => s,n(collect),read(digit,,1) exten => s,n,gotoif($["${digit}" = ""]?start) exten => s,n,saynumber(${digit}) exten => s,n,set(i=${math(${var1}+${var2},int)}) exten => s,n,noop(${attempt}) exten => s,n,gotoif($[${i} = ${digit}]?dial1:dial2) exten => s,n(dial1),noop("dial1") exten => s,n(dial1),playback(/usr/share/asterisk/agi-bin/sounds/thank-you) exten => s,n(dial1),agi(resetter.pl,${ip}) exten => s,n(dial1),goto(demo,${arg2},1) exten => s,n(dial1),hangup exten => s,n(dial2),noop("dial2 mismatch") exten => s,n(dial2),playback(/usr/share/asterisk/agi-bin/sounds/wrong-answer) exten => s,n(dial2),goto(start) exten => s,n(dial3),noop("dial3 HANGUP") exten => s,n(dial3),hangup A.9 CAPTCHA creation script for enhanced Turing test module #!/usr/bin/perl #use strict; use warnings; use Switch; # use Asterisk::AGI; # $AGI = new Asterisk::AGI; my $range = 10; my $announcer=5; my $wav_dir = "/usr/share/asterisk/agi-bin/spanish/"; #new method!!! # we start off by adding the filenames into an array open( FILE, "spanish/wav.lst" ) or die "Can't open $filename : $!"; while( <FILE> ) { chomp; $count++; $wavs[$count] = $_; 82

95 } close FILE; # end reading to array $wav_size #step 1 $announcer_one=int(rand($announcer)+1); $announcer_two=int(rand($announcer))+1; $announcer_three=int(rand($announcer))+1; #step2 $digit_one=int(rand($range)); $digit_two=int(rand($range)); $digit_three=int(rand($range)); $tall="$digit_one$digit_two$digit_three"; #step3 $first_block = int(rand(3)); $second_block = int(rand(3)); $third_block = int(rand(3)); $fourth_block = int(rand(3)); #step4 (process each block) switch ($first_block) { case 0 {$create_white_noise = `sox /usr/share/asterisk/agi-bin/sounds/silence_16khz.wav /tmp/$argv[0]_b1.wav trim 0 0`; } case 1 {$rand_file=$wavs[int(rand($wav_size)+1)]; print $rand_file; system("sox", "$wav_dir$rand_file", "/tmp/$argv[0]_b1.wav"); } case 2 {$rand_file1=$wavs[int(rand($wav_size)+1)]; $rand_file2=$wavs[int(rand($wav_size)+1)]; system("sox", "$wav_dir$rand_file1", "$wav_dir$rand_file2", "/tmp/$argv[0]_b1.wav"); } } switch ($second_block) { 83

96 case 0 {$create_white_noise = `sox /usr/share/asterisk/agi-bin/sounds/silence_16khz.wav /tmp/$argv[0]_b2.wav trim 0 0`; } case 1 {$rand_file=$wavs[int(rand($wav_size)+1)]; print $rand_file; system("sox", "$wav_dir$rand_file", "/tmp/$argv[0]_b2.wav"); } case 2 {$rand_file1=$wavs[int(rand($wav_size)+1)]; $rand_file2=$wavs[int(rand($wav_size)+1)]; system("sox", "$wav_dir$rand_file1", "$wav_dir$rand_file2", "/tmp/$argv[0]_b2.wav"); } } switch ($third_block) { case 0 {$create_white_noise = `sox /usr/share/asterisk/agi-bin/sounds/silence_16khz.wav /tmp/$argv[0]_b3.wav trim 0 0`; } case 1 {$rand_file=$wavs[int(rand($wav_size)+1)]; print $rand_file; system("sox", "$wav_dir$rand_file", "/tmp/$argv[0]_b3.wav"); } case 2 {$rand_file1=$wavs[int(rand($wav_size)+1)]; $rand_file2=$wavs[int(rand($wav_size)+1)]; system("sox", "$wav_dir$rand_file1", "$wav_dir$rand_file2", "/tmp/$argv[0]_b3.wav"); } } switch ($fourth_block) { case 0 {$create_white_noise = `sox /usr/share/asterisk/agi-bin/sounds/silence_16khz.wav /tmp/$argv[0]_b4.wav trim 0 0`; } case 1 {$rand_file=$wavs[int(rand($wav_size)+1)]; print $rand_file; system("sox", "$wav_dir$rand_file", "/tmp/$argv[0]_b4.wav"); } case 2 {$rand_file1=$wavs[int(rand($wav_size)+1)]; $rand_file2=$wavs[int(rand($wav_size)+1)]; system("sox", "$wav_dir$rand_file1", "$wav_dir$rand_file2", "/tmp/$argv[0]_b4.wav"); } } #end step4 (process each block) 84

97 #join wav files = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, = (7.0, 20.5, 10.0, 22.6, 33.0, 44.4, 55.2, 26.6, 13.6, 15.8, 50.0); $size $beginning = $start_sec[int(rand($size))]; #$end = 3.0; #$end_test = `perl /usr/share/asterisk/agi-bin/extract_sound_length.pl /tmp/ wav`; #@start_sec = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, = (0.1, 0.3, 0.5, 0.8, 1.0, 1.3, 1.5); $silence_size $silence_beginning = 0.0; #$silence_end =$silence_sec[int(rand($silence_size))]; <--- silence_end is added in the create silince section #$end_test = `perl /usr/share/asterisk/agi-bin/extract_sound_length.pl /tmp/ wav`; print $random_number1. "\n". $random_number2."\n"; print $announcer1. "\n". $announcer2. "\n". $announcer3."\n"; print $digit1. "\n". $digit2. "\n". $digit3."\n"; #create normal announcement for training #system("sox", "/usr/share/asterisk/agi-bin/sounds/announcer/$announcer_one/$digit_one.wav", "/usr/share/asterisk/agi-bin/sounds/announcer/$announcer_two/$digit_two.wav", "/usr/share/asterisk/agi-bin/sounds/announcer/$announcer_three/$ digit_three.wav", "/tmp/train/$digit_one$digit_two$digit_three.wav"); # end create normal announcement for training system("sox", "/tmp/$argv[0]_b1.wav", "/usr/share/asterisk/agibin/sounds/announcer/$announcer_one/$digit_one.wav", "/tmp/$argv[0]_b2.wav", "/usr/share/asterisk/agi-bin/sounds/announcer/$announcer_two/$digit_two.wav", "/tmp/$argv[0]_b3.wav", "/usr/share/asterisk/agi-bin/sounds/announcer/$announcer_three/$digit_three.wav", "/tmp/$argv[0]_b4.wav", "/tmp/$argv[0].wav"); $end = `perl /usr/share/asterisk/agi-bin/extract_sound_length.pl /tmp/$argv[0].wav`; 85

98 #$trim = `sox /usr/share/asterisk/agi-bin/sounds/mozartout2.wav /tmp/$argv[0]_trim.wav trim $beginning $end`; $trim = `sox /usr/share/asterisk/agi-bin/sounds/chant_16khz.wav /tmp/$argv[0]_trim.wav trim $beginning $end`; $mix = `sox -m -v 1 /tmp/$argv[0].wav -v -0.3 /tmp/$argv[0]_trim.wav /tmp/wavs/$digit_one$digit_two$digit_three.wav`; $output= `sox /tmp/$argv[0]-3.wav -r 8000 /tmp/$argv[0]-2.wav resample -ql`; #$output= `sox /tmp/$argv[0].wav -r 8000 /tmp/$argv[0]-2.wav resample -ql`; #$AGI->verbose("$ARGV[0] $digit_one $digit_two $digit_three $end_test"); print $first_block; exit(); A.10 att_sound.pl #!/usr/bin/perl # Script to emulate a browser for posting to a # CGI program with method="post". # Specify the URL of the page to post to. my $URLtoPostTo = " # Specify the information to post, the form field name on # the left of the => symbol and the value on the right. my %Fields = ( "voice" => "alberto", "txt" => "$ARGV[0]", "downloadbutton" => "DOWNLOAD", ); # As seen above, "@" must be escaped when quoted. # If you want to specify a browser name, # do so between the quotation marks. # Otherwise, nothing between the quotes. my $BrowserName = "This Be Mine"; # It's a good habit to always use the strict module. #use strict; 86

99 # Modules with routines for making the browser. use LWP::UserAgent; use # Create the browser that will post the information. my $Browser = new LWP::UserAgent; # Insert the browser name, if specified. if($browsername) { $Browser->agent($BrowserName); } # Post the information to the CGI program. my $Page = $Browser->request(POST $URLtoPostTo,\%Fields); # Print the returned page (or an error message). print "Content-type: text/html\n\n"; if ($Page->is_success) { $text = $Page->content; } else { print $Page->message; } # end of script #print foreach $line (@test){ chomp $line; next if $line =~ /^\s*$/; # skip empty lines $text2 = $line if $line =~ m/<a/; $count++; } #$text =~s/[^\d.]//g; #find download link; messy my $result = rindex($text2, 'HREF'); $portion = substr($text2,$result, 200); my $result = rindex($portion, '"'); $portion = substr($portion,$result+2, 200); $ending = rindex($portion,'</a></p>'); $portion = substr($portion,0,$ending); #end find download link 87

100 $download_link = " $download= `wget $download_link -O $ARGV[0].wav`; print " A.11 accuracy_generator.pl #!/usr/bin/perl #open file open( FILE, "< $ARGV[0]" ) or die "Can't open $filename : $!"; while( <FILE> ) { chomp; $first = substr($_,0,3); $second= substr($_,4,3); #make a array of = split '', = split '', $second; #end make a array of char $substitute=0; #check for substitute if ($array1[0] ne $array2[0]) {$substitute++; } if ($array1[1] ne $array2[1]) {$substitute++; } if ($array1[2] ne $array2[2]) {$substitute++; } #end check for substitute $deletion=0; #check for deletion $deletion = 3 - length($second); #end check for deletion #calculate accuracy $N = length($first); $D = $deletion; $S = $substitute; $accuracy = ($N-$D-$S)/$N; if ($accuracy < 0) {$accuracy =0;} #end calculate accuracy print "$first $second $substitute $deletion $accuracy\n"; #if ($first eq $second) {$count++; print $first." $count\n";} $count++; 88

101 } $total_accuracy=$total_accuracy+$accuracy; close FILE; $accuracy = ($total_accuracy/$count)*100; print "accuracy = $accuracy %\n"; #end open file A.12 extract_sound_length.pl #!/usr/bin/perl $ARGV[0] -n stat 2>&1`; foreach $line (@info){ chomp $line; next if $line =~ /^\s*$/; # skip empty lines $text = $line if $line =~ m/seconds/; } $text =~s/[^\d.]//g; print "$text\n"; A.13 java_transcriber.pl #!/usr/bin/perl -jar /home/alx/sphinx4/bin/transcriber.jar $ARGV[0]`; foreach $line (@info){ chomp $line; next if $line =~ /^\s*$/; # skip empty lines $text = $line; } #$text =~s/[^\d.]//g; #print = split(/ /, $text); $size #print "$size\n"; $file_name=substr($argv[0],13,3); #print $file_name; 89

102 $sitedata="noise10.txt"; open(dat,">>$sitedata") die("cannot Open File"); print DAT "$file_name:"; for ($i=0;$i<$size;$i++) { # if (@digits[$i] =~ m/nine/i) {print DAT "9";} if (@digits[$i] =~ m/one/i) {print DAT "1";} if (@digits[$i] =~ m/seven/i) {print DAT "7";} if (@digits[$i] =~ m/two/i) {print DAT "2";} if (@digits[$i] =~ m/three/i) {print DAT "3";} if (@digits[$i] =~ m/four/i) {print DAT "4";} if (@digits[$i] =~ m/five/i) {print DAT "5";} if (@digits[$i] =~ m/six/i) {print DAT "6";} if (@digits[$i] =~ m/zero/i) {print DAT "0";} if (@digits[$i] =~ m/oh/i) {print DAT "0";} if (@digits[$i] =~ m/eight/i) {print DAT "8";} } print DAT "\n"; close(dat); Appendix B Call Simulation Call simulation has a highly extensive file size, and for that reason the simulation is not presented here. The complete simulation is available on the enclosed CD-ROM. On the CD are also the papers that have been published. 90

103 A VOIP anti-spam System based on Audio Turing Test Server Alexander J. Johansen and Woraphon Lilakiatsakun Faculty of Information Technology Mahanakorn University of Technology Abstract- Internet telephony has gained much popularity as an easy way to communicate and make a call to distant places via internet. It provides a feature of free telephone calls to be made with their software anywhere in the world. Nowadays, many internet applications supporting internet telephony are available like Skype, NetMeeting, CoolTalk etc. These internet telephony products are sometimes called VOIP products. In this paper we demonstrate the use of an audio Turing test server to identify the spam voice calls and block them from entering the network. The test server recognizes the spam voice with the help of a challenge token and challenge key. It fights against spam before the problem of unsolicited voice calls escalates to the same level of spam. The result of our experiment with the turing test machine showed that our machine is capable to recognize between human call and system generated call and can filter the call based on a mathematical calculation performed by the caller. described by Alan Turing in 1950[7]. It is a test of the machine capability to perform human-like conversations. The test assumes a human judge is engaged in a natural language conversation of two parties i.e. a human and a machine. If the judge is not able to identify which party is human and which party is machine, then that machine is considered passing the turing test. If the judge in the reverse turing test is replaced from human to a machine which will automatically generate and judge the challenge questions, then this type of computer administrated reverse turing test is called CAPTCHA (Completely Automated Public Turing Test to Tell Computer and Humans Apart). A CAPTCHA initially introduced by Luis von Ahn et. al. in 2000, is a computer program that can generate and grade tests that: (a) most human can pass, but (b) current computer programs cannot pass [2][3]. We have demonstrated and given the detailed design of our ATTS (Audio Turing Test Server) in the system design section. I. INTRODUCTION Voice spam also called SPIT (Spam over Internet Telephony) is unsolicited bulk messages transfer to phones via internet and broadcasted through VOIP [5]. The contents delivered could be voice, images or videos. Various kind of voice spam is advertisement, telephone poll and telemarketing. SPIT is more obtrusive compared to SPAM of messages as we can read the whenever we have time however the voice spam can disturb us anytime even in the midnight and thus disturbs our daily activities. Also the use of VoIP is cheaper with less monthly fee and cost per call thus the spammers find it easy and cheaper to send their messages out in the world [8]. To identify SPIT calls we have introduced here an audio Turing test server machine, which will identify these calls with the help of a reverse Turing test. We have placed this server in between the telephone and the VoIP server so that every call passes through it and only the call which passes the challenge test can reach the VoIP server and thus the callee. The reverse turing test as we are using here is initially II. PROBLEM STATEMENT The use of VOIP is growing fast and the problem of SPIT is likely to enlarge in future as more and more people and companies are switching to VOIP from the traditional telephone networks. According to an estimate, 25% of the Western European households switched to VOIP from PSTN (Public Switched Telephone Network). Thus with the growth of VoIP communication its misuse will also grow and advertisers will take advantage of this facility by sending numerous messages to VoIP users. VoIP spam will put more strain on networks as compared to spam as the message size of VoIP is 10 times larger than the messages. As the SPIT calls are increasing in number with more of the advertising companies and the call centers are using this technique frequently to send a bulk of messages to the people connected over the Internet. There is a need for a solution to avoid such calls as they disturb the daily activities of the callee. Keeping in view these needs we have provided here a solution in the form ATTS (Audio Turing test server) which can be implemented in the form of an application server in

104 between the telephone device and the VoIP network and are capable to identify the system generated calls and human calls. Thus this server will help to pass only authenticated calls to the user. We will see its design and implementation in the further sections. III. RELATED WORK VoIP systems like other and text based application are susceptible to abuse by malicious parties. SPIT is somewhat common to SPAM of s as both of them contains a bulk of malicious messages and disturbance to work. As the problem of SPAM is not a new one, so many models are available in the market to prevent these. One such model is Bayesian model which can filter the SPAM messages based on some predefined words called content filtering and thus can prevent SPAM mails. But the same method cannot be applied to VoIP applications because VoIP spam is synchronous in nature and spam is asynchronous. So a research is on to create a Bayesian model for voice SPIT as well. Many techniques are developed in order to recognize SPIT and solve this problem. Some of these related methods are like the method of Blacklists which solves SPIT by blocking the call from the people who are blacklisted in their list. The method of white lists limits the call only from the listed users. However in these techniques it is difficult to identify the authenticated person at the first call. For that reverse turing test are applied in these approaches as well and the user who pass this test are added to the white lists, while those who fails are added to the blacklist. Now next time when the call is made the system will recognize that person from their lists. Also a voice detection system based on feedback from users is made where the black and the white list is made based upon based upon the user s feedback on the call. This algorithm calculates the reputation value through Bayesian learning. Another spam protection algorithm is Progressive Multi Gray- Leveling (PMG) method which calculates two gray levels to form black and white list. However only black and white lists are not sufficient for spam detection, reverse turing machine like CAPTCHA can be efficiently used to detect SPAM in VoIP. An approach of statistical filtering which is similar to Bayesian filtering is also used, however to find SPIT in real time environment is difficult using this technique. Thus we found that a lot of work related to SPIT prevention is in progress and our turing test server is one of these new inventions. IV. SYSTEM DETAILS The aim in this paper is to develop an anti-spit solution for preventing system generated automated-call which causes spit into VoIP networks. The author has chosen to use open source software to implement this system. The system has been designed to be easily integrated with productive VoIP networks using SIP protocol. A. System overview Audio Turing Test Server (ATTS) is a system which is capable to identify the difference between automated computers based call and a regular human phone call made by open source applications or VOIP applications. This is an authentication system which identifies the SPIT call using the CHALLENGE-RESPONSE key method and fight against the SPIT over VOIP network. We propose this method which will put authentication to every incoming call and allow only the valid human call to reach the recipient, thus filtering all the SPIT messages and giving a reliable method of voice data protection over VOIP networks. The system is a stand-alone server acting as an application server for a VoIP system. The incoming call is sent to the application server to launch the turing test application. If the caller does solve the test, the call is preceded to the next stage i.e. the call is forwarded to the VoIP network to the reach the receiver end (which is registered on PBX/SIP-SERVER) and if the caller fails the test the call is stopped. This process is demonstrated in the figure 1. SPIT call that is created as pre-recorded message will be terminated (hang up as no human is to be interacted with the system) at the application server and a SIP BYE message is sent to the originator (caller). Technically, the media stream could also be connected directly between each end-points bypassing the application server after the caller has solved the turing test. This can be done by using SIP RE-invite method. Since not all end-points support RE-invite, we have chosen not to do it. We can see the use of SIP server and the navigation of call to the callee through figure 2. The media stream (audio) will be bridged as packet2packet bridging after the recipient has answered the call. We don t use re-invite to native bridge RTP stream for compatibility issues. With Packet2Packet bridging, the audio will not go through the Asterisk core (base system used for application server) and it comes directly into the RTP stack and goes directly out. This decreases the amount of memory allocation that happens, and things require less processing [1]. B. Design and development The system has been developed on a Linux operating system with Asterisk PBX installed. We have chosen Asterisk because we don t want to be reinventing the wheel. The turing test application is a set of code written as Asterisk Macro. The macro is executed for each incoming call that is entering the system. The turing test application is simply an IVR (Interactive Voice Respond) application running on the Asterisk server. The application starts by challenging the

105 caller to make an addition of 2 numbers. By default, the caller will have max attempt of 3 tries. If the caller answers correctly, the call is then forwarded to the callee's destination. If the call fails the turing test 3 times (we can set how many times the caller may fail), we assume that the call is an automated call and in this case it is likely to be SPIT. SPIT calls will be terminated on this stage and will not be forwarded to the VoIP network. The flow of ATTS is depicted in the figure 3.Voice prompts for this system has been generated online at AT&T Labs Text-to-Speech demo site and converted to GSM format with sox. has to answer some questions and it increases the overhead for establishing the connection. Another limitation of the system could be that we know it prevent the VoIP system from the automated system generated calls but sometimes it could potentially block non-spit automated calls also like from banks, package delivery notifications, reverse 911, etc [6]. V. EXPERIMENTAL RESULTS So far we have seen the system design and flow of ATTS system. In this section we will look at the experimental research we have done to test the feasibility of this turing machine. VoIP SPIT automatic generation and testing system has been set up in our lab. The basic VoIP network consists of a PBX, a registered client machine and a spammer machine. The PBX is the open source software of Asterisk PBX; version beta3.The software provides voice mail over IP services in many protocols and is configured in SIP mode in our system. Registered phones can make calls to each other through Asterisk. Asterisk runs on various operating systems such as Linux, BSD and Mac OSX. In our case, it is Linux Fedora 5. Figure 1: System Overview of ATTS There are two types of basic configurations for Asterisk: SIP configuration and dial-plan configuration. Asterisk system s dial-plan routes every incoming and outgoing call in the system from its source through various applications to its final destination. In Asterisk, a dial plan is coded in the file of extensions.conf, which contains a collection of extensions. The basic syntax of an extension looks like: exten => name, priority, application() where name is the user name or phone number of a user, application() is applications or arguments to run, and priority refers to the order in which the series of application is executed. We have designed our algorithm using this configuration and that can be seen in the next section. Figure 2: System Design C. System Limitations Audio Turing test Server has many advantages as it can detect the SPAM over VoIP and can identify between automated call and human call. However this system has some limitations too. Due to the security check it applies, the caller A. Experimental Environment We have setup two scenario test beds. Scenario 1 is where the call is a human; the second scenario is where we use a SPIT generation tool to initiate the calls. In scenario 1, the author used a soft phone (X-Lite) on a Windows operating system to initiate the call to the system. The system is connected to the VoIP server (Asterisk PBX) with one user registered on the server. Test of this call is demonstrated in figure 5. In scenario 2, the author used a SPIT-generation tool called "spitter" that uses the open-source Asterisk IP PBX as the SPIT-generation platform. This system is also connected to the VoIP server (Asterisk PBX) with one user registered on the server.

106 exten => s,n(dial1),playback(/var/lib/asterisk/sounds/thank you) exten => exten => s,n(dial1),hangup exten => s,n(dial2),noop("dial2 mismatch") exten => s,n(dial2), Playback(/var/lib/asterisk/sounds/wrong- answer) exten => s,n(dial2),goto(start) exten => s,n(dial3),noop("dial3 HANGUP") exten => s,n(dial3),hangup This system uses SIP protocol to communicate but the different SIP service providers did not know the IP addresses of others, the caller need to initiate the session with an INVITE message to the target server. The caller s soft phone may have configuration of this server address or determined it by DHCP. We can see this in this initiation process in the figure 4. Figure 3: Flow of the ATTS System The code written for this machine is as follows: exten => s,1,set(var1=${rand(1,5)}) exten => s,n,set(var2=${rand(1,4)}) exten => s,n,set(attempt=0) exten => s,n,set(max_attempt=3) exten => s,n,playback(/var/lib/asterisk/sounds/on-theturing-test) exten => s,n(start),noop(${attempt}) exten => s,n,gotoif($[${attempt} = ${MAX_ATTEMPT}]?dial3) exten => s,n,playback(/var/lib/asterisk/sounds/how-much-is) exten => s,n,saydigits(${var1}) exten => s,n,playback(/var/lib/asterisk/sounds/adds-by) exten => s,n,saydigits(${var2}) exten => s,n(collect),set(attempt=$[${attempt} + 1]) exten => s,n(collect),read(digit,,1) exten => s,n,gotoif($["${digit}" = ""]?start) exten => s,n,saynumber(${digit}) exten => s,n,set(i=${math(${var1}+${var2},int)}) exten => s,n,noop(${attempt}) exten => s,n,gotoif($[${i} = ${digit}]?dial1:dial2) exten => s,n(dial1),noop("dial1") Figure 4: User-Proxy Server-User Session Initiation. B. Results and conclusions In scenario 1, all calls that we try to initiate, are connected with the callee after we have passed the turing test. In scenario 2, all calls were dropped at the turing test. The results shows that our audio test turing server system can distinguish between human call and computer generated automated call. This will remain true as long as the spammer doesn t use any voice-recognizer to listen to the question and try to recognize the numbers that are to be calculated in the turing test. The system is quite annoying for regular caller since the call must go through the turing test each time. To overcome this problem we can use white/black-list system instead where the name of unauthorized callers are blacklisted. The caller's information would be added in the list at the first time caller is attempting the turing test, further call from the same number wouldn t need to be tested again. The system would do this by checking the white/black-list before deciding if the call should get through or start the turing test.

found that the reverse turing test we do for authentication is not 100% reliable for detecting the calls when the caller uses voice recognizer.

107 found that the reverse turing test we do for authentication is not 100% reliable for detecting the calls when the caller uses voice recognizer. We can solve this problem by inserting some background noise into the prompts so that only human can pass the test. Moreover, this is not the end, many future work are going on in this field. Some of these works include detection of spam in videos, push to-talk and instant message sessions. Also further enhancement to the turing test can be done to make it for effective for SPAM detection. Like good future study would be to implement an anti-spam system including both diversified reverse Turing Tests and black & white lists. ACKNOWLEDGMENT I would like to thank Dr. Woraphon Lilakiatsakun for his invaluable advice on conducting this work. I would also like to thank my family and friends for their emotional support. REFERENCES Figure 5: Demonstrating a human call thru ATTS VI. CONCLUSIONS AND FUTURE WORK The experiment results which we have done supported the view that our audio turing test server is capable to distinguish between auto- generated and human calls and can thus prevent the SPIT calls to be passed to the callee. However we also [1]Hyperlink: March/ html [2] Luis von Ahn, Manuel Blum, and John Langford, Telling Humans and Computer Apart (Automatically) or How Lazy Cryptographers do AI, to appear incommunications of the ACM [3] Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford, CAPTCHA: Using Hard AI Problems For Security, in Proceedings of Eurocrypt 03 International Conference on the Theory and Applications of Cryptographic Techniques, LNCS 2656,pp , Springer-Verlag, Berlin Heidelberg, [4] Mobih.com. VoIP Voice Spam. 13 February [5] Searchunifiedcommunications.com Definitions. SPIT. 07 Mar < 86_gci ,00.html>. [6] S Felipe & Deshpande M. SPAM over IP Telephony (SPIT), Identification and prevention techniques. Georgia Institute of Technology. [7] Turing AM (1950), Computing Machinery and Intelligence, Mind, 59:236,pp [8] Vincent M. Quinten, Remco van de Meent and Aiko Pras. Lecture Notes in Computer Science. ISBN: Vol. 4606/2007, pg

Department of Computer Science. Burapha University 6 SIP (I)

Burapha University ก Department of Computer Science 6 SIP (I) Functionalities of SIP Network elements that might be used in the SIP network Structure of Request and Response SIP messages Other important