Designing a Software that Detect and Block Phishing Attacks 1 Priyanka R. Raut, 2 Samiksha Bharne Abstract Phishing is a significant security threat to the Internet, which causes tremendous economic lost every year. Phishing is a way of attempting to acquire sensitive information such as username, password, and credit card details by masquerading as a trustworthy entity in an electronic communication.this is similar to Fishing, where fisherman puts a bait at the hook, thus, pretending to be a genuine food for a fish. This paper present a new end-host based anti-phishing algorithm, TF-IDF, by utilizing the characteristics of the KEYWORD RETRIEVAL and LOGO-detection technique.our experiments verified that TF-IDF and LOGO detection is effective to detect and prevent both known and unknown phishing attacks. Keywords: Web-Browser, TF-IDF, LOGO, Phishing Detection. 1.INTRODUCTION 1 Department of Computer Science and Engineering Ballarpur Institute of Technology Ballashah(M.H.), India 2 Professor of Department of Computer Science and Engineering Ballarpur Institute of Technology Ballashah(M.H.), India The word Phishing initially emerged in 1990s. The early Hackers Often use ph to replace f to produce new words in the hacker s community, sincethey usually hack by phones. Phishing is a new word produced from fishing, it refers to the act that the attacker allure users to visit a faked website by sending them faked e-mails(or instant messages), and stealthily get victim s personal information such as user name, password, and national security ID etc.these information then can be used for future target advertisementsor even identity theft attacks(e.g.., transfer money from victim s bank account). The frequently used attack method is to send e-mails to potential victims, which seemed to be sent by banks, online organization, or ISPs. In some e-mails they will make up some causes, e.g.the password of your credit card has been mis-entered for many times, or they are providing upgrading services, to allure you to visit their websites to confirm or modify your account number and password through the hyperlink provided in the e-mail. You will then be linked to a counterfeited Website after clicking those links. The style, the functions performed, sometimes even the URL of these faked Websites are similar to the real Websites. Phishing itself is not a new concept, but it s increasingly used by phishers to steal users information and perform business crimes in recent years. According to Gartner Inc., for the 12 month ending April 2004, "there were 1.8 million phishing attack victims, and the fraud incurred byphishing victims totaled $1.2 billion" [1]. APWG provides a solution directory at (Anti-Phishing Working Group) [2] which contains most of the major anti-phishing companies in the world. However, an automatic anti-phishing method is seldom reported 2.BACKGROUND In an organization, different people hold information that can be considered sensitiveor else can be particularly used to outside parties.a phishing attacker will make use of non-technical(such as social engineering) methods to gain that information. Although financial gain is the major motivating factor for phishing, but phisher also target other factors like industrial espionage, malware distribution etc... Phishing attacks usually target: Bank information- such as VISA and PayPal accounts. Username and password information. Social security numbers. Mother maiden s name or other information which can be used to retrieve forgotten or lost credentials. Volume 5, Issue 1, January 2017 Page 20
The above information allows scammers to: Make fraudulent charges on your credit or debit card. Make use of your credentials on different online services, such as ebay, Amazon and others to commit crime without being caught ( making it appear as though you committed the criminal action). Various anti-phishing technique has been evolved to protect our website/link and personal information from phishing attacks.[1] List Based Approach is possibly the most straightforward solution for anti-phishing. A white list contains URL s of known legitimate sites. This anti- phishing result would generally deploy similarly as toolbars or extension of web browsers should remind those clients if they would scan a sheltered websites. Blacklist undergo from a window of vulnerability between the time a phishing site is launched and the site s addition to the blacklist as it requires frequent updating which is the case for white list also.[2] PhishZoo can detect current phishing sites if they look like legetimate sites by matching their content against a saved profile. In order to avoid detection, a phishing site must gaze fundamentally unique in relation to a genuine website.[3] PwdHash is a well-known anti-phishing solution in literature. It generates domain-specific passwords that are rendered unusable if they are submitted to another domain (e.g., a password for www.hotmail.com will be different if submitted to www.phisher.com).in comparison Antiphish takes an alternate methodology and stay with track about the place sensitive data is, no doubt submitted. That is, if it detect that confidential information such as a password is being entered into a form on a fake web site, a warning is generated and the pending operation is canceled. The main disadvantage of AntiPhish is that it requires user interaction to specify which sensitive information should be captured and monitored. 3.METHODOLOGY A.TheWebGuard Algorithm This WebGuard algorithm works first by analyzing the difference between the visual link and the actual link and then to reduce the false positives and negatives it works by calculating theweights of different terms and then comparing the phishing site with the legitimate site based on the highly scored terms.this approach also makes use of the TF-IDF and LOGO detection for detecting phishing sites. TF-IDF is a well known information retrieval algorithm that can be used for comparing and classifying documents, as well as retrieving documents from a large corpus. The Working of WebGuard The WebGuard algorithm works as follows: Step1:Init s main routine WebGuard, it first extracts the DNS names from the actual and visual links. It then compare the actual and visual DNS names. If their names are same, then this is in phishing of category1. Step2: It then check for dotted decimal. If dotted decimal IP address are directly used in actual DNS, then it is possible of phishing category2. Step3: Then if actual and visual link is encoded(categories 3 and 4), we first decode the links present in our database (whitelist and blacklist). Step4: When there is no destination information (DNS name or dotted IP address) in the visual link (category5). LinkGuard,therefore, handles all the 5 categories of phishing attacks. B.Functions of Subroutines In the subroutine AnalyzeDNS, if the actual DNS name is contained in the blacklist, then we are sure that it is a phishing attack. Similarly, if the actual DNS is contained in the whitelist, it is therefore not a phishing attack. If the actual DNS is not contained in either whitelist or blacklist, then Step5: Go for LOGO detection, which checks pixel to pixel characters. Step6: Calculate the TF-IDF scores of each term on that web page Step7: then generate a set by taking the five terms with highest TF-IDF weights Step8: then feed this set to a search engine, which in the case is Google Step9: If the domain name of the current web page matches the domain name of the N top search results, it will be considered a legitimate web site. Otherwise, it will be considered a phishing site.. 4.CATEGORIES OF HYPERLINKS In general the structure of the hyperlinks is as follows: <a href= URI > anchor text </a> Volume 5, Issue 1, January 2017 Page 21
where denotes for uniform resource identifier. URI basically provides the resource information about the hyperlink and the anchor text provides the information about the URI. Here the point tobe noted is we could only see the anchor text and the URI is hidden. So the phisher takes advantage of this point and succeeds in their mission. Let us call the URI in the hyperlink actual link and the anchor text the visible link. The following 5 categories of hyperlinks used by the phishers (according to APWG reports [1]) in the phishing attacks can be seen: Category1: The hyperlinks provide the DNS domain name in the visible link but the visible link in the hyperlink doesn t match with the real link. For example the hyperlink <a href= www.phishing.com >www.onlinesbi.com</a> seems to be linked to SBI online net banking but it is actually linked to a phishing site www.phishing.com. Category2: In place of DNS domain name, dotted IP address is used in the URI or anchor text. For example <a href=https://16.123.35.20>click here</a> Category3: Now-a-days attackers use encoded hyperlink to trick the users a. The DNS name in the hyperlink is encoded into their corresponding ASCII codes. Consider link<ahref=http://%34%2e%33%34%2e%31%39%35%2e%34%31%34%39%30%33/%6c%6 9%6E%64%65%78%2E%68%74%6D>www.onlinesbi.com</a>. Here it seems that it is linked to the online SBI site but it is actually linked to a phishing site http://14.34.195.41:34/l/index.htm. b. Phishers also use special characters (such as @ in the visible link) in order to make the users believe that the email is sent from some legitimate site. For instance consider the link http://www.amazon.com:fvthsgblijhtcs83infodate@69.10.142.134. It seems this is an legitimate Amazon site but is actually linked to the phishing site 69.10.142.134. Category4: Sometimes the hyperlink doesn t provide the destination information in its anchor text and uses DNS name in its URI. The DNS name in the URI is similar to some companies or organization. For instance consider the link <a href= http://www.sbionline.in/webscr.php?cmd=login >Click here to confirm your account</a>. This seems to be sent from SBI online, but is actually registered by the phisher to let the users believe that it has something to do with SBI. Category5: The attackers also utilize the vulnerabilities of the target website to redirect users to their phishing sites. For instance the following link: <ahref= http://passport.india.gov.in/dyredir.jsp?rdirl=http://200.251.251.101.verified/ >Click here</a>. Once clicked on the link this will redirect the user to the phishing site 200.251.251.10 due to vulnerability of passport.india.gov.in. Volume 5, Issue 1, January 2017 Page 22
Volume 5, Issue 1, January 2017 Page 23
5.Conclusion Phishing has become a severe problem of internet security. We propose a phishing web page detection method using the TF- IDF and LOGO detection technique. This approach works at the pixel level as well as text level of web pages, and can thus detect phishing web pages. Our experiments also show that our method can achieve satisfying classification precision and phishing recall, and the time efficiency of computation is acceptable for online users. References [1]. http://www.tartarus.org/~martin/porterstemmer/csharp2.txt [2]. The Anti-phishing working group, http://www.antiphishing.org [3]. Juan Chen and ChuanxiongGuo, Online Detection and Prevention of phishing attacks, IEEE conference,2009. [4]. Y. Zhang, J. Hong and L. Cranor, A Content based approach to detecting pharming websites in proceedings of international world wide web conference(www),2007. [5]. Georgina Stanley, Internet Security- Gone Phishing http://www.cyota.com/news.asp?id=114. [6]. Jonathan B.Postel, Simple Mail Transfer Protocol RFC821: http://www.ietf.org/rfc/rfc0821.txt [7]. TheHubPages,tomsum.hubpages.com/hub/What-are-Phishing-Emails [8]. C.E. Drake, J.J. Oliver, and E.J. Koontz.Anatomy of a Phishing Email. In Conference on E-mail and Anti-Spam, 1841 page Mill Road, Palo Alto, CA 94304, USA, 2010, MailFrontier, Inc. [9]. E. Kirda and C. Kruegel, Protecting users against phishing attacks, The Computer Journal, 2005. [10]. E. Kirda and C. Kruegel, Protecting users against phishing attacks, The Computer Journal, 2005. Volume 5, Issue 1, January 2017 Page 24