deseo: Combating Search-Result Poisoning Yu USF

Similar documents
Basics of SEO Published on: 20 September 2017

A web directory lists web sites by category and subcategory. Web directory entries are usually found and categorized by humans.

CLOAK OF VISIBILITY : DETECTING WHEN MACHINES BROWSE A DIFFERENT WEB

Security 08. Black Hat Search Engine Optimisation. SIFT Pty Ltd Australia. Paul Theriault

SEOHUNK INTERNATIONAL D-62, Basundhara Apt., Naharkanta, Hanspal, Bhubaneswar, India

CS47300 Web Information Search and Management

A PRACTICE BUILDERS white paper. 8 Ways to Improve SEO Ranking of Your Healthcare Website

You got a website. Now what?

Detecting Spam Web Pages

THE HISTORY & EVOLUTION OF SEARCH

Web Applications: Internet Search and Digital Preservation

Website Name. Project Code: # SEO Recommendations Report. Version: 1.0

What the is SEO? And how you can kick booty in the interwebs game

Search Enginge Optimization (SEO) Proposal

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures

SEO: HOW TO DRIVE MORE TRAFFIC TO YOUR WEBSITE

3Geeks Interactive, LLC PO Box Saint Petersburg, FL Tel: (727) Fax: (727)

Here are our Advanced SEO Packages designed to help your business succeed further.

SEO According to Google

ELEVATESEO. INTERNET TRAFFIC SALES TEAM PRODUCT INFOSHEETS. JUNE V1.0 WEBSITE RANKING STATS. Internet Traffic

Information Retrieval Spring Web retrieval

Gary Viray Founder, Search Opt Media Inc. Search.Rank.Convert.

WebSite Grade For : 97/100 (December 06, 2007)

5 Choosing keywords Initially choosing keywords Frequent and rare keywords Evaluating the competition rates of search

power up your business SEO (SEARCH ENGINE OPTIMISATION)

3 Media Web. Understanding SEO WHITEPAPER

SEO ISSUES FOUND ON YOUR SITE (MARCH 29, 2016)

Advanced SEO Training Details Call Us:

WEBSITES PUBLISHING. Website is published by uploading files on the remote server which is provided by the hosting company.

What Is Voice SEO and Why Should My Site Be Optimized For Voice Search?

Global Search Engine Optimization (SEO) Services.

Executed by Rocky Sir, tech Head Suven Consultants & Technology Pvt Ltd. seo.suven.net 1

Objective Explain concepts used to create websites.

Search Engine Optimization

Promoting Website CS 4640 Programming Languages for Web Applications

ANALYSIS ON OFF-PAGE SEO

SEO. Client Expectations

Flynax SEO Guide Flynax

OnCrawl Metrics. What SEO indicators do we analyze for you? Dig into our board of metrics to find the one you are looking for.

Finding Vulnerabilities in Web Applications

BCA (6 th Semester) Teaching Schedule

Cloak of Visibility. -Detecting When Machines Browse A Different Web. Zhe Zhao

SYMANTEC ENTERPRISE SECURITY. Symantec Internet Security Threat Report September 2005 Power and Energy Industry Data Sheet

SEVENMENTOR TRAINING PVT.LTD

Glossary of on line marketing terms

Smart Protection Network. Raimund Genes, CTO

Please be aware that not every step is necessary for your website, or it may be outside the scope of our agreed-upon deliverables.

White Paper. New Gateway Anti-Malware Technology Sets the Bar for Web Threat Protection

SEO is one of three types of three main web marketing tools: PPC, SEO and Affiliate/Socail.

The Ultimate Digital Marketing Glossary (A-Z) what does it all mean? A-Z of Digital Marketing Translation

SEO is one of three types of three main web marketing tools: PPC, SEO and Affiliate/Socail.

Monthly SEO Report. Example Client 16 November 2012 Scott Lawson. Date. Prepared by

Digital Marketing for Small Businesses. Amandine - The Marketing Cookie

Digital Marketing. Introduction of Marketing. Introductions

Search Engine Visibility Analysis

SEO is one of three types of three main web marketing tools: PPC, SEO and Affiliate/Socail.

SEO is one of three types of three main web marketing tools: PPC, SEO and Affiliate/Socail.

SEO WITH SHOPIFY: DOES SHOPIFY HAVE GOOD SEO?

Search Engine Optimization (SEO)

SEO 1 8 O C T O B E R 1 7

1 GOOGLE NOW ALLOWS EDITING BUSINESS LISTINGS

tray.com SEO ISSUES FOUND ON YOUR SITE (MARCH 1, 2018)

Information Retrieval May 15. Web retrieval

Domain

SEO Search Engine Optimisation An Intro to SEO. Prepared By: Paul Dobinson Tel

7/17/ Learning Objectives and Overview. + Economic Impact. Digital Marketing Mix

D I G I T A L SIM SCHOOL OF INTERNET MARKETING

6 WAYS Google s First Page

Phishing. Eugene Davis UAH Information Security Club April 11, 2013

Web Application Security. Philippe Bogaerts

How to do an On-Page SEO Analysis Table of Contents

Advertising Network Affiliate Marketing Algorithm Analytics Auto responder autoresponder Backlinks Blog

SEO ISSUES FOUND ON YOUR SITE (JANUARY 2, 2017)

Search Engine Optimization. MBA 563 Week 6

SEO basics. Chris

THE QUICK AND EASY GUIDE

This tutorial has been prepared for beginners to help them understand the simple but effective SEO characteristics.

Unique Phishing Attacks (2008 vs in thousands)

What is SEO? Search Engine Optimization 101

An Introductory Guide: SEO Best Practices

whitehvac.com SEO ISSUES FOUND ON YOUR SITE (JANUARY 2, 2017)

Learn SEO Copywriting

Fighting Spam, Phishing and Malware With Recurrent Pattern Detection

SEO ISSUES FOUND ON YOUR SITE (JANUARY 2, 2017)

Corso di Biblioteche Digitali

SEO Dubai. SEO Dubai is currently the top ranking SEO agency in Dubai, UAE. First lets get to know what is SEO?

Corso di Biblioteche Digitali

COMMUNICATIONS METRICS, WEB ANALYTICS & DATA MINING

SEO is one of three types of three main web marketing tools: PPC, SEO and Affiliate/Socail.

WebReach Product Glossary

5. search engine marketing

AN SEO GUIDE FOR SALONS

Detect Cyber Threats with Securonix Proxy Traffic Analyzer

Today we show how a search engine works

Web Honeypots for Spies

AURA ACADEMY Training With Expertised Faculty Call us on for Free Demo

SEO News. 15 SEO Fixes for Better Rankings. For SEO, marketing books and guides, visit

SEO Search Engine Optimization ~ Certificate ~

Detecting Drive-by-Download Attacks based on HTTP Context-Types Ryo Kiire, Shigeki Goto Waseda University

The Nuts and Bolts of a Forum Spam Automator

Transcription:

deseo: Combating Search-Result Poisoning Yu Jin @MSCS USF

Your Google is not SAFE! SEO Poisoning - A new way to spread malware!

Why choose SE? 22.4% of Google searches in the top 100 results > 50% for very-popular key phrase search in first page Why attackers are attracted by search engines? Low cost Legitimate appearance People trust Google, Bing, Yahoo, etc

Background of SE Page Rank rank the web pages in search index depends on number of incoming links and ranks of links on pages Features on pages disclose, to prevent spammers some estimated features: over 200 features ex. words in the title, URL, content of the page. Search Engine Optimization (SEO) optimize Web pages so that they are ranked higher by search engines. white-hat created by the end user primarily sitemap, appropriate headings and subheadings follow guidelines recommended by SE black-hat EVIL!

Black-Hat SEO Gaming the rankings, do not follow guidelines Keyword stuffing filling the page with lots of irrelevant keywords Cloaking providing different content to crawlers and users Redirects participating in link farms Detect Black-Hat SEO pages Content of pages Presence of cloaking link structures leading to the pages SEO attacks studied in this paper (Trojan.FakeAV case) Mixed behavior, hard to detect, less training data Servers are originally legitimate Main websites operate normally even after compromised

Overview of SEO Attack How a victim falls into trap of SEO poisoning Three major players: Compromised Web Server, Redirection Server, Exploit Server

Compromised Server Site B Site A Attacker query for ocommerce Web Server Non-malicious page for SE bots (cloaking) Google Trends Upload Files: php scripts graphic shell ex. images/ Site C Fake Page images/page.php? page=<keys> SEO Page

SEO Page Obfuscated PHP Eval() Un-obfuscated Code Is SE? User-Agent String Yes Logs: time, user-agent, url, etc Generate Links & Content: 40 links to same server from key.txt 5 links to other domain Relevant content to key phrase from Google top 100 results (url, snippets)

Redirecting & Exploit Server Compromised Domain NailCash Compromised Domain Cache URL Redirect URL: feed2.fancyskirt.com/<paras> cmd=gettdsurl productid=3 (FakeAV) Exploit Server (Fake AV stored)

Obversations

Observations Link Structure: Num. of out links less, more likely be compromised Time line: most at initial phase of attack duration before first SE crawl: half < 4hrs; over 85% < 1day SE crawlers are aggressive Attacker submit to SE Distribution of key phrase: High rank has high appearance rate Traffic from victims: over 5000 compromised sites, over 40 million SEO pages by monitoring logs of Redirect Servers (num. of visits of Fake AV)

Why SEO Attacks succeed? 3 key observations: Generation of pages with relevant content Targeting multiple popular search keywords to increase coverage Creating dense link structures to boost pagerank deseo focus on: Websites change behavior many new links are added after compromised, usually with different URL structures from the old URLs Harmful URLs have similar structures looking for newly created pages that share the same structure on different domains help to identify group attacks

History-based Detection Study and compare URLs of server with snapshot history Example: http://www.askania-fachmaerkte.de/images/ news.php?page=lisa+roberts+gillan Do website have url : /images/news.php?page= before?

Clustering of Suspicious Domains Three Lexical features from URLs: String features: separator between keywords, argument name, filename, subdirectory name before the keywords Numerical features: number of arguments in the URL, length of arguments, length of filename, length of keywords Bag of words: keywords abcd.blogspot.com v.s. blogspot.com Sub-domains are preferred K-means++ method Neither weight and threshold selection are sensitive

Group Analysis SEO links in one campaign share a similar page structure (not just the URL structure) Measure similarity of web pages Compare parsed HTML structure tree AutoRE - signature generator system

Results Data sets ( Bing.com ): June, 2010 - Historical snapshots Sep, 2010 Jan, 2011 Trendy keywords: Google Trends, May 28th, 2010 ~ Feb, 3rd, 2011 (20/day) History-based Detection: over 100 billion URLs at beginning filter by Alex Top 10000 2/3 move to next step

Results Clustering K=100 results Group Analysis small num of groups have high peak values most have small peak values threshold = 0.45 20 groups remain

A New Attack Found another SEO attack that uses a different methodology setting up SEO pages boosting their page ranks polluting the search index Differences b/t previous attack: Not link to each other, rely on incoming links for page rank Use cloaking Extra level of redirection for exploit server

Study of SEO Attack Queries Less than 5% of the top 500 Alexa Web sites ever submitted queries during the month of September 2010, while 46% of the compromised servers did Queries from the IPs of compromised servers are more frequent than those of legitimate sites Matching Google and Bing queries