Semantic Analysis of Search-Autocomplete Manipulations

Size: px
Start display at page:

Download "Semantic Analysis of Search-Autocomplete Manipulations"

Transcription

1 Game of Missuggestions Semantic Analysis of Search-Autocomplete Manipulations Peng Wang 1, Xianghang Mi 1, Xiaojing Liao 2, XiaoFeng Wang 1, Kan Yuan 1, Feng Qian 1, Raheem Beyah 3 Indiana University Bloomington 1 William and Mary 2 Georgia Institute of Technology 3 NDSS 2018, San Diego 0

2 Autocomplete NDSS 2018, San Diego 1

3 Autocomplete popular searches web content How predictions are made NDSS 2018, San Diego 2

4 Winter is here NDSS 2018, San Diego 3

5 Winter is here promotion target NDSS 2018, San Diego 4

6 Autocomplete Manipulation pollute search logs NDSS 2018, San Diego 5

7 Autocomplete Manipulation pollute search logs NDSS 2018, San Diego 6

8 Autocomplete Manipulation pollute search logs pollute web content compromised websites spam hosting webpages NDSS 2018, San Diego 7

9 Autocomplete Manipulation pollute search logs pollute web content compromised websites spam hosting webpages NDSS 2018, San Diego 8

10 Challenges Search log analysis can only be done by search providers Web content analysis a thorough study is non-trivial on massive data Little understanding about the real-world impacts of illicit promotions NDSS 2018, San Diego 9

11 Sacabuche Search AutoComplete Abuse Checking first detection system without accessing to search logs novel NLP techniques achieves highly efficient, accurate and scalable first large-scale analysis of autocomplete missuggestions first step to understand the ecosystem of this underground business NDSS 2018, San Diego 10

12 Observation Semantic inconsistency trigger: online backup free download legitimate: manipulated: online backup software free download strongvault online backup free download NDSS 2018, San Diego 11

13 Observation Semantic inconsistency trigger: online backup free download semsim=0.96 legitimate: online backup software free download manipulated: strongvault online backup free download NDSS 2018, San Diego 12

14 Observation Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download semsim=0.43 manipulated: strongvault online backup free download NDSS 2018, San Diego 13

15 Sentence Similarity Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download semsim=0.43 manipulated: strongvault online backup free download NDSS 2018, San Diego 14

16 Observation Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download semsim=0.43 manipulated: strongvault online backup free download legitimate: norton online backup free download NDSS 2018, San Diego 15

17 Observation Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download manipulated: strongvault online backup free download legitimate: norton online backup free download semsim=0.43 semsim=0.49 NDSS 2018, San Diego 16

18 Observation Search results inconsistency missuggestion: stongvault online backup free download trigger: online backup free download suggestion: norton online backup free download NDSS 2018, San Diego 17

19 Observation Search results inconsistency missuggestion: stongvault online backup free download trigger: online backup free download suggestion: norton online backup free download NDSS 2018, San Diego 18

20 Search Results Similarity Search results inconsistency missuggestion: stongvault online backup free download trigger: online backup free download suggestion: norton online backup free download NDSS 2018, San Diego 19

21 Architecture NDSS 2018, San Diego 20

22 Prediction Finder seeds API Preprocessing NDSS 2018, San Diego 21

23 Search Term Analyzer semantic features classifier NDSS 2018, San Diego 22

24 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download NDSS 2018, San Diego 23

25 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download phrases strongvault online online backup backup free free download NDSS 2018, San Diego 24

26 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download phrases strongvault online online backup backup free free download words strongvault online backup free download NDSS 2018, San Diego 25

27 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download phrases strongvault online online backup backup free free download phrase similarity words strongvault online backup free download word vector NDSS 2018, San Diego 26

28 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download sentence similarity phrases strongvault online online backup backup free free download phrase similarity words strongvault online backup free download word vector NDSS 2018, San Diego 27

29 Semantic Features Sentence similarity! "" # $, # & = ()(" +,", ) ()(" +," + )()(",,", ),./ #$, # & = 1 2 3/(4 $, 4 & ) 3/ 4 $, 4 & = 9:; 678 </ = $ 6, = & 6, </ = 6, = > = 8 (1 + cos.de(= 2 6, = > )) F Word similarity! G" = $, = & = HIJ(IKL > </ = $ 6 6, < > Infrequency! 6M = $, = & = NOP Q(RS9TU VW XY:Z G Q, ) NOP [ (RS9TU VW XY:Z G [ + ) ) NDSS 2018, San Diego 28

30 Search Result Analyzer search results features classifier NDSS 2018, San Diego 29

31 Search Result Features Result similarity! "# $ %, $ ' = / + -1/ 2 ($ ', $ % ) - Content impact! 56 7 %, 8 %, 8 ' = 9:; 6 (< 7 %, 8 %, 8 ' ) Result popularity! "= $ %, $ ' = <>?(2@ % ($ % ), 2@ ' ($ ' )) Result size! "# ; %, ; ' = ;% ; ' ; ' NDSS 2018, San Diego 30

32 Evaluation Datasets Badset: 150 missuggestions, 296 result pages Goodset: 300 legitimate suggestions, 593 result pages Unknown set: 114 millions trigger-suggestion pairs, 1.6 millions result pages Accuracy and coverage Ground truth: precision 96.23%, recall 95.63% Unknown set: precision 95.4% on 1K suspicious trigger-suggestion pairs Performance 1.5s / trigger-suggestion pair NDSS 2018, San Diego 31

33 Scope and magnitude Number of missuggestions on each platform (G: 0.48%, B: 0.37%, Y: 0.2%) Categories of the polluted triggers NDSS 2018, San Diego 32

34 Scope and magnitude Number of missuggestions on each platform (G: 0.48%, B: 0.37%, Y: 0.2%) Categories of the polluted triggers NDSS 2018, San Diego 33

35 Scope and magnitude Number of missuggestions on each platform (G: 0.48%, B: 0.37%, Y: 0.2%) Categories of the polluted triggers 257K polluted triggers 383K missuggestions NDSS 2018, San Diego 34

36 Evolution and lifetime Number of missuggestions over time % of newly-appeared missuggestions related to newly-appeared polluted triggers - 1.9% of triggers were polluted on average Lifetime distribution of missuggestions % of missuggestions stay > 30 days - 34 days vs. 63 days (missuggestion vs. legit.) NDSS 2018, San Diego 35

37 Evolution and lifetime Number of missuggestions over time % of newly-appeared missuggestions related to newly-appeared polluted triggers - 1.9% of triggers were polluted on average Lifetime distribution of missuggestions % of missuggestions stay > 30 days - 34 days vs. 63 days (missuggestion vs. legit.) NDSS 2018, San Diego 36

38 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers NDSS 2018, San Diego 37

39 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers missuggestion grammatical pattern Top 5 missuggestion patterns NDSS 2018, San Diego 38

40 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers missuggestion grammatical pattern Top 5 missuggestion patterns NDSS 2018, San Diego 39

41 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers missuggestion grammatical pattern Top 5 missuggestion patterns NDSS 2018, San Diego 40

42 Revenue analysis Manipulation service provider ixiala 10K sites request suggestion manipulation $54K/week commission earned by manipulation operators $515K/week for 465K manipulated suggestions NDSS 2018, San Diego 41

43 Discussion Limitations adversary can make the manipulations mimic benign ones lack of ground truth, manual efforts involved NDSS 2018, San Diego 42

44 Discussion Limitations adversary can make the manipulations mimic benign ones lack of ground truth, manual efforts involved Lesson learned unpopular targets related to triggers similar keyword patterns NDSS 2018, San Diego 43

45 Conclusion first large-scale analysis of autocomplete missuggestions, and make first step to understand the underground ecosystem novel NLP techniques to build up the first detection system without accessing to search logs NDSS 2018, San Diego 44

46 Questions & Answers NDSS 2018, San Diego 45

47 Data collection Datasets Dataset # of suggestions # of triggers # of result pages Badset Goodset Unknown set 114,275,000 1,000,900 1,607,951 Validation criteria missuggestion must promote a target whose own reputation cannot make itself stand out in the search results of the trigger missuggestion and its search results conflict with the user s original search intention NDSS 2018, San Diego 46

48 Semantic Consistency Classifier 100 missuggestions legitimate trigger-suggestion pairs SVM classification model with 5-folder cross validation Precision 94.59%, Recall 95.89% Label Feature F-score! "" # $, # & sentence similarity 0.597! '" ( $, ( & word similarity 0.741! )* ( $, ( & infrequency NDSS 2018, San Diego 47

49 Missuggestion Classifier 150 missuggestions legitimate trigger-suggestion pairs SVM classification model with 5-folder cross validation Precision: 96.23%, Recall 95.63% Label Feature F-score! "# $ %, $ ' result similarity 0.782! () * %, + %, + ' content impact 0.808! ", $ %, $ ' result popularity 0.632! "# - %, - ' result size NDSS 2018, San Diego 48

50 Evaluation Accuracy and coverage Tow-step analysis : precision 96.23%, recall 95.63%on ground truth One-step analysis: precision 97.68%, recall 95.59% on ground truth Performance Tow-step analysis: 0.016s/pair (94X faster) One-step analysis: 1.5s/pair NDSS 2018, San Diego 49

Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations

Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations Peng Wang, Xianghang Mi, Xiaojing Liao, XiaoFeng Wang, Kan Yuan, Feng Qian, Raheem Beyah Indiana University Bloomington, William

More information

An Empirical Characterization of IFTTT

An Empirical Characterization of IFTTT An Empirical Characterization of IFTTT Ecosystem, Usage, and Performance Xianghang Mi, Feng Qian, Ying Zhang, XiaoFeng Wang Indiana University Bloomington, Facebook Research 1 Outline 2 What is IFTTT IFTTT

More information

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Zhou Li, Indiana University Bloomington Sumayah Alrwais, Indiana University Bloomington

More information

Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits

Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits Carl Sabottke Octavian Suciu Tudor Dumitraș University of Maryland 2 Problem Increasing number

More information

Unstructured Data. CS102 Winter 2019

Unstructured Data. CS102 Winter 2019 Winter 2019 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data

More information

Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale

Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale Kai Chen,, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, Peng Liu Indiana University,

More information

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES

STUDYING OF CLASSIFYING CHINESE SMS MESSAGES STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2

More information

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management

Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Master thesis Final presentation Michael Legenc Advisor: Daniel Braun Munich, 08.01.2018

More information

Identifying Fraudulently Promoted Online Videos

Identifying Fraudulently Promoted Online Videos Identifying Fraudulently Promoted Online Videos Vlad Bulakh, Christopher W. Dunn, Minaxi Gupta April 7, 2014 April 7, 2014 Vlad Bulakh 2 Motivation Online video sharing websites are visited by millions

More information

An Empirical Study of Web Resource Manipulation in Real-world Mobile Applications

An Empirical Study of Web Resource Manipulation in Real-world Mobile Applications An Empirical Study of Web Resource Manipulation in Real-world Mobile Applications Xiaohan Zhang, Yuan Zhang, Qianqian Mo, Hao Xia, Zhemin Yang, Min Yang XiaoFeng Wang, Long Lu, and Haixin Duan Background

More information

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets

Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju (Purdue University) Navendu Jain (Microsoft Research) Cristina Nita-Rotaru (Purdue University) April

More information

NeighborWatcher: A Content-Agnostic Comment Spam Inference System

NeighborWatcher: A Content-Agnostic Comment Spam Inference System NeighborWatcher: A Content-Agnostic Comment Spam Inference System Jialong Zhang and Guofei Gu Secure Communication and Computer Systems Lab Department of Computer Science & Engineering Texas A&M University

More information

A study of Video Response Spam Detection on YouTube

A study of Video Response Spam Detection on YouTube A study of Video Response Spam Detection on YouTube Suman 1 and Vipin Arora 2 1 Research Scholar, Department of CSE, BITS, Bhiwani, Haryana (India) 2 Asst. Prof., Department of CSE, BITS, Bhiwani, Haryana

More information

Finding Clues For Your Secrets: Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps

Finding Clues For Your Secrets: Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps Finding Clues For Your Secrets: Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps Yuhong Nan, Zhemin Yang, Yuan Zhang, Donglai Zhu and Min Yang Fudan University Xiaofeng Wang Indiana University

More information

Patrick Krabbe Fachbereich Informatik Seminar aus maschinellem Lernen 1

Patrick Krabbe Fachbereich Informatik Seminar aus maschinellem Lernen 1 Towards a Machine Learning Algorithm for Predicting Truck Compressor Failures Using Logged Vehicle Data By S. Nowaczyk, R. Prytz, T. Rögnvaldsson, S. Byttner 14.07.2015 Patrick Krabbe Fachbereich Informatik

More information

Topology-Based Spam Avoidance in Large-Scale Web Crawls

Topology-Based Spam Avoidance in Large-Scale Web Crawls Topology-Based Spam Avoidance in Large-Scale Web Crawls Clint Sparkman Joint work with Hsin-Tsang Lee and Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M

More information

Detecting Spam Web Pages

Detecting Spam Web Pages Detecting Spam Web Pages Marc Najork Microsoft Research Silicon Valley About me 1989-1993: UIUC (home of NCSA Mosaic) 1993-2001: Digital Equipment/Compaq Started working on web search in 1997 Mercator

More information

SMig: A Stream Migration Extension For HTTP/2

SMig: A Stream Migration Extension For HTTP/2 SMig: A Stream Migration Extension For HTTP/2 Xianghang Mi Feng Qian XiaoFeng Wang Department of Computer Science Indiana University Bloomington IETF 98 httpbis Meeting Chicago IL, 3/31/2017 Motivations

More information

D B M G Data Base and Data Mining Group of Politecnico di Torino

D B M G Data Base and Data Mining Group of Politecnico di Torino DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily

More information

Studying the Impact of Text Summarization on Contextual Advertising

Studying the Impact of Text Summarization on Contextual Advertising Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University

More information

Clone Detection and Maintenance with AI Techniques. Na Meng Virginia Tech

Clone Detection and Maintenance with AI Techniques. Na Meng Virginia Tech Clone Detection and Maintenance with AI Techniques Na Meng Virginia Tech Code Clones Developers copy and paste code to improve programming productivity Clone detections tools are needed to help bug fixes

More information

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets Varsha Palandulkar 1, Siddhesh Bhujbal 2, Aayesha Momin 3, Vandana Kirane 4, Raj Naybal 5 Professor, AISSMS Polytechnic

More information

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,

More information

Dimensionality reduction as a defense against evasion attacks on machine learning classifiers

Dimensionality reduction as a defense against evasion attacks on machine learning classifiers Dimensionality reduction as a defense against evasion attacks on machine learning classifiers Arjun Nitin Bhagoji and Prateek Mittal Princeton University DC-Area Anonymity, Privacy, and Security Seminar,

More information

Automated Website Fingerprinting through Deep Learning

Automated Website Fingerprinting through Deep Learning Automated Website Fingerprinting through Deep Learning Vera Rimmer 1, Davy Preuveneers 1, Marc Juarez 2, Tom Van Goethem 1 and Wouter Joosen 1 NDSS 2018 Feb 19th (San Diego, USA) 1 2 Website Fingerprinting

More information

Dependency-Preserving Data Compaction for Scalable Forensic Analysis 1

Dependency-Preserving Data Compaction for Scalable Forensic Analysis 1 Intro Reductions Optimizations Evaluation Summary Dependency-Preserving Data Compaction for Scalable Forensic Analysis 1 Md Nahid Hossain, Junao Wang, R. Sekar, and Scott D. Stoller 1 This work was supported

More information

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University

CPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University CPSC 4/5 PP Lookup Service Ennan Zhai Computer Science Department Yale University Recall: Lec- Network basics: - OSI model and how Internet works - Socket APIs red PP network (Gnutella, KaZaA, etc.) UseNet

More information

Splog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics. Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng

Splog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics. Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng Splog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng NEC Laboratories America, Cupertino, CA AIRWeb Workshop 2007

More information

Advances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation

Advances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Information Retrieval Using Collaborative Filtering and Item Based Recommendation

More information

Countering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008

Countering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008 Countering Spam Using Classification Techniques Steve Webb webb@cc.gatech.edu Data Mining Guest Lecture February 21, 2008 Overview Introduction Countering Email Spam Problem Description Classification

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She

ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term

More information

Detecting Malicious Activity with DNS Backscatter Kensuke Fukuda John Heidemann Proc. of ACM IMC '15, pp , 2015.

Detecting Malicious Activity with DNS Backscatter Kensuke Fukuda John Heidemann Proc. of ACM IMC '15, pp , 2015. Detecting Malicious Activity with DNS Backscatter Kensuke Fukuda John Heidemann Proc. of ACM IMC '15, pp. 197-210, 2015. Presented by Xintong Wang and Han Zhang Challenges in Network Monitoring Need a

More information

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection

Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Zhen Qin (University of California, Riverside) Peter van Beek & Xu Chen (SHARP Labs of America, Camas, WA) 2015/8/30

More information

Data Mining Course Overview

Data Mining Course Overview Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical

More information

Artificial Intelligence applied to IPC and Nice classifications. Patrick FIÉVET. Geneva May 25, 2018

Artificial Intelligence applied to IPC and Nice classifications. Patrick FIÉVET. Geneva May 25, 2018 Artificial Intelligence applied to IPC and Nice classifications Patrick FIÉVET Geneva May 25, 2018 2 IPCCAT-neural : automatic text categorization in the IPC What is it about? Patent Classifications :

More information

DEFENDING AGAINST MALICIOUS NODES USING AN SVM BASED REPUTATION SYSTEM

DEFENDING AGAINST MALICIOUS NODES USING AN SVM BASED REPUTATION SYSTEM DEFENDING AGAINST MALICIOUS NODES USING AN SVM BASED REPUTATION SYSTEM Rehan Akbani, Turgay Korkmaz, and G. V. S. Raju {rakbani@cs.utsa.edu, korkmaz@cs.utsa.edu, and gvs.raju@utsa.edu} University of Texas

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Introduction. Can we use Google for networking research?

Introduction. Can we use Google for networking research? Unconstrained Profiling of Internet Endpoints via Information on the Web ( Googling the Internet) Ionut Trestian1 Soups Ranjan2 Aleksandar Kuzmanovic1 Antonio Nucci2 1 Northwestern 2 Narus University Inc.

More information

Adaptive Learning of an Accurate Skin-Color Model

Adaptive Learning of an Accurate Skin-Color Model Adaptive Learning of an Accurate Skin-Color Model Q. Zhu K.T. Cheng C. T. Wu Y. L. Wu Electrical & Computer Engineering University of California, Santa Barbara Presented by: H.T Wang Outline Generic Skin

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

An Implementation of Hierarchical Multi-Label Classification System User Manual. by Thanawut Ananpiriyakul Piyapan Poomsilivilai

An Implementation of Hierarchical Multi-Label Classification System User Manual. by Thanawut Ananpiriyakul Piyapan Poomsilivilai An Implementation of Hierarchical Multi-Label Classification System User Manual by 5331028421 Thanawut Ananpiriyakul 5331039321 Piyapan Poomsilivilai Supervisor Dr. Peerapon Vateekul Department of Computer

More information

Aiding the Detection of Fake Accounts in Large Scale Social Online Services

Aiding the Detection of Fake Accounts in Large Scale Social Online Services Aiding the Detection of Fake Accounts in Large Scale Social Online Services Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Tiago Pregueiro Cyprus Univ. of Technology Duke University Tuenti,

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

Automated Assessment of Security Risks for Mobile Applications

Automated Assessment of Security Risks for Mobile Applications Automated Assessment of Security Risks for Mobile Applications Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie Department of Computer Science North Carolina State University Lookout Mobile

More information

TorontoCity: Seeing the World with a Million Eyes

TorontoCity: Seeing the World with a Million Eyes TorontoCity: Seeing the World with a Million Eyes Authors Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun * Project Completed

More information

CAP 6412 Advanced Computer Vision

CAP 6412 Advanced Computer Vision CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha

More information

Feng Qian. Address: 323 Lindley Hall, 150 S Woodlawn Ave, Bloomington IN Homepage:

Feng Qian. Address: 323 Lindley Hall, 150 S Woodlawn Ave, Bloomington IN Homepage: CONTACT Feng Qian fengqian@indiana.edu Address: 323 Lindley Hall, 150 S Woodlawn Ave, Bloomington IN 47405-7104 Homepage: http://fengqian.org WORK AND EDUCATION 01/2015 Present: Indiana University, Bloomington

More information

Domain-specific Concept-based Information Retrieval System

Domain-specific Concept-based Information Retrieval System Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical

More information

Math Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng

Math Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng Math Information Retrieval: User Requirements and Prototype Implementation Jin Zhao, Min Yen Kan and Yin Leng Theng Why Math Information Retrieval? Examples: Looking for formulas Collect teaching resources

More information

Categorization of Phishing Detection Features. And Using the Feature Vectors to Classify Phishing Websites. Bhuvana Namasivayam

Categorization of Phishing Detection Features. And Using the Feature Vectors to Classify Phishing Websites. Bhuvana Namasivayam Categorization of Phishing Detection Features And Using the Feature Vectors to Classify Phishing Websites by Bhuvana Namasivayam A Thesis Presented in Partial Fulfillment of the Requirements for the Degree

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Detecting Malicious URLs. Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker. Presented by Gaspar Modelo-Howard September 29, 2010.

Detecting Malicious URLs. Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker. Presented by Gaspar Modelo-Howard September 29, 2010. Detecting Malicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Presented by Gaspar Modelo-Howard September 29, 2010 Publications Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey

More information

No Plan Survives Contact

No Plan Survives Contact No Plan Survives Contact Experience with Cybercrime Measurement Chris Kanich Neha Chachra Damon McCoy Chris Grier David Wang Marti Motoyama Kirill Levchenko Stefan Savage Geoffrey M. Voelker UC San Diego

More information

Detecting Malicious Web Links and Identifying Their Attack Types

Detecting Malicious Web Links and Identifying Their Attack Types Detecting Malicious Web Links and Identifying Their Attack Types Anti-Spam Team Cellopoint July 3, 2013 Introduction References A great effort has been directed towards detection of malicious URLs Blacklisting

More information

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL

SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL IMAGE DESCRIPTIONS IN THE WILD (IDW-CNN) LARGE KERNEL MATTERS (GCN) DEEP LEARNING SEMINAR, TAU NOVEMBER 2017 TOPICS IDW-CNN: Improving Semantic Segmentation

More information

Opportunities and challenges in personalization of online hotel search

Opportunities and challenges in personalization of online hotel search Opportunities and challenges in personalization of online hotel search David Zibriczky Data Science & Analytics Lead, User Profiling Introduction 2 Introduction About Mission: Helping the travelers to

More information

ENTERPRISE ENDPOINT PROTECTION BUYER S GUIDE

ENTERPRISE ENDPOINT PROTECTION BUYER S GUIDE ENTERPRISE ENDPOINT PROTECTION BUYER S GUIDE TABLE OF CONTENTS Overview...3 A Multi-Layer Approach to Endpoint Security...4 Known Attack Detection...5 Machine Learning...6 Behavioral Analysis...7 Exploit

More information

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor

More information

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Deep Character-Level Click-Through Rate Prediction for Sponsored Search Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as

More information

What s in Your Dongle and Bank Account? Mandatory and Discretionary Protection of Android External Resources

What s in Your Dongle and Bank Account? Mandatory and Discretionary Protection of Android External Resources What s in Your Dongle and Bank Account? Mandatory and Discretionary Protection of Android External Resources Soteris Demetriou, Xiaoyong Zhou, Muhammad Naveed, Yeonjoon Lee, Kan Yuan, XiaoFeng Wang, Carl

More information

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha Physics Institute,

More information

Supporting Information

Supporting Information Supporting Information Ullman et al. 10.1073/pnas.1513198113 SI Methods Training Models on Full-Object Images. The human average MIRC recall was 0.81, and the sub-mirc recall was 0.10. The models average

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Feng Qian. Address: 480 Digital Technology Center, 117 Pleasant St SE, Minneapolis MN Homepage:

Feng Qian. Address: 480 Digital Technology Center, 117 Pleasant St SE, Minneapolis MN Homepage: CONTACT EMPLOYMENT EDUCATION Feng Qian fengqian@umn.edu Address: 480 Digital Technology Center, 117 Pleasant St SE, Minneapolis MN 55455 Homepage: http://fengqian.org 08/2018 Present: University of Minnesota

More information

IC-SDV 2018 Automatic Text categorization in the International Patent Classification

IC-SDV 2018 Automatic Text categorization in the International Patent Classification IC-SDV 2018 Automatic Text categorization in the International Patent Classification IPCCAT-Neural Patrick FIÉVET & Jacques GUYOT Nice April 23, 2018 2 IPCCAT-neural : automatic text categorization in

More information

Opening the Black Box Data Driven Visualizaion of Neural N

Opening the Black Box Data Driven Visualizaion of Neural N Opening the Black Box Data Driven Visualizaion of Neural Networks September 20, 2006 Aritificial Neural Networks Limitations of ANNs Use of Visualization (ANNs) mimic the processes found in biological

More information

On Mobile Malware Infections N. Asokan

On Mobile Malware Infections N. Asokan On Mobile Malware Infections N. Asokan (joint work with Hien Thi Thu Truong, Eemil Lagerspetz, Petteri Nurmi, Adam J. Oliner, Sasu Tarkoma, Sourav Bhattacharya) Mobile malware alarm bells Google Search

More information

Service-Centric Networking for the Developing World

Service-Centric Networking for the Developing World GAIA workshop Service-Centric Networking for the Developing World Arjuna Sathiaseelan, Liang Wang, Andrius Aucinas, Gareth Tyson*, Jon Crowcroft N4D Lab liang.wang@cl.cam.ac.uk Cambridge University, UK

More information

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION

LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION Evgeny Kharitonov *, ***, Anton Slesarev *, ***, Ilya Muchnik **, ***, Fedor Romanenko ***, Dmitry Belyaev ***, Dmitry Kotlyarov *** * Moscow Institute

More information

deseo: Combating Search-Result Poisoning Yu USF

deseo: Combating Search-Result Poisoning Yu USF deseo: Combating Search-Result Poisoning Yu Jin @MSCS USF Your Google is not SAFE! SEO Poisoning - A new way to spread malware! Why choose SE? 22.4% of Google searches in the top 100 results > 50% for

More information

Adversarial Machine Learning An Introduction. With slides from: Binghui Wang

Adversarial Machine Learning An Introduction. With slides from: Binghui Wang Adversarial Machine Learning An Introduction With slides from: Binghui Wang Outline Machine Learning (ML) Adversarial ML Attack Taxonomy Capability Adversarial Training Conclusion Outline Machine Learning

More information

Hebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process

Hebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process A Text-Mining-based Patent Analysis in Product Innovative Process Liang Yanhong, Tan Runhua Abstract Hebei University of Technology Patent documents contain important technical knowledge and research results.

More information

Using Network Traffic to Remotely Identify the Type of Applications Executing on Mobile Devices. Lanier Watkins, PhD

Using Network Traffic to Remotely Identify the Type of Applications Executing on Mobile Devices. Lanier Watkins, PhD Using Network Traffic to Remotely Identify the Type of Applications Executing on Mobile Devices Lanier Watkins, PhD LanierWatkins@gmail.com Outline Introduction Contributions and Assumptions Related Work

More information

Documentation for: MTA developers

Documentation for: MTA developers This document contains implementation guidelines for developers of MTA products/appliances willing to use Spamhaus products to block as much spam as possible. No reference is made to specific products.

More information

Dynamic Feature Selection for Dependency Parsing

Dynamic Feature Selection for Dependency Parsing Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana

More information

Extraction of Web Image Information: Semantic or Visual Cues?

Extraction of Web Image Information: Semantic or Visual Cues? Extraction of Web Image Information: Semantic or Visual Cues? Georgina Tryfou and Nicolas Tsapatsoulis Cyprus University of Technology, Department of Communication and Internet Studies, Limassol, Cyprus

More information

MALICIOUS URL DETECTION AND PREVENTION AT BROWSER LEVEL FRAMEWORK

MALICIOUS URL DETECTION AND PREVENTION AT BROWSER LEVEL FRAMEWORK International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 12, December 2017, pp. 536 541, Article ID: IJMET_08_12_054 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=8&itype=12

More information

Instructor: Dr. Mehmet Aktaş. Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University

Instructor: Dr. Mehmet Aktaş. Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Instructor: Dr. Mehmet Aktaş Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

More information

MEASURING AND FINGERPRINTING CLICK-SPAM IN AD NETWORKS

MEASURING AND FINGERPRINTING CLICK-SPAM IN AD NETWORKS MEASURING AND FINGERPRINTING CLICK-SPAM IN AD NETWORKS Vacha Dave *, Saikat Guha and Yin Zhang * * The University of Texas at Austin Microsoft Research India Internet Advertising Today 2 Online advertising

More information

WHAT S NEW WITH OBSERVEIT: INSIDER THREAT MANAGEMENT VERSION 6.5

WHAT S NEW WITH OBSERVEIT: INSIDER THREAT MANAGEMENT VERSION 6.5 WHAT S NEW WITH OBSERVEIT: INSIDER THREAT MANAGEMENT VERSION 6.5 ObserveIT s award-winning insider threat management software combines user monitoring, behavioral analytics, and now policy enforcement

More information

Approach Research of Keyword Extraction Based on Web Pages Document

Approach Research of Keyword Extraction Based on Web Pages Document 2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Approach Research Keyword Extraction Based on Web Pages Document Yangxin

More information

Data-Intensive Computing with MapReduce

Data-Intensive Computing with MapReduce Data-Intensive Computing with MapReduce Session 6: Similar Item Detection Jimmy Lin University of Maryland Thursday, February 28, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share

More information

Facebook Immune System 人人安全中心姚海阔

Facebook Immune System 人人安全中心姚海阔 Facebook Immune System 人人安全中心姚海阔 Immune A realtime system to protect our users and the social graph Big data, Real time 25B checks per day 650K per second at peak Realtime checks and classifications on

More information

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool

Predicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool Predicting ground-level scene Layout from Aerial imagery Muhammad Hasan Maqbool Objective Given the overhead image predict its ground level semantic segmentation Predicted ground level labeling Overhead/Aerial

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Extracting Rankings for Spatial Keyword Queries from GPS Data

Extracting Rankings for Spatial Keyword Queries from GPS Data Extracting Rankings for Spatial Keyword Queries from GPS Data Ilkcan Keles Christian S. Jensen Simonas Saltenis Aalborg University Outline Introduction Motivation Problem Definition Proposed Method Overview

More information

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual

IT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual IT1105 Information Systems and Technology BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing Student Manual Lesson 3: Organizing Data and Information (6 Hrs) Instructional Objectives Students

More information

Distress Image Library for Precision and Bias of Fully Automated Pavement Cracking Survey

Distress Image Library for Precision and Bias of Fully Automated Pavement Cracking Survey Distress Image Library for Precision and Bias of Fully Automated Pavement Cracking Survey Kelvin C.P. Wang, Ran Ji, and Cheng Chen kelvin.wang@okstate.edu Oklahoma State University/WayLink School of Civil

More information

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients

Three-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating

Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,

More information

Test Automation. Fundamentals. Mikó Szilárd

Test Automation. Fundamentals. Mikó Szilárd Test Automation Fundamentals Mikó Szilárd 2016 EPAM 2 Blue-chip clients rely on EPAM 3 SCHEDULE 9.12 Intro 9.19 Unit testing 1 9.26 Unit testing 2 10.03 Continuous integration 1 10.10 Continuous integration

More information

Detecting and Quantifying Abusive IPv6 SMTP!

Detecting and Quantifying Abusive IPv6 SMTP! Detecting and Quantifying Abusive IPv6 SMTP Casey Deccio Verisign Labs Internet2 2014 Technical Exchange October 30, 2014 Spam, IPv4 Reputation and DNSBL Spam is pervasive Annoying (pharmaceuticals) Dangerous

More information

Natural Language Processing as Key Component to Successful Information Products

Natural Language Processing as Key Component to Successful Information Products Natural Language Processing as Key Component to Successful Information Products Yves Schabes Teragram Corporation Tera monster in Greek 2 40 (=1,099,511,627,766) 10 12 (one trillion) gram something written

More information

Artificial Intelligence. Programming Styles

Artificial Intelligence. Programming Styles Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to

More information