Semantic Analysis of Search-Autocomplete Manipulations
|
|
- Erin Hudson
- 6 years ago
- Views:
Transcription
1 Game of Missuggestions Semantic Analysis of Search-Autocomplete Manipulations Peng Wang 1, Xianghang Mi 1, Xiaojing Liao 2, XiaoFeng Wang 1, Kan Yuan 1, Feng Qian 1, Raheem Beyah 3 Indiana University Bloomington 1 William and Mary 2 Georgia Institute of Technology 3 NDSS 2018, San Diego 0
2 Autocomplete NDSS 2018, San Diego 1
3 Autocomplete popular searches web content How predictions are made NDSS 2018, San Diego 2
4 Winter is here NDSS 2018, San Diego 3
5 Winter is here promotion target NDSS 2018, San Diego 4
6 Autocomplete Manipulation pollute search logs NDSS 2018, San Diego 5
7 Autocomplete Manipulation pollute search logs NDSS 2018, San Diego 6
8 Autocomplete Manipulation pollute search logs pollute web content compromised websites spam hosting webpages NDSS 2018, San Diego 7
9 Autocomplete Manipulation pollute search logs pollute web content compromised websites spam hosting webpages NDSS 2018, San Diego 8
10 Challenges Search log analysis can only be done by search providers Web content analysis a thorough study is non-trivial on massive data Little understanding about the real-world impacts of illicit promotions NDSS 2018, San Diego 9
11 Sacabuche Search AutoComplete Abuse Checking first detection system without accessing to search logs novel NLP techniques achieves highly efficient, accurate and scalable first large-scale analysis of autocomplete missuggestions first step to understand the ecosystem of this underground business NDSS 2018, San Diego 10
12 Observation Semantic inconsistency trigger: online backup free download legitimate: manipulated: online backup software free download strongvault online backup free download NDSS 2018, San Diego 11
13 Observation Semantic inconsistency trigger: online backup free download semsim=0.96 legitimate: online backup software free download manipulated: strongvault online backup free download NDSS 2018, San Diego 12
14 Observation Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download semsim=0.43 manipulated: strongvault online backup free download NDSS 2018, San Diego 13
15 Sentence Similarity Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download semsim=0.43 manipulated: strongvault online backup free download NDSS 2018, San Diego 14
16 Observation Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download semsim=0.43 manipulated: strongvault online backup free download legitimate: norton online backup free download NDSS 2018, San Diego 15
17 Observation Semantic inconsistency trigger: legitimate: online backup free download semsim=0.96 online backup software free download manipulated: strongvault online backup free download legitimate: norton online backup free download semsim=0.43 semsim=0.49 NDSS 2018, San Diego 16
18 Observation Search results inconsistency missuggestion: stongvault online backup free download trigger: online backup free download suggestion: norton online backup free download NDSS 2018, San Diego 17
19 Observation Search results inconsistency missuggestion: stongvault online backup free download trigger: online backup free download suggestion: norton online backup free download NDSS 2018, San Diego 18
20 Search Results Similarity Search results inconsistency missuggestion: stongvault online backup free download trigger: online backup free download suggestion: norton online backup free download NDSS 2018, San Diego 19
21 Architecture NDSS 2018, San Diego 20
22 Prediction Finder seeds API Preprocessing NDSS 2018, San Diego 21
23 Search Term Analyzer semantic features classifier NDSS 2018, San Diego 22
24 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download NDSS 2018, San Diego 23
25 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download phrases strongvault online online backup backup free free download NDSS 2018, San Diego 24
26 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download phrases strongvault online online backup backup free free download words strongvault online backup free download NDSS 2018, San Diego 25
27 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download phrases strongvault online online backup backup free free download phrase similarity words strongvault online backup free download word vector NDSS 2018, San Diego 26
28 Semantic Feature example online backup free download -> strongvault online backup free download Sentence level similarity strongvault online backup free download VS. online backup free download sentence similarity phrases strongvault online online backup backup free free download phrase similarity words strongvault online backup free download word vector NDSS 2018, San Diego 27
29 Semantic Features Sentence similarity! "" # $, # & = ()(" +,", ) ()(" +," + )()(",,", ),./ #$, # & = 1 2 3/(4 $, 4 & ) 3/ 4 $, 4 & = 9:; 678 </ = $ 6, = & 6, </ = 6, = > = 8 (1 + cos.de(= 2 6, = > )) F Word similarity! G" = $, = & = HIJ(IKL > </ = $ 6 6, < > Infrequency! 6M = $, = & = NOP Q(RS9TU VW XY:Z G Q, ) NOP [ (RS9TU VW XY:Z G [ + ) ) NDSS 2018, San Diego 28
30 Search Result Analyzer search results features classifier NDSS 2018, San Diego 29
31 Search Result Features Result similarity! "# $ %, $ ' = / + -1/ 2 ($ ', $ % ) - Content impact! 56 7 %, 8 %, 8 ' = 9:; 6 (< 7 %, 8 %, 8 ' ) Result popularity! "= $ %, $ ' = <>?(2@ % ($ % ), 2@ ' ($ ' )) Result size! "# ; %, ; ' = ;% ; ' ; ' NDSS 2018, San Diego 30
32 Evaluation Datasets Badset: 150 missuggestions, 296 result pages Goodset: 300 legitimate suggestions, 593 result pages Unknown set: 114 millions trigger-suggestion pairs, 1.6 millions result pages Accuracy and coverage Ground truth: precision 96.23%, recall 95.63% Unknown set: precision 95.4% on 1K suspicious trigger-suggestion pairs Performance 1.5s / trigger-suggestion pair NDSS 2018, San Diego 31
33 Scope and magnitude Number of missuggestions on each platform (G: 0.48%, B: 0.37%, Y: 0.2%) Categories of the polluted triggers NDSS 2018, San Diego 32
34 Scope and magnitude Number of missuggestions on each platform (G: 0.48%, B: 0.37%, Y: 0.2%) Categories of the polluted triggers NDSS 2018, San Diego 33
35 Scope and magnitude Number of missuggestions on each platform (G: 0.48%, B: 0.37%, Y: 0.2%) Categories of the polluted triggers 257K polluted triggers 383K missuggestions NDSS 2018, San Diego 34
36 Evolution and lifetime Number of missuggestions over time % of newly-appeared missuggestions related to newly-appeared polluted triggers - 1.9% of triggers were polluted on average Lifetime distribution of missuggestions % of missuggestions stay > 30 days - 34 days vs. 63 days (missuggestion vs. legit.) NDSS 2018, San Diego 35
37 Evolution and lifetime Number of missuggestions over time % of newly-appeared missuggestions related to newly-appeared polluted triggers - 1.9% of triggers were polluted on average Lifetime distribution of missuggestions % of missuggestions stay > 30 days - 34 days vs. 63 days (missuggestion vs. legit.) NDSS 2018, San Diego 36
38 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers NDSS 2018, San Diego 37
39 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers missuggestion grammatical pattern Top 5 missuggestion patterns NDSS 2018, San Diego 38
40 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers missuggestion grammatical pattern Top 5 missuggestion patterns NDSS 2018, San Diego 39
41 Missuggestion content and pattern 20% missuggestions related to more than one trigger free web hosting and domain name registration services by doteasy.com related to 123 triggers missuggestion grammatical pattern Top 5 missuggestion patterns NDSS 2018, San Diego 40
42 Revenue analysis Manipulation service provider ixiala 10K sites request suggestion manipulation $54K/week commission earned by manipulation operators $515K/week for 465K manipulated suggestions NDSS 2018, San Diego 41
43 Discussion Limitations adversary can make the manipulations mimic benign ones lack of ground truth, manual efforts involved NDSS 2018, San Diego 42
44 Discussion Limitations adversary can make the manipulations mimic benign ones lack of ground truth, manual efforts involved Lesson learned unpopular targets related to triggers similar keyword patterns NDSS 2018, San Diego 43
45 Conclusion first large-scale analysis of autocomplete missuggestions, and make first step to understand the underground ecosystem novel NLP techniques to build up the first detection system without accessing to search logs NDSS 2018, San Diego 44
46 Questions & Answers NDSS 2018, San Diego 45
47 Data collection Datasets Dataset # of suggestions # of triggers # of result pages Badset Goodset Unknown set 114,275,000 1,000,900 1,607,951 Validation criteria missuggestion must promote a target whose own reputation cannot make itself stand out in the search results of the trigger missuggestion and its search results conflict with the user s original search intention NDSS 2018, San Diego 46
48 Semantic Consistency Classifier 100 missuggestions legitimate trigger-suggestion pairs SVM classification model with 5-folder cross validation Precision 94.59%, Recall 95.89% Label Feature F-score! "" # $, # & sentence similarity 0.597! '" ( $, ( & word similarity 0.741! )* ( $, ( & infrequency NDSS 2018, San Diego 47
49 Missuggestion Classifier 150 missuggestions legitimate trigger-suggestion pairs SVM classification model with 5-folder cross validation Precision: 96.23%, Recall 95.63% Label Feature F-score! "# $ %, $ ' result similarity 0.782! () * %, + %, + ' content impact 0.808! ", $ %, $ ' result popularity 0.632! "# - %, - ' result size NDSS 2018, San Diego 48
50 Evaluation Accuracy and coverage Tow-step analysis : precision 96.23%, recall 95.63%on ground truth One-step analysis: precision 97.68%, recall 95.59% on ground truth Performance Tow-step analysis: 0.016s/pair (94X faster) One-step analysis: 1.5s/pair NDSS 2018, San Diego 49
Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations
Game of Missuggestions: Semantic Analysis of Search-Autocomplete Manipulations Peng Wang, Xianghang Mi, Xiaojing Liao, XiaoFeng Wang, Kan Yuan, Feng Qian, Raheem Beyah Indiana University Bloomington, William
More informationAn Empirical Characterization of IFTTT
An Empirical Characterization of IFTTT Ecosystem, Usage, and Performance Xianghang Mi, Feng Qian, Ying Zhang, XiaoFeng Wang Indiana University Bloomington, Facebook Research 1 Outline 2 What is IFTTT IFTTT
More informationFinding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures
Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Zhou Li, Indiana University Bloomington Sumayah Alrwais, Indiana University Bloomington
More informationVulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits
Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits Carl Sabottke Octavian Suciu Tudor Dumitraș University of Maryland 2 Problem Increasing number
More informationUnstructured Data. CS102 Winter 2019
Winter 2019 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data
More informationFinding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale
Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale Kai Chen,, Peng Wang, Yeonjoon Lee, XiaoFeng Wang, Nan Zhang, Heqing Huang, Wei Zou, Peng Liu Indiana University,
More informationSTUDYING OF CLASSIFYING CHINESE SMS MESSAGES
STUDYING OF CLASSIFYING CHINESE SMS MESSAGES BASED ON BAYESIAN CLASSIFICATION 1 LI FENG, 2 LI JIGANG 1,2 Computer Science Department, DongHua University, Shanghai, China E-mail: 1 Lifeng@dhu.edu.cn, 2
More informationUsing Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management
Using Natural Language Processing and Machine Learning to Assist First-Level Customer Support for Contract Management Master thesis Final presentation Michael Legenc Advisor: Daniel Braun Munich, 08.01.2018
More informationIdentifying Fraudulently Promoted Online Videos
Identifying Fraudulently Promoted Online Videos Vlad Bulakh, Christopher W. Dunn, Minaxi Gupta April 7, 2014 April 7, 2014 Vlad Bulakh 2 Motivation Online video sharing websites are visited by millions
More informationAn Empirical Study of Web Resource Manipulation in Real-world Mobile Applications
An Empirical Study of Web Resource Manipulation in Real-world Mobile Applications Xiaohan Zhang, Yuan Zhang, Qianqian Mo, Hao Xia, Zhemin Yang, Min Yang XiaoFeng Wang, Long Lu, and Haixin Duan Background
More informationJuggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets
Juggling the Jigsaw Towards Automated Problem Inference from Network Trouble Tickets Rahul Potharaju (Purdue University) Navendu Jain (Microsoft Research) Cristina Nita-Rotaru (Purdue University) April
More informationNeighborWatcher: A Content-Agnostic Comment Spam Inference System
NeighborWatcher: A Content-Agnostic Comment Spam Inference System Jialong Zhang and Guofei Gu Secure Communication and Computer Systems Lab Department of Computer Science & Engineering Texas A&M University
More informationA study of Video Response Spam Detection on YouTube
A study of Video Response Spam Detection on YouTube Suman 1 and Vipin Arora 2 1 Research Scholar, Department of CSE, BITS, Bhiwani, Haryana (India) 2 Asst. Prof., Department of CSE, BITS, Bhiwani, Haryana
More informationFinding Clues For Your Secrets: Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps
Finding Clues For Your Secrets: Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps Yuhong Nan, Zhemin Yang, Yuan Zhang, Donglai Zhu and Min Yang Fudan University Xiaofeng Wang Indiana University
More informationPatrick Krabbe Fachbereich Informatik Seminar aus maschinellem Lernen 1
Towards a Machine Learning Algorithm for Predicting Truck Compressor Failures Using Logged Vehicle Data By S. Nowaczyk, R. Prytz, T. Rögnvaldsson, S. Byttner 14.07.2015 Patrick Krabbe Fachbereich Informatik
More informationTopology-Based Spam Avoidance in Large-Scale Web Crawls
Topology-Based Spam Avoidance in Large-Scale Web Crawls Clint Sparkman Joint work with Hsin-Tsang Lee and Dmitri Loguinov Internet Research Lab Department of Computer Science and Engineering Texas A&M
More informationDetecting Spam Web Pages
Detecting Spam Web Pages Marc Najork Microsoft Research Silicon Valley About me 1989-1993: UIUC (home of NCSA Mosaic) 1993-2001: Digital Equipment/Compaq Started working on web search in 1997 Mercator
More informationSMig: A Stream Migration Extension For HTTP/2
SMig: A Stream Migration Extension For HTTP/2 Xianghang Mi Feng Qian XiaoFeng Wang Department of Computer Science Indiana University Bloomington IETF 98 httpbis Meeting Chicago IL, 3/31/2017 Motivations
More informationD B M G Data Base and Data Mining Group of Politecnico di Torino
DataBase and Data Mining Group of Data mining fundamentals Data Base and Data Mining Group of Data analysis Most companies own huge databases containing operational data textual documents experiment results
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationTechnical Brief: Domain Risk Score Proactively uncover threats using DNS and data science
Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily
More informationStudying the Impact of Text Summarization on Contextual Advertising
Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University
More informationClone Detection and Maintenance with AI Techniques. Na Meng Virginia Tech
Clone Detection and Maintenance with AI Techniques Na Meng Virginia Tech Code Clones Developers copy and paste code to improve programming productivity Clone detections tools are needed to help bug fixes
More informationA Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets
A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets Varsha Palandulkar 1, Siddhesh Bhujbal 2, Aayesha Momin 3, Vandana Kirane 4, Raj Naybal 5 Professor, AISSMS Polytechnic
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationDimensionality reduction as a defense against evasion attacks on machine learning classifiers
Dimensionality reduction as a defense against evasion attacks on machine learning classifiers Arjun Nitin Bhagoji and Prateek Mittal Princeton University DC-Area Anonymity, Privacy, and Security Seminar,
More informationAutomated Website Fingerprinting through Deep Learning
Automated Website Fingerprinting through Deep Learning Vera Rimmer 1, Davy Preuveneers 1, Marc Juarez 2, Tom Van Goethem 1 and Wouter Joosen 1 NDSS 2018 Feb 19th (San Diego, USA) 1 2 Website Fingerprinting
More informationDependency-Preserving Data Compaction for Scalable Forensic Analysis 1
Intro Reductions Optimizations Evaluation Summary Dependency-Preserving Data Compaction for Scalable Forensic Analysis 1 Md Nahid Hossain, Junao Wang, R. Sekar, and Scott D. Stoller 1 This work was supported
More informationCPSC 426/526. P2P Lookup Service. Ennan Zhai. Computer Science Department Yale University
CPSC 4/5 PP Lookup Service Ennan Zhai Computer Science Department Yale University Recall: Lec- Network basics: - OSI model and how Internet works - Socket APIs red PP network (Gnutella, KaZaA, etc.) UseNet
More informationSplog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics. Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng
Splog Detection Using Self-Similarity Analysis on Blog Temporal Dynamics Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura and Belle Tseng NEC Laboratories America, Cupertino, CA AIRWeb Workshop 2007
More informationAdvances in Natural and Applied Sciences. Information Retrieval Using Collaborative Filtering and Item Based Recommendation
AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Information Retrieval Using Collaborative Filtering and Item Based Recommendation
More informationCountering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008
Countering Spam Using Classification Techniques Steve Webb webb@cc.gatech.edu Data Mining Guest Lecture February 21, 2008 Overview Introduction Countering Email Spam Problem Description Classification
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationBing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer
Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web
More informationELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4. Prof. James She
ELEC6910Q Analytics and Systems for Social Media and Big Data Applications Lecture 4 Prof. James She james.she@ust.hk 1 Selected Works of Activity 4 2 Selected Works of Activity 4 3 Last lecture 4 Mid-term
More informationDetecting Malicious Activity with DNS Backscatter Kensuke Fukuda John Heidemann Proc. of ACM IMC '15, pp , 2015.
Detecting Malicious Activity with DNS Backscatter Kensuke Fukuda John Heidemann Proc. of ACM IMC '15, pp. 197-210, 2015. Presented by Xintong Wang and Han Zhang Challenges in Network Monitoring Need a
More informationDirect Matrix Factorization and Alignment Refinement: Application to Defect Detection
Direct Matrix Factorization and Alignment Refinement: Application to Defect Detection Zhen Qin (University of California, Riverside) Peter van Beek & Xu Chen (SHARP Labs of America, Camas, WA) 2015/8/30
More informationData Mining Course Overview
Data Mining Course Overview 1 Data Mining Overview Understanding Data Classification: Decision Trees and Bayesian classifiers, ANN, SVM Association Rules Mining: APriori, FP-growth Clustering: Hierarchical
More informationArtificial Intelligence applied to IPC and Nice classifications. Patrick FIÉVET. Geneva May 25, 2018
Artificial Intelligence applied to IPC and Nice classifications Patrick FIÉVET Geneva May 25, 2018 2 IPCCAT-neural : automatic text categorization in the IPC What is it about? Patent Classifications :
More informationDEFENDING AGAINST MALICIOUS NODES USING AN SVM BASED REPUTATION SYSTEM
DEFENDING AGAINST MALICIOUS NODES USING AN SVM BASED REPUTATION SYSTEM Rehan Akbani, Turgay Korkmaz, and G. V. S. Raju {rakbani@cs.utsa.edu, korkmaz@cs.utsa.edu, and gvs.raju@utsa.edu} University of Texas
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationIntroduction. Can we use Google for networking research?
Unconstrained Profiling of Internet Endpoints via Information on the Web ( Googling the Internet) Ionut Trestian1 Soups Ranjan2 Aleksandar Kuzmanovic1 Antonio Nucci2 1 Northwestern 2 Narus University Inc.
More informationAdaptive Learning of an Accurate Skin-Color Model
Adaptive Learning of an Accurate Skin-Color Model Q. Zhu K.T. Cheng C. T. Wu Y. L. Wu Electrical & Computer Engineering University of California, Santa Barbara Presented by: H.T Wang Outline Generic Skin
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationAn Implementation of Hierarchical Multi-Label Classification System User Manual. by Thanawut Ananpiriyakul Piyapan Poomsilivilai
An Implementation of Hierarchical Multi-Label Classification System User Manual by 5331028421 Thanawut Ananpiriyakul 5331039321 Piyapan Poomsilivilai Supervisor Dr. Peerapon Vateekul Department of Computer
More informationAiding the Detection of Fake Accounts in Large Scale Social Online Services
Aiding the Detection of Fake Accounts in Large Scale Social Online Services Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Tiago Pregueiro Cyprus Univ. of Technology Duke University Tuenti,
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationDetecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach
Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA
More informationAutomated Assessment of Security Risks for Mobile Applications
Automated Assessment of Security Risks for Mobile Applications Rahul Pandita, Xusheng Xiao, Wei Yang, William Enck, and Tao Xie Department of Computer Science North Carolina State University Lookout Mobile
More informationTorontoCity: Seeing the World with a Million Eyes
TorontoCity: Seeing the World with a Million Eyes Authors Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun * Project Completed
More informationCAP 6412 Advanced Computer Vision
CAP 6412 Advanced Computer Vision http://www.cs.ucf.edu/~bgong/cap6412.html Boqing Gong April 21st, 2016 Today Administrivia Free parameters in an approach, model, or algorithm? Egocentric videos by Aisha
More informationFeng Qian. Address: 323 Lindley Hall, 150 S Woodlawn Ave, Bloomington IN Homepage:
CONTACT Feng Qian fengqian@indiana.edu Address: 323 Lindley Hall, 150 S Woodlawn Ave, Bloomington IN 47405-7104 Homepage: http://fengqian.org WORK AND EDUCATION 01/2015 Present: Indiana University, Bloomington
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationMath Information Retrieval: User Requirements and Prototype Implementation. Jin Zhao, Min Yen Kan and Yin Leng Theng
Math Information Retrieval: User Requirements and Prototype Implementation Jin Zhao, Min Yen Kan and Yin Leng Theng Why Math Information Retrieval? Examples: Looking for formulas Collect teaching resources
More informationCategorization of Phishing Detection Features. And Using the Feature Vectors to Classify Phishing Websites. Bhuvana Namasivayam
Categorization of Phishing Detection Features And Using the Feature Vectors to Classify Phishing Websites by Bhuvana Namasivayam A Thesis Presented in Partial Fulfillment of the Requirements for the Degree
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More informationDetecting Malicious URLs. Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker. Presented by Gaspar Modelo-Howard September 29, 2010.
Detecting Malicious URLs Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker Presented by Gaspar Modelo-Howard September 29, 2010 Publications Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey
More informationNo Plan Survives Contact
No Plan Survives Contact Experience with Cybercrime Measurement Chris Kanich Neha Chachra Damon McCoy Chris Grier David Wang Marti Motoyama Kirill Levchenko Stefan Savage Geoffrey M. Voelker UC San Diego
More informationDetecting Malicious Web Links and Identifying Their Attack Types
Detecting Malicious Web Links and Identifying Their Attack Types Anti-Spam Team Cellopoint July 3, 2013 Introduction References A great effort has been directed towards detection of malicious URLs Blacklisting
More informationSEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL
SEMANTIC SEGMENTATION AVIRAM BAR HAIM & IRIS TAL IMAGE DESCRIPTIONS IN THE WILD (IDW-CNN) LARGE KERNEL MATTERS (GCN) DEEP LEARNING SEMINAR, TAU NOVEMBER 2017 TOPICS IDW-CNN: Improving Semantic Segmentation
More informationOpportunities and challenges in personalization of online hotel search
Opportunities and challenges in personalization of online hotel search David Zibriczky Data Science & Analytics Lead, User Profiling Introduction 2 Introduction About Mission: Helping the travelers to
More informationENTERPRISE ENDPOINT PROTECTION BUYER S GUIDE
ENTERPRISE ENDPOINT PROTECTION BUYER S GUIDE TABLE OF CONTENTS Overview...3 A Multi-Layer Approach to Endpoint Security...4 Known Attack Detection...5 Machine Learning...6 Behavioral Analysis...7 Exploit
More informationA Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2
A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2 1 Student, M.E., (Computer science and Engineering) in M.G University, India, 2 Associate Professor
More informationDeep Character-Level Click-Through Rate Prediction for Sponsored Search
Deep Character-Level Click-Through Rate Prediction for Sponsored Search Bora Edizel - Phd Student UPF Amin Mantrach - Criteo Research Xiao Bai - Oath This work was done at Yahoo and will be presented as
More informationWhat s in Your Dongle and Bank Account? Mandatory and Discretionary Protection of Android External Resources
What s in Your Dongle and Bank Account? Mandatory and Discretionary Protection of Android External Resources Soteris Demetriou, Xiaoyong Zhou, Muhammad Naveed, Yeonjoon Lee, Kan Yuan, XiaoFeng Wang, Carl
More informationResearch and implementation of search engine based on Lucene Wan Pu, Wang Lisha
2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha Physics Institute,
More informationSupporting Information
Supporting Information Ullman et al. 10.1073/pnas.1513198113 SI Methods Training Models on Full-Object Images. The human average MIRC recall was 0.81, and the sub-mirc recall was 0.10. The models average
More informationIn the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,
1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to
More informationFeng Qian. Address: 480 Digital Technology Center, 117 Pleasant St SE, Minneapolis MN Homepage:
CONTACT EMPLOYMENT EDUCATION Feng Qian fengqian@umn.edu Address: 480 Digital Technology Center, 117 Pleasant St SE, Minneapolis MN 55455 Homepage: http://fengqian.org 08/2018 Present: University of Minnesota
More informationIC-SDV 2018 Automatic Text categorization in the International Patent Classification
IC-SDV 2018 Automatic Text categorization in the International Patent Classification IPCCAT-Neural Patrick FIÉVET & Jacques GUYOT Nice April 23, 2018 2 IPCCAT-neural : automatic text categorization in
More informationOpening the Black Box Data Driven Visualizaion of Neural N
Opening the Black Box Data Driven Visualizaion of Neural Networks September 20, 2006 Aritificial Neural Networks Limitations of ANNs Use of Visualization (ANNs) mimic the processes found in biological
More informationOn Mobile Malware Infections N. Asokan
On Mobile Malware Infections N. Asokan (joint work with Hien Thi Thu Truong, Eemil Lagerspetz, Petteri Nurmi, Adam J. Oliner, Sasu Tarkoma, Sourav Bhattacharya) Mobile malware alarm bells Google Search
More informationService-Centric Networking for the Developing World
GAIA workshop Service-Centric Networking for the Developing World Arjuna Sathiaseelan, Liang Wang, Andrius Aucinas, Gareth Tyson*, Jon Crowcroft N4D Lab liang.wang@cl.cam.ac.uk Cambridge University, UK
More informationLINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION
LINK GRAPH ANALYSIS FOR ADULT IMAGES CLASSIFICATION Evgeny Kharitonov *, ***, Anton Slesarev *, ***, Ilya Muchnik **, ***, Fedor Romanenko ***, Dmitry Belyaev ***, Dmitry Kotlyarov *** * Moscow Institute
More informationdeseo: Combating Search-Result Poisoning Yu USF
deseo: Combating Search-Result Poisoning Yu Jin @MSCS USF Your Google is not SAFE! SEO Poisoning - A new way to spread malware! Why choose SE? 22.4% of Google searches in the top 100 results > 50% for
More informationAdversarial Machine Learning An Introduction. With slides from: Binghui Wang
Adversarial Machine Learning An Introduction With slides from: Binghui Wang Outline Machine Learning (ML) Adversarial ML Attack Taxonomy Capability Adversarial Training Conclusion Outline Machine Learning
More informationHebei University of Technology A Text-Mining-based Patent Analysis in Product Innovative Process
A Text-Mining-based Patent Analysis in Product Innovative Process Liang Yanhong, Tan Runhua Abstract Hebei University of Technology Patent documents contain important technical knowledge and research results.
More informationUsing Network Traffic to Remotely Identify the Type of Applications Executing on Mobile Devices. Lanier Watkins, PhD
Using Network Traffic to Remotely Identify the Type of Applications Executing on Mobile Devices Lanier Watkins, PhD LanierWatkins@gmail.com Outline Introduction Contributions and Assumptions Related Work
More informationDocumentation for: MTA developers
This document contains implementation guidelines for developers of MTA products/appliances willing to use Spamhaus products to block as much spam as possible. No reference is made to specific products.
More informationDynamic Feature Selection for Dependency Parsing
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana
More informationExtraction of Web Image Information: Semantic or Visual Cues?
Extraction of Web Image Information: Semantic or Visual Cues? Georgina Tryfou and Nicolas Tsapatsoulis Cyprus University of Technology, Department of Communication and Internet Studies, Limassol, Cyprus
More informationMALICIOUS URL DETECTION AND PREVENTION AT BROWSER LEVEL FRAMEWORK
International Journal of Mechanical Engineering and Technology (IJMET) Volume 8, Issue 12, December 2017, pp. 536 541, Article ID: IJMET_08_12_054 Available online at http://www.iaeme.com/ijmet/issues.asp?jtype=ijmet&vtype=8&itype=12
More informationInstructor: Dr. Mehmet Aktaş. Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University
Instructor: Dr. Mehmet Aktaş Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
More informationMEASURING AND FINGERPRINTING CLICK-SPAM IN AD NETWORKS
MEASURING AND FINGERPRINTING CLICK-SPAM IN AD NETWORKS Vacha Dave *, Saikat Guha and Yin Zhang * * The University of Texas at Austin Microsoft Research India Internet Advertising Today 2 Online advertising
More informationWHAT S NEW WITH OBSERVEIT: INSIDER THREAT MANAGEMENT VERSION 6.5
WHAT S NEW WITH OBSERVEIT: INSIDER THREAT MANAGEMENT VERSION 6.5 ObserveIT s award-winning insider threat management software combines user monitoring, behavioral analytics, and now policy enforcement
More informationApproach Research of Keyword Extraction Based on Web Pages Document
2017 3rd International Conference on Electronic Information Technology and Intellectualization (ICEITI 2017) ISBN: 978-1-60595-512-4 Approach Research Keyword Extraction Based on Web Pages Document Yangxin
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 6: Similar Item Detection Jimmy Lin University of Maryland Thursday, February 28, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationFacebook Immune System 人人安全中心姚海阔
Facebook Immune System 人人安全中心姚海阔 Immune A realtime system to protect our users and the social graph Big data, Real time 25B checks per day 650K per second at peak Realtime checks and classifications on
More informationPredicting ground-level scene Layout from Aerial imagery. Muhammad Hasan Maqbool
Predicting ground-level scene Layout from Aerial imagery Muhammad Hasan Maqbool Objective Given the overhead image predict its ground level semantic segmentation Predicted ground level labeling Overhead/Aerial
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationExtracting Rankings for Spatial Keyword Queries from GPS Data
Extracting Rankings for Spatial Keyword Queries from GPS Data Ilkcan Keles Christian S. Jensen Simonas Saltenis Aalborg University Outline Introduction Motivation Problem Definition Proposed Method Overview
More informationIT1105 Information Systems and Technology. BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing. Student Manual
IT1105 Information Systems and Technology BIT 1 ST YEAR SEMESTER 1 University of Colombo School of Computing Student Manual Lesson 3: Organizing Data and Information (6 Hrs) Instructional Objectives Students
More informationDistress Image Library for Precision and Bias of Fully Automated Pavement Cracking Survey
Distress Image Library for Precision and Bias of Fully Automated Pavement Cracking Survey Kelvin C.P. Wang, Ran Ji, and Cheng Chen kelvin.wang@okstate.edu Oklahoma State University/WayLink School of Civil
More informationThree-Dimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients
ThreeDimensional Object Detection and Layout Prediction using Clouds of Oriented Gradients Authors: Zhile Ren, Erik B. Sudderth Presented by: Shannon Kao, Max Wang October 19, 2016 Introduction Given an
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationCombining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating
Combining Review Text Content and Reviewer-Item Rating Matrix to Predict Review Rating Dipak J Kakade, Nilesh P Sable Department of Computer Engineering, JSPM S Imperial College of Engg. And Research,
More informationTest Automation. Fundamentals. Mikó Szilárd
Test Automation Fundamentals Mikó Szilárd 2016 EPAM 2 Blue-chip clients rely on EPAM 3 SCHEDULE 9.12 Intro 9.19 Unit testing 1 9.26 Unit testing 2 10.03 Continuous integration 1 10.10 Continuous integration
More informationDetecting and Quantifying Abusive IPv6 SMTP!
Detecting and Quantifying Abusive IPv6 SMTP Casey Deccio Verisign Labs Internet2 2014 Technical Exchange October 30, 2014 Spam, IPv4 Reputation and DNSBL Spam is pervasive Annoying (pharmaceuticals) Dangerous
More informationNatural Language Processing as Key Component to Successful Information Products
Natural Language Processing as Key Component to Successful Information Products Yves Schabes Teragram Corporation Tera monster in Greek 2 40 (=1,099,511,627,766) 10 12 (one trillion) gram something written
More informationArtificial Intelligence. Programming Styles
Artificial Intelligence Intro to Machine Learning Programming Styles Standard CS: Explicitly program computer to do something Early AI: Derive a problem description (state) and use general algorithms to
More information