Mining Spammers in Social Media: Techniques and Applications

Size: px
Start display at page:

Download "Mining Spammers in Social Media: Techniques and Applications"

Transcription

1 Mining Spammers in Social Media: Techniques and Applications Tutorial at the 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining Data Mining and Machine Learning Lab 1

2 Social Media Data Mining and Machine Learning Lab 2

3 Social Spamming With the growing availability of social media services, social spamming has become rampant. Social spammers are employed to unfairly overwhelm normal users. Data Mining and Machine Learning Lab 3

4 A New Type of Spammers on Social Media Social Spammers send out unwanted spam content appearing on social networks and any website with user-generated content to targeted users, often corroborating to boost their social influence, legitimacy, credibility Spam content can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information -- Wikipedia Data Mining and Machine Learning Lab 4

5 Examples from Twitter Spam describes a variety of prohibited behaviors that violate the Twitter Rules. -- Twitter Content Information Network Information Here are some common tactics that spam accounts often use: Posting harmful links (including links to phishing or malware sites) Aggressive following behavior (mass following and mass unfollowing for attention) Abusing function to post unwanted messages to users Creating multiple accounts (either manually or using automated tools) Posting repeatedly to trending topics to try to grab attention Repeatedly posting duplicate updates Posting links with unrelated tweets Data Mining and Machine Learning Lab 5

6 Spamming on Facebook An Example Large spammer population on Social Media 83 million (8.7%) on Facebook [Facebook] Over 27% of the top 10 Twitter accounts followers are fake Spammers are used to: Share or Send Spam Theft of user s personal information Fake like and click fraud Malicious URL 50 likes per dollar Survey for prizes Data Mining and Machine Learning Lab 6

7 Spamming on Twitter An Example Followers Large spammer population on Social Media Over 27% of the top 10 Twitter accounts followers are fake Political AstroTurf 900K 4,000 new 800K followers/day 700K Jul-4 Jul-8 July 21st 100,000 new followers in 1 day Jul-12 Jul-16 Jul-20 Jul-24 Jul-28 Aug-1 Data Mining and Machine Learning Lab 7

8 Characteristics of Social Spammers Content information: Short text Unconventional use of language Adaptive to specific events Twitter spam bot replies to offer prizes related to events such as NFL or Miley Cyrus Data Mining and Machine Learning Lab 8

9 Characteristics of Social Spammers Why is the detection of social spammers so hard? It is easy to establish an arbitrarily large number of social trust relations via Twitter follower markets [Stringhini et al. 2013] Data Mining and Machine Learning Lab 9

10 Characteristics of Social Spammers Social network information: Collaborative link farming widely exists on Twitter: spammers try to infiltrate the Twitter network by building social relationships with normal users and spammers themselves [Ghosh et al. 2012] In social media, many users simply follow back when they are followed by someone for the sake of courtesy -- reflexive reciprocity [Hu et al. 2013] Data Mining and Machine Learning Lab 10

11 Combating Social Spammers for Users In a world without social spammers, from users perspective, Information on social media services will be easier to access, more conducive and rewarding Social media will be less prone to cyber-attacks when acquiring useful information, and more trustworthy Data Mining and Machine Learning Lab 11

12 Combating Social Spammers for Companies Spam can inflict damages to companies: Spammers on social media can smear a brand and turn fans and followers into doubters When advertisements of products from a company are mixed with spam information, it can have a profoundly negative impact on your social media marketing return on investment ROI Data Mining and Machine Learning Lab 12

13 How to study the problem? A Typical Framework of Spammer Detection [Lee et al. 2010] Mining Social Spammers Data Collection Mining Applications Data Mining and Machine Learning Lab 13

14 Outline Mining Social Spammers Data Collection Mining Applications Crawling and Identification Crowdsourcing Social Honeypotbased Approach Active Learning Network-based Methods Content-based Methods Methods with Hybrid Information Online Learning Cross-media Learning Data Mining and Machine Learning Lab 14

15 Crawling and Identification Twitter accounts crawling and Identification: An alternative approach [Hu et al. 2013, Thomoas et al. 2011] Step 1: Crawl a Twitter dataset from a period (July 2012 to September 2012) via Twitter s streaming API Step 2: Query via Twitter s API to identify accounts that no longer have records, either due to deletion or suspension, followed by a request to access each missing account s Twitter profile via the web to identify requests that redirect to Data Mining and Machine Learning Lab 15

16 Crowdsourcing Two groups of users are compared to assess their effectiveness [Wang et al. 2013] Experts CS professors and graduate students Turkers Crowdworkers from online crowdsourcing systems Data Mining and Machine Learning Lab 16

17 Real or fake? Why? Navigation Buttons Classifying Profiles Browsing Profiles Screenshot of Profile (Links Cannot be Clicked) Data Mining and Machine Learning Lab 17

18 System Architecture [Wang et al. 2013] Maximize Utility of Crowdsourcing Layer High Accurate Turkers Rejected! OSN Employees Very Accurate Turkers Turker Selection Accurate Turkers Sybils All Turkers Social Network Continuous Quality Control Locate Malicious Workers Heuristics User Reports Flag Suspicious Users Suspicious Profiles Data Mining and Machine Learning Lab 18 18

19 Crowdsourcing for Data Collection A crowdsourcing spammer detection system [Wang et al. 2013] False positives and negatives <1% Resistant to infiltration by malicious workers Low cost Data Mining and Machine Learning Lab 19

20 Social Honeypot-based Approach Two ways of collecting spammer evidence: Human experts User report spammers Data Mining and Machine Learning Lab 20

21 Social Honeypot-based Approach Create and deploy social honeypots in SNS [Lee et al. 2010, 2011] Data Mining and Machine Learning Lab 21

22 Social Honeypot-based Approach Create and deploy social honeypots in social networks [Lee et al. 2010, 2011] Data Mining and Machine Learning Lab 22

23 Social Honeypot-based Approach Create and deploy social honeypots in social networks [Lee et al. 2011] 60 social honeypots are deployed 36,000 content polluters for seven months Some advantages of using social honeypots: Automatically collecting evidence of spammers No interference or intrusion on the activities of normal users Robustness of ongoing spammer identification and filtering Data Mining and Machine Learning Lab 23

24 Who are the social spammers? How to effectively collect labeled data? Data Mining and Machine Learning Lab 24

25 Active Learning Traditional Data Data Mining and Machine Learning Lab 25

26 Representativeness Active Learning Representative Instances Data Mining and Machine Learning Lab 26

27 Informativeness Active Learning Informative Instances Data Mining and Machine Learning Lab 27

28 Challenges Networked Data How do we select the instances by taking advantage of relation information? Data Mining and Machine Learning Lab 28

29 Selection Strategies for Networked Data Strategy 1: Global Selection The globally important nodes in the network are selected Data Mining and Machine Learning Lab 29

30 Selection Strategies for Networked Data Strategy 2: Local Selection The important nodes from different communities are selected Data Mining and Machine Learning Lab 30

31 ActNeT Framework ActNeT Framework: (1) relation (A) modeling from source data S; (2) text content modeling; (3) selection based on relations Data Mining and Machine Learning Lab 31

32 Outline Mining Social Spammers Data Collection Mining Applications Crawling and Identification Crowdsourcing Social Honeypotbased Approach Active Learning Network-based Methods Content-based Methods Methods with Hybrid Information Online Learning Cross-media Learning Data Mining and Machine Learning Lab 32

33 Network-based Methods A traditional assumption is that spammers cannot be influential in a social network Data Mining and Machine Learning Lab 33

34 Network-based Methods Q1: How to measure influence in a social network? Q2: Are spammers less influential in social networks? Q3: What are the following patterns of spammers, normal users and influential users in the social network? Data Mining and Machine Learning Lab 34

35 How to Measure Influence of Individuals? Centrality is widely used for influence measurements on social networks Important or prominent actors are those that are linked or involved with other actors extensively A person with extensive contacts (links) or communications with many other people in the organization is considered more important than a person with relatively fewer contacts Links are also called ties. A central actor is the one having many ties Data Mining and Machine Learning Lab 35

36 Degree Centrality The degree centrality measure ranks nodes with more connections higher in terms of centrality d i is the degree (number of adjacent edges) for vertex v i In this graph degree centrality for vertex v 1 is d 1 = 8 and for all others is d j = 1, j 1 Data Mining and Machine Learning Lab 36

37 PageRank The centrality I derive from my network neighbors is proportional to their centrality divided by their outdegree x i j A ij k x j out j D diag ( d, d2,..., d 1 n ) X 1 AD X 1 Data Mining and Machine Learning Lab 37

38 Influence of Spammer Communities Another assumption is that spammers can form tight-knit communities [Danezis et al. 2009, Yu et al. 2008] Normal Users Spammers How to find communities? How to measure influence of a community? Data Mining and Machine Learning Lab 38

39 How to Find Communities? Network-centric criterion needs to consider the connections within a network globally Goal: partition nodes of a network into disjoint sets Approaches: Clustering based on vertex similarity Latent space models Block model approximation Spectral clustering Modularity maximization Tang et al. Community Detection and Mining in Social Media, Morgan & Claypool Publishers, Data Mining and Machine Learning Lab 39 39

40 Clustering based on Vertex Similarity Apply k-means or similarity-based clustering Vertex similarity is defined in terms of the similarity of their neighborhood Structural equivalence: two nodes are structurally equivalent iff they are connecting to the same set of actors Nodes 1 and 3 are structurally equivalent; So are nodes 5 and 7. Structural equivalence is too restrict for practical use Data Mining and Machine Learning Lab 40

41 Vertex Similarity Jaccard Similarity Cosine similarity Data Mining and Machine Learning Lab 41

42 Latent Space Models Map nodes into a low-dimensional space such that the proximity between nodes based on network connectivity is preserved in the new space, then apply k-means clustering Multi-dimensional scaling (MDS) Given a network, construct a proximity matrix P representing the pairwise distance between nodes (e.g., geodesic distance) Let S R n l denote the coordinates of nodes in the low-dimensional space Objective function: Solution: V is the top eigenvectors of, and is a diagonal matrix of top eigenvalues Data Mining and Machine Learning Lab 42

43 How to Measure Influence of a Community? All centrality measures defined so far measure centrality for a single node. These measures can be generalized for a group of nodes A simple approach is to replace all nodes in a group with a super node The group structure is disregarded Let S denote the set of nodes in the group and V-S the set of outsiders Data Mining and Machine Learning Lab 43

44 Group Centrality Group Degree Centrality We can normalize it by dividing it by V-S Example: consider S={v2,v3} Group degree centrality=3 Data Mining and Machine Learning Lab 44

45 Are Spammers Less Influential? Social media services have become a target for link farming, where users try to acquire large numbers of follower links [Ghosh et al. 2012] Link farming in Web Websites exchange reciprocal links with other sites to improve ranking by search engines Link farming on social media Spammers follow other users and attempt to get them to follow back Data Mining and Machine Learning Lab 45

46 Link Farming by Spammers Spammers farm links at large scale [Ghosh et al. 2012] Over 15 million users (27% of total) targeted by 41,352 spammers (0.08% of total) 1.3 million spam-followers 82% are targeted spammers get most links by reciprocation Data Mining and Machine Learning Lab 46

47 Influential Social Spammers Spammers get more followers than an average Twitter user Some spammers acquire very high Pagerank scores 304 within top 100,000 (0.18% of all users) Social Spammers are not necessarily less influential Data Mining and Machine Learning Lab 47

48 Edges To Normal Users Spammer Communities? Key assumption: Spammers form tight-knit communities Edges Between Sybils Spammers don t necessarily form communities on social media Data Mining and Machine Learning Lab 48

49 What are the Following Patterns? Reflexive Reciprocity widely exists on social media: many users simply follow back when they are followed by someone for the sake of courtesy Who are the spam-followers? Who are the top link-farmers? Before answering the two questions, we first present a brief introduction on reciprocity Data Mining and Machine Learning Lab 49

50 Reciprocity on Social Networks In directed networks, the frequency of loops of length two is measured by Reciprocity It tells that how likely it is that a vertex that you point to also points back at you Directed edges between i and j are Reciprocated iff: (i, j) (j, i) Data Mining and Machine Learning Lab 50

51 Reciprocity on Social Networks Reciprocity r is the fraction of edges that are Reciprocated r 1 m ij A ij A ji 1 m Tr A 2 A ij = 1 and A ji = 1 iff there is an edge between i and j and also between j and i m is the total number of edges Data Mining and Machine Learning Lab 51

52 Reflexive Reciprocity 72.4% of the twitterers follow more than 80% of their followers [Weng et al. 2010] 80.5% of the twitterers have 80% of their friends follow them back Data Mining and Machine Learning Lab 52

53 Farming Links on Twitter A Twitter account is created, and followed some of the top targeted spam-followers Followed 500 randomly selected users out of the top 100K spam-followers Within 3 days, 65 reciprocated by following back The account ranked within the top 9% of all users in Twitter in 3 days Existence of a set of users from whom social links (hence social influence) can be farmed easily Data Mining and Machine Learning Lab 53

54 Who are the Spam-Followers? Non-targeted spam-followers Mostly spammers / hired helps of spammers Most have now been suspended by Twitter Targeted spam-followers Ranked on the basis of number of links to spammers 60% of follow-links acquired by spammers come from the top 100,000 targeted followers Top spam-followers tend to reciprocate almost all links established to them by spammers Data Mining and Machine Learning Lab 54

55 Who are the Top Link-Farmers? Not spammers themselves 76% not suspended by Twitter in the last two years 235 verified by Twitter to be real, well-known users Have much higher indegree as well as outdegree compared to spammers Most of their tweets contain valid URLs Data Mining and Machine Learning Lab 55

56 Who are the Top Link-Farmers? Highly influential users Rank within top 5% according to Pagerank, follower-rank, retweet-rank Mostly social marketers, entrepreneurs, Want to promote some online business / website Heavily interconnect with each other density of subgraph is (for whole graph: 10-7 ) Aim: to acquire social capital Data Mining and Machine Learning Lab 56 56

57 Combating the Link-farmers Not practical for Twitter to suspend / blacklist top link-farmers Solutions [Ghosh et al. 2012] Strategy to disincentivize users from following / reciprocating to unknown people Penalize users for following spammers Algorithm that is inverse of Pagerank Negatively bias a small set of known spammers Propagate negative scores from spammers to spamfollowers Data Mining and Machine Learning Lab 57

58 Collusionrank A user is penalized for following spammers, but not for being followed by spammers Data Mining and Machine Learning Lab 58

59 Pagerank + Collusionrank Computed Collusionrank considering 600 known spammers Rank users by Pagerank + Collusionrank Effectively filters out spammers and link-farmers (top spam-followers) from top ranks Data Mining and Machine Learning Lab 59

60 Pagerank + Collusionrank Selectively penalizes spammers & link-farmers Out of top 100K according to Pagerank, 20K demoted heavily, rest 80% not affected much (inset) The heavily demoted 20K follow many more spammers than the rest (main figure) Data Mining and Machine Learning Lab 60

61 Content-based Methods Q1: What types of features can we use? Q2: How to model content information? Data Mining and Machine Learning Lab 61

62 Feature Engineering Features used to detect Foursquare spammers and their χ 2 Rankings [Aggarwal et al. 2013] Data Mining and Machine Learning Lab 62

63 Feature Engineering Features can be grouped into four classes having as scope the message, user, topic, and propagation respectively [Castillo et al. 2011] Data Mining and Machine Learning Lab 63

64 Feature Engineering Features can be grouped into four classes having as scope the message, user, topic, and propagation respectively [Castillo et al. 2011] Data Mining and Machine Learning Lab 64

65 Modeling Content Information Supervised learning methods such as Least Squares are widely used for modeling content information X W Y Data Mining and Machine Learning Lab 65

66 Sparse Learning Sparse learning has been introduced to tackle the curse of dimensionality X W Y Data Mining and Machine Learning Lab 66

67 Matrix Factorization Another effective way of tackling the curse of dimensionality is matrix factorization X U V Data Mining and Machine Learning Lab 67

68 Decision Tree Decision tree can be used for classification and feature analysis, which is effective in understanding spamming purposes Data Mining and Machine Learning Lab 68

69 Two Classification Strategies Flat classification promoters (P), spammers (S), and legitimate users (L) Hierarchical strategy first separate promoters (P) from non-promoters (NP) heavy (HP) and light promoters (LP) legitimate users (L) and spammers (S) Flat Classification Hierarchical Classification Data Mining and Machine Learning Lab 69

70 Methods with Hybrid Information How to collectively make use of content and relations for social spammer detection? Data Mining and Machine Learning Lab 70

71 Modeling Social Networks Four types of following relations on social networks: [spammer, spammer], [normal, normal], [normal, spammer], [spammer, normal] Directed Graph Laplacian: Data Mining and Machine Learning Lab 71

72 Social Spammer Detection Objective function of the proposed formulation with network and content information: Data Mining and Machine Learning Lab 72

73 Dataset for Study Crawled a Twitter dataset from July 2012 to September 2012 via the Twitter Search API The users that were suspended by Twitter during this period are considered as the gold standard of spammers in the experiment. Data Mining and Machine Learning Lab 73

74 Social Spammer Detection Results Different sizes of training data Comparison with possible solutions Precision, recall and F 1 -measure are used as metrics Data Mining and Machine Learning Lab 74

75 MFSR with Supervised Information Label Informed matrix factorization with social relations (MFSR) Network Information Data Mining and Machine Learning Lab 75

76 Outline Mining Social Spammers Data Collection Mining Applications Crawling and Identification Crowdsourcing Social Honeypotbased Approach Active Learning Network-based Methods Content-based Methods Methods with Hybrid Information Online Learning Cross-media Learning Data Mining and Machine Learning Lab 76

77 Spammers Evolve Fast Behaviors that constitute spamming will continue to evolve as we respond to new tactics by spammers. - - Twitter Social spammers show dynamic content patterns in social media Data Mining and Machine Learning Lab 77

78 Online Learning Existing systems rely on building a new model to capture newly emerging content-based and networkbased patterns of social spammers Given the rapidly evolving nature, it is necessary to have a framework that efficiently reflects the effect of newly emerging data in social spammer detection How do we update the built model to efficiently incorporate newly emerging data objects? Data Mining and Machine Learning Lab 78

79 Problem Statement Data Mining and Machine Learning Lab 79

80 Learning the Basic Model Data Mining and Machine Learning Lab 80

81 Learning a New Model Data Mining and Machine Learning Lab 81

82 Reformulated Objective Function Data Mining and Machine Learning Lab 82

83 Experiments How effective is the proposed framework compared with other methods of social spammer detection? How efficient is the proposed online learning framework compared with other methods for modeling? Data Mining and Machine Learning Lab 83

84 Social Spammer Detection Results Data Mining and Machine Learning Lab 84

85 Social Spammer Detection Results Data Mining and Machine Learning Lab 85

86 Cross-Media Learning A straightforward way to perform content-based spammer detection is to model this task as a supervised learning problem While the problem of social spamming is relatively new, it has been extensively studied for years in other platforms, e.g., communication, SMS and the web Data Mining and Machine Learning Lab 86

87 Cross-Media Learning Are the resources from other media potentially helpful for spammer detection in microblogging? How do we explicitly model and make use of the resources from other media for spammer detection? Is the knowledge learned from other media helpful for microblogging spammer detection? Data Mining and Machine Learning Lab 87

88 Lexical Analysis Are the resources from other media potentially helpful for spammer detection in microblogging? Microblogging data is not significantly different from the datasets in other media Data Mining and Machine Learning Lab 88

89 Modeling Knowledge across Media Data Mining and Machine Learning Lab 89

90 Social Spammer Detection Results Data Mining and Machine Learning Lab 90

91 Open Research Issues Mining Social Spammers Data Collection Mining Applications Data Mining and Machine Learning Lab 91

92 Research Issues in Data Collection Quality Datasets are needed: Large-scale Accurate Up-to-date Labeling Issues: Active learning Crowdsourcing Data Mining and Machine Learning Lab 92

93 Research Issues in Mining Social Spammers Mining Social Networks Sophisticated centrality-based measures Community detection Mining Content Information Feature engineering Machine Learning Methods Sparse learning Online learning Multi-source learning Data Mining and Machine Learning Lab 93

94 Potential Applications Social Sciences Understanding purposes of social spammers in different events, e.g., natural disasters Understanding spamming behavior of normal users on social media Geographical and temporal patterns of social spammers Data Mining and Machine Learning Lab 94

95 Acknowledgments Members of the Data Mining and Machine Learning Lab at ASU The Office of Naval Research, Army Research Office Everyone attending our tutorial Data Mining and Machine Learning Lab 95

96 References Aggarwal et al. Detection of spam tipping behaviour on foursquare, In WWW Companion, Castillo et al. Information credibility on twitter, In WWW 2011 Danezis, George, and Prateek Mittal. "SybilInfer: Detecting Sybil Nodes using Social Networks." NDSS Ghosh, Saptarshi, et al. "Understanding and combating link farming in the twitter social network." In WWW Grier, Chris, et al. spam: the underground on 140 characters or less. In CCS Hu et al. "Social spammer detection in microblogging." In IJCAI, Hu et al. "Leveraging Knowledge across Media for Spammer Detection in Microblogging." In SIGIR, Hu et al. Online social spammer detection" In AAAI, Hu et al. "ActNeT: Active Learning for Networked Texts in Microblogging." In SDM, Lee et al. Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter. In ICWSM 2011 Data Mining and Machine Learning Lab 96

97 References Lee et al. Uncovering social spammers: social honeypots+ machine learning. In SIGIR 2010 Stringhini et al. "Follow the green: growth and dynamics in twitter follower markets." In IMC, Thomas et al. "Suspended accounts in retrospect: an analysis of twitter spam. In IMC, Tan, Enhua, et al. "UNIK: unsupervised social network spam detection. In CIKM Tang et al. Community Detection and Mining in Social Media, Morgan & Claypool Publishers, Viswanath et al. An analysis of social network-based sybil defenses[j]. In ACM SIGCOMM Computer Communication Review, 2011, 41(4): Wang et al. "Social Turing Tests: Crowdsourcing Sybil Detection" In NDSS, 2013 Weng, Jianshu, et al. "Twitterrank: finding topic-sensitive influential twitterers."proceedings of the third ACM international conference on Web search and data mining. ACM, Data Mining and Machine Learning Lab 97

98 References Yang, Zhi, et al. "Uncovering social network sybils in the wild." In IMC, Yu, Haifeng, et al. "Sybillimit: A near-optimal social network defense against sybil attacks." In SP Zafarani R, Abbasi M A, Liu H. Social Media Mining: An Introduction[M]. Cambridge University Press, Zhu et al. Discovering Spammers in Social Networks. In AAAI Data Mining and Machine Learning Lab 98

Link Farming in Twitter

Link Farming in Twitter Link Farming in Twitter Pawan Goyal CSE, IITKGP Nov 11, 2016 Pawan Goyal (IIT Kharagpur) Link Farming in Twitter Nov 11, 2016 1 / 1 Reference Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar

More information

Network Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017

Network Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017 Network Centrality Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017 Node centrality n Relative importance of a node in a network n How influential a person is within a

More information

Fraud Detection of Mobile Apps

Fraud Detection of Mobile Apps Fraud Detection of Mobile Apps Urmila Aware*, Prof. Amruta Deshmuk** *(Student, Dept of Computer Engineering, Flora Institute Of Technology Pune, Maharashtra, India **( Assistant Professor, Dept of Computer

More information

Aiding the Detection of Fake Accounts in Large Scale Social Online Services

Aiding the Detection of Fake Accounts in Large Scale Social Online Services Aiding the Detection of Fake Accounts in Large Scale Social Online Services Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Tiago Pregueiro Cyprus Univ. of Technology Duke University Tuenti,

More information

Method to Study and Analyze Fraud Ranking In Mobile Apps

Method to Study and Analyze Fraud Ranking In Mobile Apps Method to Study and Analyze Fraud Ranking In Mobile Apps Ms. Priyanka R. Patil M.Tech student Marri Laxman Reddy Institute of Technology & Management Hyderabad. Abstract: Ranking fraud in the mobile App

More information

Community Detection. Community

Community Detection. Community Community Detection Community In social sciences: Community is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group,

More information

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach

Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach Alex Hai Wang College of Information Sciences and Technology, The Pennsylvania State University, Dunmore, PA 18512, USA

More information

Extracting Information from Complex Networks

Extracting Information from Complex Networks Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform

More information

Identification of Sybil Communities Generating Context-Aware Spam on Online Social Networks

Identification of Sybil Communities Generating Context-Aware Spam on Online Social Networks Final version of the accepted paper. Cite as: "F. Ahmad and M. Abulaish, Identification of Sybil Communities Generating Context-Aware Spam on Online Social Networks, In Proceedings of the 15th Asia-Pacific

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Clusters and Communities

Clusters and Communities Clusters and Communities Lecture 7 CSCI 4974/6971 22 Sep 2016 1 / 14 Today s Biz 1. Reminders 2. Review 3. Communities 4. Betweenness and Graph Partitioning 5. Label Propagation 2 / 14 Today s Biz 1. Reminders

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

MAHENDRA EKNATH PAWAR, PROF. B.W.BALKHANDE

MAHENDRA EKNATH PAWAR, PROF. B.W.BALKHANDE International Journal of Advances in Applied Science and Engineering (IJAEAS) ISSN (P): 2348-1811; ISSN (E): 2348-182X Vol-1, Iss.-4, SEPTEMBER 2014, 136-141 IIST DETERMINING AND BLOCKING OF SYBIL USERS

More information

Detect Spammers in Online Social Networks

Detect Spammers in Online Social Networks University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 215 Detect Spammers in Online Social Networks Yi Zhang Follow this and additional works at: https://scholar.uwindsor.ca/etd

More information

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science

Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science Technical Brief: Domain Risk Score Proactively uncover threats using DNS and data science 310 Million + Current Domain Names 11 Billion+ Historical Domain Profiles 5 Million+ New Domain Profiles Daily

More information

An MCL-Based Approach for Spam Profile Detection in Online Social Networks

An MCL-Based Approach for Spam Profile Detection in Online Social Networks 12 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications An MCL-Based Approach for Profile Detection in Online Social Networks Faraz Ahmed Center of Excellence

More information

ENWalk: Learning Network Features for Spam Detection in Twitter

ENWalk: Learning Network Features for Spam Detection in Twitter ENWalk: Learning Network Features for Spam Detection in Twitter Santosh K C 1, Suman Kalyan Maity 2, Arjun Mukherjee 1 1 University of Houston, 2 IIT Kharagpur skc@uh.edu, sumankalyan.maity@cse.iitkgp.ernet.in,

More information

Sampling Large Graphs for Anticipatory Analysis

Sampling Large Graphs for Anticipatory Analysis Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015

More information

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul

CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul 1 CS224W Project Write-up Static Crawling on Social Graph Chantat Eksombatchai Norases Vesdapunt Phumchanit Watanaprakornkul Introduction Our problem is crawling a static social graph (snapshot). Given

More information

DETECTING VIDEO SPAMMERS IN YOUTUBE SOCIAL MEDIA

DETECTING VIDEO SPAMMERS IN YOUTUBE SOCIAL MEDIA How to cite this paper: Yuhanis Yusof & Omar Hadeb Sadoon. (2017). Detecting video spammers in youtube social media in Zulikha, J. & N. H. Zakaria (Eds.), Proceedings of the 6th International Conference

More information

Robust Spammer Detection in Microblogs: Leveraging User Carefulness

Robust Spammer Detection in Microblogs: Leveraging User Carefulness Robust Spammer Detection in Microblogs: Leveraging User Carefulness HAO FU, University of Science and Technology of China XING XIE and YONG RUI, Microsoft Research NEIL ZHENQIANG GONG, Iowa State University

More information

Spectral Methods for Network Community Detection and Graph Partitioning

Spectral Methods for Network Community Detection and Graph Partitioning Spectral Methods for Network Community Detection and Graph Partitioning M. E. J. Newman Department of Physics, University of Michigan Presenters: Yunqi Guo Xueyin Yu Yuanqi Li 1 Outline: Community Detection

More information

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures

Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Finding the Linchpins of the Dark Web: A Study on Topologically Dedicated Hosts on Malicious Web Infrastructures Zhou Li, Indiana University Bloomington Sumayah Alrwais, Indiana University Bloomington

More information

Fighting Spam, Phishing and Malware With Recurrent Pattern Detection

Fighting Spam, Phishing and Malware With Recurrent Pattern Detection Fighting Spam, Phishing and Malware With Recurrent Pattern Detection White Paper September 2017 www.cyren.com 1 White Paper September 2017 Fighting Spam, Phishing and Malware With Recurrent Pattern Detection

More information

Twi$er s Trending Topics exploita4on pa$erns

Twi$er s Trending Topics exploita4on pa$erns Twi$er s Trending Topics exploita4on pa$erns Despoina Antonakaki Paraskevi Fragopoulou, So6ris Ioannidis isocial Mee6ng, February 4-5th, 2014 Online Users World popula6ons percentage of online users: 39%

More information

Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog

Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog Analysis and Identification of Spamming Behaviors in Sina Weibo Microblog Chengfeng Lin alex_lin@sjtu.edu.cn Yi Zhou zy_21th@sjtu.edu.cn Kai Chen kchen@sjtu.edu.cn Jianhua He Aston University j.he7@aston.ac.uk

More information

Slides based on those in:

Slides based on those in: Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]

More information

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 2, Issue 2, February 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Paper / Case Study Available online at:

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.

More information

Web Structure Mining Community Detection and Evaluation

Web Structure Mining Community Detection and Evaluation Web Structure Mining Community Detection and Evaluation 1 Community Community. It is formed by individuals such that those within a group interact with each other more frequently than with those outside

More information

Part I: Data Mining Foundations

Part I: Data Mining Foundations Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?

More information

Mining Social Network Graphs

Mining Social Network Graphs Mining Social Network Graphs Analysis of Large Graphs: Community Detection Rafael Ferreira da Silva rafsilva@isi.edu http://rafaelsilva.com Note to other teachers and users of these slides: We would be

More information

Machine-Powered Learning for People-Centered Security

Machine-Powered Learning for People-Centered Security White paper Machine-Powered Learning for People-Centered Security Protecting Email with the Proofpoint Stateful Composite Scoring Service www.proofpoint.com INTRODUCTION: OUTGUNNED AND OVERWHELMED Today

More information

Consequences of Compromise: Characterizing Account Hijacking on Twitter

Consequences of Compromise: Characterizing Account Hijacking on Twitter Consequences of Compromise: Characterizing Account Hijacking on Twitter Frank Li UC Berkeley With: Kurt Thomas (UCB Google), Chris Grier (UCB/ICSI Databricks), Vern Paxson (UCB/ICSI) Accounts on Social

More information

Chapter 27 Introduction to Information Retrieval and Web Search

Chapter 27 Introduction to Information Retrieval and Web Search Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval

More information

UNIK: Unsupervised Social Network Spam Detection

UNIK: Unsupervised Social Network Spam Detection UNIK: Unsupervised Social Network Spam Detection Enhua Tan,LeiGuo 2,SongqingChen 3,XiaodongZhang 2,andYihong(Eric)Zhao 4 LinkedIn entan@linkedin.com 2 Ohio State University 3 George Mason {lguo,zhang}@cse.ohiostate.edu

More information

Think before RT: An Experimental Study of Abusing Twitter Trends

Think before RT: An Experimental Study of Abusing Twitter Trends Think before RT: An Experimental Study of Abusing Twitter Trends Despoina Antonakaki 1, Iasonas Polakis 2, Elias Athanasopoulos 1, Sotiris Ioannidis 1, and Paraskevi Fragopoulou 1 1 FORTH-ICS, Greece {despoina,elathan,sotiris,fragopou}@ics.forth.gr

More information

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion

Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion Introduction Types of Social Network Analysis Social Networks in the Online Age Data Mining for Social Network Analysis Applications Conclusion References Social Network Social Network Analysis Sociocentric

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford

More information

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets

A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets A Performance Evaluation of Lfun Algorithm on the Detection of Drifted Spam Tweets Varsha Palandulkar 1, Siddhesh Bhujbal 2, Aayesha Momin 3, Vandana Kirane 4, Raj Naybal 5 Professor, AISSMS Polytechnic

More information

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing

Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Detecting and Analyzing Communities in Social Network Graphs for Targeted Marketing Gautam Bhat, Rajeev Kumar Singh Department of Computer Science and Engineering Shiv Nadar University Gautam Buddh Nagar,

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Lecture #11: Link Analysis 3 Seoul National University 1 In This Lecture WebSpam: definition and method of attacks TrustRank: how to combat WebSpam HITS algorithm: another algorithm

More information

Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data

Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data Stochastic Blockmodels as an unsupervised approach to detect botnet infected clusters in networked data Mark Patrick Roeling & Geoff Nicholls Department of Statistics University of Oxford Data Science

More information

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings.

Karami, A., Zhou, B. (2015). Online Review Spam Detection by New Linguistic Features. In iconference 2015 Proceedings. Online Review Spam Detection by New Linguistic Features Amir Karam, University of Maryland Baltimore County Bin Zhou, University of Maryland Baltimore County Karami, A., Zhou, B. (2015). Online Review

More information

deseo: Combating Search-Result Poisoning Yu USF

deseo: Combating Search-Result Poisoning Yu USF deseo: Combating Search-Result Poisoning Yu Jin @MSCS USF Your Google is not SAFE! SEO Poisoning - A new way to spread malware! Why choose SE? 22.4% of Google searches in the top 100 results > 50% for

More information

A Generic Statistical Approach for Spam Detection in Online Social Networks

A Generic Statistical Approach for Spam Detection in Online Social Networks Final version of the accepted paper. Cite as: F. Ahmad and M. Abulaish, A Generic Statistical Approach for Spam Detection in Online Social Networks, Computer Communications, 36(10-11), Elsevier, pp. 1120-1129,

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Graph and Link Mining

Graph and Link Mining Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }

More information

V2: Measures and Metrics (II)

V2: Measures and Metrics (II) - Betweenness Centrality V2: Measures and Metrics (II) - Groups of Vertices - Transitivity - Reciprocity - Signed Edges and Structural Balance - Similarity - Homophily and Assortative Mixing 1 Betweenness

More information

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Effective Latent Space Graph-based Re-ranking Model with Global Consistency Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case

More information

Identifying Fraudulently Promoted Online Videos

Identifying Fraudulently Promoted Online Videos Identifying Fraudulently Promoted Online Videos Vlad Bulakh, Christopher W. Dunn, Minaxi Gupta April 7, 2014 April 7, 2014 Vlad Bulakh 2 Motivation Online video sharing websites are visited by millions

More information

A study of Video Response Spam Detection on YouTube

A study of Video Response Spam Detection on YouTube A study of Video Response Spam Detection on YouTube Suman 1 and Vipin Arora 2 1 Research Scholar, Department of CSE, BITS, Bhiwani, Haryana (India) 2 Asst. Prof., Department of CSE, BITS, Bhiwani, Haryana

More information

Social-Network Graphs

Social-Network Graphs Social-Network Graphs Mining Social Networks Facebook, Google+, Twitter Email Networks, Collaboration Networks Identify communities Similar to clustering Communities usually overlap Identify similarities

More information

Identifying Suspended Accounts In Twitter

Identifying Suspended Accounts In Twitter University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations 2016 Identifying Suspended Accounts In Twitter Xiutian Cui University of Windsor Follow this and additional works at: https://scholar.uwindsor.ca/etd

More information

Communities Against Deception in Online Social Networks

Communities Against Deception in Online Social Networks Final version of the accepted paper. Cite as: Communities Against Deception in Online Social Networks, Computer Fraud and Security, Volume 2014, Issue 2, Elsevier, pp. 8-16, Feb. 2014. Communities Against

More information

Classification Methods for Spam Detection In Online Social Network

Classification Methods for Spam Detection In Online Social Network International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-56 Volume: 4 Issue: 7 July-217 www.irjet.net p-issn: 2395-72 Classification Methods for Spam Detection In Online Social

More information

Social Network Analysis

Social Network Analysis Chirayu Wongchokprasitti, PhD University of Pittsburgh Center for Causal Discovery Department of Biomedical Informatics chw20@pitt.edu http://www.pitt.edu/~chw20 Overview Centrality Analysis techniques

More information

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today

3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today 3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

More information

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How

More information

Fake and Spam Messages: Detecting Misinformation During Natural Disasters on Social Media

Fake and Spam Messages: Detecting Misinformation During Natural Disasters on Social Media Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 5-2015 Fake and Spam Messages: Detecting Misinformation During Natural Disasters on Social Media Meet Rajdev

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

Social Network Mining An Introduction

Social Network Mining An Introduction Social Network Mining An Introduction Jiawei Zhang Assistant Professor Florida State University Big Data A Questionnaire Please raise your hands, if you (1) use Facebook (2) use Instagram (3) use Snapchat

More information

Sampling Large Graphs: Algorithms and Applications

Sampling Large Graphs: Algorithms and Applications Sampling Large Graphs: Algorithms and Applications Don Towsley Umass - Amherst Joint work with P.H. Wang, J.Z. Zhou, J.C.S. Lui, X. Guan Measuring, Analyzing Large Networks - large networks can be represented

More information

Detection of Spam Tipping Behaviour on Foursquare

Detection of Spam Tipping Behaviour on Foursquare Detection of Spam Tipping Behaviour on Foursquare Anupama Aggarwal IIIT - Delhi New Delhi, India anupamaa@iiitd.ac.in Jussara Almeida UFMG Brazil jussara@dcc.ufmg.br Ponnurangam Kumaraguru IIIT - Delhi

More information

Online Social Spammer Detection

Online Social Spammer Detection Online Social Spammer Detection Xia Hu, Jiliang Tang, Huan Liu Computer Science and Engineering, Arizona State University, USA {xiahu, jiliang.tang, huan.liu}@asu.edu Abstract The explosive use of social

More information

Community Detection Using Random Walk Label Propagation Algorithm and PageRank Algorithm over Social Network

Community Detection Using Random Walk Label Propagation Algorithm and PageRank Algorithm over Social Network Community Detection Using Random Walk Label Propagation Algorithm and PageRank Algorithm over Social Network 1 Monika Kasondra, 2 Prof. Kamal Sutaria, 1 M.E. Student, 2 Assistent Professor, 1 Computer

More information

Sampling Large Graphs: Algorithms and Applications

Sampling Large Graphs: Algorithms and Applications Sampling Large Graphs: Algorithms and Applications Don Towsley College of Information & Computer Science Umass - Amherst Collaborators: P.H. Wang, J.C.S. Lui, J.Z. Zhou, X. Guan Measuring, analyzing large

More information

Online Social Networks and Media

Online Social Networks and Media Online Social Networks and Media Absorbing Random Walks Link Prediction Why does the Power Method work? If a matrix R is real and symmetric, it has real eigenvalues and eigenvectors: λ, w, λ 2, w 2,, (λ

More information

Bruno Martins. 1 st Semester 2012/2013

Bruno Martins. 1 st Semester 2012/2013 Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4

More information

Fooji Code of Conduct

Fooji Code of Conduct Fooji Code of Conduct In order to protect the experience of the Fooji fans and employees, there are some limitations on the type of content and behavior that we allow. These limitations are set forth in

More information

Using PageRank in Feature Selection

Using PageRank in Feature Selection Using PageRank in Feature Selection Dino Ienco, Rosa Meo, and Marco Botta Dipartimento di Informatica, Università di Torino, Italy {ienco,meo,botta}@di.unito.it Abstract. Feature selection is an important

More information

Anomaly Detection on Data Streams with High Dimensional Data Environment

Anomaly Detection on Data Streams with High Dimensional Data Environment Anomaly Detection on Data Streams with High Dimensional Data Environment Mr. D. Gokul Prasath 1, Dr. R. Sivaraj, M.E, Ph.D., 2 Department of CSE, Velalar College of Engineering & Technology, Erode 1 Assistant

More information

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR

SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR SCALABLE KNOWLEDGE BASED AGGREGATION OF COLLECTIVE BEHAVIOR P.SHENBAGAVALLI M.E., Research Scholar, Assistant professor/cse MPNMJ Engineering college Sspshenba2@gmail.com J.SARAVANAKUMAR B.Tech(IT)., PG

More information

Information Retrieval

Information Retrieval Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have

More information

Part 1: Link Analysis & Page Rank

Part 1: Link Analysis & Page Rank Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

CS246: Mining Massive Datasets Jure Leskovec, Stanford University CS6: Mining Massive Datasets Jure Leskovec, Stanford University http://cs6.stanford.edu //8 Jure Leskovec, Stanford CS6: Mining Massive Datasets High dim. data Graph data Infinite data Machine learning

More information

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic SEMANTIC COMPUTING Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 23 November 2018 Overview Unsupervised Machine Learning overview Association

More information

10 KEY WAYS THE FINANCIAL SERVICES INDUSTRY CAN COMBAT CYBER THREATS

10 KEY WAYS THE FINANCIAL SERVICES INDUSTRY CAN COMBAT CYBER THREATS 10 KEY WAYS THE FINANCIAL SERVICES INDUSTRY CAN COMBAT CYBER THREATS WHITE PAPER INTRODUCTION BANKS ARE A COMMON TARGET FOR CYBER CRIMINALS AND OVER THE LAST YEAR, FIREEYE HAS BEEN HELPING CUSTOMERS RESPOND

More information

SOCIAL NETWORKING IN TODAY S BUSINESS WORLD

SOCIAL NETWORKING IN TODAY S BUSINESS WORLD SOCIAL NETWORKING IN TODAY S BUSINESS WORLD AGENDA Review the use of social networking applications within the business environment Review current trends in threats, attacks and incidents Understand how

More information

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge

Centralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum

More information

SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation

SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation This paper was presented as part of the main technical program at IEEE INFOCOM 2011 SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Kyungbaek Kim Xiaowei Yang

More information

Big Data Analytics CSCI 4030

Big Data Analytics CSCI 4030 High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising

More information

The Cost of Phishing. Understanding the True Cost Dynamics Behind Phishing Attacks A CYVEILLANCE WHITE PAPER MAY 2015

The Cost of Phishing. Understanding the True Cost Dynamics Behind Phishing Attacks A CYVEILLANCE WHITE PAPER MAY 2015 The Cost of Phishing Understanding the True Cost Dynamics Behind Phishing Attacks A CYVEILLANCE WHITE PAPER MAY 2015 Executive Summary.... 3 The Costs... 4 How To Estimate the Cost of an Attack.... 5 Table

More information

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at

International Journal of Research in Advent Technology, Vol.7, No.3, March 2019 E-ISSN: Available online at Performance Evaluation of Ensemble Method Based Outlier Detection Algorithm Priya. M 1, M. Karthikeyan 2 Department of Computer and Information Science, Annamalai University, Annamalai Nagar, Tamil Nadu,

More information

SOCIAL MEDIA MINING. Data Mining Essentials

SOCIAL MEDIA MINING. Data Mining Essentials SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate

More information

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a

1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a !"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and

More information

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer Bing Liu Web Data Mining Exploring Hyperlinks, Contents, and Usage Data With 177 Figures Springer Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web

More information

Countering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008

Countering Spam Using Classification Techniques. Steve Webb Data Mining Guest Lecture February 21, 2008 Countering Spam Using Classification Techniques Steve Webb webb@cc.gatech.edu Data Mining Guest Lecture February 21, 2008 Overview Introduction Countering Email Spam Problem Description Classification

More information

Telling Experts from Spammers Expertise Ranking in Folksonomies

Telling Experts from Spammers Expertise Ranking in Folksonomies 32 nd Annual ACM SIGIR 09 Boston, USA, Jul 19-23 2009 Telling Experts from Spammers Expertise Ranking in Folksonomies Michael G. Noll (Albert) Ching-Man Au Yeung Christoph Meinel Nicholas Gibbins Nigel

More information

Clustering Results. Result List Example. Clustering Results. Information Retrieval

Clustering Results. Result List Example. Clustering Results. Information Retrieval Information Retrieval INFO 4300 / CS 4300! Presenting Results Clustering Clustering Results! Result lists often contain documents related to different aspects of the query topic! Clustering is used to

More information

with Advanced Protection

with Advanced  Protection with Advanced Email Protection OVERVIEW Today s sophisticated threats are changing. They re multiplying. They re morphing into new variants. And they re targeting people, not just technology. As organizations

More information

Community-Based Recommendations: a Solution to the Cold Start Problem

Community-Based Recommendations: a Solution to the Cold Start Problem Community-Based Recommendations: a Solution to the Cold Start Problem Shaghayegh Sahebi Intelligent Systems Program University of Pittsburgh sahebi@cs.pitt.edu William W. Cohen Machine Learning Department

More information

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks

CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks CS224W: Social and Information Network Analysis Project Report: Edge Detection in Review Networks Archana Sulebele, Usha Prabhu, William Yang (Group 29) Keywords: Link Prediction, Review Networks, Adamic/Adar,

More information

Graph Mining and Social Network Analysis

Graph Mining and Social Network Analysis Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References q Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann

More information

Discover (n.): This feature surfaces personalized content tailored to your interests.

Discover (n.): This feature surfaces personalized content tailored to your interests. Glossary: General Terms @: The @ sign is used to call out usernames in Tweets: "Hello @twitter!" People will use your @username to mention you in Tweets, send you a message or link to your profile. @username:

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 3, March 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue:

More information

Supervised Random Walks

Supervised Random Walks Supervised Random Walks Pawan Goyal CSE, IITKGP September 8, 2014 Pawan Goyal (IIT Kharagpur) Supervised Random Walks September 8, 2014 1 / 17 Correlation Discovery by random walk Problem definition Estimate

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

SOFIA: Social Filtering for Niche Markets

SOFIA: Social Filtering for Niche Markets Social Filtering for Niche Markets Matteo Dell'Amico Licia Capra University College London UCL MobiSys Seminar 9 October 2007 : Social Filtering for Niche Markets Outline 1 Social Filtering Competence:

More information