SOFIA: Social Filtering for Niche Markets
|
|
- Clement Harrell
- 5 years ago
- Views:
Transcription
1 Social Filtering for Niche Markets Matteo Dell'Amico Licia Capra University College London UCL MobiSys Seminar 9 October 2007 : Social Filtering for Niche Markets
2 Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
3 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern The Long Tail Chris Anderson, 2006 Digital distribution: millions of dierent products are available to consumers. An enormous market for niche content is appearing. Users need help to nd interesting content. Filters are essential to connect supply and demand. Our Problem Creating an ecient and robust lter. : Social Filtering for Niche Markets
4 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern The Long Tail Chris Anderson, 2006 Digital distribution: millions of dierent products are available to consumers. An enormous market for niche content is appearing. Users need help to nd interesting content. Filters are essential to connect supply and demand. Our Problem Creating an ecient and robust lter. : Social Filtering for Niche Markets
5 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
6 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Collaborative Filtering Which items might I like? Let's look at what similar users did. Similarity in reviews, behaviour... They are competent: they express (subjective!) judgements we agree with. : Social Filtering for Niche Markets
7 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Propagating Trust: Competence Alice expressed judgement X (I like eating at SOAS). Bob agrees with Alice on X, therefore Alice ranks Bob as a competent evaluator. Bob also expressed judgement Y (They make good burgers at ULU). Alice decides to trust Bob's advice and tries ULU. : Social Filtering for Niche Markets
8 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Sybil Attack Also known as... How to trick Alice? Create lots of false users (Sybils) that copy Alice's judgements. All Sybils vote for a malicious judgement they want to increase the ranking of. Since the Sybils look competent to Alice, she will trust them. Prole injection, shilling (CF), web spam (webpage ranking). In Social Filtering, Alice leverages on her social ties to isolate Sybils. : Social Filtering for Niche Markets
9 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Sybil Attack Also known as... How to trick Alice? Create lots of false users (Sybils) that copy Alice's judgements. All Sybils vote for a malicious judgement they want to increase the ranking of. Since the Sybils look competent to Alice, she will trust them. Prole injection, shilling (CF), web spam (webpage ranking). In Social Filtering, Alice leverages on her social ties to isolate Sybils. : Social Filtering for Niche Markets
10 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
11 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Propagating Trust: Intent Web of Trust: a social network where A links to B if A trusts B to behave honestly. Created explicitely by users (e.g., Facebook) or automatically (e.g., logs). Trust Transitivity: I trust the friends of my friends. Alice thinks Bob is honest. Bob recommends Charlie to Alice. Since Alice trusts Bob, she decides to trusts Charlie as well. Iteratively, Alice derives trust for Dave. : Social Filtering for Niche Markets
12 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Isolating Sybils There is no way to recognize legitimate users only by looking at their judgements. It is costly for the attacker to convince honest users to trust it. A small number of honest users are connected to the Sybil network via attack edges (Yu et al., ACM SIGCOMM '06). We can isolate Sybils if we limit the amount of trust propagated through the attack edges. : Social Filtering for Niche Markets
13 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Discussing Trust Transitivity Pro Users get trusted if they behave honestly. If reciprocative behaviour is adopted, the rational choice for selsh users is to behave honestly (Feldman et al., ACM EC '04). Sybils can get isolated. Con Trust transitivity does not take into account the tastes of the users. This is a big problem in niches, where subjectivity is extreme. : Social Filtering for Niche Markets
14 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Discussing Trust Transitivity Pro Users get trusted if they behave honestly. If reciprocative behaviour is adopted, the rational choice for selsh users is to behave honestly (Feldman et al., ACM EC '04). Sybils can get isolated. Con Trust transitivity does not take into account the tastes of the users. This is a big problem in niches, where subjectivity is extreme. : Social Filtering for Niche Markets
15 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
16 Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern Propagating Trust: Social Filtering We trust users who are both willing and able to give good judgements. Alice trusts Dave's intent because a path in the web of trust connects her to him. She trusts his competence because they agree on X. Since Dave is honest and competent, Alice trusts his judgement Y. : Social Filtering for Niche Markets
17 From HITS to Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
18 From HITS to PageRank Google's algorithm to rank the importance of web pages. Intuitive consideration: an authoritative page is linked by many authoritative pages. A random surfer following links at random is more likely to stumble in more important pages. With 1 α probability of stopping at each step, PageRank computes the probability that the random surfer stops at any given page. In Webs of Trust The same principle applies: reputable users are recommended by other reputable users. We swap the WWW graph with the social network. : Social Filtering for Niche Markets
19 From HITS to PageRank Google's algorithm to rank the importance of web pages. Intuitive consideration: an authoritative page is linked by many authoritative pages. A random surfer following links at random is more likely to stumble in more important pages. With 1 α probability of stopping at each step, PageRank computes the probability that the random surfer stops at any given page. In Webs of Trust The same principle applies: reputable users are recommended by other reputable users. We swap the WWW graph with the social network. : Social Filtering for Niche Markets
20 From HITS to Personalized PageRank PageRank does not take into account subjectivity, which is essential to isolate Sybil nodes. We force the random walk to start in the evaluating node: this assures that the walk starts at a honest node. The trust obtained by Sybil nodes is limited by the probability of following an attack edge. : Social Filtering for Niche Markets
21 From HITS to Personalized PageRank - The α Parameter (1) α: probability that our random walk continues at each step. Low α implies shorter paths. Pro: Con: Fast convergence Close social ties may have related tastes (i.e., my friends listen to similar music) We don't trust honest users because they're socially far away. : Social Filtering for Niche Markets
22 From HITS to Personalized PageRank - The α Parameter (2) High α implies longer paths: Pro: Con: We have more information about nodes. Attack edges are more likely to be traversed: lower attack resilience. : Social Filtering for Niche Markets
23 From HITS to Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
24 From HITS to HITS: the Idea Jon Kleinberg, JACM 1999 Web pages are seen as hubs and authorities: authorities are the authoritative pages; hubs are pages that link to authorities. Good hubs point to good authorities; good authorities are pointed by good hubs. In our case Users instead of hubs; judgements instead of authorities. : Social Filtering for Niche Markets
25 From HITS to HITS: the Algorithm We have a bipartite graph with hubs/users (circles) and authorities/judgements (squares). All hubs start with the same weight. Iteratively, until convergence: Weights on authorities are the sum of weights on all hubs that link them; Weights on hubs become the sum of weights on authorities they link; Weights on hubs get renormalized. : Social Filtering for Niche Markets
26 From HITS to HITS: Example (1) Initialization Weights on hubs get initialized. : Social Filtering for Niche Markets
27 From HITS to HITS: Example (2) Forward step Weigths on authorities are the sum of hubs who link them. : Social Filtering for Niche Markets
28 From HITS to HITS: Example (3) Backward step Weigths on hubs are the sum of linked authorities. : Social Filtering for Niche Markets
29 From HITS to HITS: Example (4) Normalization Weigths on hubs get renormalized. Back to the Forward Step. : Social Filtering for Niche Markets
30 From HITS to Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
31 From HITS to Change 1: Tightly Knit Communities SALSA: Lempel and Moran, 2001 HITS rewards disproportionately communities where users and judgements are highly correlated. In the graph on the left, the ranking of nodes in the less dense blue community goes to 0. Fix: perform a random walk on the judgement graph and compute the equilibrium distribution. Side eect: niche judgements are rewarded, since their weight is redistributed to less nodes. : Social Filtering for Niche Markets
32 From HITS to Change 2: Subjective Ranking Problem The results of HITS are independent from tastes of the evaluating node. It is essential to have personalized results. Fix Same approach as in PageRank: we start the random walk from the evaluating node. To reward shorter paths, we stop at each iteration with probability 1 β. Low β implies higher subjectivity and faster convergence. High β favours longer paths of trust propagation. : Social Filtering for Niche Markets
33 From HITS to Change 2: Subjective Ranking Problem The results of HITS are independent from tastes of the evaluating node. It is essential to have personalized results. Fix Same approach as in PageRank: we start the random walk from the evaluating node. To reward shorter paths, we stop at each iteration with probability 1 β. Low β implies higher subjectivity and faster convergence. High β favours longer paths of trust propagation. : Social Filtering for Niche Markets
34 From HITS to Change 3: Take Intent into Account Problem Fix As said before, we don't want to trust dishonest nodes. Culprit for HITS: backwards step. The fact that a user expressed a judgement does not insure they are well intentioned. 1 Compute intent ranking using Personalized PageRank. 2 Redistribute trust to users proportionally to their intent ranking. : Social Filtering for Niche Markets
35 From HITS to Change 3: Take Intent into Account Problem Fix As said before, we don't want to trust dishonest nodes. Culprit for HITS: backwards step. The fact that a user expressed a judgement does not insure they are well intentioned. 1 Compute intent ranking using Personalized PageRank. 2 Redistribute trust to users proportionally to their intent ranking. : Social Filtering for Niche Markets
36 From HITS to in Synthesis : SOcial FIltering Algorithm HITS-like trust propagating algorithm. 3 key modications: 1 Random walk trust propagation as proposed in SALSA 2 The starting point is the evaluating node; the random walk continues at each step with probability β. 3 In the backward step, trust is redistributed from judgements to users according to their intent ranking computed using Personalized PageRank on the web of trust. : Social Filtering for Niche Markets
37 Datasets Hidden Judgements Sybil Attacks Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
38 Datasets Hidden Judgements Sybil Attacks CiteSeer Large dataset of scientic collaborations. Social network: co-authorship data. Authors A and B are connected if they wrote papers together. Judgements: citations. Graph data If X cites Y, the implicit judgement is Y is relevant to X's topic. A highly clustered subset of the whole graph. 10,000 authors. 182,675 papers. : Social Filtering for Niche Markets
39 Datasets Hidden Judgements Sybil Attacks Last.fm Social networking website devoted to music. Social network: friend lists. Same as Facebook, MySpace,... Judgements: most listened artists chart for each user. Implicit judgement: I like to listen to songs by X. Graph data A BFS crawl of 10,000 users. 51,654 dierent artists. : Social Filtering for Niche Markets
40 Datasets Hidden Judgements Sybil Attacks Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
41 Datasets Hidden Judgements Sybil Attacks Hidden Judgements How to evaluate the accuracy of 's ranking on judgements? We want to rank highly the judgements that a user would approve. We hide a random judgement and execute. If the algorithm performs well, the hidden judgement will have a high ranking. In Citeseer, we try to guess a missing citation from a paper. In Last.fm, we try to nd the missing artist in a chart. : Social Filtering for Niche Markets
42 Datasets Hidden Judgements Sybil Attacks Hidden Judgements - Citeseer no intent ranking Personalized PageRank Ratio Rank Medians: 4 (), 12 ( - no intent ranking), 30 (Personalized PageRank). : Social Filtering for Niche Markets
43 Datasets Hidden Judgements Sybil Attacks Hidden Judgements - Last.fm no intent ranking Personalized PageRank Ratio Rank Medians: 174 (), 157 ( - no intent ranking), 344 (Personalized PageRank). : Social Filtering for Niche Markets
44 Datasets Hidden Judgements Sybil Attacks Outline 1 Social Filtering Competence: Taste Similarity Intent: Trust Transitivity The Social Filtering Pattern 2 From HITS to 3 Datasets Hidden Judgements Sybil Attacks : Social Filtering for Niche Markets
45 Datasets Hidden Judgements Sybil Attacks Sybil Attack We simulated an attack trying to inate the rating of a malicious judgement X on a victim node A. A coalition of 100 Sybil nodes is created. All Sybils copy A's judgements, then add a link to X. We study how the ranking of X changes before and after the attack, on the victim node A and on other nodes. : Social Filtering for Niche Markets
46 Datasets Hidden Judgements Sybil Attacks Sybil Attack - Last.fm (1) Attack Percentiles Algorithm edges Role Any no attack 12,914 25,827 38,741 - no intent victim other 348 1,185 3, ,730 20,493 33,322 Pers. PageRank 10 4,759 8,757 13, ,092 2,012 3,101 1 victim 3,406 11,182 31,765 other 9,599 19,186 33, victim 469 1,311 2,815 other 4,612 8,779 14, victim other 1,040 2,649 5,571 : Social Filtering for Niche Markets
47 Datasets Hidden Judgements Sybil Attacks Sybil Attack - Last.fm (2) Attack Percentiles Algorithm edges Role (α = 0.9) 100 (α = 0.5) 100 victim other 1,040 2,649 5,571 victim other 1,578 3,106 5,128 Tradeo between accuracy and attack resilience. : Social Filtering for Niche Markets
48 Datasets Hidden Judgements Sybil Attacks Conclusions Social Filtering Integrating information about social networks and subjective preferences we obtain recommendations that are: Accurate (due mainly to preferences) Attack resilient (thanks to social networks). Incorporating social network may increase accuracy. A particular implementation of Social Filtering. Future Work P2P/mobile decentralised implementation Other social ltering algorithms? : Social Filtering for Niche Markets
COMP5331: Knowledge Discovery and Data Mining
COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd, Jon M. Kleinberg 1 1 PageRank
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationInformation Retrieval. Lecture 11 - Link analysis
Information Retrieval Lecture 11 - Link analysis Seminar für Sprachwissenschaft International Studies in Computational Linguistics Wintersemester 2007 1/ 35 Introduction Link analysis: using hyperlinks
More informationCOMP 4601 Hubs and Authorities
COMP 4601 Hubs and Authorities 1 Motivation PageRank gives a way to compute the value of a page given its position and connectivity w.r.t. the rest of the Web. Is it the only algorithm: No! It s just one
More informationSocial Network Analysis
Social Network Analysis Giri Iyengar Cornell University gi43@cornell.edu March 14, 2018 Giri Iyengar (Cornell Tech) Social Network Analysis March 14, 2018 1 / 24 Overview 1 Social Networks 2 HITS 3 Page
More informationDegree Distribution: The case of Citation Networks
Network Analysis Degree Distribution: The case of Citation Networks Papers (in almost all fields) refer to works done earlier on same/related topics Citations A network can be defined as Each node is a
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu SPAM FARMING 2/11/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2 2/11/2013 Jure Leskovec, Stanford
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationLecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule
Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule 1 How big is the Web How big is the Web? In the past, this question
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More informationUniversity of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationPart 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Graph Data: Social Networks [Source: 4-degrees of separation, Backstrom-Boldi-Rosa-Ugander-Vigna,
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationLecture 9: I: Web Retrieval II: Webology. Johan Bollen Old Dominion University Department of Computer Science
Lecture 9: I: Web Retrieval II: Webology Johan Bollen Old Dominion University Department of Computer Science jbollen@cs.odu.edu http://www.cs.odu.edu/ jbollen April 10, 2003 Page 1 WWW retrieval Two approaches
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCentralities (4) By: Ralucca Gera, NPS. Excellence Through Knowledge
Centralities (4) By: Ralucca Gera, NPS Excellence Through Knowledge Some slide from last week that we didn t talk about in class: 2 PageRank algorithm Eigenvector centrality: i s Rank score is the sum
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu HITS (Hypertext Induced Topic Selection) Is a measure of importance of pages or documents, similar to PageRank
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationSlides based on those in:
Spyros Kontogiannis & Christos Zaroliagis Slides based on those in: http://www.mmds.org A 3.3 B 38.4 C 34.3 D 3.9 E 8.1 F 3.9 1.6 1.6 1.6 1.6 1.6 2 y 0.8 ½+0.2 ⅓ M 1/2 1/2 0 0.8 1/2 0 0 + 0.2 0 1/2 1 [1/N]
More information1 Starting around 1996, researchers began to work on. 2 In Feb, 1997, Yanhong Li (Scotch Plains, NJ) filed a
!"#$ %#& ' Introduction ' Social network analysis ' Co-citation and bibliographic coupling ' PageRank ' HIS ' Summary ()*+,-/*,) Early search engines mainly compare content similarity of the query and
More informationBig Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
More informationLearning to Rank Networked Entities
Learning to Rank Networked Entities Alekh Agarwal Soumen Chakrabarti Sunny Aggarwal Presented by Dong Wang 11/29/2006 We've all heard that a million monkeys banging on a million typewriters will eventually
More informationHow to organize the Web?
How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second try: Web Search Information Retrieval attempts to find relevant docs in a small and trusted set Newspaper
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationGraph Algorithms. Revised based on the slides by Ruoming Kent State
Graph Algorithms Adapted from UMD Jimmy Lin s slides, which is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States. See http://creativecommons.org/licenses/by-nc-sa/3.0/us/
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCS6200 Information Retreival. The WebGraph. July 13, 2015
CS6200 Information Retreival The WebGraph The WebGraph July 13, 2015 1 Web Graph: pages and links The WebGraph describes the directed links between pages of the World Wide Web. A directed edge connects
More informationSocial and Technological Network Data Analytics. Lecture 5: Structure of the Web, Search and Power Laws. Prof Cecilia Mascolo
Social and Technological Network Data Analytics Lecture 5: Structure of the Web, Search and Power Laws Prof Cecilia Mascolo In This Lecture We describe power law networks and their properties and show
More informationLink Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material.
Link Analysis from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer and other material. 1 Contents Introduction Network properties Social network analysis Co-citation
More informationEinführung in Web und Data Science Community Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Einführung in Web und Data Science Community Analysis Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Today s lecture Anchor text Link analysis for ranking Pagerank and variants
More informationIntroduction to Data Mining
Introduction to Data Mining Lecture #11: Link Analysis 3 Seoul National University 1 In This Lecture WebSpam: definition and method of attacks TrustRank: how to combat WebSpam HITS algorithm: another algorithm
More informationBitcoin, Security for Cloud & Big Data
Bitcoin, Security for Cloud & Big Data CS 161: Computer Security Prof. David Wagner April 18, 2013 Bitcoin Public, distributed, peer-to-peer, hash-chained audit log of all transactions ( block chain ).
More informationCS425: Algorithms for Web Scale Data
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org J.
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris Manning at Stanford U.) The Web as a Directed Graph
More informationCPSC 340: Machine Learning and Data Mining. Ranking Fall 2016
CPSC 340: Machine Learning and Data Mining Ranking Fall 2016 Assignment 5: Admin 2 late days to hand in Wednesday, 3 for Friday. Assignment 6: Due Friday, 1 late day to hand in next Monday, etc. Final:
More informationMAE 298, Lecture 9 April 30, Web search and decentralized search on small-worlds
MAE 298, Lecture 9 April 30, 2007 Web search and decentralized search on small-worlds Search for information Assume some resource of interest is stored at the vertices of a network: Web pages Files in
More informationDSCI 575: Advanced Machine Learning. PageRank Winter 2018
DSCI 575: Advanced Machine Learning PageRank Winter 2018 http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf Web Search before Google Unsupervised Graph-Based Ranking We want to rank importance based on
More informationLecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods
Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Node Importance and Neighborhoods Matthias Schubert, Matthias Renz, Felix Borutta, Evgeniy Faerman, Christian Frey, Klaus Arthur
More informationCOMP6237 Data Mining Making Recommendations. Jonathon Hare
COMP6237 Data Mining Making Recommendations Jonathon Hare jsh2@ecs.soton.ac.uk Introduction Recommender systems 101 Taxonomy of recommender systems Collaborative Filtering Collecting user preferences as
More informationSupplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks
1 Supplementary file for SybilDefender: A Defense Mechanism for Sybil Attacks in Large Social Networks Wei Wei, Fengyuan Xu, Chiu C. Tan, Qun Li The College of William and Mary, Temple University {wwei,
More informationData-Intensive Computing with MapReduce
Data-Intensive Computing with MapReduce Session 5: Graph Processing Jimmy Lin University of Maryland Thursday, February 21, 2013 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationPagerank Scoring. Imagine a browser doing a random walk on web pages:
Ranking Sec. 21.2 Pagerank Scoring Imagine a browser doing a random walk on web pages: Start at a random page At each step, go out of the current page along one of the links on that page, equiprobably
More informationRecommendation/Reputation. Ennan Zhai
Recommendation/Reputation Ennan Zhai ennan.zhai@yale.edu Lecture Outline Background Reputation System: EigenTrust & Credence Sybil-Resitance: DSybil Lecture Outline Background Reputation System: EigenTrust
More informationAiding the Detection of Fake Accounts in Large Scale Social Online Services
Aiding the Detection of Fake Accounts in Large Scale Social Online Services Qiang Cao Duke University Michael Sirivianos Xiaowei Yang Tiago Pregueiro Cyprus Univ. of Technology Duke University Tuenti,
More informationBig Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)
Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 5: Analyzing Graphs (2/2) February 2, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These
More informationRecommender Systems (RSs)
Recommender Systems Recommender Systems (RSs) RSs are software tools providing suggestions for items to be of use to users, such as what items to buy, what music to listen to, or what online news to read
More informationInformation Networks: PageRank
Information Networks: PageRank Web Science (VU) (706.716) Elisabeth Lex ISDS, TU Graz June 18, 2018 Elisabeth Lex (ISDS, TU Graz) Links June 18, 2018 1 / 38 Repetition Information Networks Shape of the
More informationAnalysis of Large Graphs: TrustRank and WebSpam
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationLec 8: Adaptive Information Retrieval 2
Lec 8: Adaptive Information Retrieval 2 Advaith Siddharthan Introduction to Information Retrieval by Manning, Raghavan & Schütze. Website: http://nlp.stanford.edu/ir-book/ Linear Algebra Revision Vectors:
More informationBruno Martins. 1 st Semester 2012/2013
Link Analysis Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2012/2013 Slides baseados nos slides oficiais do livro Mining the Web c Soumen Chakrabarti. Outline 1 2 3 4
More informationCS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu How to organize the Web? First try: Human curated Web directories Yahoo, DMOZ, LookSmart Second
More informationInformation Networks: Hubs and Authorities
Information Networks: Hubs and Authorities Web Science (VU) (706.716) Elisabeth Lex KTI, TU Graz June 11, 2018 Elisabeth Lex (KTI, TU Graz) Links June 11, 2018 1 / 61 Repetition Opinion Dynamics Culture
More informationThreats & Vulnerabilities in Online Social Networks
Threats & Vulnerabilities in Online Social Networks Lei Jin LERSAIS Lab @ School of Information Sciences University of Pittsburgh 03-26-2015 201 Topics Focus is the new vulnerabilities that exist in online
More informationUnit VIII. Chapter 9. Link Analysis
Unit VIII Link Analysis: Page Ranking in web search engines, Efficient Computation of Page Rank using Map-Reduce and other approaches, Topic-Sensitive Page Rank, Link Spam, Hubs and Authorities (Text Book:2
More informationJordan Boyd-Graber University of Maryland. Thursday, March 3, 2011
Data-Intensive Information Processing Applications! Session #5 Graph Algorithms Jordan Boyd-Graber University of Maryland Thursday, March 3, 2011 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 60 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Pregel: A System for Large-Scale Graph Processing
More informationCSI 445/660 Part 10 (Link Analysis and Web Search)
CSI 445/660 Part 10 (Link Analysis and Web Search) Ref: Chapter 14 of [EK] text. 10 1 / 27 Searching the Web Ranking Web Pages Suppose you type UAlbany to Google. The web page for UAlbany is among the
More informationAuthor(s): Rahul Sami, 2009
Author(s): Rahul Sami, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/
More informationCS 6604: Data Mining Large Networks and Time-Series
CS 6604: Data Mining Large Networks and Time-Series Soumya Vundekode Lecture #12: Centrality Metrics Prof. B Aditya Prakash Agenda Link Analysis and Web Search Searching the Web: The Problem of Ranking
More informationGraph and Link Mining
Graph and Link Mining Graphs - Basics A graph is a powerful abstraction for modeling entities and their pairwise relationships. G = (V,E) Set of nodes V = v,, v 5 Set of edges E = { v, v 2, v 4, v 5 }
More informationRoadmap. Roadmap. Ranking Web Pages. PageRank. Roadmap. Random Walks in Ranking Query Results in Semistructured Databases
Roadmap Random Walks in Ranking Query in Vagelis Hristidis Roadmap Ranking Web Pages Rank according to Relevance of page to query Quality of page Roadmap PageRank Stanford project Lawrence Page, Sergey
More informationWeb Spam. Seminar: Future Of Web Search. Know Your Neighbors: Web Spam Detection using the Web Topology
Seminar: Future Of Web Search University of Saarland Web Spam Know Your Neighbors: Web Spam Detection using the Web Topology Presenter: Sadia Masood Tutor : Klaus Berberich Date : 17-Jan-2008 The Agenda
More informationInformation Retrieval and Web Search Engines
Information Retrieval and Web Search Engines Lecture 12: Link Analysis January 28 th, 2016 Wolf-Tilo Balke and Younes Ghammad Institut für Informationssysteme Technische Universität Braunschweig An Overview
More informationWeb search before Google. (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.)
' Sta306b May 11, 2012 $ PageRank: 1 Web search before Google (Taken from Page et al. (1999), The PageRank Citation Ranking: Bringing Order to the Web.) & % Sta306b May 11, 2012 PageRank: 2 Web search
More informationDe#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs,
De#anonymizing,Social,Networks, and,inferring,private,attributes, Using,Knowledge,Graphs, Jianwei Qian Illinois Tech Chunhong Zhang BUPT Xiang#Yang Li USTC,/Illinois Tech Linlin Chen Illinois Tech Outline
More informationNetwork Centrality. Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017
Network Centrality Saptarshi Ghosh Department of CSE, IIT Kharagpur Social Computing course, CS60017 Node centrality n Relative importance of a node in a network n How influential a person is within a
More informationSybil defenses via social networks
Sybil defenses via social networks Abhishek University of Oslo, Norway 19/04/2012 1 / 24 Sybil identities Single user pretends many fake/sybil identities i.e., creating multiple accounts observed in real-world
More informationThe link prediction problem for social networks
The link prediction problem for social networks Alexandra Chouldechova STATS 319, February 1, 2011 Motivation Recommending new friends in in online social networks. Suggesting interactions between the
More informationPersonalized Information Retrieval
Personalized Information Retrieval Shihn Yuarn Chen Traditional Information Retrieval Content based approaches Statistical and natural language techniques Results that contain a specific set of words or
More informationMatrix-Vector Multiplication by MapReduce. From Rajaraman / Ullman- Ch.2 Part 1
Matrix-Vector Multiplication by MapReduce From Rajaraman / Ullman- Ch.2 Part 1 Google implementation of MapReduce created to execute very large matrix-vector multiplications When ranking of Web pages that
More informationA Case For OneSwarm. Tom Anderson University of Washington.
A Case For OneSwarm Tom Anderson University of Washington http://oneswarm.cs.washington.edu/ With: Jarret Falkner, Tomas Isdal, Alex Jaffe, John P. John, Arvind Krishnamurthy, Harsha Madhyastha and Mike
More informationTrust in the Internet of Things From Personal Experience to Global Reputation. 1 Nguyen Truong PhD student, Liverpool John Moores University
Trust in the Internet of Things From Personal Experience to Global Reputation 1 Nguyen Truong PhD student, Liverpool John Moores University 2 Outline I. Background on Trust in Computer Science II. Overview
More informationLink Analysis. CSE 454 Advanced Internet Systems University of Washington. 1/26/12 16:36 1 Copyright D.S.Weld
Link Analysis CSE 454 Advanced Internet Systems University of Washington 1/26/12 16:36 1 Ranking Search Results TF / IDF or BM25 Tag Information Title, headers Font Size / Capitalization Anchor Text on
More informationGraphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey
More informationLink Analysis: Web Structure and Search
Link Analysis: Web Structure and Search Web Science (VU) (706716) Elisabeth Lex ISDS, TU Graz June 12, 2017 Elisabeth Lex (ISDS, TU Graz) Links June 12, 2017 1 / 69 Outline 1 Information Networks 2 Paths
More informationRecent Researches on Web Page Ranking
Recent Researches on Web Page Pradipta Biswas School of Information Technology Indian Institute of Technology Kharagpur, India Importance of Web Page Internet Surfers generally do not bother to go through
More informationSEO: SEARCH ENGINE OPTIMISATION
SEO: SEARCH ENGINE OPTIMISATION SEO IN 11 BASIC STEPS EXPLAINED What is all the commotion about this SEO, why is it important? I have had a professional content writer produce my content to make sure that
More informationSocial Interaction Based Video Recommendation: Recommending YouTube Videos to Facebook Users
Social Interaction Based Video Recommendation: Recommending YouTube Videos to Facebook Users Bin Nie, Honggang Zhang, Yong Liu Fordham University, Bronx, NY. Email: {bnie, hzhang44}@fordham.edu NYU Poly,
More informationCPSC 426/526. Reputation Systems. Ennan Zhai. Computer Science Department Yale University
CPSC 426/526 Reputation Systems Ennan Zhai Computer Science Department Yale University Recall: Lec-4 P2P search models: - How Chord works - Provable guarantees in Chord - Other DHTs, e.g., CAN and Pastry
More informationGraph and Web Mining - Motivation, Applications and Algorithms PROF. EHUD GUDES DEPARTMENT OF COMPUTER SCIENCE BEN-GURION UNIVERSITY, ISRAEL
Graph and Web Mining - Motivation, Applications and Algorithms PROF. EHUD GUDES DEPARTMENT OF COMPUTER SCIENCE BEN-GURION UNIVERSITY, ISRAEL Web mining - Outline Introduction Web Content Mining Web usage
More informationLink Structure Analysis
Link Structure Analysis Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!) Link Analysis In the Lecture HITS: topic-specific algorithm Assigns each page two scores a hub score
More informationExtracting Information from Complex Networks
Extracting Information from Complex Networks 1 Complex Networks Networks that arise from modeling complex systems: relationships Social networks Biological networks Distinguish from random networks uniform
More informationWeb Structure Mining using Link Analysis Algorithms
Web Structure Mining using Link Analysis Algorithms Ronak Jain Aditya Chavan Sindhu Nair Assistant Professor Abstract- The World Wide Web is a huge repository of data which includes audio, text and video.
More informationTopic mash II: assortativity, resilience, link prediction CS224W
Topic mash II: assortativity, resilience, link prediction CS224W Outline Node vs. edge percolation Resilience of randomly vs. preferentially grown networks Resilience in real-world networks network resilience
More informationHome Page. Title Page. Page 1 of 14. Go Back. Full Screen. Close. Quit
Page 1 of 14 Retrieving Information from the Web Database and Information Retrieval (IR) Systems both manage data! The data of an IR system is a collection of documents (or pages) User tasks: Browsing
More informationWeb consists of web pages and hyperlinks between pages. A page receiving many links from other pages may be a hint of the authority of the page
Link Analysis Links Web consists of web pages and hyperlinks between pages A page receiving many links from other pages may be a hint of the authority of the page Links are also popular in some other information
More informationMIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns
MIDTERM EXAMINATION Networked Life (NETS 112) November 21, 2013 Prof. Michael Kearns This is a closed-book exam. You should have no material on your desk other than the exam itself and a pencil or pen.
More informationLink Farming in Twitter
Link Farming in Twitter Pawan Goyal CSE, IITKGP Nov 11, 2016 Pawan Goyal (IIT Kharagpur) Link Farming in Twitter Nov 11, 2016 1 / 1 Reference Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar
More informationLink Analysis. Hongning Wang
Link Analysis Hongning Wang CS@UVa Structured v.s. unstructured data Our claim before IR v.s. DB = unstructured data v.s. structured data As a result, we have assumed Document = a sequence of words Query
More informationAn Improved Computation of the PageRank Algorithm 1
An Improved Computation of the PageRank Algorithm Sung Jin Kim, Sang Ho Lee School of Computing, Soongsil University, Korea ace@nowuri.net, shlee@computing.ssu.ac.kr http://orion.soongsil.ac.kr/ Abstract.
More information3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today
3 announcements: Thanks for filling out the HW1 poll HW2 is due today 5pm (scans must be readable) HW3 will be posted today CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu
More informationCollaborative Filtering using Euclidean Distance in Recommendation Engine
Indian Journal of Science and Technology, Vol 9(37), DOI: 10.17485/ijst/2016/v9i37/102074, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Collaborative Filtering using Euclidean Distance
More informationAutomatically Building Research Reading Lists
Automatically Building Research Reading Lists Michael D. Ekstrand 1 Praveen Kanaan 1 James A. Stemper 2 John T. Butler 2 Joseph A. Konstan 1 John T. Riedl 1 ekstrand@cs.umn.edu 1 GroupLens Research Department
More informationCPSC 532L Project Development and Axiomatization of a Ranking System
CPSC 532L Project Development and Axiomatization of a Ranking System Catherine Gamroth cgamroth@cs.ubc.ca Hammad Ali hammada@cs.ubc.ca April 22, 2009 Abstract Ranking systems are central to many internet
More informationDeep Web Crawling and Mining for Building Advanced Search Application
Deep Web Crawling and Mining for Building Advanced Search Application Zhigang Hua, Dan Hou, Yu Liu, Xin Sun, Yanbing Yu {hua, houdan, yuliu, xinsun, yyu}@cc.gatech.edu College of computing, Georgia Tech
More informationA STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE
A STUDY OF RANKING ALGORITHM USED BY VARIOUS SEARCH ENGINE Bohar Singh 1, Gursewak Singh 2 1, 2 Computer Science and Application, Govt College Sri Muktsar sahib Abstract The World Wide Web is a popular
More informationLink Analysis in Web Mining
Problem formulation (998) Link Analysis in Web Mining Hubs and Authorities Spam Detection Suppose we are given a collection of documents on some broad topic e.g., stanford, evolution, iraq perhaps obtained
More informationCountering Sparsity and Vulnerabilities in Reputation Systems
Countering Sparsity and Vulnerabilities in Reputation Systems Li Xiong Department of Mathematics and Computer Science Emory University lxiong@mathcs.emory.edu Ling Liu, Mustaque Ahamad College of Computing
More information