CPSC 426/526 Reputation Systems Ennan Zhai Computer Science Department Yale University
Recall: Lec-4 P2P search models: - How Chord works - Provable guarantees in Chord - Other DHTs, e.g., CAN and Pastry - Comparison between structured and unstructured P2P models red P2P network (Gnutella, KaZaA, etc.)tructured P2P network (Chord, CAN, Pastry, etc.) - More (Hybrid P2P, BitTorrent, etc.)
Lecture Roadmap Background Reputation Systems Case Study: Credence
Background foo A
Background NO Name Ava/Source 1 Foo1 70/78 2 foo1 60/66 foo A 3 foo 11/40 4 Foo-4 2/3 5......
Background NO Name Ava/Source Polluted { 1 Foo1 70/78 2 foo1 60/66 3 foo 11/40 4 Foo-4 2/3 5......
Pollution Attacks There are many ways to pollute files: - Corrupting the majority content of the files - Corrupting the block downloading at 99%
Pollution Attacks There are many ways to pollute files: - Corrupting the majority content of the files - Corrupting the block downloading at 99% I spent 3 hours on downloading this file, but this file is not what I want...
Pollution Attacks There are many ways to pollute files: - Corrupting the majority content of the files - Corrupting the block downloading at 99%
100,000 polluted files Background NO Name Ava/Source 100 files foo A 1 Foo1 70/78 2 foo1 60/66 3 foo 11/40 4 Foo-4 2/3 5......
100 files 100,000 polluted files foo Background NO Name Ava/Source 1 Foo1 70/78 It is high possible for customers to 2 foo1 60/66 search bad files! 3 foo 11/40 A 4 Foo-4 2/3 5......
Background Why there are so many pollution attacks: - Publishers (e.g., music companies) want copyright - Employ many programmers to launch these attacks red P2P network (Gnutella, KaZaA, etc.) The majority of content is polluted: - The service quality of P2P content sharing became low - Users gave up P2P content sharing systemsp2p network (Chord, CAN, Pastry, etc.) - More (Hybrid P2P, BitTorrent, etc.)
Lecture Roadmap Background Reputation Systems Case Study: Credence
Reputation Systems
Reputation Systems What is a reputation system: - Rating users or objects based on historical activities - Like credit cards - Assumption: high reputation users publish good content red P2P network (Gnutella, KaZaA, etc.) Types of reputation systems: - Global reputation model, e.g., PageRank - Personalized reputation model, e.g., EigenTrust and Credence Global reputation model is mainly used in centralized systems Personalized reputation is mainly used in P2P systems
Reputation Systems What is a reputation system: - Rating users or objects based on historical activities - Like credit cards - Assumption: high reputation users publish good content red P2P network (Gnutella, KaZaA, etc.) Types of reputation systems: - Global reputation model, e.g., PageRank - Personalized reputation model, e.g., EigenTrust and Credence Global reputation model is mainly used in centralized systems Personalized reputation is mainly used in P2P systems
How a reputation system works? Messages Author (Score) Votes........................
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (0) 0..................
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (0) 0..................
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (0) 0..................
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (0) 0.................. Bob Eve Dave
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (0) 0.................. Bob Eve Dave
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (0) Like: 3.................. Bob Eve Dave
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (3) Like: 3.................. Vi=1+1+1=3 Bob Eve Dave
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (3) Like: 3 Don t play with AlphaGo Alice (3) 0............
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (3) Like: 3 Don t play with AlphaGo Alice (3) 0 I hate C++ Bob (1) 0......
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (3) Like: 3 Don t play with AlphaGo Alice (3) 0 I hate C++ Bob (1) 0...... Alice Dave
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (3) Like: 3 Don t play with AlphaGo Alice (3) 0 I hate C++ Bob (1) Dislike: 2...... Alice Dave
How a reputation system works? Messages Author (Score) Votes I like Yale Alice (3) Like: 3 Don t play with AlphaGo Alice (3) 0 I hate C++ Bob (-1) Dislike: 2...... Vi=1-1-1=-1 Alice Dave
Examples
Global Reputation System Every entity has only one reputation score red P2P network (Gnutella, KaZaA, etc.) PageRank is a global peer-based trust model - Who plays the peer role in this context? - What are the historical activities of peers in this context?
PageRank [WWW 98] A B C
PageRank [WWW 98] A The number of web pages N = 3 The damping parameter d = 0.7 PR(A)=(1-d)x(1/N)+dx(PR(C)/a) PR(B)=(1-d)x(1/N)+dx(PR(A)/b) PR(C)=(1-d)x(1/N)+dx(PR(B)/c) B C PR(A)=0.1+0.7xPR(C) PR(B)=0.1+0.7xPR(A) PR(C)=0.1+0.7xPR(B) By solving the linear equations: PR(A)=PR(B)=PR(C)=0.33
PageRank [WWW 98] A The number of web pages N = 3 The damping parameter d = 0.7 PR(A)=(1-d)x(1/N)+dx(PR(C)/a) PR(B)=(1-d)x(1/N)+dx(PR(A)/b) PR(C)=(1-d)x(1/N)+dx(PR(B)/c) B C PR(A)=0.1+0.7xPR(C) PR(B)=0.1+0.7xPR(A) PR(C)=0.1+0.7xPR(B) By solving the linear equations: PR(A)=PR(B)=PR(C)=0.33
PageRank [WWW 98] B A C The number of web pages N = 3 The damping parameter d = 0.7 PR(A)=(1-d)x(1/N)+dx(PR(C)/a) PR(B)=(1-d)x(1/N)+dx(PR(A)/b) PR(C)=(1-d)x(1/N)+dx(PR(B)/c) # of outgoing links PR(A)=0.1+0.7xPR(C) PR(B)=0.1+0.7xPR(A) PR(C)=0.1+0.7xPR(B) By solving the linear equations: PR(A)=PR(B)=PR(C)=0.33
PageRank [WWW 98] A The number of web pages N = 3 The damping parameter d = 0.7 PR(A)=(1-d)x(1/N)+dx(PR(C)/a) PR(B)=(1-d)x(1/N)+dx(PR(A)/b) PR(C)=(1-d)x(1/N)+dx(PR(B)/c) B C PR(A)=0.1+0.7xPR(C) PR(B)=0.1+0.7xPR(A) PR(C)=0.1+0.7xPR(B) By solving the linear equations: PR(A)=PR(B)=PR(C)=0.33
PageRank [WWW 98] A The number of web pages N = 3 The damping parameter d = 0.7 PR(A)=(1-d)x(1/N)+dx(PR(C)/a) PR(B)=(1-d)x(1/N)+dx(PR(A)/b) PR(C)=(1-d)x(1/N)+dx(PR(B)/c) B C PR(A)=0.1+0.7xPR(C) PR(B)=0.1+0.7xPR(A) PR(C)=0.1+0.7xPR(B) By solving the linear equations: PR(A)=PR(B)=PR(C)=0.33
PageRank [WWW 98] A A B C B C What are PageRanks? What s the problem in this example?
Reputation Systems What is a reputation system: - Rating users or objects based on historical activities - Like credit cards - Assumption: high reputation users publish good content red P2P network (Gnutella, KaZaA, etc.) Types of reputation systems: - Global reputation model, e.g., PageRank - Personalized reputation model, e.g., EigenTrust and Credence Global reputation model is mainly used in centralized systems Personalized reputation is mainly used in P2P systems
Reputation Systems What is a reputation system: - Rating users or objects based on historical activities - Like credit cards - Assumption: high reputation users publish good content red P2P network (Gnutella, KaZaA, etc.) Types of reputation systems: - Global reputation model, e.g., PageRank - Personalized reputation model, e.g., EigenTrust and Credence Global reputation model is mainly used in centralized systems Personalized reputation is mainly used in P2P systems
Personalized Reputation Model We focus on personalized reputation model Used in P2P content sharing systems Three different types: - Peer-based reputation systems, e.g., EigenTrust [WWW 03] - Object-based reputation systems, e.g., Credence [NSDI 06] - Hybrid reputation systems, e.g., Scrubber [P2P 07] Request Decentralized file-sharing system Alice
EigenTrust [WWW 03] EigenTrsut is the first peer-based reputation system: - Similar to PageRank - Each peer is assigned a personalized reputation score - Assumption: Good peer does not publish polluted files Problems: - Relying on recommenders - Cannot offer fine-grained reputation to each object - Difficult to decide parameters in practice
Lecture Roadmap Background Reputation Systems Case Study: Credence
Credence [NSDI 06] Credence is the first object-based reputation system: - In Alice s view, each object is assigned a reputation score - Defending against malicious recommenders - Fine-grained reputation for each object
Credence [NSDI 06] Files Providers Voters F10 P2, P6 P4 (+1), P6 (-1) F22 P2, P6, P8 P2 (-1), P7 (-1) F4 P2, P4 P2(+1), P4(-1), P7(-1) Alice F6 P11, P13, P14 P11(+1)..................
Computing each file s reputation Files Providers Voters F10 P2, P6 P4 (+1), P6 (-1) F22 P2, P6, P8 P2 (-1), P7 (-1) F4 P2, P4 P2(+1), P4(-1), P7(-1) Alice F6 P11, P13, P14 P11(+1)..................
A file s reputation For Alice, the reputation score of each object is computed by weighted average of voters similarities. n Rep(F) = V i θ( Alice,Voter_i ) i=1 n θ( Alice,Voter_i ) i=1 [ -1, 1 ]
A file s reputation For Alice, the reputation score of each object is computed by weighted average of voters similarities. n The vote cast by voter i on F (+1 or -1) Rep(F) = V i θ( Alice,Voter_i ) i=1 n θ( Alice,Voter_i ) i=1 [ -1, 1 ]
A file s reputation For Alice, the reputation score of each object is computed by weighted average of voters similarities. n The vote cast by voter i on F (+1 or -1) Rep(F) = V i θ( Alice,Voter_i ) i=1 n θ( Alice,Voter_i ) i=1 [ -1, 1 ] The similarity between Alice and voter i The range is [-1, +1].
How to compute similarity Sim = (p-ab) a(1-a)b(1-b) For the overlapping voting set (e.g., S) between Alice and C i : - a is # of positive votes cased by Alice on the files in S divided by # of all the votes casted by Alice on the files in S - b is # of positive votes casted by C i on the files in S divided by # of all the votes casted by C i on the files in S - p is # of positive votes casted by both Alice and C i on the files in S divided by # of all the votes agreed by both Alice and C i on the files in S
How to compute similarity Sim = (p-ab) a(1-a)b(1-b) Simplify it! Sim(A, B) = # of the same votes on S S where S is the set consisting of overlapping files voted by both A and B
Example B: +1 C: -1 D: +1 File1 A: +1 B: +1 C: -1 File2 A: -1 B: -1 File3 A: -1 C: +1 D: -1 File4 A C: +1 D: -1 File5 A: +1 D: -1 File6 C: +1 G: -1 File7
Practical Issues There are several practical issues in Credence: - Cold start - Lack of enough overlapping voting history
Solution: Flow-based Reputation I want to compute similarity with C, but I do not have direct similarity with C A C 0.8 0.9 B
Solution: Flow-based Reputation I want to compute similarity with C, but I do not have direct similarity with C 0.8 * 0.9 = 0.72 A C 0.8 0.9 B
Credence [NSDI 06] Files Providers Voters F10 P2, P6 P2, P4, P6 F22 P2, P6, P8 P2, P7 F4 P2, P4 P2, P4, P7 Alice F6 P11, P13, P14 P11..................
Credence [NSDI 06] Files Providers Voters F10=0.8 P2, P6 P2, P4, P6 F22 P2, P6, P8 P2, P7 F4 P2, P4 P2, P4, P7 Alice F6 P11, P13, P14 P11..................
Credence [NSDI 06] Files Providers Voters F10=0.8 P2, P6 P2, P4, P6 F22=0.5 P2, P6, P8 P2, P7 F4 P2, P4 P2, P4, P7 Alice F6 P11, P13, P14 P11..................
Credence [NSDI 06] Files Providers Voters F10=0.8 P2, P6 P2, P4, P6 F22=0.5 P2, P6, P8 P2, P7 F4=0.9 P2, P4 P2, P4, P7 Alice F6 P11, P13, P14 P11..................
Credence [NSDI 06] Files Providers Voters F10=0.8 P2, P6 P2, P4, P6 F22=0.5 P2, P6, P8 P2, P7 F4=0.9 P2, P4 P2, P4, P7 Alice F6=0.6 P11, P13, P14 P11..................
Credence [NSDI 06] Pick the highest reputation file Files Providers Voters F10=0.8 P2, P6 P2, P4, P6 F22=0.5 P2, P6, P8 P2, P7 F4=0.9 P2, P4 P2, P4, P7 Alice F6=0.6 P11, P13, P14 P11..................
Discussion: Does it work? Credence works under assumptions: - Malicious nodes publish many polluted files - Malicious users cast misleading votes - # of malicious users should not be higher than 1/2, say 20% - There is no large-scale sybil attack in the setting
Next Lecture In the lec-6, I will cover: - What is sybil attack - How to defend against sybil attacks - Case studies: SybilGuard and DSybil